Remove Empty Chunks in Cloudfetch concatenation by jprakash-db · Pull Request #814 · databricks/databricks-sql-python

jprakash-db · 2026-05-27T18:13:59Z

What type of PR is this?

Description

ThriftResultSet.fetchmany_arrow and fetchall_arrow previously appended every chunk
returned by the underlying queue — including the 0-row placeholder that
CloudFetchQueue._create_empty_table() emits when self.table is None — into
partial_result_chunks, and then handed the list to concat_table_chunks.

When the placeholder's schema differs from the real downloaded chunks (which it can,
because it is built from schema_bytes that may be stale, or schemaless when
schema_bytes is None), pyarrow.concat_tables(..., promote_options="default")
silently introduces phantom columns filled with NULLs, or — for type mismatches —
raises.

Fix: hold any 0-row chunk aside in a local zero_row_table instead of appending it.
The concat list now only ever contains real chunks, which share a consistent schema.
If every chunk turned out to be 0-row (genuinely empty result set), fall back to
appending the held-aside placeholder so the method still returns a valid pyarrow.Table.

CloudFetchQueue itself is unchanged. The columnar fetch paths are unchanged
(they only run with ColumnQueue, whose empty slices always carry the right schema).
The SEA arrow methods are unchanged (single queue call per invocation, no concat).

How is this tested?

Added 4 regression tests in tests/unit/test_fetches.py:

test_fetchall_arrow_drops_mismatched_empty_placeholder — first queue returns a
0-row placeholder with column stale_col; second returns real data with col0.
Asserts the result has only col0. Verified to fail against the pre-fix code with
['stale_col', 'col0'] != ['col0'].
test_fetchmany_arrow_drops_mismatched_empty_placeholder — same for fetchmany_arrow.
test_fetchall_arrow_all_empty_returns_zero_row_table — every chunk is 0-row;
asserts a pyarrow.Table with num_rows == 0 is returned (the held-aside fallback fires).
test_fetchmany_arrow_all_empty_returns_zero_row_table — same for fetchmany_arrow.

Full unit suite (pytest tests/unit -x): 765 passed, 4 skipped.

Integration / e2e to be run separately.

gopalldb · 2026-05-30T04:13:13Z

P2 — Minor

Inconsistent type annotation
result_set.py:372

fetchall_arrow uses partial_result_chunks: List = [] while fetchmany_arrow:313 uses List["pyarrow.Table"]. Cosmetic; align both.

Pre-existing infinite-loop semantics preserved
result_set.py:325-336, 382-388

If a pathological server returned 0-row chunks while leaving has_more_rows=True forever, both methods spin. The old code (append-without-decrement) and the new code (continue) have identical semantics here.
Not introduced by this PR, but the new shape makes it easier to add a max-consecutive-empties safety bound as a follow-up.

No mid-stream-placeholder test
tests/unit/test_fetches.py

All four new tests place the placeholder first. The continue path is exercised, but only with the initial queue being the placeholder — there's no [real, placeholder, real] test that locks in the mid-stream
continue behaviour in isolation from the initial-call branch.

Stub queue minor mismatch with production next_n_rows
tests/unit/test_fetches.py:99-108

_StubArrowQueue returns the full table on first call and slice(0, 0) thereafter. Production next_n_rows(N) returns up to N rows. The tests happen to call fetchmany_arrow(3) against a 3-row table so the
difference doesn't matter. Pure fixture observation.

gopalldb

Added some minor comments

jprakash-db · 2026-05-30T06:14:03Z

Pre-existing infinite-loop semantics preserved
Why would the server return has more rows and then return 0 rows ? This is a server side issue

Comments regarding the tests are fine, but our main point is to check whether concatenation is averted or not with the empty chunks and this works, so its fine

jprakash-db added 2 commits May 27, 2026 23:21

Removed the unncecessary chunk

d04ff17

Added tests

4bfc1f5

jprakash-db deployed to azure-prod May 27, 2026 18:14 — with GitHub Actions Active

jprakash-db requested review from gopalldb and msrathore-db May 27, 2026 18:21

gopalldb approved these changes May 30, 2026

View reviewed changes

jprakash-db merged commit 13987de into main May 30, 2026
45 of 46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Empty Chunks in Cloudfetch concatenation#814

Remove Empty Chunks in Cloudfetch concatenation#814
jprakash-db merged 2 commits into
mainfrom
jprakash-db/rm-mt-chunk

jprakash-db commented May 27, 2026 •

edited

Loading

Uh oh!

gopalldb commented May 30, 2026

Uh oh!

gopalldb left a comment

Uh oh!

jprakash-db commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jprakash-db commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

Description

How is this tested?

Uh oh!

gopalldb commented May 30, 2026

Uh oh!

gopalldb left a comment

Choose a reason for hiding this comment

Uh oh!

jprakash-db commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jprakash-db commented May 27, 2026 •

edited

Loading