Skip to content

Fix: compact view buffers in ScalarValue::compact for all container t…#21934

Open
bert-beyondloops wants to merge 2 commits intoapache:mainfrom
bert-beyondloops:scalar_value_compact_view_types
Open

Fix: compact view buffers in ScalarValue::compact for all container t…#21934
bert-beyondloops wants to merge 2 commits intoapache:mainfrom
bert-beyondloops:scalar_value_compact_view_types

Conversation

@bert-beyondloops
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

ScalarValue::compact copies array data for container types to release sliced-buffer overhead, but never called .gc() on nested StringViewArray / BinaryViewArray. A scalar extracted from a large batch would therefore still hold a reference to the entire original view backing buffer, negating the benefit of compaction for any list, struct, or map whose values are view-typed.

What changes are included in this PR?

  • ScalarValue::compact now compacts nested view buffers for all container types (List, LargeList, FixedSizeList, ListView, LargeListView, Struct, Map), trimming StringViewArray / BinaryViewArray backing buffers to only the
    referenced bytes.
  • The internal compact_view_buffers helper recursively handles FixedSizeList, ListView, LargeListView, and Map.

Are these changes tested?

new unit tests are added in scalar::tests, one per container type. Each test verifies that after compact() the backing buffer is reduced to exactly the bytes of the referenced string, and that the scalar value is preserved.

Are there any user-facing changes?

No. The public compact / compacted API is unchanged; this PR only fixes the behaviour for view-typed nested arrays.

@github-actions github-actions Bot added the common Related to common crate label Apr 29, 2026
@bert-beyondloops
Copy link
Copy Markdown
Contributor Author

I tried to come up with a minimal solution.
If the implementation should somehow reside in another place, or made in another way, please feel free to comment.

@alamb alamb added the performance Make DataFusion faster label Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate performance Make DataFusion faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ScalarValue::compacted() does not free unused view-buffer memory for Utf8View / BinaryView arrays

3 participants