refactor: Use BufferedBatchState enum for SMJ spilling#17429
Merged
comphead merged 5 commits intoapache:mainfrom Sep 5, 2025
Merged
refactor: Use BufferedBatchState enum for SMJ spilling#17429comphead merged 5 commits intoapache:mainfrom
BufferedBatchState enum for SMJ spilling#17429comphead merged 5 commits intoapache:mainfrom
Conversation
Contributor
Author
2010YOUY01
approved these changes
Sep 5, 2025
Contributor
2010YOUY01
left a comment
There was a problem hiding this comment.
Thank you, this is a nice improvement to the existing code.
| /// None if the batch spilled to disk th | ||
| pub batch: Option<RecordBatch>, | ||
| /// Represents in memory or spilled record batch | ||
| pub batch: BufferedBatchState, |
Contributor
There was a problem hiding this comment.
I think this BufferedBatchState enum should include all intermediate data into the states, for example join_arrays. When the main batch is spilled, the join key array should either be also spilled or throw away, otherwise it will be leaking memory.
Perhaps we can leave a TODO in the comment now? This will be a big change.
comphead
approved these changes
Sep 5, 2025
Contributor
comphead
left a comment
There was a problem hiding this comment.
this is pretty neat thanks @jonathanc-n
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
This was brought up by @alamb in the SMJ spilling pull request.
Rationale for this change
Using enum to represent batch spill state is simpler than two enums
Wanted to get this out of the way before switching HJ spilling to SMJ during runtime.
What changes are included in this PR?
Use enum instead of double option for expressing the state of the buffered batch