Skip to content

GH-3501: Speed up DictionaryValuesWriter.shouldFallBack()#3502

Open
arouel wants to merge 1 commit intoapache:masterfrom
arouel:should-fallBack-check-dictionary-size-exceeded
Open

GH-3501: Speed up DictionaryValuesWriter.shouldFallBack()#3502
arouel wants to merge 1 commit intoapache:masterfrom
arouel:should-fallBack-check-dictionary-size-exceeded

Conversation

@arouel
Copy link
Copy Markdown

@arouel arouel commented Apr 19, 2026

Rationale for this change

DictionaryValuesWriter.shouldFallBack() is called by FallbackValuesWriter.checkFallback() after every single value write. The current implementation dispatches a virtual call to getDictionarySize() on every invocation, even for duplicate values that do not grow the dictionary. Both dictionaryByteSize and the dictionary entry count can only increase when a new entry is added (inside the if (id == -1) branch), so the size-exceeded condition can only transition from false to true at that exact point. The per-write virtual dispatch is redundant work for the common case.

What changes are included in this PR?

Single file change to DictionaryValuesWriter.java:

  • Add a private boolean dictionarySizeExceeded field, reset in resetDictionary().
  • Add a protected void checkDictionarySizeLimit(int newDictionarySize) method that sets the flag when dictionaryByteSize > maxDictionaryByteSize || newDictionarySize > MAX_DICTIONARY_ENTRIES.
  • Each subclass write method (Binary, FixedLenArray, Long, Double, Integer, Float) calls checkDictionarySizeLimit(id + 1) after inserting a new dictionary entry.
  • shouldFallBack() becomes return dictionarySizeExceeded, a simple field read with no virtual dispatch.

Are these changes tested?

Yes. All existing tests in TestDictionary pass without modification, confirming semantic equivalence. The new test testCheckDictionarySizeLimitExceedsByEntryCount directly exercises the previously untested MAX_DICTIONARY_ENTRIES boundary in checkDictionarySizeLimit, verifying correct trigger and reset behavior.

Are there any user-facing changes?

No. This is an internal optimization with no API, behavior, or configuration changes.

Closes #3501

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid per-write virtual dispatch in DictionaryValuesWriter.shouldFallBack() by caching the size-exceeded check

1 participant