Skip to content

perf: Optimize lower, upper for ASCII inputs#21980

Open
neilconway wants to merge 2 commits intoapache:mainfrom
neilconway:neilc/perf-case-conv
Open

perf: Optimize lower, upper for ASCII inputs#21980
neilconway wants to merge 2 commits intoapache:mainfrom
neilconway:neilc/perf-case-conv

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

@neilconway neilconway commented May 1, 2026

Which issue does this PR close?

Rationale for this change

This PR implements two optimizations for lower and upper on ASCII strings:

  1. For the Utf8/LargeUtf8 code path, we previously did the case conversion via str::to_uppercase or str::to_lowercase. For ASCII inputs, it is a bit faster to use map(u8::to_ascii_lowercase).collect() over the bytes of the string directly: although the stdlib functions are well-optimized, they need to check again on every string to see if it is ASCII. Since we know the input is all-ASCII, we can avoid that check.
  2. The Utf8View code path previously wasn't optimized for ASCII strings; add a new code path that is. As with the Utf8 code path, we can do case-conversion on bytes directly, which vectorizes well and avoids repeated ASCII checks. In addition, we can build the output StringViewArray directly, which avoids the intermediate strings and unnecessary allocations used in the previous approach.

Benchmarks (ARM64):

upper

  • upper_all_values_are_ascii: 5.4 → 4.1 µs (−24.1%)

lower — all-ASCII (the optimized paths)

  • lower_all_values_are_ascii: 1024: 5.4 → 4.0 µs (−25.9%)
  • lower_all_values_are_ascii: 4096: 22.6 → 15.6 µs (−31.0%)
  • lower_all_values_are_ascii: 8192: 41.9 → 30.8 µs (−26.5%)
  • string_views size:4096 str_len:10 null:0 mixed:false: 151.0 → 75.3 µs (−50.1%)
  • string_views size:4096 str_len:10 null:0 mixed:true: 175.9 → 134.6 µs (−23.5%)
  • string_views size:4096 str_len:10 null:0.1 mixed:false: 143.5 → 76.8 µs (−46.5%)
  • string_views size:4096 str_len:10 null:0.1 mixed:true: 166.6 → 125.0 µs (−25.0%)
  • string_views size:4096 str_len:64 null:0 mixed:false: 150.1 → 92.7 µs (−38.2%)
  • string_views size:4096 str_len:64 null:0 mixed:true: 185.2 → 140.1 µs (−24.4%)
  • string_views size:4096 str_len:64 null:0.1 mixed:false: 136.7 → 97.0 µs (−29.0%)
  • string_views size:4096 str_len:64 null:0.1 mixed:true: 173.7 → 131.2 µs (−24.5%)
  • string_views size:4096 str_len:128 null:0 mixed:false: 190.3 → 141.7 µs (−25.5%)
  • string_views size:4096 str_len:128 null:0 mixed:true: 197.0 → 153.7 µs (−22.0%)
  • string_views size:4096 str_len:128 null:0.1 mixed:false: 173.3 → 141.7 µs (−18.2%)
  • string_views size:4096 str_len:128 null:0.1 mixed:true: 184.0 → 142.8 µs (−22.4%)
  • string_views size:8192 str_len:10 null:0 mixed:false: 302.9 → 150.2 µs (−50.4%)
  • string_views size:8192 str_len:10 null:0 mixed:true: 352.9 → 279.0 µs (−20.9%)
  • string_views size:8192 str_len:10 null:0.1 mixed:false: 285.0 → 154.3 µs (−45.9%)
  • string_views size:8192 str_len:10 null:0.1 mixed:true: 334.2 → 266.4 µs (−20.3%)
  • string_views size:8192 str_len:64 null:0 mixed:false: 295.6 → 184.4 µs (−37.6%)
  • string_views size:8192 str_len:64 null:0 mixed:true: 371.4 → 290.7 µs (−21.7%)
  • string_views size:8192 str_len:64 null:0.1 mixed:false: 273.7 → 195.1 µs (−28.7%)
  • string_views size:8192 str_len:64 null:0.1 mixed:true: 347.0 → 279.6 µs (−19.4%)
  • string_views size:8192 str_len:128 null:0 mixed:false: 379.6 → 285.6 µs (−24.8%)
  • string_views size:8192 str_len:128 null:0 mixed:true: 397.1 → 317.4 µs (−20.1%)
  • string_views size:8192 str_len:128 null:0.1 mixed:false: 364.1 → 285.1 µs (−21.7%)
  • string_views size:8192 str_len:128 null:0.1 mixed:true: 379.3 → 302.3 µs (−20.3%)
  • lower_sliced_ascii parent=65536 slice=128 str_len=32: 980.2 → 797.9 ns (−18.6%)

lower — some non-ASCII string_views (mostly noise)

  • size:4096 str_len:10 null:0 mixed:false: 374.5 → 362.2 µs (−3.3%)
  • size:4096 str_len:10 null:0 mixed:true: 374.6 → 380.5 µs (+1.6%)
  • size:4096 str_len:10 null:0.1 mixed:false: 340.8 → 356.5 µs (+4.6%)
  • size:4096 str_len:10 null:0.1 mixed:true: 344.0 → 352.5 µs (+2.5%)
  • size:4096 str_len:64 null:0 mixed:false: 377.5 → 373.5 µs (−1.1%)
  • size:4096 str_len:64 null:0 mixed:true: 380.6 → 375.0 µs (−1.5%)
  • size:4096 str_len:64 null:0.1 mixed:false: 330.7 → 341.8 µs (+3.4%)
  • size:4096 str_len:64 null:0.1 mixed:true: 341.8 → 354.2 µs (+3.6%)
  • size:4096 str_len:128 null:0 mixed:false: 371.8 → 356.2 µs (−4.2%)
  • size:4096 str_len:128 null:0 mixed:true: 378.9 → 386.0 µs (+1.9%)
  • size:4096 str_len:128 null:0.1 mixed:false: 350.5 → 350.3 µs (−0.1%)
  • size:4096 str_len:128 null:0.1 mixed:true: 351.0 → 337.9 µs (−3.7%)
  • size:8192 str_len:10 null:0 mixed:false: 740.0 → 757.2 µs (+2.3%)
  • size:8192 str_len:10 null:0 mixed:true: 781.3 → 750.2 µs (−4.0%)
  • size:8192 str_len:10 null:0.1 mixed:false: 693.7 → 693.7 µs (0.0%)
  • size:8192 str_len:10 null:0.1 mixed:true: 681.5 → 705.2 µs (+3.5%)
  • size:8192 str_len:64 null:0 mixed:false: 755.5 → 768.6 µs (+1.7%)
  • size:8192 str_len:64 null:0 mixed:true: 759.6 → 754.3 µs (−0.7%)
  • size:8192 str_len:64 null:0.1 mixed:false: 711.5 → 667.8 µs (−6.1%)
  • size:8192 str_len:64 null:0.1 mixed:true: 682.1 → 688.2 µs (+0.9%)
  • size:8192 str_len:128 null:0 mixed:false: 771.5 → 765.9 µs (−0.7%)
  • size:8192 str_len:128 null:0 mixed:true: 747.7 → 792.6 µs (+6.0%)
  • size:8192 str_len:128 null:0.1 mixed:false: 687.1 → 701.3 µs (+2.1%)
  • size:8192 str_len:128 null:0.1 mixed:true: 679.2 → 696.8 µs (+2.6%)

lower — first/middle non-ASCII (flat)

  • lower_the_first_value_is_nonascii: 1024: 42.1 → 42.4 µs (+0.7%)
  • lower_the_first_value_is_nonascii: 4096: 173.9 → 173.3 µs (−0.3%)
  • lower_the_first_value_is_nonascii: 8192: 350.8 → 349.3 µs (−0.4%)
  • lower_the_middle_value_is_nonascii: 1024: 42.9 → 42.8 µs (−0.2%)
  • lower_the_middle_value_is_nonascii: 4096: 175.1 → 176.3 µs (+0.7%)
  • lower_the_middle_value_is_nonascii: 8192: 353.6 → 354.6 µs (+0.3%)

What changes are included in this PR?

  • Implement optimizations
  • Share StringViewArray buffer size constants with the bulk-NULL builders

Are these changes tested?

Covered by existing tests.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the functions Changes to functions implementation label May 1, 2026
Comment thread datafusion/functions/src/string/common.rs
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neilconway
I believe the optimization tightly connected to german strings specifics and it would be nice to comment the byte level work

@neilconway
Copy link
Copy Markdown
Contributor Author

Thanks for the review, @comphead ! Please let me know if you have more feedback.

FYI I think there's an opportunity to refactor this into an extension to the StringViewArrayBuilder API we just added (and perhaps use it in some other string UDFs), but I'd like to land this change first, so we can be sure the refactoring doesn't regress codegen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

lower, upper could be further optimized for ASCII-only inputs

2 participants