Skip to content

perf: Use Arrow vectorized eq kernel for IN list with column references#20528

Merged
adriangb merged 4 commits intoapache:mainfrom
zhangxffff:feat/arrow-eq-in-list
Feb 28, 2026
Merged

perf: Use Arrow vectorized eq kernel for IN list with column references#20528
adriangb merged 4 commits intoapache:mainfrom
zhangxffff:feat/arrow-eq-in-list

Conversation

@zhangxffff
Copy link
Copy Markdown
Contributor

@zhangxffff zhangxffff commented Feb 24, 2026

Which issue does this PR close?

Rationale for this change

When the IN list contains column references (e.g. SELECT * FROM t WHERE a IN (b, c, d, e)), DataFusion falls back to a row-by-row make_comparator path which is significantly slower than it needs to be. Arrow provides SIMD-optimized eq kernels that can compare entire arrays in one call.

What changes are included in this PR?

  • Use Arrow's vectorized eq kernel instead of row-by-row make_comparator for non-nested types (primitive, string, binary) in the column-reference IN list evaluation path
  • For nested types (Struct, List, etc.), fall back to make_comparator since Arrow's eq kernel does not support them
  • Add 6 unit tests covering the column-reference evaluation path (Int32, Utf8, NOT IN, NULL handling, NaN semantics)

Are these changes tested?

Yes. 6 new unit tests added:

  • test_in_list_with_columns_int32_scalars
  • test_in_list_with_columns_int32_column_refs
  • test_in_list_with_columns_utf8_column_refs
  • test_in_list_with_columns_negated
  • test_in_list_with_columns_null_in_list
  • test_in_list_with_columns_float_nan

Are there any user-facing changes?

No API changes. Queries with column-reference IN lists will run faster.

@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label Feb 24, 2026
@zhangxffff
Copy link
Copy Markdown
Contributor Author

Benchmark result

run with cargo bench --bench in_list -- "in_list_cols"

Int32: 10-20x speedup across all scenarios. Improvement is greater with nulls (up to 20x) since the original row-by-row path has higher per-row null checking overhead.

Utf8: 2.3-11x speedup, with larger gains at lower match rates where vectorized comparison dominates over or_kleene merging cost.

(zhangxffff) zhangxffff@95d3d60664da ~/W/datafusion (main)> critcmp after before
group                                              after                                  before
-----                                              -----                                  ------
in_list_cols/Int32/list=28/match=0%/nulls=0%       1.00     92.6±1.06µs        ? ?/sec    9.97   923.4±13.94µs        ? ?/sec
in_list_cols/Int32/list=28/match=0%/nulls=20%      1.00    103.7±0.79µs        ? ?/sec    17.29 1792.5±15.14µs        ? ?/sec
in_list_cols/Int32/list=28/match=100%/nulls=0%     1.00     92.7±1.41µs        ? ?/sec    10.26  950.9±10.76µs        ? ?/sec
in_list_cols/Int32/list=28/match=100%/nulls=20%    1.00    104.4±1.72µs        ? ?/sec    17.28 1804.9±16.47µs        ? ?/sec
in_list_cols/Int32/list=28/match=50%/nulls=0%      1.00     92.5±0.67µs        ? ?/sec    15.65 1448.1±13.22µs        ? ?/sec
in_list_cols/Int32/list=28/match=50%/nulls=20%     1.00    106.2±2.56µs        ? ?/sec    19.70     2.1±0.02ms        ? ?/sec
in_list_cols/Int32/list=3/match=0%/nulls=0%        1.00     10.1±0.20µs        ? ?/sec    9.74     98.6±0.91µs        ? ?/sec
in_list_cols/Int32/list=3/match=0%/nulls=20%       1.00     11.2±0.16µs        ? ?/sec    16.17   181.9±1.58µs        ? ?/sec
in_list_cols/Int32/list=3/match=100%/nulls=0%      1.00     10.2±0.09µs        ? ?/sec    9.98    101.7±0.91µs        ? ?/sec
in_list_cols/Int32/list=3/match=100%/nulls=20%     1.00     11.3±0.12µs        ? ?/sec    16.28   184.0±3.56µs        ? ?/sec
in_list_cols/Int32/list=3/match=50%/nulls=0%       1.00     10.1±0.08µs        ? ?/sec    14.79   149.4±1.51µs        ? ?/sec
in_list_cols/Int32/list=3/match=50%/nulls=20%      1.00     11.2±0.13µs        ? ?/sec    18.23   204.1±1.75µs        ? ?/sec
in_list_cols/Int32/list=8/match=0%/nulls=0%        1.00     26.7±0.41µs        ? ?/sec    9.85    263.5±2.04µs        ? ?/sec
in_list_cols/Int32/list=8/match=0%/nulls=20%       1.00     30.2±0.23µs        ? ?/sec    16.49   498.3±3.58µs        ? ?/sec
in_list_cols/Int32/list=8/match=100%/nulls=0%      1.00     26.6±0.44µs        ? ?/sec    10.21   271.9±3.34µs        ? ?/sec
in_list_cols/Int32/list=8/match=100%/nulls=20%     1.00     29.7±0.31µs        ? ?/sec    17.07   507.8±6.23µs        ? ?/sec
in_list_cols/Int32/list=8/match=50%/nulls=0%       1.00     26.8±0.27µs        ? ?/sec    15.17   406.1±2.29µs        ? ?/sec
in_list_cols/Int32/list=8/match=50%/nulls=20%      1.00     29.9±0.63µs        ? ?/sec    19.82  592.9±10.95µs        ? ?/sec
in_list_cols/Utf8/list=28/match=0%                 1.00    158.5±4.42µs        ? ?/sec    10.19 1615.5±10.33µs        ? ?/sec
in_list_cols/Utf8/list=28/match=100%               1.00   722.8±11.34µs        ? ?/sec    2.29  1655.1±11.99µs        ? ?/sec
in_list_cols/Utf8/list=28/match=50%                1.00  1070.4±11.22µs        ? ?/sec    2.97      3.2±0.02ms        ? ?/sec
in_list_cols/Utf8/list=3/match=0%                  1.00     15.5±0.43µs        ? ?/sec    11.09   171.6±2.69µs        ? ?/sec
in_list_cols/Utf8/list=3/match=100%                1.00     70.0±1.08µs        ? ?/sec    2.34    163.4±2.15µs        ? ?/sec
in_list_cols/Utf8/list=3/match=50%                 1.00    107.5±2.05µs        ? ?/sec    2.97    318.9±4.12µs        ? ?/sec
in_list_cols/Utf8/list=8/match=0%                  1.00     42.4±1.63µs        ? ?/sec    10.88   461.3±4.24µs        ? ?/sec
in_list_cols/Utf8/list=8/match=100%                1.00    194.9±1.06µs        ? ?/sec    2.40    467.1±4.39µs        ? ?/sec
in_list_cols/Utf8/list=8/match=50%                 1.00    296.0±3.38µs        ? ?/sec    3.03    897.1±8.81µs        ? ?/sec

Copy link
Copy Markdown
Contributor

@neilconway neilconway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really awesome! Nice improvement. Overall LGTM, other than one obscure issue with REE types.

// falling back to row-by-row comparator for nested types (Struct, List, etc.)
// where eq semantics are ambiguous.
let value = value.into_array(num_rows)?;
let use_arrow_eq = !value.data_type().is_nested();
Copy link
Copy Markdown
Contributor

@neilconway neilconway Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know for sure that Arrow's eq kernel will work for all non-nested types? Perhaps that would be a bit fragile if we add new types in the future. I wonder if we should explicitly whitelist the types we know that work?

Digging around a bit, it seems we panic if we try to pass RunEndEncoded types to eq, but REE types aren't considered nested.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great catch! I've replaced the !is_nested() check with an explicit whitelist (supports_arrow_eq) that only enables Arrow's eq kernel for known-supported types. Please take another look when you have a chance.

Copy link
Copy Markdown
Contributor

@neilconway neilconway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Comment on lines +150 to +155
match dt {
Boolean | Binary | LargeBinary | BinaryView | FixedSizeBinary(_) => true,
dt if dt.is_primitive() || dt.is_null() || dt.is_string() => true,
Dictionary(_, v) => supports_arrow_eq(v.as_ref()),
_ => false,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional but maybe a bit nicer?

Suggested change
match dt {
Boolean | Binary | LargeBinary | BinaryView | FixedSizeBinary(_) => true,
dt if dt.is_primitive() || dt.is_null() || dt.is_string() => true,
Dictionary(_, v) => supports_arrow_eq(v.as_ref()),
_ => false,
}
match dt {
Boolean | Binary | LargeBinary | BinaryView | FixedSizeBinary(_) => true,
Dictionary(_, v) => supports_arrow_eq(v.as_ref()),
_ => dt.is_primitive() || dt.is_null() || dt.is_string(),
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional but maybe a bit nicer?

Good suggestion, thanks! Much more concise this way. Applied in the latest commit.

@zhangxffff
Copy link
Copy Markdown
Contributor Author

Hi! @adriangb
Could you please take a look at this PR when you have a moment?
If everything looks good, I’d appreciate it if you could help merge it.

Thanks!

Copy link
Copy Markdown
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 😄

/// binary (Binary/LargeBinary/BinaryView/FixedSizeBinary), Null, and
/// Dictionary-encoded variants of the above.
/// Unsupported: nested types (Struct, List, Map, Union) and RunEndEncoded.
fn supports_arrow_eq(dt: &DataType) -> bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate arrow itself doesn't expose this check

@adriangb
Copy link
Copy Markdown
Contributor

run benchmark in_list

@alamb-ghbot
Copy link
Copy Markdown

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing feat/arrow-eq-in-list (38e0df4) to db5197b diff
BENCH_NAME=in_list
BENCH_COMMAND=cargo bench --features=parquet --bench in_list
BENCH_FILTER=
BENCH_BRANCH_NAME=feat_arrow-eq-in-list
Results will be posted here when complete

@alamb-ghbot
Copy link
Copy Markdown

🤖: Benchmark completed

Details

group                                                  feat_arrow-eq-in-list                  main
-----                                                  ---------------------                  ----
in_list/Float32/list=100/nulls=0%                      1.35     52.6±0.31µs        ? ?/sec    1.00     39.1±0.78µs        ? ?/sec
in_list/Float32/list=100/nulls=20%                     1.29     72.0±0.83µs        ? ?/sec    1.00     55.7±0.26µs        ? ?/sec
in_list/Float32/list=28/nulls=0%                       1.35     48.1±0.38µs        ? ?/sec    1.00     35.7±2.04µs        ? ?/sec
in_list/Float32/list=28/nulls=20%                      1.00     56.6±0.22µs        ? ?/sec    1.46     82.6±1.14µs        ? ?/sec
in_list/Float32/list=3/nulls=0%                        1.00     30.1±0.16µs        ? ?/sec    1.08     32.3±0.09µs        ? ?/sec
in_list/Float32/list=3/nulls=20%                       1.05     31.3±0.07µs        ? ?/sec    1.00     29.8±0.09µs        ? ?/sec
in_list/Float32/list=8/nulls=0%                        1.00     33.5±0.43µs        ? ?/sec    1.07     35.8±0.29µs        ? ?/sec
in_list/Float32/list=8/nulls=20%                       1.18     37.1±3.47µs        ? ?/sec    1.00     31.4±0.21µs        ? ?/sec
in_list/Int16/list=100/nulls=0%                        1.28     45.7±0.70µs        ? ?/sec    1.00     35.7±0.13µs        ? ?/sec
in_list/Int16/list=100/nulls=20%                       1.00     33.8±0.08µs        ? ?/sec    1.43     48.3±1.58µs        ? ?/sec
in_list/Int16/list=28/nulls=0%                         1.41     61.3±1.23µs        ? ?/sec    1.00     43.4±0.18µs        ? ?/sec
in_list/Int16/list=28/nulls=20%                        1.40     74.5±0.42µs        ? ?/sec    1.00     53.2±0.86µs        ? ?/sec
in_list/Int16/list=3/nulls=0%                          1.27     29.1±0.13µs        ? ?/sec    1.00     23.0±0.09µs        ? ?/sec
in_list/Int16/list=3/nulls=20%                         1.29     29.1±0.66µs        ? ?/sec    1.00     22.6±0.33µs        ? ?/sec
in_list/Int16/list=8/nulls=0%                          1.24     31.4±0.33µs        ? ?/sec    1.00     25.3±0.13µs        ? ?/sec
in_list/Int16/list=8/nulls=20%                         1.26     30.9±0.93µs        ? ?/sec    1.00     24.6±0.13µs        ? ?/sec
in_list/Int32/list=100/nulls=0%                        1.02     52.0±0.58µs        ? ?/sec    1.00     51.0±0.32µs        ? ?/sec
in_list/Int32/list=100/nulls=20%                       1.52     58.7±0.36µs        ? ?/sec    1.00     38.5±0.14µs        ? ?/sec
in_list/Int32/list=28/nulls=0%                         1.42     48.2±0.88µs        ? ?/sec    1.00     34.0±0.29µs        ? ?/sec
in_list/Int32/list=28/nulls=20%                        1.00     34.6±0.14µs        ? ?/sec    1.03     35.7±0.49µs        ? ?/sec
in_list/Int32/list=3/nulls=0%                          1.00     29.2±0.20µs        ? ?/sec    1.00     29.2±0.10µs        ? ?/sec
in_list/Int32/list=3/nulls=20%                         1.00     28.8±0.16µs        ? ?/sec    1.01     29.0±0.08µs        ? ?/sec
in_list/Int32/list=8/nulls=0%                          1.01     31.7±1.25µs        ? ?/sec    1.00     31.3±0.27µs        ? ?/sec
in_list/Int32/list=8/nulls=20%                         1.00     31.6±0.52µs        ? ?/sec    1.38     43.8±0.35µs        ? ?/sec
in_list/TimestampNs/list=100/nulls=0%                  1.48    106.8±1.40µs        ? ?/sec    1.00     72.3±0.36µs        ? ?/sec
in_list/TimestampNs/list=100/nulls=20%                 1.22    131.9±1.84µs        ? ?/sec    1.00    108.5±2.87µs        ? ?/sec
in_list/TimestampNs/list=28/nulls=0%                   1.20     74.1±0.46µs        ? ?/sec    1.00     61.6±0.32µs        ? ?/sec
in_list/TimestampNs/list=28/nulls=20%                  1.07    133.9±0.69µs        ? ?/sec    1.00    125.5±1.19µs        ? ?/sec
in_list/TimestampNs/list=3/nulls=0%                    1.00     51.7±7.38µs        ? ?/sec    1.01     52.3±1.54µs        ? ?/sec
in_list/TimestampNs/list=3/nulls=20%                   1.02     96.3±0.66µs        ? ?/sec    1.00     94.2±3.34µs        ? ?/sec
in_list/TimestampNs/list=8/nulls=0%                    1.00     55.2±2.80µs        ? ?/sec    1.01     55.8±1.51µs        ? ?/sec
in_list/TimestampNs/list=8/nulls=20%                   1.02     98.5±0.84µs        ? ?/sec    1.00     96.8±3.89µs        ? ?/sec
in_list/UInt8/list=100/nulls=0%                        1.09     55.7±0.65µs        ? ?/sec    1.00     51.0±0.71µs        ? ?/sec
in_list/UInt8/list=100/nulls=20%                       1.10     64.9±0.50µs        ? ?/sec    1.00     59.0±1.07µs        ? ?/sec
in_list/UInt8/list=28/nulls=0%                         1.00     38.9±0.45µs        ? ?/sec    1.40     54.3±0.28µs        ? ?/sec
in_list/UInt8/list=28/nulls=20%                        1.29     54.2±0.73µs        ? ?/sec    1.00     42.1±0.77µs        ? ?/sec
in_list/UInt8/list=3/nulls=0%                          1.01     27.2±0.62µs        ? ?/sec    1.00     27.0±0.29µs        ? ?/sec
in_list/UInt8/list=3/nulls=20%                         1.00     27.7±0.18µs        ? ?/sec    1.00     27.7±0.18µs        ? ?/sec
in_list/UInt8/list=8/nulls=0%                          1.04     31.1±0.23µs        ? ?/sec    1.00     29.7±0.10µs        ? ?/sec
in_list/UInt8/list=8/nulls=20%                         1.08     30.4±0.32µs        ? ?/sec    1.00     28.2±0.67µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=100                 1.02    151.1±4.73µs        ? ?/sec    1.00    148.1±1.75µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=12                  1.18     99.9±2.37µs        ? ?/sec    1.00     84.7±0.52µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=3                   1.01    102.9±0.39µs        ? ?/sec    1.00    102.2±0.30µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=100                1.00    171.7±1.00µs        ? ?/sec    1.10    189.0±6.76µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=12                 1.03    152.0±1.03µs        ? ?/sec    1.00    147.6±3.50µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=3                  1.01    153.2±1.02µs        ? ?/sec    1.00    151.5±3.43µs        ? ?/sec
in_list/Utf8/list=28/nulls=0%/str=100                  1.00    150.2±0.96µs        ? ?/sec    1.18    177.9±1.69µs        ? ?/sec
in_list/Utf8/list=28/nulls=0%/str=12                   1.00    107.5±1.87µs        ? ?/sec    1.11    119.6±7.23µs        ? ?/sec
in_list/Utf8/list=28/nulls=0%/str=3                    1.13    108.9±0.39µs        ? ?/sec    1.00     96.4±0.58µs        ? ?/sec
in_list/Utf8/list=28/nulls=20%/str=100                 1.00    179.6±1.68µs        ? ?/sec    1.04    186.5±3.04µs        ? ?/sec
in_list/Utf8/list=28/nulls=20%/str=12                  1.00    127.7±0.61µs        ? ?/sec    1.02    130.3±0.89µs        ? ?/sec
in_list/Utf8/list=28/nulls=20%/str=3                   1.00    129.8±0.86µs        ? ?/sec    1.14    147.8±1.89µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=100                   1.00    125.6±1.25µs        ? ?/sec    1.01    127.3±1.16µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=12                    1.00     69.6±5.70µs        ? ?/sec    1.01     70.5±1.02µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=3                     1.00     73.1±0.48µs        ? ?/sec    1.03     75.0±0.30µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=100                  1.02    161.3±1.49µs        ? ?/sec    1.00    158.2±1.46µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=12                   1.00    116.2±1.79µs        ? ?/sec    1.00    116.5±0.40µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=3                    1.01    120.8±1.68µs        ? ?/sec    1.00    119.6±3.37µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=100                   1.00    130.9±1.11µs        ? ?/sec    1.00    130.9±5.74µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=12                    1.02    76.1±10.10µs        ? ?/sec    1.00     74.8±0.34µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=3                     1.00     77.9±0.24µs        ? ?/sec    1.02     79.3±0.44µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=100                  1.01    163.2±1.36µs        ? ?/sec    1.00    161.8±5.11µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=12                   1.01    121.0±0.72µs        ? ?/sec    1.00    119.6±1.02µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=3                    1.01    125.7±1.16µs        ? ?/sec    1.00    124.5±2.82µs        ? ?/sec
in_list/Utf8/mixed/list=100/match=0%/nulls=0%          1.00    132.1±5.53µs        ? ?/sec    1.24    163.5±1.60µs        ? ?/sec
in_list/Utf8/mixed/list=100/match=0%/nulls=20%         1.00    185.6±1.87µs        ? ?/sec    1.08    199.5±1.66µs        ? ?/sec
in_list/Utf8/mixed/list=100/match=25%/nulls=0%         1.04    186.5±4.81µs        ? ?/sec    1.00    179.1±1.34µs        ? ?/sec
in_list/Utf8/mixed/list=100/match=25%/nulls=20%        1.05    218.6±0.83µs        ? ?/sec    1.00    207.3±0.72µs        ? ?/sec
in_list/Utf8/mixed/list=100/match=75%/nulls=0%         1.01    184.2±3.41µs        ? ?/sec    1.00    182.5±1.19µs        ? ?/sec
in_list/Utf8/mixed/list=100/match=75%/nulls=20%        1.04    223.0±1.58µs        ? ?/sec    1.00   215.3±11.16µs        ? ?/sec
in_list/Utf8/mixed/list=28/match=0%/nulls=0%           1.00    122.9±0.99µs        ? ?/sec    1.01    124.4±0.90µs        ? ?/sec
in_list/Utf8/mixed/list=28/match=0%/nulls=20%          1.00    161.9±1.38µs        ? ?/sec    1.11    180.4±0.93µs        ? ?/sec
in_list/Utf8/mixed/list=28/match=25%/nulls=0%          1.02    205.8±0.58µs        ? ?/sec    1.00    200.8±1.71µs        ? ?/sec
in_list/Utf8/mixed/list=28/match=25%/nulls=20%         1.00    217.6±0.43µs        ? ?/sec    1.02    221.3±0.95µs        ? ?/sec
in_list/Utf8/mixed/list=28/match=75%/nulls=0%          1.06    194.0±2.44µs        ? ?/sec    1.00    182.9±1.29µs        ? ?/sec
in_list/Utf8/mixed/list=28/match=75%/nulls=20%         1.06    221.4±1.32µs        ? ?/sec    1.00    208.1±0.59µs        ? ?/sec
in_list/Utf8/mixed/list=3/match=0%/nulls=0%            1.00    104.4±0.66µs        ? ?/sec    1.03    107.0±0.67µs        ? ?/sec
in_list/Utf8/mixed/list=3/match=0%/nulls=20%           1.00    153.0±0.59µs        ? ?/sec    1.00    152.8±0.57µs        ? ?/sec
in_list/Utf8/mixed/list=3/match=25%/nulls=0%           1.00    149.3±0.85µs        ? ?/sec    1.00    150.0±0.73µs        ? ?/sec
in_list/Utf8/mixed/list=3/match=25%/nulls=20%          1.00    186.7±1.50µs        ? ?/sec    1.01    188.1±8.77µs        ? ?/sec
in_list/Utf8/mixed/list=3/match=75%/nulls=0%           1.00    168.8±2.45µs        ? ?/sec    1.01    170.9±6.01µs        ? ?/sec
in_list/Utf8/mixed/list=3/match=75%/nulls=20%          1.01    214.5±0.43µs        ? ?/sec    1.00    212.5±1.35µs        ? ?/sec
in_list/Utf8/mixed/list=8/match=0%/nulls=0%            1.00    108.8±2.13µs        ? ?/sec    1.02    111.1±0.48µs        ? ?/sec
in_list/Utf8/mixed/list=8/match=0%/nulls=20%           1.01    156.8±1.66µs        ? ?/sec    1.00    155.5±0.96µs        ? ?/sec
in_list/Utf8/mixed/list=8/match=25%/nulls=0%           1.00    161.0±1.28µs        ? ?/sec    1.01    162.7±1.29µs        ? ?/sec
in_list/Utf8/mixed/list=8/match=25%/nulls=20%          1.04   204.4±19.36µs        ? ?/sec    1.00    196.6±3.17µs        ? ?/sec
in_list/Utf8/mixed/list=8/match=75%/nulls=0%           1.00    175.7±2.16µs        ? ?/sec    1.00    174.9±0.83µs        ? ?/sec
in_list/Utf8/mixed/list=8/match=75%/nulls=20%          1.01    204.9±0.53µs        ? ?/sec    1.00    203.8±1.38µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=100             1.01    151.5±0.89µs        ? ?/sec    1.00    150.2±1.06µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=12              1.17    121.0±0.93µs        ? ?/sec    1.00    103.8±0.29µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=3               1.00     69.2±0.38µs        ? ?/sec    1.17     81.2±0.20µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=100            1.00    180.4±1.55µs        ? ?/sec    1.02    184.4±1.60µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=12             1.02    146.7±1.06µs        ? ?/sec    1.00    143.6±0.83µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=3              1.00    139.8±2.05µs        ? ?/sec    1.10    153.8±3.01µs        ? ?/sec
in_list/Utf8View/list=28/nulls=0%/str=100              1.18    166.7±0.71µs        ? ?/sec    1.00    141.4±0.87µs        ? ?/sec
in_list/Utf8View/list=28/nulls=0%/str=12               1.31     92.8±0.98µs        ? ?/sec    1.00     70.6±0.42µs        ? ?/sec
in_list/Utf8View/list=28/nulls=0%/str=3                1.00     84.0±1.21µs        ? ?/sec    1.27    106.6±0.62µs        ? ?/sec
in_list/Utf8View/list=28/nulls=20%/str=100             1.00    179.4±1.83µs        ? ?/sec    1.16    207.2±1.43µs        ? ?/sec
in_list/Utf8View/list=28/nulls=20%/str=12              1.03    144.7±1.20µs        ? ?/sec    1.00    140.0±1.22µs        ? ?/sec
in_list/Utf8View/list=28/nulls=20%/str=3               1.00    125.4±6.83µs        ? ?/sec    1.12    140.7±1.59µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=100               1.00    129.5±3.69µs        ? ?/sec    1.01    130.9±0.84µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=12                1.00     53.3±0.49µs        ? ?/sec    1.04     55.5±0.33µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=3                 1.00     52.9±1.26µs        ? ?/sec    1.06     56.0±1.76µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=100              1.01    163.1±1.77µs        ? ?/sec    1.00    161.1±0.84µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=12               1.01    103.2±0.93µs        ? ?/sec    1.00    102.5±0.81µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=3                1.00    103.1±0.40µs        ? ?/sec    1.00    103.5±1.58µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=100               1.00    134.8±3.50µs        ? ?/sec    1.03   139.2±11.53µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=12                1.00     59.1±1.27µs        ? ?/sec    1.04     61.7±0.58µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=3                 1.00     58.0±1.31µs        ? ?/sec    1.06     61.3±1.15µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=100              1.00    167.2±0.74µs        ? ?/sec    1.00    166.8±0.81µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=12               1.00    106.5±0.87µs        ? ?/sec    1.01    107.6±0.87µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=3                1.00    106.4±1.78µs        ? ?/sec    1.01    107.0±1.44µs        ? ?/sec
in_list/Utf8View/mixed/list=100/match=0%/nulls=0%      1.02    137.1±0.99µs        ? ?/sec    1.00    135.0±2.70µs        ? ?/sec
in_list/Utf8View/mixed/list=100/match=0%/nulls=20%     1.00    156.0±1.81µs        ? ?/sec    1.02    159.1±1.70µs        ? ?/sec
in_list/Utf8View/mixed/list=100/match=25%/nulls=0%     1.13    202.9±0.43µs        ? ?/sec    1.00    180.2±0.67µs        ? ?/sec
in_list/Utf8View/mixed/list=100/match=25%/nulls=20%    1.00    206.0±1.42µs        ? ?/sec    1.06    218.6±0.77µs        ? ?/sec
in_list/Utf8View/mixed/list=100/match=75%/nulls=0%     1.02    207.3±1.11µs        ? ?/sec    1.00    203.2±0.38µs        ? ?/sec
in_list/Utf8View/mixed/list=100/match=75%/nulls=20%    1.00    230.9±4.58µs        ? ?/sec    1.04    239.5±0.46µs        ? ?/sec
in_list/Utf8View/mixed/list=28/match=0%/nulls=0%       1.24    163.1±3.59µs        ? ?/sec    1.00    131.8±1.96µs        ? ?/sec
in_list/Utf8View/mixed/list=28/match=0%/nulls=20%      1.14    187.3±3.89µs        ? ?/sec    1.00    164.1±1.94µs        ? ?/sec
in_list/Utf8View/mixed/list=28/match=25%/nulls=0%      1.10    193.2±0.94µs        ? ?/sec    1.00    174.9±1.40µs        ? ?/sec
in_list/Utf8View/mixed/list=28/match=25%/nulls=20%     1.04    224.1±0.57µs        ? ?/sec    1.00    216.4±0.59µs        ? ?/sec
in_list/Utf8View/mixed/list=28/match=75%/nulls=0%      1.00    205.6±9.17µs        ? ?/sec    1.04    213.7±0.40µs        ? ?/sec
in_list/Utf8View/mixed/list=28/match=75%/nulls=20%     1.00    226.7±0.76µs        ? ?/sec    1.03    234.6±0.79µs        ? ?/sec
in_list/Utf8View/mixed/list=3/match=0%/nulls=0%        1.00     96.7±0.58µs        ? ?/sec    1.03     99.8±0.35µs        ? ?/sec
in_list/Utf8View/mixed/list=3/match=0%/nulls=20%       1.00    136.2±0.96µs        ? ?/sec    1.02    138.5±2.32µs        ? ?/sec
in_list/Utf8View/mixed/list=3/match=25%/nulls=0%       1.00    159.3±1.11µs        ? ?/sec    1.01    160.7±4.35µs        ? ?/sec
in_list/Utf8View/mixed/list=3/match=25%/nulls=20%      1.00    188.7±0.70µs        ? ?/sec    1.01    189.7±0.50µs        ? ?/sec
in_list/Utf8View/mixed/list=3/match=75%/nulls=0%       1.00    183.2±0.99µs        ? ?/sec    1.00    182.4±0.69µs        ? ?/sec
in_list/Utf8View/mixed/list=3/match=75%/nulls=20%      1.00    218.0±1.19µs        ? ?/sec    1.01    219.5±2.42µs        ? ?/sec
in_list/Utf8View/mixed/list=8/match=0%/nulls=0%        1.00    103.6±1.52µs        ? ?/sec    1.03    106.3±0.58µs        ? ?/sec
in_list/Utf8View/mixed/list=8/match=0%/nulls=20%       1.00    143.3±0.50µs        ? ?/sec    1.01    144.3±1.31µs        ? ?/sec
in_list/Utf8View/mixed/list=8/match=25%/nulls=0%       1.00    167.7±0.68µs        ? ?/sec    1.00    167.9±0.63µs        ? ?/sec
in_list/Utf8View/mixed/list=8/match=25%/nulls=20%      1.00    186.2±0.85µs        ? ?/sec    1.01    188.2±0.75µs        ? ?/sec
in_list/Utf8View/mixed/list=8/match=75%/nulls=0%       1.00    174.7±6.04µs        ? ?/sec    1.00    173.8±0.87µs        ? ?/sec
in_list/Utf8View/mixed/list=8/match=75%/nulls=20%      1.00    221.2±0.98µs        ? ?/sec    1.03   228.2±19.10µs        ? ?/sec
in_list_cols/Int32/list=28/match=0%/nulls=0%           1.00    166.7±1.22µs        ? ?/sec    8.70   1451.0±9.87µs        ? ?/sec
in_list_cols/Int32/list=28/match=0%/nulls=20%          1.00    193.0±1.53µs        ? ?/sec    16.18     3.1±0.04ms        ? ?/sec
in_list_cols/Int32/list=28/match=100%/nulls=0%         1.00    166.9±1.22µs        ? ?/sec    9.41   1570.7±5.22µs        ? ?/sec
in_list_cols/Int32/list=28/match=100%/nulls=20%        1.00   196.1±18.72µs        ? ?/sec    15.82     3.1±0.03ms        ? ?/sec
in_list_cols/Int32/list=28/match=50%/nulls=0%          1.00    166.5±1.28µs        ? ?/sec    13.31     2.2±0.01ms        ? ?/sec
in_list_cols/Int32/list=28/match=50%/nulls=20%         1.00    192.2±1.46µs        ? ?/sec    18.26     3.5±0.01ms        ? ?/sec
in_list_cols/Int32/list=3/match=0%/nulls=0%            1.00     17.9±0.36µs        ? ?/sec    8.62    154.7±0.56µs        ? ?/sec
in_list_cols/Int32/list=3/match=0%/nulls=20%           1.00     20.3±0.12µs        ? ?/sec    16.36   332.2±1.01µs        ? ?/sec
in_list_cols/Int32/list=3/match=100%/nulls=0%          1.00     17.9±0.14µs        ? ?/sec    9.41    168.6±1.61µs        ? ?/sec
in_list_cols/Int32/list=3/match=100%/nulls=20%         1.00     20.5±0.79µs        ? ?/sec    16.09   330.2±1.46µs        ? ?/sec
in_list_cols/Int32/list=3/match=50%/nulls=0%           1.00     17.9±0.33µs        ? ?/sec    13.23   237.2±6.77µs        ? ?/sec
in_list_cols/Int32/list=3/match=50%/nulls=20%          1.00     20.4±0.13µs        ? ?/sec    18.38   374.5±0.80µs        ? ?/sec
in_list_cols/Int32/list=8/match=0%/nulls=0%            1.00     47.4±0.34µs        ? ?/sec    8.78    415.8±7.62µs        ? ?/sec
in_list_cols/Int32/list=8/match=0%/nulls=20%           1.00     54.9±1.11µs        ? ?/sec    16.16   887.0±4.16µs        ? ?/sec
in_list_cols/Int32/list=8/match=100%/nulls=0%          1.00     48.4±0.98µs        ? ?/sec    9.25    447.7±0.60µs        ? ?/sec
in_list_cols/Int32/list=8/match=100%/nulls=20%         1.00     54.4±0.25µs        ? ?/sec    16.20   881.3±3.16µs        ? ?/sec
in_list_cols/Int32/list=8/match=50%/nulls=0%           1.00     47.4±1.26µs        ? ?/sec    13.34   632.2±1.70µs        ? ?/sec
in_list_cols/Int32/list=8/match=50%/nulls=20%          1.00     54.8±1.36µs        ? ?/sec    18.32  1003.5±5.88µs        ? ?/sec
in_list_cols/Utf8/list=28/match=0%                     1.00    277.0±0.99µs        ? ?/sec    10.23     2.8±0.04ms        ? ?/sec
in_list_cols/Utf8/list=28/match=100%                   1.00    620.2±3.70µs        ? ?/sec    4.86      3.0±0.01ms        ? ?/sec
in_list_cols/Utf8/list=28/match=50%                    1.00  1233.3±11.54µs        ? ?/sec    3.86      4.8±0.02ms        ? ?/sec
in_list_cols/Utf8/list=3/match=0%                      1.00     29.6±0.30µs        ? ?/sec    9.97    295.1±1.24µs        ? ?/sec
in_list_cols/Utf8/list=3/match=100%                    1.00     65.0±1.61µs        ? ?/sec    4.94    321.1±2.09µs        ? ?/sec
in_list_cols/Utf8/list=3/match=50%                     1.00    122.7±0.91µs        ? ?/sec    4.13    506.6±4.34µs        ? ?/sec
in_list_cols/Utf8/list=8/match=0%                      1.00     78.8±1.16µs        ? ?/sec    10.16   801.2±7.72µs        ? ?/sec
in_list_cols/Utf8/list=8/match=100%                    1.00    175.1±1.32µs        ? ?/sec    4.92    860.4±6.14µs        ? ?/sec
in_list_cols/Utf8/list=8/match=50%                     1.00    345.9±2.35µs        ? ?/sec    3.90   1350.5±2.99µs        ? ?/sec

@adriangb
Copy link
Copy Markdown
Contributor

run benchmark in_list

@alamb-ghbot
Copy link
Copy Markdown

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing feat/arrow-eq-in-list (38e0df4) to db5197b diff
BENCH_NAME=in_list
BENCH_COMMAND=cargo bench --features=parquet --bench in_list
BENCH_FILTER=
BENCH_BRANCH_NAME=feat_arrow-eq-in-list
Results will be posted here when complete

@alamb-ghbot
Copy link
Copy Markdown

🤖: Benchmark completed

Details

group                                                  feat_arrow-eq-in-list                  main
-----                                                  ---------------------                  ----
in_list/Float32/list=100/nulls=0%                      1.00     44.3±0.26µs        ? ?/sec    1.01     44.7±0.97µs        ? ?/sec
in_list/Float32/list=100/nulls=20%                     1.12     54.3±0.86µs        ? ?/sec    1.00     48.6±0.50µs        ? ?/sec
in_list/Float32/list=28/nulls=0%                       1.00     55.4±0.64µs        ? ?/sec    1.43     79.5±2.39µs        ? ?/sec
in_list/Float32/list=28/nulls=20%                      1.49     77.0±0.92µs        ? ?/sec    1.00     51.6±0.20µs        ? ?/sec
in_list/Float32/list=3/nulls=0%                        1.01     30.0±0.70µs        ? ?/sec    1.00     29.7±0.12µs        ? ?/sec
in_list/Float32/list=3/nulls=20%                       1.00     29.8±0.45µs        ? ?/sec    1.06     31.6±0.18µs        ? ?/sec
in_list/Float32/list=8/nulls=0%                        1.08     34.7±0.52µs        ? ?/sec    1.00     32.1±0.18µs        ? ?/sec
in_list/Float32/list=8/nulls=20%                       1.00     35.9±1.74µs        ? ?/sec    1.30     46.9±0.25µs        ? ?/sec
in_list/Int16/list=100/nulls=0%                        2.07     72.6±1.07µs        ? ?/sec    1.00     35.1±0.08µs        ? ?/sec
in_list/Int16/list=100/nulls=20%                       1.28     55.2±0.67µs        ? ?/sec    1.00     43.2±0.26µs        ? ?/sec
in_list/Int16/list=28/nulls=0%                         1.00     39.9±0.73µs        ? ?/sec    1.30     51.9±0.70µs        ? ?/sec
in_list/Int16/list=28/nulls=20%                        1.10     79.7±0.45µs        ? ?/sec    1.00     72.3±0.62µs        ? ?/sec
in_list/Int16/list=3/nulls=0%                          1.30     29.6±0.30µs        ? ?/sec    1.00     22.9±0.06µs        ? ?/sec
in_list/Int16/list=3/nulls=20%                         1.27     28.5±0.83µs        ? ?/sec    1.00     22.5±0.30µs        ? ?/sec
in_list/Int16/list=8/nulls=0%                          1.27     32.2±0.53µs        ? ?/sec    1.00     25.4±0.24µs        ? ?/sec
in_list/Int16/list=8/nulls=20%                         1.24     30.6±0.60µs        ? ?/sec    1.00     24.6±0.09µs        ? ?/sec
in_list/Int32/list=100/nulls=0%                        1.04     36.2±0.15µs        ? ?/sec    1.00     34.8±0.21µs        ? ?/sec
in_list/Int32/list=100/nulls=20%                       1.00     34.4±0.39µs        ? ?/sec    1.41     48.5±0.33µs        ? ?/sec
in_list/Int32/list=28/nulls=0%                         1.44     55.2±0.35µs        ? ?/sec    1.00     38.4±0.13µs        ? ?/sec
in_list/Int32/list=28/nulls=20%                        1.02     59.9±0.78µs        ? ?/sec    1.00     58.5±0.37µs        ? ?/sec
in_list/Int32/list=3/nulls=0%                          1.00     29.1±0.14µs        ? ?/sec    1.00     29.0±0.38µs        ? ?/sec
in_list/Int32/list=3/nulls=20%                         1.00     28.8±0.16µs        ? ?/sec    1.01     29.0±0.56µs        ? ?/sec
in_list/Int32/list=8/nulls=0%                          1.02     31.8±0.17µs        ? ?/sec    1.00     31.1±0.57µs        ? ?/sec
in_list/Int32/list=8/nulls=20%                         1.07     33.0±4.71µs        ? ?/sec    1.00     30.8±0.52µs        ? ?/sec
in_list/TimestampNs/list=100/nulls=0%                  1.05     69.4±0.19µs        ? ?/sec    1.00     65.9±0.17µs        ? ?/sec
in_list/TimestampNs/list=100/nulls=20%                 1.00    110.4±0.54µs        ? ?/sec    1.19    131.7±2.16µs        ? ?/sec
in_list/TimestampNs/list=28/nulls=0%                   1.00     83.6±0.52µs        ? ?/sec    1.14     95.1±0.34µs        ? ?/sec
in_list/TimestampNs/list=28/nulls=20%                  1.00    109.7±1.09µs        ? ?/sec    1.16    127.3±0.75µs        ? ?/sec
in_list/TimestampNs/list=3/nulls=0%                    1.00     51.6±0.24µs        ? ?/sec    1.01     52.2±0.32µs        ? ?/sec
in_list/TimestampNs/list=3/nulls=20%                   1.02     96.0±0.64µs        ? ?/sec    1.00     94.5±2.44µs        ? ?/sec
in_list/TimestampNs/list=8/nulls=0%                    1.01     56.4±2.39µs        ? ?/sec    1.00     55.7±0.23µs        ? ?/sec
in_list/TimestampNs/list=8/nulls=20%                   1.01     97.3±0.72µs        ? ?/sec    1.00     96.7±0.56µs        ? ?/sec
in_list/UInt8/list=100/nulls=0%                        1.21     61.7±0.45µs        ? ?/sec    1.00     51.1±0.21µs        ? ?/sec
in_list/UInt8/list=100/nulls=20%                       1.03     62.9±0.39µs        ? ?/sec    1.00     61.1±2.25µs        ? ?/sec
in_list/UInt8/list=28/nulls=0%                         1.82     61.9±0.40µs        ? ?/sec    1.00     34.0±0.12µs        ? ?/sec
in_list/UInt8/list=28/nulls=20%                        1.18     43.1±0.59µs        ? ?/sec    1.00     36.4±0.48µs        ? ?/sec
in_list/UInt8/list=3/nulls=0%                          1.00     28.0±0.50µs        ? ?/sec    1.02     28.5±5.71µs        ? ?/sec
in_list/UInt8/list=3/nulls=20%                         1.08     28.6±4.75µs        ? ?/sec    1.00     26.5±0.13µs        ? ?/sec
in_list/UInt8/list=8/nulls=0%                          1.10     33.3±0.17µs        ? ?/sec    1.00     30.3±0.37µs        ? ?/sec
in_list/UInt8/list=8/nulls=20%                         1.08     30.5±0.14µs        ? ?/sec    1.00     28.2±0.09µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=100                 1.01    142.4±1.31µs        ? ?/sec    1.00    140.6±9.07µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=12                  1.00     93.6±0.32µs        ? ?/sec    1.00     93.7±0.47µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=3                   1.35    132.0±2.00µs        ? ?/sec    1.00     97.5±0.36µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=100                1.00    192.3±6.00µs        ? ?/sec    1.11    214.3±3.02µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=12                 1.30    177.5±4.47µs        ? ?/sec    1.00    136.6±2.82µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=3                  1.06    148.6±3.08µs        ? ?/sec    1.00    139.6±3.77µs        ? ?/sec
in_list/Utf8/list=28/nulls=0%/str=100                  1.00    167.0±1.00µs        ? ?/sec    1.06    177.0±4.68µs        ? ?/sec
in_list/Utf8/list=28/nulls=0%/str=12                   1.25    116.7±0.70µs        ? ?/sec    1.00     93.2±0.96µs        ? ?/sec
in_list/Utf8/list=28/nulls=0%/str=3                    1.00     84.1±0.31µs        ? ?/sec    1.24    104.5±0.36µs        ? ?/sec
in_list/Utf8/list=28/nulls=20%/str=100                 1.04    173.4±4.83µs        ? ?/sec    1.00    167.0±3.38µs        ? ?/sec
in_list/Utf8/list=28/nulls=20%/str=12                  1.22    151.5±0.77µs        ? ?/sec    1.00    124.0±0.62µs        ? ?/sec
in_list/Utf8/list=28/nulls=20%/str=3                   1.25    160.1±5.03µs        ? ?/sec    1.00    127.6±2.22µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=100                   1.00    125.7±1.95µs        ? ?/sec    1.02    128.8±8.16µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=12                    1.00     69.1±2.65µs        ? ?/sec    1.02     70.5±0.62µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=3                     1.00     72.7±0.20µs        ? ?/sec    1.03     75.2±1.14µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=100                  1.02    161.4±0.74µs        ? ?/sec    1.00    158.2±0.82µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=12                   1.01    116.3±0.72µs        ? ?/sec    1.00    114.8±1.32µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=3                    1.04   123.6±13.58µs        ? ?/sec    1.00    119.1±1.01µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=100                   1.00    131.5±0.43µs        ? ?/sec    1.05   138.1±18.94µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=12                    1.00     73.2±0.26µs        ? ?/sec    1.03     75.3±0.80µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=3                     1.00     78.1±1.13µs        ? ?/sec    1.02     79.7±0.19µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=100                  1.02    165.2±2.50µs        ? ?/sec    1.00    161.5±1.05µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=12                   1.01    120.3±0.59µs        ? ?/sec    1.00    119.2±0.80µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=3                    1.01    125.8±6.87µs        ? ?/sec    1.00    123.9±0.98µs        ? ?/sec
in_list/Utf8/mixed/list=100/match=0%/nulls=0%          1.30    159.0±0.66µs        ? ?/sec    1.00    122.7±1.51µs        ? ?/sec
in_list/Utf8/mixed/list=100/match=0%/nulls=20%         1.01    176.2±1.50µs        ? ?/sec    1.00    173.9±1.26µs        ? ?/sec
in_list/Utf8/mixed/list=100/match=25%/nulls=0%         1.00    174.6±0.68µs        ? ?/sec    1.09    191.0±1.08µs        ? ?/sec
in_list/Utf8/mixed/list=100/match=25%/nulls=20%        1.00    204.6±0.50µs        ? ?/sec    1.02    209.4±1.90µs        ? ?/sec
in_list/Utf8/mixed/list=100/match=75%/nulls=0%         1.02    186.8±0.91µs        ? ?/sec    1.00    182.5±0.86µs        ? ?/sec
in_list/Utf8/mixed/list=100/match=75%/nulls=20%        1.01    214.8±0.99µs        ? ?/sec    1.00    213.3±1.06µs        ? ?/sec
in_list/Utf8/mixed/list=28/match=0%/nulls=0%           1.00    122.6±0.78µs        ? ?/sec    1.14    140.4±0.95µs        ? ?/sec
in_list/Utf8/mixed/list=28/match=0%/nulls=20%          1.02    163.4±0.35µs        ? ?/sec    1.00    160.5±1.04µs        ? ?/sec
in_list/Utf8/mixed/list=28/match=25%/nulls=0%          1.17    203.4±1.14µs        ? ?/sec    1.00    174.4±1.58µs        ? ?/sec
in_list/Utf8/mixed/list=28/match=25%/nulls=20%         1.00    209.0±0.83µs        ? ?/sec    1.08    225.9±1.15µs        ? ?/sec
in_list/Utf8/mixed/list=28/match=75%/nulls=0%          1.09    195.0±0.75µs        ? ?/sec    1.00    179.0±1.45µs        ? ?/sec
in_list/Utf8/mixed/list=28/match=75%/nulls=20%         1.00    212.4±0.65µs        ? ?/sec    1.01    214.4±2.84µs        ? ?/sec
in_list/Utf8/mixed/list=3/match=0%/nulls=0%            1.00    103.3±0.45µs        ? ?/sec    1.03    106.4±0.43µs        ? ?/sec
in_list/Utf8/mixed/list=3/match=0%/nulls=20%           1.01    153.6±0.88µs        ? ?/sec    1.00    152.6±0.70µs        ? ?/sec
in_list/Utf8/mixed/list=3/match=25%/nulls=0%           1.02    152.8±1.16µs        ? ?/sec    1.00    150.2±1.37µs        ? ?/sec
in_list/Utf8/mixed/list=3/match=25%/nulls=20%          1.01    187.5±0.95µs        ? ?/sec    1.00    186.0±0.61µs        ? ?/sec
in_list/Utf8/mixed/list=3/match=75%/nulls=0%           1.05   176.6±13.44µs        ? ?/sec    1.00    168.2±3.52µs        ? ?/sec
in_list/Utf8/mixed/list=3/match=75%/nulls=20%          1.00    213.0±0.80µs        ? ?/sec    1.00    212.2±1.15µs        ? ?/sec
in_list/Utf8/mixed/list=8/match=0%/nulls=0%            1.00    108.7±0.30µs        ? ?/sec    1.03    112.1±4.29µs        ? ?/sec
in_list/Utf8/mixed/list=8/match=0%/nulls=20%           1.00    156.2±1.73µs        ? ?/sec    1.00    155.4±0.86µs        ? ?/sec
in_list/Utf8/mixed/list=8/match=25%/nulls=0%           1.00    162.2±3.94µs        ? ?/sec    1.02    165.4±2.16µs        ? ?/sec
in_list/Utf8/mixed/list=8/match=25%/nulls=20%          1.00    197.6±0.65µs        ? ?/sec    1.01    199.1±2.66µs        ? ?/sec
in_list/Utf8/mixed/list=8/match=75%/nulls=0%           1.01   179.3±12.45µs        ? ?/sec    1.00    177.0±3.56µs        ? ?/sec
in_list/Utf8/mixed/list=8/match=75%/nulls=20%          1.01    205.0±0.93µs        ? ?/sec    1.00    202.9±0.55µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=100             1.12    175.3±0.76µs        ? ?/sec    1.00    157.0±1.09µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=12              1.05     75.8±0.56µs        ? ?/sec    1.00     72.0±0.51µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=3               1.00     64.5±0.38µs        ? ?/sec    1.12     72.3±4.20µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=100            1.00    200.3±0.41µs        ? ?/sec    1.03    206.2±0.83µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=12             1.16    146.5±6.85µs        ? ?/sec    1.00    126.4±8.42µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=3              1.17    141.1±0.80µs        ? ?/sec    1.00    120.2±0.91µs        ? ?/sec
in_list/Utf8View/list=28/nulls=0%/str=100              1.03    145.4±1.23µs        ? ?/sec    1.00    141.1±1.85µs        ? ?/sec
in_list/Utf8View/list=28/nulls=0%/str=12               1.00     68.8±0.28µs        ? ?/sec    1.48    101.5±0.42µs        ? ?/sec
in_list/Utf8View/list=28/nulls=0%/str=3                1.00     65.3±1.89µs        ? ?/sec    1.58    103.5±1.59µs        ? ?/sec
in_list/Utf8View/list=28/nulls=20%/str=100             1.18    204.7±1.92µs        ? ?/sec    1.00    173.1±0.64µs        ? ?/sec
in_list/Utf8View/list=28/nulls=20%/str=12              1.22    142.8±0.51µs        ? ?/sec    1.00    116.6±3.00µs        ? ?/sec
in_list/Utf8View/list=28/nulls=20%/str=3               1.07    140.3±0.66µs        ? ?/sec    1.00    131.5±0.55µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=100               1.00    128.0±0.94µs        ? ?/sec    1.02    131.1±0.69µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=12                1.00     53.4±0.56µs        ? ?/sec    1.04     55.4±0.29µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=3                 1.00     54.2±2.25µs        ? ?/sec    1.02     55.1±0.58µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=100              1.01    162.9±4.26µs        ? ?/sec    1.00    161.7±0.66µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=12               1.01    104.4±7.43µs        ? ?/sec    1.00    103.1±1.64µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=3                1.13   116.6±26.25µs        ? ?/sec    1.00    103.2±2.18µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=100               1.00    133.7±5.85µs        ? ?/sec    1.01    135.2±1.33µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=12                1.00     58.6±2.03µs        ? ?/sec    1.03     60.6±0.37µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=3                 1.00     59.1±1.55µs        ? ?/sec    1.02     60.5±0.54µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=100              1.00    167.3±0.69µs        ? ?/sec    1.00    167.0±0.96µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=12               1.00    106.1±0.58µs        ? ?/sec    1.02    107.7±2.51µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=3                1.00    106.2±3.03µs        ? ?/sec    1.01    106.9±2.21µs        ? ?/sec
in_list/Utf8View/mixed/list=100/match=0%/nulls=0%      1.00    127.0±1.93µs        ? ?/sec    1.23    155.9±1.22µs        ? ?/sec
in_list/Utf8View/mixed/list=100/match=0%/nulls=20%     1.06    165.4±1.38µs        ? ?/sec    1.00    156.7±0.80µs        ? ?/sec
in_list/Utf8View/mixed/list=100/match=25%/nulls=0%     1.11    212.4±0.60µs        ? ?/sec    1.00    190.5±0.38µs        ? ?/sec
in_list/Utf8View/mixed/list=100/match=25%/nulls=20%    1.00    221.9±2.95µs        ? ?/sec    1.03    228.6±1.20µs        ? ?/sec
in_list/Utf8View/mixed/list=100/match=75%/nulls=0%     1.00    203.6±0.39µs        ? ?/sec    1.05    213.5±1.63µs        ? ?/sec
in_list/Utf8View/mixed/list=100/match=75%/nulls=20%    1.01    224.3±2.70µs        ? ?/sec    1.00    221.6±0.73µs        ? ?/sec
in_list/Utf8View/mixed/list=28/match=0%/nulls=0%       1.00    114.9±0.80µs        ? ?/sec    1.40    161.4±4.46µs        ? ?/sec
in_list/Utf8View/mixed/list=28/match=0%/nulls=20%      1.08    183.5±1.69µs        ? ?/sec    1.00    169.1±1.12µs        ? ?/sec
in_list/Utf8View/mixed/list=28/match=25%/nulls=0%      1.00    200.5±1.87µs        ? ?/sec    1.00    201.3±1.00µs        ? ?/sec
in_list/Utf8View/mixed/list=28/match=25%/nulls=20%     1.09    221.9±1.25µs        ? ?/sec    1.00    203.9±1.82µs        ? ?/sec
in_list/Utf8View/mixed/list=28/match=75%/nulls=0%      1.03    201.9±1.07µs        ? ?/sec    1.00    196.7±0.89µs        ? ?/sec
in_list/Utf8View/mixed/list=28/match=75%/nulls=20%     1.07    240.1±2.52µs        ? ?/sec    1.00    225.2±2.98µs        ? ?/sec
in_list/Utf8View/mixed/list=3/match=0%/nulls=0%        1.00     97.7±0.49µs        ? ?/sec    1.02    100.0±0.77µs        ? ?/sec
in_list/Utf8View/mixed/list=3/match=0%/nulls=20%       1.00    135.9±1.46µs        ? ?/sec    1.02    138.7±0.67µs        ? ?/sec
in_list/Utf8View/mixed/list=3/match=25%/nulls=0%       1.00    160.2±1.90µs        ? ?/sec    1.00    160.2±0.85µs        ? ?/sec
in_list/Utf8View/mixed/list=3/match=25%/nulls=20%      1.00    189.8±2.10µs        ? ?/sec    1.02    193.9±1.37µs        ? ?/sec
in_list/Utf8View/mixed/list=3/match=75%/nulls=0%       1.01    184.0±8.25µs        ? ?/sec    1.00    182.8±1.78µs        ? ?/sec
in_list/Utf8View/mixed/list=3/match=75%/nulls=20%      1.00    218.9±2.07µs        ? ?/sec    1.02    224.0±5.31µs        ? ?/sec
in_list/Utf8View/mixed/list=8/match=0%/nulls=0%        1.00    103.4±0.40µs        ? ?/sec    1.03    106.1±1.59µs        ? ?/sec
in_list/Utf8View/mixed/list=8/match=0%/nulls=20%       1.00    143.1±1.83µs        ? ?/sec    1.01    145.0±1.53µs        ? ?/sec
in_list/Utf8View/mixed/list=8/match=25%/nulls=0%       1.00    166.8±1.79µs        ? ?/sec    1.01    169.1±2.14µs        ? ?/sec
in_list/Utf8View/mixed/list=8/match=25%/nulls=20%      1.00    186.6±0.86µs        ? ?/sec    1.01    187.8±0.94µs        ? ?/sec
in_list/Utf8View/mixed/list=8/match=75%/nulls=0%       1.00    173.5±0.53µs        ? ?/sec    1.00    174.1±5.60µs        ? ?/sec
in_list/Utf8View/mixed/list=8/match=75%/nulls=20%      1.00    222.1±2.27µs        ? ?/sec    1.01    224.2±2.97µs        ? ?/sec
in_list_cols/Int32/list=28/match=0%/nulls=0%           1.00    167.4±4.90µs        ? ?/sec    8.64  1446.7±11.10µs        ? ?/sec
in_list_cols/Int32/list=28/match=0%/nulls=20%          1.00    191.5±5.05µs        ? ?/sec    16.27     3.1±0.04ms        ? ?/sec
in_list_cols/Int32/list=28/match=100%/nulls=0%         1.00    166.5±0.78µs        ? ?/sec    9.43   1570.1±8.39µs        ? ?/sec
in_list_cols/Int32/list=28/match=100%/nulls=20%        1.00    191.5±4.53µs        ? ?/sec    16.29     3.1±0.03ms        ? ?/sec
in_list_cols/Int32/list=28/match=50%/nulls=0%          1.00    166.2±0.76µs        ? ?/sec    13.30     2.2±0.03ms        ? ?/sec
in_list_cols/Int32/list=28/match=50%/nulls=20%         1.00    191.8±1.98µs        ? ?/sec    18.33     3.5±0.01ms        ? ?/sec
in_list_cols/Int32/list=3/match=0%/nulls=0%            1.00     17.9±0.19µs        ? ?/sec    8.65    154.8±0.45µs        ? ?/sec
in_list_cols/Int32/list=3/match=0%/nulls=20%           1.00     20.2±0.18µs        ? ?/sec    16.50   333.3±2.80µs        ? ?/sec
in_list_cols/Int32/list=3/match=100%/nulls=0%          1.00     17.9±0.40µs        ? ?/sec    9.40    168.1±1.62µs        ? ?/sec
in_list_cols/Int32/list=3/match=100%/nulls=20%         1.00     20.3±0.67µs        ? ?/sec    16.59  336.8±27.68µs        ? ?/sec
in_list_cols/Int32/list=3/match=50%/nulls=0%           1.00     17.8±0.10µs        ? ?/sec    13.25   236.0±0.42µs        ? ?/sec
in_list_cols/Int32/list=3/match=50%/nulls=20%          1.00     20.2±0.37µs        ? ?/sec    18.54   374.8±0.63µs        ? ?/sec
in_list_cols/Int32/list=8/match=0%/nulls=0%            1.00     47.6±0.77µs        ? ?/sec    8.66    412.1±0.76µs        ? ?/sec
in_list_cols/Int32/list=8/match=0%/nulls=20%           1.00     54.7±0.33µs        ? ?/sec    16.22   887.6±3.08µs        ? ?/sec
in_list_cols/Int32/list=8/match=100%/nulls=0%          1.00     47.7±0.22µs        ? ?/sec    9.41    448.5±1.87µs        ? ?/sec
in_list_cols/Int32/list=8/match=100%/nulls=20%         1.00     54.4±0.19µs        ? ?/sec    16.20   881.2±6.30µs        ? ?/sec
in_list_cols/Int32/list=8/match=50%/nulls=0%           1.00     48.0±0.12µs        ? ?/sec    13.15   631.3±0.88µs        ? ?/sec
in_list_cols/Int32/list=8/match=50%/nulls=20%          1.00     54.1±1.05µs        ? ?/sec    18.55  1004.2±7.34µs        ? ?/sec
in_list_cols/Utf8/list=28/match=0%                     1.00    277.7±2.33µs        ? ?/sec    10.15     2.8±0.01ms        ? ?/sec
in_list_cols/Utf8/list=28/match=100%                   1.00    614.9±2.28µs        ? ?/sec    5.22      3.2±0.07ms        ? ?/sec
in_list_cols/Utf8/list=28/match=50%                    1.00  1239.3±47.24µs        ? ?/sec    3.85      4.8±0.03ms        ? ?/sec
in_list_cols/Utf8/list=3/match=0%                      1.00     29.6±0.82µs        ? ?/sec    9.96    294.4±0.75µs        ? ?/sec
in_list_cols/Utf8/list=3/match=100%                    1.00     64.9±0.41µs        ? ?/sec    5.15    334.0±3.59µs        ? ?/sec
in_list_cols/Utf8/list=3/match=50%                     1.00    122.9±1.00µs        ? ?/sec    4.12    506.2±2.53µs        ? ?/sec
in_list_cols/Utf8/list=8/match=0%                      1.00     78.6±0.22µs        ? ?/sec    10.17   799.4±5.27µs        ? ?/sec
in_list_cols/Utf8/list=8/match=100%                    1.00    174.5±0.50µs        ? ?/sec    5.14    896.9±2.68µs        ? ?/sec
in_list_cols/Utf8/list=8/match=50%                     1.00    345.2±0.78µs        ? ?/sec    3.94  1358.9±17.31µs        ? ?/sec

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Feb 27, 2026

in_list_cols/Int32/list=28/match=0%/nulls=0%           1.00    167.4±4.90µs        ? ?/sec    8.64  1446.7±11.10µs        ? ?/sec
in_list_cols/Int32/list=28/match=0%/nulls=20%          1.00    191.5±5.05µs        ? ?/sec    16.27     3.1±0.04ms        ? ?/sec

Those are some pretty crazy improvements for Int32/Utf8 -- nice work

I wonder why the others don't show a similar improvement

Copy link
Copy Markdown
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also noticing there's quite a bit of variability in some of these benchmarks:

in_list/Float32/list=28/nulls=20%                      1.49     77.0±0.92µs        ? ?/sec    1.00     51.6±0.20µs        ? ?/sec

I think this is noise but it's quite a bit of noise. I think these would be good candidates for Codspeed.

That said I feel like this change makes a lot of sense and the numbers generally look good - I propose we go ahead and merge it.

@zhangxffff
Copy link
Copy Markdown
Contributor Author

in_list_cols/Int32/list=28/match=0%/nulls=0%           1.00    167.4±4.90µs        ? ?/sec    8.64  1446.7±11.10µs        ? ?/sec
in_list_cols/Int32/list=28/match=0%/nulls=20%          1.00    191.5±5.05µs        ? ?/sec    16.27     3.1±0.04ms        ? ?/sec

Those are some pretty crazy improvements for Int32/Utf8 -- nice work

I wonder why the others don't show a similar improvement

This patch only optimizes the IN LIST path without static filters, which was benchmarked using in_list_cols.

For IN LIST with a static filter, it uses a hash set for matching and is unchanged in this patch.

Therefore, the in_list benchmarks do not show any improvement.

@adriangb adriangb added this pull request to the merge queue Feb 28, 2026
Merged via the queue into apache:main with commit acec058 Feb 28, 2026
32 checks passed
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 1, 2026

Thanks again @zhangxffff and @adriangb -- very nice

@alamb alamb added the performance Make DataFusion faster label Mar 1, 2026
github-merge-queue Bot pushed a commit that referenced this pull request Mar 5, 2026
…es (#20694)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #20428 .

## Rationale for this change

  Third PR in the IN list optimization series (split from #20428):
  - PR1: benchmarks (#20444, merged)
  - PR2: Arrow vectorized eq kernel (#20528, merged)
- **PR3 (this): short-circuit, collect_bool, and first-expr
initialization**

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?
- **Short-circuit break**: convert `try_fold` to `for` loop; when all
non-null rows are already `true`, skip remaining list items (up to 27x
faster for match=100%/nulls=0%)
- **`BooleanBuffer::collect_bool`**: use in `make_comparator` fallback
path for nested types instead `(0..n).map().collect()` (suggested by
@Dandandan in #20428 )
- **First-expr initialization**: evaluate the first list expression
directly as the accumulator, avoiding a redundant `or_kleene(all_false,
rhs)` (suggested by @Dandandan in #20428 )
- **Tests**: added 3 new tests covering short-circuit, short-circuit
with nulls, and struct column references (make_comparator fallback path)

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested? 
Yes, and add test to cover short-circuit, short-circuit with nulls, and
struct column references (make_comparator fallback path)

Benchmark result:
```
(zhangxffff) zhangxffff/datafusion@95d3d60664da ~/W/datafusion ((bcc52cd))> critcmp after before
group                                              after                                  before
-----                                              -----                                  ------
in_list_cols/Int32/list=28/match=0%/nulls=0%       1.02     93.8±1.80µs        ? ?/sec    1.00     91.8±1.52µs        ? ?/sec
in_list_cols/Int32/list=28/match=0%/nulls=20%      1.03    105.3±1.95µs        ? ?/sec    1.00    102.2±1.59µs        ? ?/sec
in_list_cols/Int32/list=28/match=100%/nulls=0%     1.00      3.4±0.07µs        ? ?/sec    27.14    91.7±1.52µs        ? ?/sec
in_list_cols/Int32/list=28/match=100%/nulls=20%    1.07    107.7±1.91µs        ? ?/sec    1.00    100.4±1.33µs        ? ?/sec
in_list_cols/Int32/list=28/match=50%/nulls=0%      1.00     50.1±1.15µs        ? ?/sec    1.84     92.4±1.36µs        ? ?/sec
in_list_cols/Int32/list=28/match=50%/nulls=20%     1.05    105.1±1.49µs        ? ?/sec    1.00    100.0±0.84µs        ? ?/sec
in_list_cols/Int32/list=3/match=0%/nulls=0%        1.00      9.9±0.17µs        ? ?/sec    1.01     10.1±0.19µs        ? ?/sec
in_list_cols/Int32/list=3/match=0%/nulls=20%       1.02     11.0±0.18µs        ? ?/sec    1.00     10.8±0.16µs        ? ?/sec
in_list_cols/Int32/list=3/match=100%/nulls=0%      1.00      3.3±0.06µs        ? ?/sec    2.95      9.9±0.16µs        ? ?/sec
in_list_cols/Int32/list=3/match=100%/nulls=20%     1.01     10.9±0.19µs        ? ?/sec    1.00     10.8±0.09µs        ? ?/sec
in_list_cols/Int32/list=3/match=50%/nulls=0%       1.00     10.0±0.17µs        ? ?/sec    1.00      9.9±0.18µs        ? ?/sec
in_list_cols/Int32/list=3/match=50%/nulls=20%      1.05     11.3±0.24µs        ? ?/sec    1.00     10.8±0.11µs        ? ?/sec
in_list_cols/Int32/list=8/match=0%/nulls=0%        1.02     26.7±0.58µs        ? ?/sec    1.00     26.2±0.50µs        ? ?/sec
in_list_cols/Int32/list=8/match=0%/nulls=20%       1.04     29.6±0.57µs        ? ?/sec    1.00     28.5±0.45µs        ? ?/sec
in_list_cols/Int32/list=8/match=100%/nulls=0%      1.00      3.4±0.05µs        ? ?/sec    7.78     26.2±0.36µs        ? ?/sec
in_list_cols/Int32/list=8/match=100%/nulls=20%     1.05     30.0±0.65µs        ? ?/sec    1.00     28.7±0.55µs        ? ?/sec
in_list_cols/Int32/list=8/match=50%/nulls=0%       1.03     26.7±0.59µs        ? ?/sec    1.00     26.0±0.37µs        ? ?/sec
in_list_cols/Int32/list=8/match=50%/nulls=20%      1.04     29.9±0.57µs        ? ?/sec    1.00     28.7±0.46µs        ? ?/sec
in_list_cols/Utf8/list=28/match=0%                 1.17    155.0±2.44µs        ? ?/sec    1.00    132.8±2.97µs        ? ?/sec
in_list_cols/Utf8/list=28/match=100%               1.02   726.6±14.54µs        ? ?/sec    1.00    712.4±9.09µs        ? ?/sec
in_list_cols/Utf8/list=28/match=50%                1.02  1070.1±13.06µs        ? ?/sec    1.00   1051.8±8.17µs        ? ?/sec
in_list_cols/Utf8/list=3/match=0%                  1.14     16.4±0.37µs        ? ?/sec    1.00     14.4±0.22µs        ? ?/sec
in_list_cols/Utf8/list=3/match=100%                1.02     68.0±1.29µs        ? ?/sec    1.00     66.5±0.99µs        ? ?/sec
in_list_cols/Utf8/list=3/match=50%                 1.15    107.6±2.05µs        ? ?/sec    1.00     93.6±1.88µs        ? ?/sec
in_list_cols/Utf8/list=8/match=0%                  1.16     44.0±0.61µs        ? ?/sec    1.00     37.9±0.95µs        ? ?/sec
in_list_cols/Utf8/list=8/match=100%                1.00    190.4±2.71µs        ? ?/sec    1.03    195.7±2.01µs        ? ?/sec
in_list_cols/Utf8/list=8/match=50%                 1.03    295.9±4.45µs        ? ?/sec    1.00    287.3±3.26µs        ? ?/sec
```

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
de-bgunter pushed a commit to de-bgunter/datafusion that referenced this pull request Mar 24, 2026
…es (apache#20528)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Relates to apache#20427 .

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

When the IN list contains column references (e.g. `SELECT * FROM t WHERE
a IN (b, c, d, e)`), DataFusion falls back to a row-by-row
`make_comparator` path which is significantly slower than it needs to
be. Arrow provides SIMD-optimized `eq` kernels that can compare entire
arrays in one call.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

- Use Arrow's vectorized `eq` kernel instead of row-by-row
`make_comparator` for non-nested types (primitive, string, binary) in
the column-reference IN list evaluation path
- For nested types (Struct, List, etc.), fall back to `make_comparator`
since Arrow's `eq` kernel does not support them
- Add 6 unit tests covering the column-reference evaluation path (Int32,
Utf8, NOT IN, NULL handling, NaN semantics)

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

  Yes. 6 new unit tests added:
  - `test_in_list_with_columns_int32_scalars`
  - `test_in_list_with_columns_int32_column_refs`
  - `test_in_list_with_columns_utf8_column_refs`
  - `test_in_list_with_columns_negated`
  - `test_in_list_with_columns_null_in_list`
  - `test_in_list_with_columns_float_nan`


## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No API changes. Queries with column-reference IN lists will run faster.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
de-bgunter pushed a commit to de-bgunter/datafusion that referenced this pull request Mar 24, 2026
…es (apache#20694)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#20428 .

## Rationale for this change

  Third PR in the IN list optimization series (split from apache#20428):
  - PR1: benchmarks (apache#20444, merged)
  - PR2: Arrow vectorized eq kernel (apache#20528, merged)
- **PR3 (this): short-circuit, collect_bool, and first-expr
initialization**

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?
- **Short-circuit break**: convert `try_fold` to `for` loop; when all
non-null rows are already `true`, skip remaining list items (up to 27x
faster for match=100%/nulls=0%)
- **`BooleanBuffer::collect_bool`**: use in `make_comparator` fallback
path for nested types instead `(0..n).map().collect()` (suggested by
@Dandandan in apache#20428 )
- **First-expr initialization**: evaluate the first list expression
directly as the accumulator, avoiding a redundant `or_kleene(all_false,
rhs)` (suggested by @Dandandan in apache#20428 )
- **Tests**: added 3 new tests covering short-circuit, short-circuit
with nulls, and struct column references (make_comparator fallback path)

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested? 
Yes, and add test to cover short-circuit, short-circuit with nulls, and
struct column references (make_comparator fallback path)

Benchmark result:
```
(zhangxffff) zhangxffff/datafusion@95d3d60664da ~/W/datafusion ((bcc52cd))> critcmp after before
group                                              after                                  before
-----                                              -----                                  ------
in_list_cols/Int32/list=28/match=0%/nulls=0%       1.02     93.8±1.80µs        ? ?/sec    1.00     91.8±1.52µs        ? ?/sec
in_list_cols/Int32/list=28/match=0%/nulls=20%      1.03    105.3±1.95µs        ? ?/sec    1.00    102.2±1.59µs        ? ?/sec
in_list_cols/Int32/list=28/match=100%/nulls=0%     1.00      3.4±0.07µs        ? ?/sec    27.14    91.7±1.52µs        ? ?/sec
in_list_cols/Int32/list=28/match=100%/nulls=20%    1.07    107.7±1.91µs        ? ?/sec    1.00    100.4±1.33µs        ? ?/sec
in_list_cols/Int32/list=28/match=50%/nulls=0%      1.00     50.1±1.15µs        ? ?/sec    1.84     92.4±1.36µs        ? ?/sec
in_list_cols/Int32/list=28/match=50%/nulls=20%     1.05    105.1±1.49µs        ? ?/sec    1.00    100.0±0.84µs        ? ?/sec
in_list_cols/Int32/list=3/match=0%/nulls=0%        1.00      9.9±0.17µs        ? ?/sec    1.01     10.1±0.19µs        ? ?/sec
in_list_cols/Int32/list=3/match=0%/nulls=20%       1.02     11.0±0.18µs        ? ?/sec    1.00     10.8±0.16µs        ? ?/sec
in_list_cols/Int32/list=3/match=100%/nulls=0%      1.00      3.3±0.06µs        ? ?/sec    2.95      9.9±0.16µs        ? ?/sec
in_list_cols/Int32/list=3/match=100%/nulls=20%     1.01     10.9±0.19µs        ? ?/sec    1.00     10.8±0.09µs        ? ?/sec
in_list_cols/Int32/list=3/match=50%/nulls=0%       1.00     10.0±0.17µs        ? ?/sec    1.00      9.9±0.18µs        ? ?/sec
in_list_cols/Int32/list=3/match=50%/nulls=20%      1.05     11.3±0.24µs        ? ?/sec    1.00     10.8±0.11µs        ? ?/sec
in_list_cols/Int32/list=8/match=0%/nulls=0%        1.02     26.7±0.58µs        ? ?/sec    1.00     26.2±0.50µs        ? ?/sec
in_list_cols/Int32/list=8/match=0%/nulls=20%       1.04     29.6±0.57µs        ? ?/sec    1.00     28.5±0.45µs        ? ?/sec
in_list_cols/Int32/list=8/match=100%/nulls=0%      1.00      3.4±0.05µs        ? ?/sec    7.78     26.2±0.36µs        ? ?/sec
in_list_cols/Int32/list=8/match=100%/nulls=20%     1.05     30.0±0.65µs        ? ?/sec    1.00     28.7±0.55µs        ? ?/sec
in_list_cols/Int32/list=8/match=50%/nulls=0%       1.03     26.7±0.59µs        ? ?/sec    1.00     26.0±0.37µs        ? ?/sec
in_list_cols/Int32/list=8/match=50%/nulls=20%      1.04     29.9±0.57µs        ? ?/sec    1.00     28.7±0.46µs        ? ?/sec
in_list_cols/Utf8/list=28/match=0%                 1.17    155.0±2.44µs        ? ?/sec    1.00    132.8±2.97µs        ? ?/sec
in_list_cols/Utf8/list=28/match=100%               1.02   726.6±14.54µs        ? ?/sec    1.00    712.4±9.09µs        ? ?/sec
in_list_cols/Utf8/list=28/match=50%                1.02  1070.1±13.06µs        ? ?/sec    1.00   1051.8±8.17µs        ? ?/sec
in_list_cols/Utf8/list=3/match=0%                  1.14     16.4±0.37µs        ? ?/sec    1.00     14.4±0.22µs        ? ?/sec
in_list_cols/Utf8/list=3/match=100%                1.02     68.0±1.29µs        ? ?/sec    1.00     66.5±0.99µs        ? ?/sec
in_list_cols/Utf8/list=3/match=50%                 1.15    107.6±2.05µs        ? ?/sec    1.00     93.6±1.88µs        ? ?/sec
in_list_cols/Utf8/list=8/match=0%                  1.16     44.0±0.61µs        ? ?/sec    1.00     37.9±0.95µs        ? ?/sec
in_list_cols/Utf8/list=8/match=100%                1.00    190.4±2.71µs        ? ?/sec    1.03    195.7±2.01µs        ? ?/sec
in_list_cols/Utf8/list=8/match=50%                 1.03    295.9±4.45µs        ? ?/sec    1.00    287.3±3.26µs        ? ?/sec
```

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Make DataFusion faster physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants