fix(fsst): pick i32 vs i64 codes offsets per call#7836
fix(fsst): pick i32 vs i64 codes offsets per call#7836connortsui20 wants to merge 5 commits intodevelopfrom
Conversation
Polar Signals Profiling ResultsLatest Run
Previous Runs (1)
Powered by Polar Signals Cloud |
Merging this PR will degrade performance by 25%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | new_bp_prim_test_between[i32, 32768] |
141.1 µs | 170.7 µs | -17.31% |
| ❌ | Simulation | new_bp_prim_test_between[i32, 16384] |
94.8 µs | 109.9 µs | -13.73% |
| ❌ | Simulation | new_bp_prim_test_between[i16, 32768] |
120.5 µs | 134.8 µs | -10.63% |
| ❌ | Simulation | new_bp_prim_test_between[i64, 16384] |
115.3 µs | 145.3 µs | -20.67% |
| ❌ | Simulation | new_bp_prim_test_between[i64, 32768] |
178.1 µs | 237.5 µs | -25% |
| ❌ | Simulation | new_alp_prim_test_between[f64, 16384] |
127.1 µs | 149.7 µs | -15.1% |
Comparing claude/move-fsst-regression-test-7724L (e618376) with develop (d0d9a8b)
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.022x ➖ datafusion / vortex-file-compressed (1.022x ➖, 0↑ 1↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.067x ➖, 0↑ 4↓)
datafusion / vortex-compact (1.051x ➖, 0↑ 0↓)
datafusion / parquet (1.058x ➖, 0↑ 2↓)
datafusion / arrow (1.083x ➖, 0↑ 7↓)
duckdb / vortex-file-compressed (1.085x ➖, 0↑ 8↓)
duckdb / vortex-compact (1.065x ➖, 0↑ 3↓)
duckdb / parquet (1.041x ➖, 1↑ 3↓)
duckdb / duckdb (1.050x ➖, 0↑ 2↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMEFile Size Changes (195 files changed, -98.4% overall, 0↑ 195↓)
Totals:
|
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.014x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.974x ➖, 1↑ 0↓)
datafusion / parquet (0.986x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.979x ➖, 1↑ 1↓)
duckdb / vortex-compact (0.994x ➖, 1↑ 0↓)
duckdb / parquet (1.014x ➖, 0↑ 1↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.007x ➖, 0↑ 2↓)
datafusion / vortex-compact (1.000x ➖, 2↑ 1↓)
datafusion / parquet (0.987x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (1.008x ➖, 4↑ 3↓)
duckdb / vortex-compact (1.006x ➖, 1↑ 3↓)
duckdb / parquet (1.009x ➖, 0↑ 0↓)
duckdb / duckdb (1.010x ➖, 0↑ 2↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.093x ➖, 0↑ 1↓)
datafusion / vortex-compact (0.926x ➖, 1↑ 0↓)
datafusion / parquet (1.008x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.977x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.966x ➖, 0↑ 0↓)
duckdb / parquet (1.004x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (1.060x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.030x ➖, 0↑ 1↓)
duckdb / parquet (1.025x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsNo file size changes detected. |
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.028x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.029x ➖, 0↑ 0↓)
datafusion / parquet (1.038x ➖, 0↑ 1↓)
datafusion / arrow (1.038x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.043x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.032x ➖, 0↑ 0↓)
duckdb / parquet (1.024x ➖, 0↑ 0↓)
duckdb / duckdb (1.024x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
Benchmarks: Random AccessVortex (geomean): 0.981x ➖ unknown / unknown (0.995x ➖, 2↑ 0↓)
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.074x ➖, 0↑ 3↓)
datafusion / vortex-compact (1.023x ➖, 0↑ 1↓)
datafusion / parquet (0.992x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.027x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.024x ➖, 0↑ 0↓)
duckdb / parquet (1.034x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.087x ➖, 1↑ 22↓)
datafusion / parquet (1.141x ❌, 0↑ 29↓)
duckdb / vortex-file-compressed (1.048x ➖, 1↑ 12↓)
duckdb / parquet (1.079x ➖, 0↑ 15↓)
duckdb / duckdb (1.039x ➖, 0↑ 8↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: CompressionVortex (geomean): 1.006x ➖ unknown / unknown (0.976x ➖, 10↑ 5↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.043x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.982x ➖, 1↑ 2↓)
datafusion / parquet (1.037x ➖, 1↑ 1↓)
duckdb / vortex-file-compressed (1.020x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.032x ➖, 0↑ 0↓)
duckdb / parquet (1.002x ➖, 0↑ 0↓)
Full attributed analysis
|
|
@mprammer seems to work well |
Adds an `#[ignore]`d regression test for #7833 to the existing `encodings/fsst/src/tests.rs`. The test allocates ~5 GiB total, so it is opt-in via `--ignored`: cargo test --release -p vortex-fsst -- --ignored fsst_compress_offsets This is an alternative to #7832 that keeps the test alongside the other FSST tests instead of introducing a new module, and avoids the `test-with` dev-dependency. Signed-off-by: Claude <noreply@anthropic.com>
`fsst_compress_iter` previously hardcoded `VarBinBuilder::<i32>` for the FSST output, panicking once cumulative compressed bytes crossed `i32::MAX`. Switch to `VarBinBuilder::<i64>` so large inputs compress without overflow. The `FSSTMetadata.codes_offsets_ptype` field already records the offset PType, so existing serialized arrays continue to deserialize unchanged. Widening exposed a latent bug in `VarBin::compare`: with i64 offsets, the LHS is converted to Arrow `LargeBinary`/`LargeUtf8` (per `preferred_arrow_type`), but the RHS scalar was hardcoded to `Binary`/ `Utf8`. Arrow refuses `LargeBinary == Binary`. The RHS now picks the matching Arrow type from the LHS Datum. The previously-ignored regression test `fsst_compress_offsets_overflow_i32` now passes when run with `--ignored`. It still allocates ~5 GiB and stays `#[ignore]`d. Signed-off-by: Claude <noreply@anthropic.com>
2cf4232 to
b9becd9
Compare
|
Can we please edit these descriptions |
`fsst_compress` now does an upfront pass over the input to compute the total uncompressed byte count, then picks `VarBinBuilder<i32>` when the FSST upper-bound compressed size (`2 * total + 7 * n`) fits in `i32::MAX`, falling back to `VarBinBuilder<i64>` only for inputs that would actually overflow (the case from #7833). This restores the compact i32 layout for the common case while still avoiding the overflow. `fsst_compress_iter` (single-pass, iterator-only API) keeps its signature and now always uses i64, since it can't size-estimate without consuming the iterator. Direct callers are test-only. Replace the previous ~5 GiB `#[ignore]`d regression test with three boundary unit tests of the new `upper_bound_fits_i32` helper plus a small round-trip test that asserts a fresh FSST array keeps i32 `codes_offsets` for typical inputs. Signed-off-by: Claude <noreply@anthropic.com>
|
So this new approach just collects the sum of the lengths of the strings to figure out which builder to use. I am happy to clean this up later, but @mprammer since this is your issue feel free to pick this branch up. Seems to work well, but I don't have time to dig into this. |
| // guaranteed to fit; otherwise we widen to `i64` to avoid the overflow | ||
| // tracked in #7833. The upfront pass over the iterator is cheap | ||
| // relative to the compression pass. | ||
| let total_uncompressed = |
There was a problem hiding this comment.
Is the signature wrong?
There was a problem hiding this comment.
Would changing the method sig help here?
Per review: callers virtually always already know the total uncompressed byte count (e.g. `varbin.bytes().len()` for `VarBinArray`, `views().iter().map(|v| v.len()).sum()` for `VarBinViewArray`), so the extra upfront iterator pass inside `fsst_compress` was wasted work. `fsst_compress` and `fsst_compress_iter` now take `total_uncompressed: usize` after `len`, and use it to dispatch i32 vs i64 codes offsets via the same `upper_bound_fits_i32` rule. All in-tree callers updated. Public API lock files refreshed. Signed-off-by: Claude <noreply@anthropic.com>
The link triggered `rustdoc::private_intra_doc_links` and failed the docs CI check. Replaced with an inline description of the bound. Signed-off-by: Claude <noreply@anthropic.com>
|
I dont think this is right, I think claude doesn't really know what it is doing. I think that just having i64 always is really the best solution here, and as the benchmarks showed before it had 0 affect on perf and no file changes |
Fixes #7833.
fsst_compress_iterpreviously hardcodedVarBinBuilder::<i32>for theFSST output, panicking once cumulative compressed bytes crossed
i32::MAX.Approach
fsst_compressnow does an upfront pass over the input to compute thetotal uncompressed byte count, then picks
VarBinBuilder<i32>when theFSST upper-bound compressed size (
2 * total + 7 * n) fits ini32::MAX, falling back toVarBinBuilder<i64>only when it wouldactually overflow. Common case keeps the compact i32 layout; pathological
inputs are correctly handled.
fsst_compress_iter(single-pass iterator API) keeps its publicsignature and now always uses i64. It can't size-estimate without
consuming the iterator, and direct callers are test-only.
vortex-array/src/arrays/varbin/compute/compare.rs: latent bug surfacedby the i64 path. With i64 offsets,
Datum::try_newproduces an ArrowLargeBinary/LargeUtf8, but the RHS scalar was hardcoded toBinary/Utf8. Arrow refusesLargeBinary == Binary. The RHS nowpicks the matching Arrow scalar from
lhs.data_type().Tests
The previous ~5 GiB
#[ignore]d regression test is replaced with:upper_bound_fits_i32helper.i32codes_offsetsfor typical inputs.The i64 path is exercised only for inputs whose worst-case compressed
size exceeds
i32::MAX, which is too expensive to test directly; theboundary unit tests cover the dispatch.
Checks
cargo test -p vortex-fsst --lib— 83 passed (was 78 before this PR).cargo test -p vortex-array --lib arrays::varbin— 93 passed.cargo test -p vortex-btrblocks --lib— 35 passed.cargo +nightly fmt -- --check,cargo clippy -p vortex-fsst -p vortex-array --all-targets --all-features— clean../scripts/public-api.sh— no public-api.lock changes.🤖 Generated with Claude Code