Support `convert_to_state` for `AVG` accumulator by alamb · Pull Request #11734 · apache/datafusion

alamb · 2024-07-30T21:48:31Z

Note: There are ~20 lines of code in this PR, the rest is docmentation and tests

Which issue does this PR close?

Closes Improve performance of Avg aggregate: implement convert_to_state #11816
Follow on Skipping partial aggregation when it is not helping for high cardinality aggregates #11627 from @korowa

Rationale for this change

To take advantage of the benefits of #11627 a new method must be implemented for each GroupsAccumulator.

At least one ClickBench query (the one in #6937) uses AVG so let's implement that

What changes are included in this PR?

Implement convert_to_state for AVG accumulator

Are these changes tested?

Yes with new unit tests

Performance benchmarks:

Clickbench on the whole looks better

│ QQuery 28    │ 14589.57ms │        15650.15ms │  1.07x slower │
│ QQuery 31    │  1648.84ms │         1351.42ms │ +1.22x faster │
│ QQuery 32    │  7674.18ms │         4546.37ms │ +1.69x faster │
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)           │ 88225.75ms │
│ Total Time (alamb_support_avg)   │ 85177.63ms │
│ Average Time (main_base)         │  2051.76ms │
│ Average Time (alamb_support_avg) │  1980.88ms │
│ Queries Faster                   │          2 │
│ Queries Slower                   │          1 │
│ Queries with No Change           │         40 │
└──────────────────────────────────┴────────────┘

Interestingly my benchmark shows Q31 and Q32 get faster (they both have avg):

datafusion/benchmarks/queries/clickbench/queries.sql

Lines 32 to 33 in 4e278ca

    
           SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits WHERE "SearchPhrase" <> '' GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10; 
        
           SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10;

But Q28 gets slower

datafusion/benchmarks/queries/clickbench/queries.sql

Line 29 in 4e278ca

    
           SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer") FROM hits WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;

Details

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_support_avg ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.65ms │            0.66ms │     no change │
│ QQuery 1     │    68.39ms │           69.21ms │     no change │
│ QQuery 2     │   123.19ms │          124.32ms │     no change │
│ QQuery 3     │   130.87ms │          129.52ms │     no change │
│ QQuery 4     │   975.14ms │          956.34ms │     no change │
│ QQuery 5     │  1073.94ms │         1051.31ms │     no change │
│ QQuery 6     │    65.20ms │           65.15ms │     no change │
│ QQuery 7     │    72.73ms │           74.63ms │     no change │
│ QQuery 8     │  1442.03ms │         1426.67ms │     no change │
│ QQuery 9     │  1359.96ms │         1341.42ms │     no change │
│ QQuery 10    │   453.35ms │          451.60ms │     no change │
│ QQuery 11    │   491.08ms │          487.39ms │     no change │
│ QQuery 12    │  1174.58ms │         1163.25ms │     no change │
│ QQuery 13    │  2175.10ms │         2105.75ms │     no change │
│ QQuery 14    │  1613.52ms │         1591.44ms │     no change │
│ QQuery 15    │  1100.45ms │         1090.39ms │     no change │
│ QQuery 16    │  2887.87ms │         2882.07ms │     no change │
│ QQuery 17    │  2821.88ms │         2801.39ms │     no change │
│ QQuery 18    │  5627.08ms │         5535.29ms │     no change │
│ QQuery 19    │   118.97ms │          118.65ms │     no change │
│ QQuery 20    │  1686.34ms │         1653.84ms │     no change │
│ QQuery 21    │  1998.87ms │         2017.09ms │     no change │
│ QQuery 22    │  4835.67ms │         4795.01ms │     no change │
│ QQuery 23    │ 11438.22ms │        11139.93ms │     no change │
│ QQuery 24    │   750.23ms │          754.42ms │     no change │
│ QQuery 25    │   671.80ms │          671.75ms │     no change │
│ QQuery 26    │   827.26ms │          834.02ms │     no change │
│ QQuery 27    │  2530.97ms │         2520.82ms │     no change │
│ QQuery 28    │ 14589.57ms │        15650.15ms │  1.07x slower │
│ QQuery 29    │   573.58ms │          562.79ms │     no change │
│ QQuery 30    │  1299.57ms │         1296.16ms │     no change │
│ QQuery 31    │  1648.84ms │         1351.42ms │ +1.22x faster │
│ QQuery 32    │  7674.18ms │         4546.37ms │ +1.69x faster │
│ QQuery 33    │  5072.36ms │         5086.50ms │     no change │
│ QQuery 34    │  5030.77ms │         5034.60ms │     no change │
│ QQuery 35    │  1854.58ms │         1829.27ms │     no change │
│ QQuery 36    │   320.08ms │          314.68ms │     no change │
│ QQuery 37    │   218.66ms │          218.22ms │     no change │
│ QQuery 38    │   189.78ms │          196.34ms │     no change │
│ QQuery 39    │   977.91ms │          977.56ms │     no change │
│ QQuery 40    │    85.17ms │           85.73ms │     no change │
│ QQuery 41    │    79.63ms │           78.92ms │     no change │
│ QQuery 42    │    95.74ms │           95.59ms │     no change │
└──────────────┴────────────┴───────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)           │ 88225.75ms │
│ Total Time (alamb_support_avg)   │ 85177.63ms │
│ Average Time (main_base)         │  2051.76ms │
│ Average Time (alamb_support_avg) │  1980.88ms │
│ Queries Faster                   │          2 │
│ Queries Slower                   │          1 │
│ Queries with No Change           │         40 │
└──────────────────────────────────┴────────────┘

--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ alamb_support_avg ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  247.69ms │          229.35ms │ +1.08x faster │
│ QQuery 2     │  126.01ms │          126.86ms │     no change │
│ QQuery 3     │  125.02ms │          129.29ms │     no change │
│ QQuery 4     │   94.59ms │           88.65ms │ +1.07x faster │
│ QQuery 5     │  172.20ms │          175.13ms │     no change │
│ QQuery 6     │   59.18ms │           59.55ms │     no change │
│ QQuery 7     │  204.27ms │          208.79ms │     no change │
│ QQuery 8     │  156.08ms │          163.42ms │     no change │
│ QQuery 9     │  253.29ms │          254.54ms │     no change │
│ QQuery 10    │  227.42ms │          232.69ms │     no change │
│ QQuery 11    │   99.08ms │           98.47ms │     no change │
│ QQuery 12    │  146.89ms │          138.62ms │ +1.06x faster │
│ QQuery 13    │  290.02ms │          291.07ms │     no change │
│ QQuery 14    │   82.66ms │           81.43ms │     no change │
│ QQuery 15    │  116.98ms │          135.97ms │  1.16x slower │
│ QQuery 16    │   88.83ms │           88.72ms │     no change │
│ QQuery 17    │  232.60ms │          218.48ms │ +1.06x faster │
│ QQuery 18    │  330.29ms │          330.22ms │     no change │
│ QQuery 19    │  149.17ms │          161.32ms │  1.08x slower │
│ QQuery 20    │  139.35ms │          137.98ms │     no change │
│ QQuery 21    │  280.84ms │          263.72ms │ +1.06x faster │
│ QQuery 22    │   65.21ms │           65.46ms │     no change │
└──────────────┴───────────┴───────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main_base)           │ 3687.65ms │
│ Total Time (alamb_support_avg)   │ 3679.71ms │
│ Average Time (main_base)         │  167.62ms │
│ Average Time (alamb_support_avg) │  167.26ms │
│ Queries Faster                   │         5 │
│ Queries Slower                   │         2 │
│ Queries with No Change           │        15 │
└──────────────────────────────────┴───────────┘

Are there any user-facing changes?

Faster performance

alamb · 2024-07-31T12:29:23Z

There is something strange going on with AVG in this query -- it is giving different answers when convert to state is enabled vs not. Maybe it is due to float rounding, but I am not confident

korowa · 2024-07-31T17:17:36Z

There is something strange going on with AVG in this query -- it is giving different answers when convert to state is enabled vs not. Maybe it is due to float rounding, but I am not confident

Kind of expected -- result set is sorted by COUNT(*) which equals 2 for only 4 records, and 1 for all other records, so this ordering may be considered as nondeterministic.

I've got different results after two consecutive runs even on main branch (without any skipped aggregation).

alamb · 2024-07-31T17:33:38Z

I debugged this a bit more and I think the issue may be that AVG uses Float64 internally to accumulate the sum, and since this column has giant integers that can't fit into a float64 precisely, the order of operations affects the final output. I will update the test to use a column other than this giant int64 column I think

korowa · 2024-07-31T18:14:41Z

Oh, the AVG values themselves, got it.

alamb · 2024-08-05T18:45:10Z

This PR is failing like the following when running on benchmarks on TPCH. I think there may be a bug related to types in the intermediates. I will keep debugging

Query 19 iteration 3 took 161.5 ms and returned 1 rows
Query 19 iteration 4 took 171.7 ms and returned 1 rows
Query 19 avg time: 164.29 ms
Error: External(ArrowError(InvalidArgumentError("column types must match schema types, expected Decimal128(25, 2) but fou\
nd Decimal128(38, 10) at column index 2"), None))

Update: turns out it is q18:

(venv) andrewlamb@Andrews-MBP-2:~/Software/datafusion/benchmarks/data/tpch_sf1$ /Users/andrewlamb/Software/datafusion/datafusion-cli/target/debug/datafusion-cli -f ../../queries/q18.sql

Update: filed real issue here #11832

korowa · 2024-08-08T05:08:09Z

LGTM, thank you @alamb.

Regarding q28 slowdown from PR description -- I suppose it's not a stable slowdown, and just a result on single benchmark run (since the regexp over Referrer field in the query doesn't seem to produce high enough cardinality to skip partial aggregation)?

alamb · 2024-08-08T11:22:14Z

LGTM, thank you @alamb.

Regarding q28 slowdown from PR description -- I suppose it's not a stable slowdown, and just a result on single benchmark run (since the regexp over Referrer field in the query doesn't seem to produce high enough cardinality to skip partial aggregation)?

That is my understanding too. I also have high hopes that the StringView work will make that query in particualr faster as well

2010YOUY01

This looks great, thank you

2010YOUY01 · 2024-08-09T13:34:36Z

+/// │false│    │      │NULL │            │NULL │
+/// │false│           │true │            │true │
+/// └─────┘           └─────┘            └─────┘
+/// array           opt_filter           output nulls


Looks like output nulls has typo, should be false; true; false; false; fasle?

Yes you are correct -- thank you for catching that. I fixed it in 149406b

…or/nulls.rs

comphead

lgtm thanks @alamb

comphead · 2024-08-12T17:40:35Z

 4 11 14
 5 8 7

+# Test avg for tinyint / float


just wondering why the test is only for tinyint / floats?

The idea was that AVG the accumulator is already tested elsewhere -- this test is only to exercise the partial aggregate skipping logic

alamb · 2024-08-12T21:33:50Z

Thanks @comphead -- we are making progress here slowly -- but I am pretty stoked to see it 🚀

Dandandan · 2024-08-13T08:18:39Z

Nice work 🎉

alamb · 2024-08-14T19:55:44Z

I am very excited to get the StringView work (#11752) done and enabled -- and then rerun the clickbench benchmarks again for DataFusion. 🚀

@Rachelint is also working on some good stuff.

## Which issue does this PR close? - Part of apache#17964. ## Rationale for this change SparkAvg's AvgGroupsAccumulator doesn't implement supports_convert_to_state (defaults to false), which prevents the skip-partial-aggregation optimization from kicking in for queries that use Spark's avg(). I ran into this while benchmarking a Spark Connect engine built on DataFusion. On TPC-H q17 at SF10, the partial aggregate for avg(l_quantity) grouped by l_partkey (~2M groups out of 60M rows) was not triggering skip-aggregation: | Metric | Without convert_to_state | With convert_to_state | |--------|-------------------------|-----------------------| | Partial aggregate memory | 923 MB | 40 MB | | Partial aggregate elapsed | 4.75s | 109ms | The skip-aggregation probe (apache#11627) detects when a partial aggregate isn't reducing cardinality and falls back to passing rows through as state directly. This needs convert_to_state so the accumulator can produce [sum, count] state arrays from raw input. The built-in Avg already has this (apache#11734), but it wasn't carried over when SparkAvg was migrated from Comet in apache#17871. ## What changes are included in this PR? Adds convert_to_state() and supports_convert_to_state() to AvgGroupsAccumulator in datafusion-spark. Follows the same approach as the built-in Avg, adapted for SparkAvg's differences: - State order is [sum, count] (vs [count, sum] in the built-in) - Count type is Int64 (vs UInt64 in the built-in) - Null handling uses NullBuffer::union directly instead of pulling in datafusion-functions-aggregate-common as a dep Also cleaned up the fully-qualified arrow::array::BooleanArray references in update_batch / merge_batch since adding BooleanArray to the import block triggered the unused_qualifications lint. ## Are these changes tested? Yes, unit tests covering basic conversion, null propagation, filter handling, and a roundtrip through merge_batch to verify the converted state produces correct results end-to-end. ## Are there any user-facing changes? No. Queries using avg() through the Spark function registry will automatically benefit from skip-partial-aggregation on high-cardinality groupings. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

alamb marked this pull request as draft July 30, 2024 21:48

alamb mentioned this pull request Jul 30, 2024

Skipping partial aggregation when it is not helping for high cardinality aggregates #11627

Merged

github-actions Bot added documentation Improvements or additions to documentation logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Jul 30, 2024

alamb force-pushed the alamb/support_avg branch from ee5ac1c to a8b5a05 Compare July 31, 2024 11:29

alamb marked this pull request as ready for review July 31, 2024 11:37

alamb marked this pull request as draft July 31, 2024 11:37

alamb force-pushed the alamb/support_avg branch from a8b5a05 to 7020dcf Compare July 31, 2024 12:09

alamb mentioned this pull request Aug 5, 2024

Improve performance of Avg aggregate: implement convert_to_state #11816

Closed

alamb force-pushed the alamb/support_avg branch from 7020dcf to 42daa94 Compare August 5, 2024 13:41

github-actions Bot removed documentation Improvements or additions to documentation logical-expr Logical plan and expressions labels Aug 5, 2024

alamb marked this pull request as ready for review August 5, 2024 13:47

alamb force-pushed the alamb/support_avg branch from 42daa94 to b60e1aa Compare August 5, 2024 14:58

alamb marked this pull request as draft August 5, 2024 15:00

alamb mentioned this pull request Aug 5, 2024

Use filtered_null_mask in CountGroupsAccumulator and PrimitiveGroupsAccumulator #11825

Closed

3 tasks

alamb force-pushed the alamb/support_avg branch from b60e1aa to f05c4cd Compare August 5, 2024 16:11

alamb force-pushed the alamb/support_avg branch from f05c4cd to e3eb80f Compare August 5, 2024 22:06

Support convert_to_state for AVG accumulator

f3bedc0

alamb force-pushed the alamb/support_avg branch from 0c4df2e to f3bedc0 Compare August 6, 2024 10:58

alamb marked this pull request as ready for review August 6, 2024 12:17

korowa reviewed Aug 7, 2024

View reviewed changes

Comment thread datafusion/physical-expr-common/src/aggregate/groups_accumulator/nulls.rs Outdated

korowa approved these changes Aug 8, 2024

View reviewed changes

2010YOUY01 approved these changes Aug 9, 2024

View reviewed changes

alamb added 6 commits August 11, 2024 10:43

Update datafusion/physical-expr-common/src/aggregate/groups_accumulat…

92decac

…or/nulls.rs

fix documentation

149406b

Merge remote-tracking branch 'apache/main' into alamb/support_avg

e725150

Fix after merge

6186171

Merge remote-tracking branch 'apache/main' into alamb/support_avg

ba3f00c

fix for change in location

8d4ea5b

github-actions Bot added the functions Changes to functions implementation label Aug 12, 2024

comphead approved these changes Aug 12, 2024

View reviewed changes

alamb merged commit 00ef820 into apache:main Aug 12, 2024

alamb deleted the alamb/support_avg branch August 14, 2024 19:54

azhangd mentioned this pull request Apr 11, 2026

perf: implement convert_to_state for SparkAvg #21548

Merged

	SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits WHERE "SearchPhrase" <> '' GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10;
	SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10;

Conversation

alamb commented Jul 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb commented Jul 31, 2024

Uh oh!

korowa commented Jul 31, 2024

Uh oh!

alamb commented Jul 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

korowa commented Jul 31, 2024

Uh oh!

alamb commented Aug 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

korowa commented Aug 8, 2024

Uh oh!

alamb commented Aug 8, 2024

Uh oh!

2010YOUY01 left a comment

Choose a reason for hiding this comment

Uh oh!

2010YOUY01 Aug 9, 2024

Choose a reason for hiding this comment

Uh oh!

alamb Aug 11, 2024

Choose a reason for hiding this comment

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

comphead Aug 12, 2024

Choose a reason for hiding this comment

Uh oh!

alamb Aug 12, 2024

Choose a reason for hiding this comment

Uh oh!

alamb commented Aug 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dandandan commented Aug 13, 2024

Uh oh!

alamb commented Aug 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

alamb commented Jul 30, 2024 •

edited

Loading

alamb commented Jul 31, 2024 •

edited

Loading

alamb commented Aug 5, 2024 •

edited

Loading

alamb commented Aug 12, 2024 •

edited

Loading