[AURON #2182] Implement native support for percent_rank window function#2204
[AURON #2182] Implement native support for percent_rank window function#2204weimingdiit wants to merge 1 commit intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds end-to-end native execution support for Spark SQL’s percent_rank() window function in Auron, including Spark plan conversion, native planner/proto wiring, native execution, and coverage tests so queries can remain on the native path.
Changes:
- Add
PERCENT_RANKto the window function protobuf enum and native planner mapping. - Extend Spark-side window expression conversion to recognize
PercentRankand serialize it into native plans. - Implement native
PercentRankProcessorand introduce a “full-partition” execution path in the native window executor, plus add Spark- and native-level tests.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeWindowBase.scala | Adds Spark PercentRank expression conversion to native window protobuf. |
| spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronQuerySuite.scala | Adds Spark SQL test asserting percent_rank() stays native and matches Spark results. |
| native-engine/datafusion-ext-plans/src/window_exec.rs | Adds full-partition processing path and a native unit test for percent-rank. |
| native-engine/datafusion-ext-plans/src/window/window_context.rs | Adds requires_full_partition() helper to drive execution strategy selection. |
| native-engine/datafusion-ext-plans/src/window/processors/percent_rank_processor.rs | Introduces native percent-rank computation processor. |
| native-engine/datafusion-ext-plans/src/window/processors/mod.rs | Exposes the new percent-rank processor module. |
| native-engine/datafusion-ext-plans/src/window/mod.rs | Wires PercentRank into window function enum/dispatch and flags it as requiring full partition. |
| native-engine/auron-planner/src/planner.rs | Maps protobuf PercentRank to native WindowFunction::PercentRank. |
| native-engine/auron-planner/proto/auron.proto | Adds PERCENT_RANK to the WindowFunction enum. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f20933f to
b366278
Compare
…function Signed-off-by: weimingdiit <weimingdiit@gmail.com>
b366278 to
cdfb75f
Compare
slfan1989
left a comment
There was a problem hiding this comment.
@weimingdiit Thanks for the contribution!
Overall implementation looks correct: percent_rank semantics align with rank-style window functions, and the `partition-by streaming handling is well done.
| windowExprBuilder.setFuncType(pb.WindowFunctionType.Window) | ||
| windowExprBuilder.setWindowFunc(pb.WindowFunction.DENSE_RANK) | ||
|
|
||
| case e: PercentRank => |
There was a problem hiding this comment.
PercentRank should require ORDER BY. If orderSpec is empty, native percent_rank will treat all rows as peers and return 0.0 for the whole partition. Can we assert orderSpec.nonEmpty here (or fail fast) to avoid silent wrong results if upstream validation regresses?
| } | ||
| } | ||
|
|
||
| impl WindowFunctionProcessor for PercentRankProcessor { |
There was a problem hiding this comment.
percent_rank currently derives ordering from WindowContext.get_order_rows(); if order_spec is empty, all rows compare equal and percent_rank becomes 0.0. This is why an ORDER BY guard is important.
Which issue does this PR close?
Closes #2182
Rationale for this change
Auron does not currently support
percent_rank()in native window execution, so queries using this function fall back to Spark. This leaves a gap in native window function coverage.percent_rank()cannot be implemented with the existing streaming-style window processors alone, because its result depends on both the row rank and the total number of rows in the partition. To match Spark semantics, the native engine needs to evaluate it with full-partition context.What changes are included in this PR?
This PR adds native support for
percent_rank()window function end to end.The main changes are:
PERCENT_RANKto the window function protobuf and planner conversion path so Spark plans can be serialized into native plans correctly.NativeWindowBaseto recognize Spark'sPercentRankexpression and convert it to the native window function enum.PercentRankProcessorthat computes percent rank with Spark-compatible semantics:(rank - 1) / (partition_size - 1)0.0percent_rank().Are there any user-facing changes?
Yes.
Queries using
percent_rank()window function can now stay on the native execution path instead of falling back to Spark, as long as the rest of the plan is supported by Auron. No user-facing configuration changes are introduced.How was this patch tested?
CI.