Skip to content

[AURON #2182] Implement native support for percent_rank window function#2204

Open
weimingdiit wants to merge 1 commit intoapache:masterfrom
weimingdiit:feat/native-window-percent_rank
Open

[AURON #2182] Implement native support for percent_rank window function#2204
weimingdiit wants to merge 1 commit intoapache:masterfrom
weimingdiit:feat/native-window-percent_rank

Conversation

@weimingdiit
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #2182

Rationale for this change

Auron does not currently support percent_rank() in native window execution, so queries using this function fall back to Spark. This leaves a gap in native window function coverage.

percent_rank() cannot be implemented with the existing streaming-style window processors alone, because its result depends on both the row rank and the total number of rows in the partition. To match Spark semantics, the native engine needs to evaluate it with full-partition context.

What changes are included in this PR?

This PR adds native support for percent_rank() window function end to end.
The main changes are:

  • Add PERCENT_RANK to the window function protobuf and planner conversion path so Spark plans can be serialized into native plans correctly.
  • Extend NativeWindowBase to recognize Spark's PercentRank expression and convert it to the native window function enum.
  • Introduce a native PercentRankProcessor that computes percent rank with Spark-compatible semantics:
    • rows in the same peer group share the same rank
    • the result is (rank - 1) / (partition_size - 1)
    • single-row partitions return 0.0
  • Add a full-partition execution path in native window execution for functions that require complete partition context, and use that path for percent_rank().
  • Add tests on both the native execution side and the Spark SQL side to verify correctness.

Are there any user-facing changes?

Yes.

Queries using percent_rank() window function can now stay on the native execution path instead of falling back to Spark, as long as the rest of the plan is supported by Auron. No user-facing configuration changes are introduced.

How was this patch tested?

CI.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds end-to-end native execution support for Spark SQL’s percent_rank() window function in Auron, including Spark plan conversion, native planner/proto wiring, native execution, and coverage tests so queries can remain on the native path.

Changes:

  • Add PERCENT_RANK to the window function protobuf enum and native planner mapping.
  • Extend Spark-side window expression conversion to recognize PercentRank and serialize it into native plans.
  • Implement native PercentRankProcessor and introduce a “full-partition” execution path in the native window executor, plus add Spark- and native-level tests.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeWindowBase.scala Adds Spark PercentRank expression conversion to native window protobuf.
spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronQuerySuite.scala Adds Spark SQL test asserting percent_rank() stays native and matches Spark results.
native-engine/datafusion-ext-plans/src/window_exec.rs Adds full-partition processing path and a native unit test for percent-rank.
native-engine/datafusion-ext-plans/src/window/window_context.rs Adds requires_full_partition() helper to drive execution strategy selection.
native-engine/datafusion-ext-plans/src/window/processors/percent_rank_processor.rs Introduces native percent-rank computation processor.
native-engine/datafusion-ext-plans/src/window/processors/mod.rs Exposes the new percent-rank processor module.
native-engine/datafusion-ext-plans/src/window/mod.rs Wires PercentRank into window function enum/dispatch and flags it as requiring full partition.
native-engine/auron-planner/src/planner.rs Maps protobuf PercentRank to native WindowFunction::PercentRank.
native-engine/auron-planner/proto/auron.proto Adds PERCENT_RANK to the WindowFunction enum.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread native-engine/datafusion-ext-plans/src/window_exec.rs
Comment thread native-engine/datafusion-ext-plans/src/window_exec.rs
@weimingdiit weimingdiit force-pushed the feat/native-window-percent_rank branch 2 times, most recently from f20933f to b366278 Compare April 23, 2026 15:57
…function

Signed-off-by: weimingdiit <weimingdiit@gmail.com>
@weimingdiit weimingdiit force-pushed the feat/native-window-percent_rank branch from b366278 to cdfb75f Compare April 24, 2026 06:41
Copy link
Copy Markdown
Contributor

@slfan1989 slfan1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weimingdiit Thanks for the contribution!

Overall implementation looks correct: percent_rank semantics align with rank-style window functions, and the `partition-by streaming handling is well done.

windowExprBuilder.setFuncType(pb.WindowFunctionType.Window)
windowExprBuilder.setWindowFunc(pb.WindowFunction.DENSE_RANK)

case e: PercentRank =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PercentRank should require ORDER BY. If orderSpec is empty, native percent_rank will treat all rows as peers and return 0.0 for the whole partition. Can we assert orderSpec.nonEmpty here (or fail fast) to avoid silent wrong results if upstream validation regresses?

}
}

impl WindowFunctionProcessor for PercentRankProcessor {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

percent_rank currently derives ordering from WindowContext.get_order_rows(); if order_spec is empty, all rows compare equal and percent_rank becomes 0.0. This is why an ORDER BY guard is important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement native support for percent_rank window function

3 participants