[AURON #2170] Correctness Testing: All Spark Versions - Add Aggregate operator related tests #2213
Draft
ShreyeshArangath wants to merge 6 commits intoapache:masterfrom
Draft
Conversation
Introduce empty test modules for Spark 3.1/3.2/3.4/3.5/4.0/4.1 alongside the existing spark33 module. Each module ships only a Maven pom and an empty AuronSparkTestSettings stub so that profile activation and the reflection lookup in common/SparkTestSettings both succeed. Per-area suites (Aggregate/Sort/Parquet/Functions/Expressions) will land in separate follow-up PRs tracked under apache#2170-apache#2174.
…ests across all versions Wire up a new spark-tests.yml workflow that exercises the auron-spark-tests module for every supported Spark profile (3.1/3.2/3.3/3.4/3.5/4.0/4.1) using the JDK+Scala combos already validated in tpcds.yml. Build step installs the Auron extension + spark-tests modules with tests skipped, then a scoped `mvn test` targets only auron-spark-tests/common + the per-version submodule so the job does not redundantly re-run every other module's tests.
…/3.4/3.5/4.0/4.1 Mirror the three aggregate suites from spark33 (AuronDataFrameAggregateSuite, AuronDatasetAggregatorSuite, AuronTypedImperativeAggregateSuite) and wire them into each per-version AuronSparkTestSettings with the same exclude list (collect functions prefix, SPARK-19471 overridden locally, SPARK-24788) so the matrix CI exercises aggregates on every supported Spark profile.
…ests-common The common test module compiles against every Spark version we support, but it called several APIs that were reshaped in Spark 4: * `Column.apply(Expression)` was removed — the classic module now exposes it as `ExpressionUtils.column(expr)`. * `SparkSession.internalCreateDataFrame` lives on `classic.SparkSession` in 4.x and requires the `isStreaming` argument. * `DataFrame.logicalPlan` is no longer on the `api.Dataset` trait, and the `SQLExecution.withSQLConfPropagated` overload now takes a `classic.SparkSession` rather than the abstract `SparkSession`. Wrap the two Spark-4-only calls in `@sparkver` helpers so the right implementation is emitted under each profile, switch to `df.queryExecution.logical` / `df.queryExecution.sparkSession` (both public on `QueryExecution` across every supported version and returning the concrete session type in 4.x), and pull in the `spark-version-annotation-macros` dependency the helpers need.
…e aggregates Three ported DataFrameAggregateSuite tests fail not because of a regression but because they assert on Spark-specific internals that Auron's native aggregation deliberately replaces: * Spark 3.2 SPARK-34837 (`avg` on ANSI intervals) emits invalid Java when Spark's HashAggregate codegen consumes values produced by Auron's native project; later Spark versions avoid this path. * Spark 3.5 SPARK-16484 negative tests assert the thrown error implements `SparkThrowable`, but `SparkUDAFWrapper` surfaces UDAF failures as `RuntimeException`. * Spark 3.5 SPARK-43876 greps for `public class hashAgg_FastHashMap_0` in the WholeStageCodegen output, which never exists when the aggregate runs natively. Exclude these three tests in the relevant per-version `AuronSparkTestSettings`, matching the existing precedent for the SPARK-19471 / SPARK-24788 cases.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #2170
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?
How was this patch tested?