Support PPL timewrap command #5241
Conversation
PR Reviewer Guide 🔍(Review updated until commit 41f00ee)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 41f00ee Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 026f481
Suggestions up to commit 83d18be
Suggestions up to commit b072dbf
Suggestions up to commit 0fe5dde
Suggestions up to commit 11061de
|
|
Persistent review updated to latest commit 6276cba |
6276cba to
e0a03f0
Compare
|
Persistent review updated to latest commit e0a03f0 |
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 4ba32b8.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
Persistent review updated to latest commit 219a704 |
Implement the timewrap command that reshapes timechart output by wrapping each time period into a separate data series, enabling day-over-day, week-over-week, and other recurring interval comparisons. Signed-off-by: Jialiang Li <jialiang.li@hey.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>
Add timewrap_test to SHOW TABLES expected output (25 tables). Signed-off-by: Jialiang Li <jialiang.li@hey.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>
…etic - Extract all timewrap helper methods to TimewrapUtils.java in calcite/utils/ - Add precise EXTRACT-based period computation for month/quarter/year - Add cumDaysBeforeMonth with leap year CASE expression for precise quarter offset - Month/quarter/year period assignment now uses calendar arithmetic instead of approximate fixed-length conversions Signed-off-by: Jialiang Li <jialiang.li@hey.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>
- Remove Calcite PIVOT from timewrap: no more MAX_PERIODS limit or crash risk - Add post-processing pivot in execution engine: dynamically builds columns from actual data using HashMap grouping - Add series parameter: series=relative (default), series=short (s0, s1) - Add series=exact grammar support (falls back to short at runtime) - Add time_format grammar support for series=exact - Benchmark: 1000 period columns in ~50ms (Calcite PIVOT crashed at 1000) - 32 IT tests, all using verifySchema + verifyDataRows Signed-off-by: Jialiang Li <jialiang.li@hey.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>
219a704 to
4ba32b8
Compare
|
Persistent review updated to latest commit 4ba32b8 |
The timewrap pivot (turning the unpivoted [display_ts, value, __base_offset__, __period__] rows into Splunk-style period columns) was done as post-processing in OpenSearchExecutionEngine.buildResultSet, gated on CalcitePlanContext thread-locals. The analytics-engine route executes the RelNode via AnalyticsExecutionEngine and never reaches that code, so timewrap queries came back unpivoted (CalciteTimewrapCommandIT 0/32 on the analytics route). Extract the pivot into a shared core helper (TimewrapPivot) and call it from both execution engines so they produce identical output. AnalyticsExecutionEngine captures the timewrap signals at execute() entry via TimewrapSignals because the result callback runs on a different worker thread than the planning thread that set the thread-locals. CalciteTimewrapCommandIT: 32/32 on both the Calcite/v2 and analytics-engine routes. Signed-off-by: Kai Huang <ahkcs@amazon.com>
… docs, unit tests Fixes from PR review: - Thread-local leak: visitTimewrap sets timewrap signals on CalcitePlanContext thread-locals during planning, but they were only cleared on the execute success path. explain and exception-after-planning paths leaked the signals onto the pooled worker thread, so the next query was wrongly treated as timewrap. Centralize clearing in CalcitePlanContext.clearTimewrapSignals(), called from CalcitePlanContext.run()'s finally (v2 path) and from RestUnifiedQueryAction's finally (analytics path, which doesn't use run()). Both engines' existing clears now route through the one helper. - Anonymizer: add visitTimewrap to PPLQueryDataAnonymizer so anonymized query logs include the timewrap command instead of silently dropping the pipe segment. span magnitude is masked; align/series are constrained keywords. - align=now off-by-one: baseOffset used integer DIVIDE (truncates toward zero), giving wrong period labels when the reference is below maxEpoch (future-dated data under align=now). Use FLOOR(double-divide) for true floor division. - Docs/dead code: remove the unimplemented "max 20 period columns" claim from timewrap.md and the unused MAX_PERIODS constant (the pivot is intentionally unbounded). - Tests: add CalcitePPLTimewrapTest (verifyLogical + verifyPPLToSparkSQL) per the PPL-command checklist, and a timewrap case in PPLQueryDataAnonymizerTest. CalciteTimewrapCommandIT: 32/32 on both the Calcite/v2 and analytics routes. Signed-off-by: Kai Huang <ahkcs@amazon.com>
…ivot Follow-up cleanups from PR review: - series=exact: keep the documented fallback to short "s<N>" naming, but remove the write-only CalcitePlanContext.timewrapTimeFormat thread-local (set in visitTimewrap and cleared in clearTimewrapSignals, but never read). The AST Timewrap.timeFormat field is retained for when exact formatting is implemented. Collapse the identical short/exact switch arms with a comment. - TimewrapPivot: precompute each period's display name once into a Map<Long,String> instead of re-running split/parseInt/switch inside the per-row loop (was O(rows x valueCols)). Collapse the unreachable two-pass column-index scan into one — visitTimewrap always emits the bookkeeping columns last, so the fallback scan never ran. CalciteTimewrapCommandIT: 32/32 on both the Calcite/v2 and analytics routes. Signed-off-by: Kai Huang <ahkcs@amazon.com>
|
Persistent review updated to latest commit 11061de |
Two explain-plan tests following the existing CalciteExplainIT pattern (explainQueryYaml + loadExpectedPlan + assertYamlEqualsIgnoreId): - testExplainTimewrap: fixed-length unit (1day), epoch-based arithmetic path. - testExplainTimewrapMonth: variable-length unit (1month), EXTRACT-based calendar arithmetic path. Both pin the align=end reference with a WHERE @timestamp <= upper bound so the base_offset literal is deterministic across runs. Signed-off-by: Kai Huang <ahkcs@amazon.com>
CalciteNoPushdownIT re-runs CalciteExplainIT with pushdown disabled, which loads expected plans from expectedOutput/calcite_no_pushdown/ instead of expectedOutput/calcite/. The two timewrap explain tests were missing their no-pushdown variants, causing "resource ... not found" failures in CI. Logical plans match the pushdown variants; the physical plans differ (EnumerableLimit/EnumerableSort + full index scan, no pushed-down aggregation). Signed-off-by: Kai Huang <ahkcs@amazon.com>
|
Persistent review updated to latest commit b072dbf |
| // series=exact (+ time_format) is not yet implemented; it intentionally falls back to the | ||
| // short "s<N>" naming. TODO: format the period start date with time_format. |
There was a problem hiding this comment.
Is this tracked somewhere?
There was a problem hiding this comment.
Not previously due to low usage — opened a follow-up issue #5606 to track series=exact with time_format
| case "s" -> value; | ||
| case "m" -> value * 60L; | ||
| case "h" -> value * 3_600L; | ||
| case "d" -> value * 86_400L; | ||
| case "w" -> value * 7L * 86_400L; | ||
| case "M" -> value * 30L * 86_400L; | ||
| case "q" -> value * 91L * 86_400L; | ||
| case "y" -> value * 365L * 86_400L; |
There was a problem hiding this comment.
Should this be approximate or exact? What's expected for leap-year February boundary?
There was a problem hiding this comment.
spanToSeconds is only used on the fixed-length path (s/m/h/d/w), where the second count is exact — no calendar variability. The variable-length units (M/q/y) don't use this method; they go through calendarUnitNumber/calendarUnitStartEpoch, which handle the Feb/leap-year boundary precisely via cumDaysBeforeMonth. I've updated spanToSeconds to throw on M/q/y so the approximate values can never be silently used, and clarified the javadoc.
| java.time.LocalDateTime.parse( | ||
| s, java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")); | ||
| return ldt.toEpochSecond(java.time.ZoneOffset.UTC); | ||
| } catch (Exception e) { |
There was a problem hiding this comment.
Narrow to catch (DateTimeParseException)
There was a problem hiding this comment.
Done — narrowed both to DateTimeParseException
| visitChildren(node, context); | ||
|
|
||
| // Signal the execution engine to strip all-null columns and rename with absolute offsets | ||
| CalcitePlanContext.stripNullColumns.set(true); |
There was a problem hiding this comment.
Can we add a leak-guard test: a timewrap query followed by a non-timewrap query on the same pooled thread, asserting the second result has no base_offset/period artifacts?
There was a problem hiding this comment.
Added TimewrapSignalsLeakTest in 41f00ee
spanToSeconds is only used on the fixed-length timewrap path (s/m/h/d/w), where the second count is exact. The M/q/y arms returned approximate 30/91/365-day values that were never used for wrapping (those units go through the calendar arithmetic path with exact leap-year handling). Throw instead so the approximation can never be silently consumed. Signed-off-by: Kai Huang <ahkcs@amazon.com>
|
Persistent review updated to latest commit 83d18be |
Both LocalDateTime.parse and Instant.parse throw only DateTimeParseException on bad input; catch that specific type instead of Exception so genuine programming errors surface. Signed-off-by: Kai Huang <ahkcs@amazon.com>
|
Persistent review updated to latest commit 026f481 |
Asserts that after a timewrap query runs through CalcitePlanContext.run, the pivot thread-locals are cleared so the next non-timewrap query on the same pooled thread is not wrongly pivoted (no __base_offset__/__period__ artifacts). Verified the test fails if run()'s clearTimewrapSignals guard is removed. Signed-off-by: Kai Huang <ahkcs@amazon.com>
|
Persistent review updated to latest commit 41f00ee |
Description
Adds the
timewrapcommand to PPL, which reshapestimechartoutput by wrapping each time period into separate columns for side-by-side comparison (day-over-day, week-over-week, etc.). Behavior verified against Splunk.Syntax:
... | timechart <agg> | timewrap <span> [align=end|now]Key features
<agg>_<N><unit>_before,<agg>_latest_<unit>,<agg>_<N><unit>_afteralign=end(default): uses WHERE upper bound as reference for deterministic column namesalign=now: uses current time as referenceImplementation
timewrapCommandrule reusing existingspanLiteralTimewrapnode with unit, value, alignUNIX_TIMESTAMP/FROM_UNIXTIMEfor epoch conversion__base_offset__, and handlelatest_<unit>naming@timestamp <=filter foralign=endreferenceKnown limitations
BYclause not yet supportedTesting
Issues Resolved
Resolves #5237
Check List
--signoff