Optimize performance_datum queries in alert generation#9550
Conversation
generate_new_alerts_in_series and generate_new_test_alerts_in_series run on every signature during ingestion. Their PerformanceDatum reads were not using the (repository, signature, push_timestamp) index: - The `series` query filtered only on signature + push_timestamp, so Postgres could not seek the composite index (its leading column, repository, was unconstrained). Add repository=signature.repository; a signature has exactly one repository, so results are unchanged. - The replicate lookup used `IN (SELECT ... WHERE EXISTS (...))`. The EXISTS was redundant -- a replicate row's parent datum is in the set by definition. Replace with a plain join via the new shared helper _collect_replicates_map(), removing the duplicated block in both functions and the now-unused Exists/OuterRef/Subquery imports. Add a characterization test pinning the replicate-map behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #9550 +/- ##
========================================
Coverage 82.94% 82.94%
========================================
Files 613 613
Lines 35372 35376 +4
Branches 3208 3278 +70
========================================
+ Hits 29338 29342 +4
+ Misses 5880 5665 -215
- Partials 154 369 +215 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Trim the PerformanceDatum series query to the 4 columns actually used (.only id/push_id/push_timestamp/value) and drop the no-op select_related on the latest-alert-timestamp lookup in both generate_new_alerts_in_series and generate_new_test_alerts_in_series. Extract the now-identical data-loading prologue into _load_revision_data, parameterized by the latest-alert-timestamp queryset and the RevisionDatum class, so the two alerting paths share one copy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 2 - extra perf enhancementsFile: Two nearly-identical functions — 1. Trimmed over-fetching on the
|
junngo
left a comment
There was a problem hiding this comment.
At that time, we spent quite a bit of effort tuning the datum replicate query shape.
The Subquery and Exists conditions may look logically redundant, but they helped the DB optimizer choose the intended index path for PerformanceDatumReplicate.
I’m happy to hear any ideas for improving it further, and I’ll also try to double-check whether there’s a better query shape here.
| performance_datum__signature=signature, | ||
| performance_datum__repository=signature.repository, | ||
| performance_datum__push_timestamp__gte=alert_after_ts, | ||
| ).values_list("performance_datum_id", "value") |
There was a problem hiding this comment.
I’m concerned this may introduce a query plan regression.
This query [0] takes over 2 minutes per call in Redash. From the query plan [1], performance_datum uses the intended index quickly, but performance_datum_replicate still performs a full scan.
[0]
SELECT
pdr.performance_datum_id,
pdr.value
FROM
performance_datum_replicate pdr
INNER JOIN
performance_datum pd
ON pdr.performance_datum_id = pd.id
WHERE pd.push_timestamp >= '2026-05-11'
and pd.repository_id = 77
and pd.signature_id = 5763463
[1]
Execution Time: 138,480 ms ≈ 2m 18s
Gather
└─ Hash Join
Join: pdr.performance_datum_id = pd.id
├─ Outer: Parallel Seq Scan on performance_datum_replicate pdr
│ ├─ ....
│ ├─ Total scanned: ~1.36 billion rows
│
└─ Inner: Hash
└─ Index Scan on performance_datum pd
├─ Index: performance_reposit_c9d328_idx
├─ Cond:
│ repository_id = 77
│ signature_id = 5763463
│ push_timestamp >= 2026-05-11
├─ Actual rows: 55,900
└─ Actual time: ~49ms
| ) | ||
| replicates = PerformanceDatumReplicate.objects.filter( | ||
| performance_datum_id__in=Subquery(datum_with_replicates.values("id")) | ||
| ).values_list("performance_datum_id", "value") |
There was a problem hiding this comment.
The original query takes about 100 ms per call, which looks reasonable to me.
Execution Time: 107 ms
Nested Loop
├─ Outer: Aggregate / Gather
│ └─ Nested Loop Semi
│ ├─ Bitmap Heap Scan on performance_datum
│ │ ├─ Index: performance_reposit_c9d328_idx
│ │ ├─ Actual rows: ~55,494 total
│ │ └─ Uses repository/signature/push_timestamp condition
│ │
│ └─ Index Only Scan on performance_datum_replicate u0
│ ├─ Index: performance_datum_replicate_performance_datum_id_fe2ed518
│ ├─ Cond: performance_datum_id = v0.id
│ ├─ Loops: 55,494
│ └─ Heap Fetches: 0
│
└─ Inner: Index Scan on performance_datum_replicate
└─ Not executed because no matching datum ids with replicates
Bug 2041948
generate_new_alerts_in_seriesandgenerate_new_test_alerts_in_seriesrun on every signature during ingestion. TheirPerformanceDatumreads were not using the (repository,signature,push_timestamp) index:The
seriesquery filtered only onsignature+push_timestamp, so Postgres could not seek the composite index (its leading column,repository, was unconstrained). Add repository=signature.repository; a signature has exactly one repository, so results are unchanged.The replicate lookup used
IN (SELECT ... WHERE EXISTS (...)). TheEXISTSwas redundant -- a replicate row's parent datum is in the set by definition. Replace with a plain join via the new shared helper_collect_replicates_map(), removing the duplicated block in both functions and the now-unused Exists/OuterRef/Subquery imports.De-duplicated some code that was identical. This streamlines the code where both queries had the same issues, so this was optimal.
Add a characterization test pinning the replicate-map behavior.