Skip to content

Hotfix: apply LIMIT before CALL subqueries in dataset/transgene queries#44

Merged
Robbie1977 merged 1 commit into
mainfrom
fix/perf-limit-before-call
May 29, 2026
Merged

Hotfix: apply LIMIT before CALL subqueries in dataset/transgene queries#44
Robbie1977 merged 1 commit into
mainfrom
fix/perf-limit-before-call

Conversation

@Robbie1977
Copy link
Copy Markdown
Contributor

Hotfix for the perf-test failure on main after PR #43 merge: https://github.com/VirtualFlyBrain/VFBquery/actions/runs/26659171834/job/78577045325

PR #43 added CALL subqueries to get_aligned_datasets, get_all_datasets, and get_transgene_expression_here, but LIMIT was appended at the END of the constructed query. Cypher applies LIMIT after the CALL subqueries fire, so every candidate ds/ep gets enriched through 4 (or 2) CALL subqueries before being trimmed.

For AlignedDatasets that meant 86 datasets × 4 subqueries (one of which is count(DISTINCT img) over has_source edges). For AllDatasets, 130 datasets. For TransgeneExpressionHere on mushroom body, 2,340 EPs with a 5-hop thumbnail join.

The fix moves LIMIT after WITH DISTINCT and before the CALL subqueries fire, so only the kept rows are enriched. Also drops the ORDER BY name from _dataset_return_clause and moves it next to LIMIT in each caller (can't have two ORDER BYs).

Dry-run against pdb.v4 public read-only

  • AlignedDatasets LIMIT 10 — 1.64 s (was timing out before fix)
  • AllDatasets LIMIT 20 — 1.10 s
  • TransgeneExpressionHere LIMIT 10 on mushroom body — 0.51 s

All comfortably under their thresholds (3 s, 3 s, 15 s).

PR #43 broke THRESHOLD_MEDIUM (3 s) on AlignedDatasets / AllDatasets
and THRESHOLD_SLOW (15 s) on TransgeneExpressionHere because LIMIT
was appended at the end of each Cypher and applied AFTER the four
(or two) CALL subqueries fired for every candidate ds/ep. One of
the dataset subqueries does count(DISTINCT img) across has_source
edges; the transgene one traverses a 5-hop image join inside the
CALL.

Move LIMIT after `WITH DISTINCT ds` (or `WITH DISTINCT ep`) and
before the CALL subqueries so only the rows we keep get enriched.
Drop the ORDER BY from `_dataset_return_clause` and move it next
to LIMIT in each caller, since you can't ORDER BY twice in the
same query.

Dry-run against pdb.v4 (public read-only):
  AlignedDatasets    LIMIT 10 -> 1.64 s
  AllDatasets        LIMIT 20 -> 1.10 s
  TransgeneExpr...   LIMIT 10 -> 0.51 s

All under their respective thresholds.
@Robbie1977 Robbie1977 merged commit f8d8618 into main May 29, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant