Make parallel query-cycle reporting deterministic#157343
Conversation
e642a03 to
f4a84eb
Compare
f4a84eb to
e3e3e67
Compare
|
r? @Zoxc could you check if this generally makes sense? Sorting based on spans is still not deterministic because all the spans can potentially be the same (including dummy), but it may be more deterministic than just some arbitrarily selected order, which may be enough for our parallel test suite in practice. (The change is most likely LLM-generated so feel free to close if you don't want to look.) |
|
Failed to set assignee to
|
When the parallel front-end detects a query cycle, the deadlock handler collects the active query jobs in a nondeterministic order. process_cycle then picked entry_points[0], or the first entry point that has a waiter, so the query the cycle was anchored at changed from run to run and the reported cycle text was unstable. Order the entry points instead of taking the first one. The primary key is the incoming-edge span: the cycle stack records, for each query, the span where its predecessor in the cycle requested it. For the recursive-definition cycles these errors come from, the entry point with the latest incoming-edge span is the query the single-threaded path anchors at, so this keeps the parallel output matching the committed .stderr. Span ties prefer an entry point that has an outside waiter, so the "cycle used when ..." note is still produced, then fall back to the query's stable description. This is a heuristic rather than a guaranteed total order: if two entry points share both a span and a description the choice is still arbitrary, and there is no stable per-job key to order them by since QueryJobId is assigned in racy execution order. For the cycles these tests exercise the edge spans are distinct, so the anchor is deterministic in practice, which is what the parallel test suite needs. This only changes process_cycle, which runs exclusively under the parallel front-end, so single-threaded output is unchanged.
These tests were marked ignore-parallel-frontend because the reported cycle anchor was nondeterministic. Now that process_cycle picks the entry point deterministically their output is stable, so replace the directive with a blank line rather than deleting it. The following source lines stay in place, so the expected stderr is unchanged.
e3e3e67 to
6d4945e
Compare
|
Using spans here seems inferior to just reverting #152229. For parallel tests I think we should have flag which removes entry points from messages entirely and shifts query cycles so that the lowest query one is on top. That would be more robust. |
The parallel front-end reports query cycles through the deadlock handler, which collects the active query jobs in a nondeterministic order.
process_cyclethen anchored the reported cycle atentry_points[0](or the first entry point that has a waiter), so the query the cycle text was anchored at changed from run to run, and several UI tests had unstable output under-Zthreads.This orders the entry points instead of taking the first one. The primary key is the incoming-edge span: the cycle stack records, for each query, the span where its predecessor in the cycle requested it. For the recursive-definition cycles these errors come from, the entry point with the latest incoming-edge span is the query the single-threaded path anchors at, so this keeps the parallel output matching the committed
.stderr. Span ties prefer an entry point that has an outside waiter, so the "cycle used when" note is still produced, then fall back to the query's stable description.This is a heuristic rather than a guaranteed total order: if two entry points share both a span and a description the choice is still arbitrary, and there is no stable per-job key to order them by since
QueryJobIdis assigned in racy execution order. For the cycles these tests exercise the edge spans are distinct, so the anchor is deterministic in practice, which is what the parallel test suite needs.process_cycleonly runs under the parallel front-end, so single-threaded output is unchanged.