Skip to content

Rewrite webhooks reference with event lifecycle and v1/v2 schema notes#52

Draft
samgutentag wants to merge 2 commits into
mainfrom
sam-gutentag/webhook-lifecycle
Draft

Rewrite webhooks reference with event lifecycle and v1/v2 schema notes#52
samgutentag wants to merge 2 commits into
mainfrom
sam-gutentag/webhook-lifecycle

Conversation

@samgutentag
Copy link
Copy Markdown
Member

Summary

  • Rewrites merge-queue/webhooks.mdx (previously a near-empty stub deferring to Svix) into a real reference covering subscription families, event lifecycle, batch behavior, payload caveats, v1/v2 test_case.status_changed, quarantining events, and AI investigation events.
  • Adds a cross-link from flaky-tests/webhooks/index.mdx to the Merge Queue webhooks page so integrators landing on either side find the other.

Why

Sourced from customer feedback mining (cluster webhook-event-lifecycle, verdict missing, 11 pairs across 7 customers). The current webhooks.mdx is a stub that defers to Svix; customers integrating against Trunk webhooks hit recurring confusion on batch vs PR lifecycle, v1/v2 schema, bisection semantics, and event-subscription requirements.

Items flagged for review

  • v2.test_case.status_changed naming — used the literal event name based on the Brex/Healthie Slack replies and the existing flaky-tests/webhooks/index.mdx page. Confirm this matches the Svix catalog exactly.
  • pull_request.queued reason field enum — claims BISECTION_REQUIRED (entering) and BISECTION_TEST_RUN_PASSED (returning). Pulled verbatim from the Faire thread; confirm these are the exact enum strings in the live payload.
  • Bisection lifecycle for higher-queue failures — claim that PRs above a pending-failure batch can get kicked first and the lower ones re-queued (so later pull_request.queued events don't necessarily map back to the original pending_failure). Confirm phrasing matches actual queue behavior.
  • Payload timestamp absence — the existing flaky-tests/webhooks/index.mdx payload tables list timestamp as an ISO 8601 field for v2.test_case.status_changed, but the AgencyAnalytics thread says payloads have no timestamp. One of the two is wrong — flagging so the right one stays.
  • Quarantining enum — claims test_case.quarantining_setting_changed fires only for manual ALWAYS / NEVER overrides, not auto-quarantine. Pulled from the Descript thread; confirm there isn't a third trigger.
  • current_status == FLAKY check — taken verbatim from the Descript reply. Confirm the field is literally current_status (not new_status or status) on the test_case.status_changed payload.
  • flaky-tests/get-test-details API path — linked to api.trunk.io/docs#tag/flaky-tests as a best guess. Confirm the actual API reference URL.
  • Replaced Slack-community support link with support@trunk.io per global memory (no community Slack for support). The sibling migrating-from-github-merge-queue.mdx still uses the old Slack link — separate cleanup pass if desired.

Customer signal

@samgutentag samgutentag added the needs review PR sourced from customer-feedback-mining; needs human scrutiny for accuracy before merge label May 20, 2026
@mintlify
Copy link
Copy Markdown
Contributor

mintlify Bot commented May 20, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
trunk 🟢 Ready View Preview May 20, 2026, 11:04 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@samgutentag
Copy link
Copy Markdown
Member Author

samgutentag commented May 21, 2026

Verification status (2026-05-26): unknown

Could not determine rollout state from available signals. Chaining to verify-docs-against-code for content-accuracy check.

  • Eng PR: none referenced in PR body
  • Flag: n/a (no eng work to verify)
  • Signals checked:
    • No trunk-io/<repo>#NNN or PR URL references in PR body
    • No TRUNK-XXX Linear ticket linked
    • PR scope is content (documenting existing webhooks event lifecycle), not a flag-gated new feature

This is a content-accuracy PR, not a feature-rollout PR. The seven "items flagged for review" in the PR body (event name, enum values, bisection lifecycle, timestamp absence, quarantining triggers, current_status field, API path) are the real blockers and need direct verification against source. The code-verification comment (2026-05-21) found 3 contradictions that still need to be resolved before publishing.

@samgutentag samgutentag marked this pull request as draft May 21, 2026 07:10
@samgutentag
Copy link
Copy Markdown
Member Author

samgutentag commented May 21, 2026

Code verification (2026-05-26)

Re-verification. No new commits since last check (2026-05-21). The 3 contradictions identified previously remain unresolved in the current diff.

Claims checked:

  • v2.test_case.status_changed event name — confirmed in trunk-io/trunk2 flake-detection/src/types.ts and svix-publish-schemas/src/event-types.ts
  • reason field on pull_request.queued with BISECTION_REQUIRED / BISECTION_TEST_RUN_PASSED — docs say reason, code says queued_reason (trunk-io/trunk events_merge.ts#L157)
  • ✅ PRs above a pending-failure batch can be kicked, surviving PRs re-queued — confirmed in trunk-io/trunk merge_graph.ts#L2784-L2794
  • ❌ "Webhook payloads do not include an event timestamp" — code shows timestamp is a required field on the base schema for all pull_request.* events (events_merge.ts#L91-L93) and on v2.test_case.status_changed (flake-detection/src/types.ts#L239-L242)
  • test_case.quarantining_setting_changed fires only on manual override, not auto-quarantine — confirmed in trunk-io/trunk flaky_tests_service.ts#L782-L794
  • ❌ Quarantining overrides described as ALWAYS or NEVER — code uses ALWAYS_QUARANTINE / NEVER_QUARANTINE (trunk-io/trunk2 prisma/flakyDb/kysely/types.ts)
  • test_case.investigation_completed fires for AI/autofix analysis — confirmed in svix-publish-schemas/src/events/test-case-investigation-completed.ts: "Emitted when a flaky test analysis completes"
  • ❌ "check for current_status == FLAKY in the payload" (in Quarantining events section) — v2 payload uses new_status, not current_status. current_status only exists nested in the legacy v1 event as status_change.current_status.value
  • pull_request.* and pull_request_batch.* as event families — confirmed via internal topics proto (TOPIC_MERGE_GRAPH_ITEM_PENDING_FAILURE) and integration tests referencing pull_request.queued, pull_request_batch.pending_failure
  • ⚠️ pull_request_batch.pending_failure specific event name — not directly found in trunk2 Svix schemas (MQ events registered in trunk-io/trunk), but internal topics proto has TOPIC_MERGE_GRAPH_ITEM_PENDING_FAILURE and prior verification confirmed the event name
  • ⚠️ No dedicated bisection_completed event — could not locate such an event in either repo (consistent with the claim being true)
  • ✅ v2 event omits aggregate metrics like failure_rate_last_7d — confirmed. The enrichV2TestCaseStatusChanged function in enrich.ts does not include these fields. Only the legacy enrichTestCaseStatusChanged computes them from ClickHouse.

Contradictions still present (unchanged since 2026-05-21):

  1. Field name: The PR says "watch the reason field on pull_request.queued". The actual field name is queued_reason. Fix: replace reason with queued_reason.

  2. Timestamp claim: The PR says "Webhook payloads currently do not include an event timestamp. The practical workaround is to treat the time your endpoint receives the webhook as the event time." This is wrong. All MQ webhook payloads include a required ISO 8601 timestamp field. The entire "Payload notes" section should be deleted or corrected.

  3. Field name in quarantining section: The PR says "check for current_status == FLAKY in the payload". For the v2 event, the correct field is new_status. Additionally, the quarantining override values should be ALWAYS_QUARANTINE / NEVER_QUARANTINE, not ALWAYS / NEVER.

Verdict: needs eng review

@samgutentag samgutentag added the needs eng review verify-docs-against-code: at least one claim contradicts source. label May 21, 2026
samgutentag and others added 2 commits May 26, 2026 13:10
Three contradictions caught by source verification (see PR comment
4505684245):

- pull_request.queued field is queued_reason, not bare reason. The
  whole event family uses <state>_reason (queued_reason,
  failure_reason, merged_reason, cancellation_reason, submitted_reason).
- Payloads do include an ISO 8601 timestamp field; rewrite the
  "no timestamp" section to point readers at the actual field and
  explain why receive-time is unsafe (retries, out-of-order delivery).
- v2.test_case.status_changed uses top-level new_status (and
  previous_status), not nested status_change.current_status from v1.
  Simplify the quarantine example to v2 only and use the correct
  field name and string casing.

Plus:
- Quarantining enum strings on test_case.quarantining_setting_changed
  are ALWAYS_QUARANTINE / NEVER_QUARANTINE (not bare ALWAYS / NEVER).
- Cross-link from flaky-tests/webhooks/index.mdx to the Merge Queue
  webhooks page is wrapped in an <Info> callout for prominence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs eng review verify-docs-against-code: at least one claim contradicts source. needs review PR sourced from customer-feedback-mining; needs human scrutiny for accuracy before merge

Development

Successfully merging this pull request may close these issues.

1 participant