Skip to content

P5b: Generation pipeline (Recall webhook → 6 docs → tasks → master record → notify)#1

Open
itkujo wants to merge 2 commits into
mainfrom
feat/p5b-generation-pipeline
Open

P5b: Generation pipeline (Recall webhook → 6 docs → tasks → master record → notify)#1
itkujo wants to merge 2 commits into
mainfrom
feat/p5b-generation-pipeline

Conversation

@itkujo

@itkujo itkujo commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Second half of P5 — the meeting generation path — built on the P5a ingest foundation (dd4cf08) and the previously-dormant @gracie/shared/ai groundwork, which is now wired in, not rebuilt (GENERATED_DOC_SPECS / GENERATED_DOC_ORDER, assemblePrompt, parseTaskExtraction, the provider interface).

Brief: docs/plan/p5b-generation-pipeline.md. Spec authority: docs/06-ai-pipeline.md §2/§3/§4/§6/§8/§9.

What's here

  • Queue contract (@gracie/shared, client-safe): generate + watchdog queues, job names, scheduler id, SLA constants; GenerationJobPayload (meetingId, botJobId, optional transcriptOverride) + WatchdogJobPayload.
  • Generation core (apps/worker/src/lib/generate.ts): pipeline-agnostic, runs the 6 docs sequentially in the fixed order (D7), authors each layer-3 instruction, assembles the 5-layer prompt, and calls the provider interface only (D11 — never the OpenAI SDK). One stricter re-ask on invalid task JSON (docs/06 §8). Reusable from the upload path later.
  • Meeting processor (generate.processor.ts, mirrors ingest.processor.ts): transcript (override or Recall fetch) → store raw in MinIO → embed (pinned 1536-dim, source_type='transcript') → historical context (match_embeddings top-5 client-scoped, excluding self, + open tasks) → 6 docs → store .md + insert documents (correct document_type mapping; docs 3 & 6 requires_review) → parse → tasks (owner/due/priority resolution, source_document_id = checklist) → append master_record_entriespipeline_runs → mark complete + per-attendee documents_ready notifications. Failure handling: transient errors retry w/ backoff; final attempt → needs_attention + failed pipeline_runs.
  • Transcript watchdog (watchdog.processor.ts): repeatable 15-min sweep; meetings awaiting a transcript past the 90-min SLA → needs_attention + in-app notification to the lead. Resend alert deferred to P7 (TODO).
  • Webhook (POST /api/webhooks/recall, runtime='nodejs'): Svix signature verification (pure, unit-tested) — enforced once RECALL_WEBHOOK_SECRET exists, skipped-with-warning until then; confirms a meeting matches bot_job_id (else 4xx) → enqueue → pipeline_status='processing'202.

Acceptance — verified

  • pnpm -w typecheck, pnpm -w lint, pnpm --filter web build all pass.
  • End-to-end against live Supabase/MinIO/Redis (enqueued generate w/ transcriptOverride for a seeded CMS meeting): 6 documents (correct types; docs 3 & 6 requires_review=true/needs_review), 4 tasks (owners resolved, 2026-05-15 due parsed, priority flag set, source_document_id set), 1 master_record_entries, 1 pipeline_runs (success, documents_generated=6), 2 transcript embeddings @ 1536-dim, pipeline_status='complete', 3 documents_ready notifications, six .md objects in MinIO.
  • [VERIFY: …] tags appear where the model is uncertain (9 tags on a deliberately ambiguous transcript; clean transcript → 0, as expected). Generation model read from settings.ai_model via getActiveProvider().
  • ✅ Webhook 404 on a non-matching bot_job_id, 400 on a missing id, 202 on a valid one (live).
  • ✅ Watchdog flagged an overdue meeting live; the Recall-fetch failure path drove a meeting to needs_attention + a failed pipeline_runs row.
  • ✅ Svix signature verification: 12/12 unit assertions (valid / multi-sig / tampered-body / wrong-sig / wrong-secret / missing-headers + payload parsing).
  • ✅ No secrets staged (*.env.local + docs/SECRETS.md git-ignored).

Deploy-time follow-ups (flagged, non-blocking — per brief Escalate §)

  • Provision RECALL_WEBHOOK_SECRET and register the webhook with Recall (signature verification is built + tested behind it).
  • Swap the legacy Recall transcript endpoint for the modern transcript_retrieve flow — the live key reached Recall but reported /bot/{id}/transcript/ is deprecated. Pipeline is proven via transcriptOverride; wiring the modern flow needs a real bot payload. Documented in apps/worker/src/lib/recall.ts.

Out of scope (later phases)

Manual-upload doc-set selection, .docx rendering (stored as .md), Intelligence chat / KB (P6), calendar scan / bot dispatch (P4), Resend delivery (P7).

🤖 Generated with Claude Code

Daniel Velez and others added 2 commits June 22, 2026 12:43
P5a ingest is done; queue up P5b (Recall webhook -> 6 docs -> tasks ->
master record -> pipeline_runs -> notify + transcript watchdog), wiring the
dormant @gracie/shared AI groundwork. HANDOFF 'Now' section updated; notes the
2026-06-18 APP_ENCRYPTION_KEY rotation + reconstructed worker env (P4 deferred).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s → master → notify)

Builds the second half of P5 — the meeting GENERATION path — on top of the P5a
ingest foundation and the dormant @gracie/shared/ai groundwork (GENERATED_DOC_SPECS,
assemblePrompt, parseTaskExtraction), which are now wired in (not rebuilt).

Queue contract (@gracie/shared):
- Add `generate` + `watchdog` queue names, job names, scheduler id, and the
  transcript-watchdog interval/SLA constants (client-safe; names + types only).
- Add `GenerationJobPayload` (meetingId, botJobId, optional transcriptOverride)
  and `WatchdogJobPayload`.

Generation core (apps/worker/src/lib/generate.ts):
- Pipeline-agnostic: runs the 6 documents SEQUENTIALLY in the fixed order (D7),
  authoring the per-doc layer-3 instruction, assembling the 5-layer prompt, and
  calling the provider interface (D11 — never the OpenAI SDK). Parses the task
  checklist with one stricter re-ask on invalid JSON (docs/06 §8). Reusable from
  both the meeting processor and (later) the upload path.

Meeting processor (generate.processor.ts), per docs/06 §4:
- transcript (override or Recall fetch) → store raw in MinIO → embed (pinned
  1536-dim, source_type='transcript') → historical context (match_embeddings
  top-5 client-scoped, excluding self, + open tasks) → generate 6 docs → store
  .md + insert `documents` (GeneratedDocType→enum mapping; docs 3 & 6
  requires_review/needs_review) → parse → insert `tasks` (owner/due/priority
  resolution, source_document_id = checklist) → append `master_record_entries`
  → `pipeline_runs` (success|partial) → mark `complete` + notify attendees.
  Failure handling: transient errors retry w/ backoff; final attempt →
  needs_attention + failed `pipeline_runs` row.

Transcript watchdog (watchdog.processor.ts, docs/06 §8):
- Repeatable 15-min sweep flags meetings awaiting a transcript past the 90-min
  SLA → needs_attention + in-app notification to the lead. Resend alert deferred
  to P7 (TODO).

Web webhook (POST /api/webhooks/recall, runtime=nodejs):
- Svix signature verification (pure, unit-tested) enforced once
  RECALL_WEBHOOK_SECRET is provisioned (skipped-with-warning until deploy);
  confirms a meeting matches bot_job_id (else 4xx) → enqueue → set
  pipeline_status='processing' → 202.

Verified end-to-end against live infra: enqueued generate jobs with sample
transcripts for seeded meetings → 6 `documents`, 4 `tasks`, 1
`master_record_entries`, 1 success `pipeline_runs` (documents_generated=6),
1536-dim transcript `embeddings`, pipeline_status=complete, per-attendee
`documents_ready` notifications, .md objects in MinIO; [VERIFY] tags appear on an
ambiguous transcript; webhook returns 404 on a bad bot_job_id and 202 on a valid
one; watchdog + failure→needs_attention paths exercised live. typecheck + lint +
`pnpm --filter web build` all pass.

Deploy-time follow-ups (per brief): provision RECALL_WEBHOOK_SECRET and register
the webhook; swap the legacy Recall transcript endpoint for the modern
transcript_retrieve flow (live key confirmed working).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant