P5b: Generation pipeline (Recall webhook → 6 docs → tasks → master record → notify)#1
Open
itkujo wants to merge 2 commits into
Open
P5b: Generation pipeline (Recall webhook → 6 docs → tasks → master record → notify)#1itkujo wants to merge 2 commits into
itkujo wants to merge 2 commits into
Conversation
P5a ingest is done; queue up P5b (Recall webhook -> 6 docs -> tasks -> master record -> pipeline_runs -> notify + transcript watchdog), wiring the dormant @gracie/shared AI groundwork. HANDOFF 'Now' section updated; notes the 2026-06-18 APP_ENCRYPTION_KEY rotation + reconstructed worker env (P4 deferred). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s → master → notify) Builds the second half of P5 — the meeting GENERATION path — on top of the P5a ingest foundation and the dormant @gracie/shared/ai groundwork (GENERATED_DOC_SPECS, assemblePrompt, parseTaskExtraction), which are now wired in (not rebuilt). Queue contract (@gracie/shared): - Add `generate` + `watchdog` queue names, job names, scheduler id, and the transcript-watchdog interval/SLA constants (client-safe; names + types only). - Add `GenerationJobPayload` (meetingId, botJobId, optional transcriptOverride) and `WatchdogJobPayload`. Generation core (apps/worker/src/lib/generate.ts): - Pipeline-agnostic: runs the 6 documents SEQUENTIALLY in the fixed order (D7), authoring the per-doc layer-3 instruction, assembling the 5-layer prompt, and calling the provider interface (D11 — never the OpenAI SDK). Parses the task checklist with one stricter re-ask on invalid JSON (docs/06 §8). Reusable from both the meeting processor and (later) the upload path. Meeting processor (generate.processor.ts), per docs/06 §4: - transcript (override or Recall fetch) → store raw in MinIO → embed (pinned 1536-dim, source_type='transcript') → historical context (match_embeddings top-5 client-scoped, excluding self, + open tasks) → generate 6 docs → store .md + insert `documents` (GeneratedDocType→enum mapping; docs 3 & 6 requires_review/needs_review) → parse → insert `tasks` (owner/due/priority resolution, source_document_id = checklist) → append `master_record_entries` → `pipeline_runs` (success|partial) → mark `complete` + notify attendees. Failure handling: transient errors retry w/ backoff; final attempt → needs_attention + failed `pipeline_runs` row. Transcript watchdog (watchdog.processor.ts, docs/06 §8): - Repeatable 15-min sweep flags meetings awaiting a transcript past the 90-min SLA → needs_attention + in-app notification to the lead. Resend alert deferred to P7 (TODO). Web webhook (POST /api/webhooks/recall, runtime=nodejs): - Svix signature verification (pure, unit-tested) enforced once RECALL_WEBHOOK_SECRET is provisioned (skipped-with-warning until deploy); confirms a meeting matches bot_job_id (else 4xx) → enqueue → set pipeline_status='processing' → 202. Verified end-to-end against live infra: enqueued generate jobs with sample transcripts for seeded meetings → 6 `documents`, 4 `tasks`, 1 `master_record_entries`, 1 success `pipeline_runs` (documents_generated=6), 1536-dim transcript `embeddings`, pipeline_status=complete, per-attendee `documents_ready` notifications, .md objects in MinIO; [VERIFY] tags appear on an ambiguous transcript; webhook returns 404 on a bad bot_job_id and 202 on a valid one; watchdog + failure→needs_attention paths exercised live. typecheck + lint + `pnpm --filter web build` all pass. Deploy-time follow-ups (per brief): provision RECALL_WEBHOOK_SECRET and register the webhook; swap the legacy Recall transcript endpoint for the modern transcript_retrieve flow (live key confirmed working). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Second half of P5 — the meeting generation path — built on the P5a ingest foundation (
dd4cf08) and the previously-dormant@gracie/shared/aigroundwork, which is now wired in, not rebuilt (GENERATED_DOC_SPECS/GENERATED_DOC_ORDER,assemblePrompt,parseTaskExtraction, the provider interface).Brief:
docs/plan/p5b-generation-pipeline.md. Spec authority:docs/06-ai-pipeline.md§2/§3/§4/§6/§8/§9.What's here
@gracie/shared, client-safe):generate+watchdogqueues, job names, scheduler id, SLA constants;GenerationJobPayload(meetingId,botJobId, optionaltranscriptOverride) +WatchdogJobPayload.apps/worker/src/lib/generate.ts): pipeline-agnostic, runs the 6 docs sequentially in the fixed order (D7), authors each layer-3 instruction, assembles the 5-layer prompt, and calls the provider interface only (D11 — never the OpenAI SDK). One stricter re-ask on invalid task JSON (docs/06 §8). Reusable from the upload path later.generate.processor.ts, mirrorsingest.processor.ts): transcript (override or Recall fetch) → store raw in MinIO → embed (pinned 1536-dim,source_type='transcript') → historical context (match_embeddingstop-5 client-scoped, excluding self, + open tasks) → 6 docs → store.md+ insertdocuments(correctdocument_typemapping; docs 3 & 6requires_review) → parse →tasks(owner/due/priority resolution,source_document_id= checklist) → appendmaster_record_entries→pipeline_runs→ markcomplete+ per-attendeedocuments_readynotifications. Failure handling: transient errors retry w/ backoff; final attempt →needs_attention+failedpipeline_runs.watchdog.processor.ts): repeatable 15-min sweep; meetings awaiting a transcript past the 90-min SLA →needs_attention+ in-app notification to the lead. Resend alert deferred to P7 (TODO).POST /api/webhooks/recall,runtime='nodejs'): Svix signature verification (pure, unit-tested) — enforced onceRECALL_WEBHOOK_SECRETexists, skipped-with-warning until then; confirms a meeting matchesbot_job_id(else 4xx) → enqueue →pipeline_status='processing'→ 202.Acceptance — verified
pnpm -w typecheck,pnpm -w lint,pnpm --filter web buildall pass.generatew/transcriptOverridefor a seeded CMS meeting): 6documents(correct types; docs 3 & 6requires_review=true/needs_review), 4tasks(owners resolved,2026-05-15due parsed, priority flag set,source_document_idset), 1master_record_entries, 1pipeline_runs(success,documents_generated=6), 2 transcriptembeddings@ 1536-dim,pipeline_status='complete', 3documents_readynotifications, six.mdobjects in MinIO.[VERIFY: …]tags appear where the model is uncertain (9 tags on a deliberately ambiguous transcript; clean transcript → 0, as expected). Generation model read fromsettings.ai_modelviagetActiveProvider().bot_job_id, 400 on a missing id, 202 on a valid one (live).needs_attention+ afailedpipeline_runsrow.*.env.local+docs/SECRETS.mdgit-ignored).Deploy-time follow-ups (flagged, non-blocking — per brief Escalate §)
RECALL_WEBHOOK_SECRETand register the webhook with Recall (signature verification is built + tested behind it).transcript_retrieveflow — the live key reached Recall but reported/bot/{id}/transcript/is deprecated. Pipeline is proven viatranscriptOverride; wiring the modern flow needs a real bot payload. Documented inapps/worker/src/lib/recall.ts.Out of scope (later phases)
Manual-upload doc-set selection,
.docxrendering (stored as.md), Intelligence chat / KB (P6), calendar scan / bot dispatch (P4), Resend delivery (P7).🤖 Generated with Claude Code