feat(generation): route Ricky writer to Claude Sonnet + add Opus review pass#102
Conversation
… review
Today Ricky's writer falls back to the workload-router built-in
`persona-maker` (Codex GPT-5.3) because no persona pack ships an
`agent-relay-workflow` persona that workload-router 0.19 can resolve
out of the box. The fallback is silent and produces specs decomposed
as if they were persona-authoring tasks rather than workflow
generation, with the structural failures observed on
docs/plans/deploy-v1-schema-cascade-spec.md (missed `## Track`
headings, serial-pipeline instead of parallel DAG, inlined spec,
missing per-track review loop, etc.).
This change:
- Adds `@agent-relay/personas@^6.0.18` so the canonical pack is at
least available on disk for reference (the workload-router 0.19
catalog is hardcoded and does not auto-discover packs, so this is
for parity, not auto-registration).
- Authors two Ricky-local persona JSONs under `personas/`:
- `agent-relay-workflow.json` — writer, harness `claude`,
`best=claude-opus-4-7`, `best-value=claude-sonnet-4-6`,
`minimum=claude-haiku-4-5-20251001`. Pins the writer to a
Claude harness so spec decomposition can reason over long
structurally nested specs.
- `agent-relay-workflow-review.json` — reviewer, intent `review`,
same Claude tier ladder. The reviewer produces a structured
`{verdict, summary, fixes}` JSON that the pipeline can act on.
- Adds `src/product/generation/ricky-local-persona-resolver.ts` — a
`WorkforcePersonaResolver` that loads Ricky-local Claude personas
first and hands them to `useRunnableSelection` from harness-kit.
Falls through to the existing package resolver for any intent
with no Ricky-local override, so the rest of the workforce path
keeps working unchanged.
- Adds `src/product/generation/workforce-persona-reviewer.ts` — a
reviewer pass that resolves through the `review` intent at tier
`best` (Opus) by default, audits the generated workflow against
the spec, and returns a structured verdict.
- Wires the resolver and the post-write reviewer pass into
`pipeline.ts:generateWithWorkforcePersona`. On `fix`, the
pipeline feeds the structured fix list back into one writer
repair attempt. On `pass`, the artifact ships as-is. On `block`,
the writer artifact is preserved and the verdict is surfaced in
metadata. The reviewer pass is opt-out via
`RICKY_PERSONA_REVIEW=0` or `workforcePersonaWriter.review: false`.
- Threads a `WorkforcePersonaReviewSummary` field through
`WorkforcePersonaGenerationMetadata` so callers (CLI, Cloud API)
can see verdict/fix counts without parsing logs.
Tests: 4 new pipeline-level tests cover the pass / fix-with-repair /
block-unparseable / opt-out paths; pre-existing writer-in-isolation
tests are updated to `review: false` since they predate the review
pass. `npx tsc --noEmit` clean, `npx vitest run` → 1087 passing, no
new failures.
Note on workload-router 3.x: persona-kit 3.0.1 collapsed the
per-tier shape (`tiers.best`, `tiers.best-value`, `tiers.minimum`)
into a single `harness`/`model` pair, but harness-kit 0.19 still
pins workload-router 0.19. A clean 3.x bump is a separate refactor
(harness-kit has no 3.x release yet); this change stays on 0.19's
tier-based shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a post-write reviewer pass to the workforce persona generation pipeline with local persona overrides, a reviewer runner that emits structured verdicts/fixes, deterministic fix application, writer prompt summarization/spec referencing, debug dumps, persona JSONs, and tests. ChangesWorkforce Persona Reviewer Pass
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer/Generator
participant Writer as writeWorkflowWithWorkforcePersona
participant Runner as runWorkforcePersonaReviewPass
participant Reviewer as reviewWorkflowWithWorkforcePersona
participant Persona as PersonaContext
Dev->>Writer: request generation (with resolver)
Writer-->>Dev: artifact + validation
Dev->>Runner: is review enabled? (flags/env)
Runner->>Reviewer: submit deterministic review task (spec, artifact, contract)
Reviewer->>Persona: send prompt via resolved persona context
Persona-->>Reviewer: textual response (may include JSON)
Reviewer->>Runner: parsed verdict + fixes + metadata
alt verdict == fix
Runner->>Writer: render fixes & re-run writer (validationFeedback)
Writer-->>Runner: repaired artifact + validation
end
Runner-->>Dev: final artifact + review metadata
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@package.json`:
- Line 62: The package.json currently runtime-loads personas/*.json but the
"files" allowlist omits the personas directory, so published packages won't
include those JSON overrides; update package.json's "files" array to include
"personas/" (or "personas/*.json") so the personas JSON files are packaged and
available at runtime, ensuring any code that reads personas/*.json (runtime
loader for personas) can find them after publish.
In `@src/product/generation/pipeline.ts`:
- Around line 349-371: The catch block currently returns a reviewSummary with
verdict: 'pass', which falsely signals approval; change the returned
reviewSummary.verdict to a non-passing value (e.g., 'error' or 'failed') and
adjust summary text accordingly so downstream automation sees this as a failure;
update the object built in the catch (the reviewSummary literal returned
alongside finalArtifact/validation/personaMetadata) to set verdict: 'error' (and
optionally change summary and selected fields like personaId/tier/runId if your
conventions require) so the error path is correctly represented.
- Around line 388-399: The current guard lets a 'block' verdict with non-empty
review.fixes fall through to renderReviewFixesForWriter; change the condition so
that the early return triggers for 'pass' OR 'block' OR when review.fixes is
empty, ensuring only verdict === 'fix' with non-empty fixes reaches
renderReviewFixesForWriter; update the if that checks review.verdict and
review.fixes.length accordingly around the return that supplies finalArtifact,
validation, personaMetadata, and reviewSummary.
In `@src/product/generation/types.ts`:
- Around line 229-233: The docstring for the review outcome enum/comment is
inaccurate: update the `block` description to reflect the current pipeline
behavior (it retains the writer artifact and records review metadata) instead of
saying it falls back to deterministic rendering; locate the comment near the
review outcome type/enum (e.g., the block description in types.ts where `pass`,
`fix`, and `block` are documented) and replace the misleading sentence with a
concise explanation that `block` preserves the writer artifact and stores review
metadata.
In `@src/product/generation/workforce-persona-reviewer.ts`:
- Around line 185-233: The current verdict extraction is brittle: replace the
raw regex/text scanning in extractFencedJson and extractTrailingJsonObject with
a Markdown-aware approach using mdast-util-from-markdown so we reliably find
JSON fences and trailing JSON objects; update extractFencedJson to parse the
output into an MDAST and return the first/last code node with lang "json"
(instead of using output.match), and update extractTrailingJsonObject to inspect
the AST for a final JSON code block or a trailing literal JSON object in the
last paragraph/text node before falling back to safeParse; ensure the main loop
that builds candidates still calls these functions (extractFencedJson,
extractTrailingJsonObject, safeParse) and keep normalizeFix usage unchanged so
behavior is preserved once a proper JSON payload is located.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 37b928a1-3dbf-48f4-801e-fca9afdb9af7
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (8)
package.jsonpersonas/agent-relay-workflow-review.jsonpersonas/agent-relay-workflow.jsonsrc/product/generation/pipeline.tssrc/product/generation/ricky-local-persona-resolver.tssrc/product/generation/types.tssrc/product/generation/workforce-persona-reviewer.tssrc/product/generation/workforce-persona-writer.test.ts
| function extractFencedJson(output: string): Record<string, unknown> | null { | ||
| const match = output.match(/```json\s*([\s\S]*?)```/i); | ||
| if (!match) return null; | ||
| return safeParse(match[1].trim()); | ||
| } |
There was a problem hiding this comment.
🟡 extractFencedJson extracts the first fenced JSON block, but docstring and intent say "last"
The parseReviewerVerdict docstring at line 176 says the function extracts "the last fenced ```json block", but extractFencedJson at line 210 uses output.match(regex) which returns the first match. Because extractFencedJson is the first candidate tried (line 186) and its result is accepted if it contains a valid verdict, the second candidate extractTrailingJsonObject (which correctly finds the last JSON object) is never reached.
This matters when an LLM reviewer emits chain-of-thought reasoning with an intermediate JSON block before the final verdict. For example, if the output contains a draft {"verdict": "fix", ...} followed by a final {"verdict": "pass", ...}, the parser picks up the wrong verdict. The persona prompt says "Return only the response contract" which reduces but does not eliminate this risk.
| function extractFencedJson(output: string): Record<string, unknown> | null { | |
| const match = output.match(/```json\s*([\s\S]*?)```/i); | |
| if (!match) return null; | |
| return safeParse(match[1].trim()); | |
| } | |
| function extractFencedJson(output: string): Record<string, unknown> | null { | |
| const matches = [...output.matchAll(/```json\s*([\s\S]*?)```/gi)]; | |
| if (matches.length === 0) return null; | |
| return safeParse(matches[matches.length - 1][1].trim()); | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
Ricky Eval ReviewRun: Passed: 8 | Needs human: 43 | Reviewable: 43 | Missing output: 0 | Failed: 0 | Skipped: 0 Human Review CasesThese cases passed deterministic checks and include captured Ricky output for a human verdict against their REVIEW
|
| Step | Action | Outcome |
|---|---|---|
| Parse arguments | --mode local tells Ricky to operate against the local repository; --spec supplies a natural‑language description of the desired workflow. |
The spec string is captured for downstream processing. |
| Spec → Contract translation | Ricky invokes the Spec‑to‑Contract engine, which produces a deterministic workflow contract (JSON/YAML) describing inputs, outputs, and verification gates. | You receive a contract file (e.g., workflow_contract.yaml) in the current directory. |
| Static validation | The contract is run through the Workflow Linter and Schema Validator. | Any syntax errors, missing required fields, or rule violations are reported as validation errors. |
| Preview generation | If the contract passes validation, Ricky renders a human‑readable preview of the workflow (markdown diagram + step list). | The preview is printed to stdout and also sav |
| ...[truncated] |
**Must**
- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.
**Must Not**
- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>
**User message**
> Render Ricky's first-run CLI onboarding for a new user.
**Ricky output**
```text
**Ricky – First‑Run CLI Onboarding**
Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the Ricky CLI. It follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic gates, review artifacts, 80‑100 validation loops), and the **shared authoring rules** (scoped branches/PRs, honest blocker reporting).
---
## 1️⃣ Repository Layout (for reference)
ricky/
├─ .github/
│ └─ workflows/
│ └─ onboarding.yml # CI gate for onboarding
├─ docs/
│ └─ onboarding.md # Human‑readable guide (generated)
├─ src/
│ └─ cli/
│ └─ onboarding/
│ ├─ main.ts # Entry point
│ ├─ steps/
│ │ ├─ check-prereqs.ts
│ │ ├─ init-config.ts
│ │ ├─ install-hooks.ts
│ │ └─ verify.ts
│ └─ contract.ts # Workflow contract
├─ tests/
│ └─ onboarding/
│ └─ *.test.ts # 80‑100 validation loops
├─ .ricky/
│ └─ branch/
│ └─ onboarding/ # Scoped branch for first‑run
└─ README.md
---
## 2️⃣ Scoped Branch & PR
| Item | Value |
|------|-------|
| **Branch** | `feature/on
...[truncated]
Must
- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.
Must Not
- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)
User message
A user gives Ricky an empty spec or a missing spec file.
Ricky output
### Handling an Empty or Missing Specification File
When a workflow is triggered with **no spec** (empty file, missing path, or malformed JSON/YAML), treat it as a **hard blocker** that must be reported and resolved before any further automation can proceed. The following contract‑driven pattern keeps the pipeline deterministic and makes the failure visible to both developers and the CI system.
---
## 1. Detection Gate (Pre‑flight)
| Step | Action | Outcome |
|------|--------|---------|
| **1.1** | **File existence check** – `fs.existsSync(specPath)` | If *false* → go to **Blocker Report**. |
| **1.2** | **Non‑empty check** – `fs.statSync(specPath).size > 0` | If *false* → go to **Blocker Report**. |
| **1.3** | **Parse validation** – try `JSON.parse` *or* `yaml.load` depending on extension. Catch syntax errors. | If *error* → go to **Blocker Report**. |
| **1.4** | **Schema validation** – run the spec against the canonical JSON‑Schema (`spec-schema.json`). | If *invalid* → go to **Blocker Report**. |
| **1.5** | **Pass** – continue to normal workflow. | |
All checks must be **synchronous** (or awaited) and must **fail fast** – the first failing gate aborts the run.
---
#
...[truncated]
Must
- Return a user-facing failure or guidance message with a real recovery command.
- Distinguish generation failure from execution failure.
- Show stack traces only when verbose diagnostic mode is requested.
Must Not
- Crash with an uncaught exception in normal mode.
- Suggest commands that do not exist.
- Pretend a missing spec was accepted.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)
User message
Render
ricky statuswhen no provider checks have proven Google or GitHub are connected.
Ricky output
**`ricky status` output (no provider checks have proven Google or GitHub are connected)**
=== AgentWorkforce Status Report ===
🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)
🔧 Last Run
• Workflow : idle
• Last Build : N/A
• Errors : none (no connectivity checks executed)
📊 System Health
• Scheduler : running
• Queue : empty
• DB : healthy
• Google and GitHub connectivity have not been validated.
• Any tasks that depend on these providers will remain pending until a successful check confirms the connection.
🛠️ Next Steps (recommended)
- Run provider health checks:
ricky check --provider google
ricky check --provider github - Verify OAuth / API credentials are correctly configured.
- Re‑run pending workflows once the providers report “CONNECTED”.
--- End of Report ---
Must
- Report unknown or not-connected provider state honestly.
- Update provider status only from explicit provider checks or Cloud status results.
- Give concrete setup guidance for Cloud when relevant.
Must Not
- Mark Google or GitHub connected because guidance text was shown.
- Invent a provider connection URL or OAuth flow.
- Show empty fields with no recovery guidance when config is missing.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)
User message
Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.
Ricky output
**Workflow: “GitHub Primitive Change”**
*Purpose*: Apply a single‑file edit (e.g., typo fix, comment update, config tweak) to a repository while guaranteeing deterministic verification, clear ownership, and safe merge.
---
## 1. Scope & Branching Contract
| Item | Definition |
|------|------------|
| **Change Type** | Primitive – one file, ≤ 50 lines changed, no structural refactor. |
| **Target Branch** | `main` (or the repository’s default branch). |
| **Feature Branch** | `fix/<short‑description>-<YYYYMMDD>` (e.g., `fix/readme‑typo‑20260513`). |
| **PR Title Template** | `fix: <short description>` |
| **PR Description Template** | • **Why** – brief rationale.<br>• **What** – list of changed lines (diff summary).<br>• **Verification** – steps executed (see §3). |
| **Merge Strategy** | Squash‑merge only; commit message = PR title. |
| **Rollback Policy** | If any gate fails after merge, revert the squash commit via a new PR titled `revert: <original PR title>`. |
---
## 2. Skill Set Selected by Ricky
| Skill | Role in the workflow |
|-------|----------------------|
| **GitOps** | Handles branch creation, rebasing, and PR lifecycle. |
| **Static Linting** | Runs
...[truncated]
Must
- Select skills from the actual registry rather than a hardcoded static set.
- Record ranked skill matches with confidence and trigger evidence.
- Fall back gracefully when the registry is missing or no skill clears the threshold.
Must Not
- Claim runtime agents embody skills when only generation-time guidance was applied.
- Fail generation solely because optional skill files are missing.
- Hide skill selection evidence from artifacts or JSON output.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)
User message
Generate a workflow where the spec says "use Claude to review and Codex to implement".
Ricky output
**Workflow: “Claude‑Review → Codex‑Implement”**
*Purpose*: Automate a deterministic pipeline where Claude (LLM‑reviewer) validates design/spec changes and Codex (code‑generation model) produces the implementation. The workflow is scoped to a single feature branch and ends with a merge‑ready PR.
---
## 1. Contract Overview
| Element | Description |
|---------|-------------|
| **Input** | `feature‑spec.yaml` (validated against `spec-schema.json`). |
| **Output** | `feature‑impl/` directory containing generated source files, a `review‑report.md`, and a signed `implementation‑manifest.json`. |
| **Success Criteria** | • Claude review score ≥ 0.85 (semantic correctness, security, style). <br>• Codex compilation passes 100 % of unit tests. <br>• All verification gates pass. |
| **Failure Mode** | Any gate < threshold → **Blocker** reported to `#workflow‑ops` and pipeline aborts. |
---
## 2. Branch & PR Scope
- **Branch**: `feature/<ticket‑id>-claude‑codex` (created from `develop`).
- **PR Title**: `feat(<module>): <short description> – reviewed by Claude, implemented by Codex`.
- **PR Labels**: `auto‑generated`, `review‑by‑claude`, `implementation‑by‑codex`.
All artifa
...[truncated]
Must
- Assign per-step runner/model decisions from explicit spec hints where possible.
- Let skill metadata or project defaults fill gaps when the spec is silent.
- Write
tool-selection.jsonor equivalent audit metadata.
Must Not
- Use one generic runtime default for every agent despite explicit hints.
- Let skill defaults override explicit user/spec runner hints.
- Omit the reason each tool or model was chosen.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)
User message
Use
--refineto sharpen a generated workflow's step task descriptions and acceptance gates.
Ricky output
### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates
`--refine` is a deterministic post‑processing flag that takes an existing workflow JSON/YAML and rewrites the **`task.description`** and **`acceptance.gate`** fields for every step. The goal is to make each step **action‑oriented**, **measurable**, and **verifiable** while preserving the original logical ordering.
Below is a concrete contract, a minimal example, and the exact command‑line usage you can copy‑paste.
---
## 1. Contract Overview
| Input | Format | Required fields |
|-------|--------|-----------------|
| `workflow` | JSON or YAML | `steps[*].id`, `steps[*].task.description`, `steps[*].acceptance.gate` |
| `--refine` | flag | – |
| Output | Format | Guarantees |
|--------|--------|------------|
| `refined_workflow` | Same format as input | • Every `task.description` is a single imperative sentence (≤ 20 words). <br>• Every `acceptance.gate` is a boolean expression using only **observable** artifacts (files, API responses, DB rows). <br>• No new steps are added or removed; step IDs remain unchanged. <br>• The output validates against the **WorkflowSchema v2** JSON‑Schema. |
If any step cannot
...[truncated]
Must
- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.
Must Not
- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)
User message
Generate a workflow for the
ricky --versionspec.
Ricky output
**Workflow: `ricky --version` Specification & Delivery**
*Repository:* `github.com/agentworkforce/ricky`
*Target Branch:* `main` → `feature/ricky-version`
---
## 1. Scope & Acceptance Criteria
| ID | Requirement | Success Metric |
|----|-------------|----------------|
| V‑001 | `ricky --version` prints the current CLI version in **semantic‑version** format (`MAJOR.MINOR.PATCH`). | Output matches regex `^\d+\.\d+\.\d+$`. |
| V‑002 | The command exits with **status 0** on success. | Exit code `0`. |
| V‑003 | The version string is sourced from a **single source of truth** (`pkg/version/version.go`). | No hard‑coded literals elsewhere. |
| V‑004 | The binary reports the same version as the **Git tag** of the commit that built it. | `git describe --tags --exact-match` equals the printed version. |
| V‑005 | The feature is covered by **unit tests** (≥ 95 % line coverage) and an **integration test** that runs the compiled binary. | `go test ./... -cover` ≥ 95 % for the package; integration test passes on CI. |
| V‑006 | Documentation in `README.md` and `docs/cli.md` includes a **Usage** snippet for `--version`. | Docs render correctly on the site build. |
| V‑007 | The change
...[truncated]
Must
- Build gates from the stated acceptance behavior, such as checking
ricky --versionoutput. - Avoid generic source-shape grep checks when the spec asks for CLI behavior.
- Keep generated validation meaningful for the current repo shape.
Must Not
- Treat
grep -Eq 'export|function|class|workflow(' dist/ricky.jsas proof of version behavior. - Claim the workflow is proven by source syntax alone.
- Ignore the package-json version resolution order in the spec.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)
User message
Generate a workflow for many independent artifacts with a validation/fix/rerun loop.
Ricky output
**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**
*Purpose*: Process a large collection of independent artifacts (e.g., data files, code modules, config bundles) through deterministic validation, automatic fixing where possible, and controlled re‑execution until all artifacts reach a **PASS** state or are reported as **BLOCKED**.
---
## 1. Scope & Boundaries
| Aspect | Definition |
|--------|------------|
| **Artifacts** | Any unit that can be validated independently (e.g., `*.json`, `*.py`, `*.csv`). |
| **Batch size** | Configurable; default 100 artifacts per run to keep CI resources bounded. |
| **Branch/PR** | All changes are made on a feature branch `batch‑validation‑<timestamp>`. A single PR aggregates the final “all‑pass” commit. |
| **Owner** | `@team‑data‑ops` (or appropriate domain team). |
| **Blocker policy** | If an artifact cannot be auto‑fixed after **N=3** attempts, the workflow aborts for that artifact and files a **BLOCKER** issue with reproducible steps. |
---
## 2. Contract (Inputs / Outputs)
| Contract | Type | Description |
|----------|------|-------------|
| `artifact_list` | **Input** (JSON) | Array of artifact identifiers (paths or IDs). |
| `
...[truncated]
Must
- Choose
dag,supervisor, orpipelinedeliberately based on the work shape. - Explain the pattern choice in artifact metadata or a rationale.
- Use
dagfor validation/fix/rerun loops when dependencies matter.
Must Not
- Default blindly to
dagfor every workflow. - Collapse independent artifact work into one vague agent task.
- Omit verification gates because the chosen pattern seems obvious.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)
User message
Describe how selected workflow-writing skills affected a generated Ricky workflow.
Ricky output
### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow
Below is a concise “skill‑impact map” that shows the concrete ways each of the core Ricky writing skills is reflected in a generated workflow. The example workflow is a **“Customer‑Onboarding Ticket Triage”** pipeline that moves a new support request from intake to the appropriate product‑team queue.
| **Ricky Skill** | **What the Skill Enforces** | **Resulting Artifact in the Workflow** | **Why It Matters** |
|-----------------|----------------------------|----------------------------------------|--------------------|
| **Deterministic Contract Definition** | Every step declares its **input schema**, **output schema**, and **versioned contract ID**. | ```yaml<br>step: ingest_ticket<br>contract: ticket_v2<br>in: { raw_json: string }<br>out: { ticket: TicketV2 }``` | Guarantees downstream steps can validate the payload without runtime guessing. Any contract change forces a new version bump, preventing silent breakage. |
| **Verification Gates** | After each step a **gate** runs a schema‑validation test and a **semantic sanity check** (e.g., “priority must be 1‑5”). | ```yaml<br>gate: validate_ticket_schema
...[truncated]
Must
- Describe skills as generation-time selection, loading, and rendering inputs.
- Point to metadata such as
loaded-skills.txtandskill-application-boundary.json. - Say the workflow instructions were informed by selected skills.
Must Not
- Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
- Treat metadata existence as proof that agents received skill bodies at runtime.
- Overstate current tests beyond the generation boundary.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)
User message
Generate a workflow from a vague product spec and return it to the user.
Ricky output
**Ricky – Workflow Generation Assistant**
---
### 📋 Quick Summary
Because the product specification you provided is vague, I need a few concrete details before I can author a reliable, test‑driven workflow that meets our standards (deterministic verification gates, review artifacts, 80‑100 validation loops, etc.).
Please fill in the sections below. Once I have this information, I’ll generate a complete workflow contract, implementation skeleton, and verification plan ready for branch/PR creation.
---
## Required Specification Details
| # | Detail | Why It Matters |
|---|--------|----------------|
| 1 | **Workflow Goal** – What business outcome should the workflow achieve? (e.g., “process new user sign‑ups”, “generate nightly sales report”) | Defines the primary success criteria and downstream dependencies. |
| 2 | **Trigger(s)** – Event(s) that start the workflow. (e.g., webhook, schedule, message queue) | Determines the entry point and required adapters. |
| 3 | **Inputs** – Data schema for each trigger (JSON schema, required fields, types). | Enables deterministic validation at the “Input Validation” gate. |
| 4 | **Core Steps** – Ordered list of high‑level actions (e.
...[truncated]
Must
- Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
- Validate with dry-run or targeted structural checks where possible.
- Return artifacts, warnings, and follow-up commands honestly.
Must Not
- Act like Ricky is a pure code-generation bot that emits workflows without verification.
- Stop at "code compiles" as the proof bar.
- Skip skill-aware workflow authoring guidance for serious workflows.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)
User message
A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.
Ricky output
**Ricky’s Immediate Response Plan**
| Phase | Action | Why | Deliverable / Artifact |
|-------|--------|-----|------------------------|
| 1️⃣ Detect & Record | • Capture the timeout exception details (step name, elapsed time, input payload). <br>• Log the “alive‑but‑idle” worker’s heartbeat timestamps and lack of output. | Provides an immutable audit trail for post‑mortem and for the verification gate. | `failure_log.md` (JSON + human‑readable summary) stored in the workflow’s **/logs** folder. |
| 2️⃣ Blocker Declaration | • Mark the workflow run as **BLOCKED** in the orchestration DB. <br>• Emit a **Ricky‑Blocker** event with severity *high* and a short “root‑cause hint” (timeout + missing artifact). | Guarantees downstream pipelines do not consume partial results and surfaces the issue to the team instantly. | `blocker_event.json` published to the **ricky-events** topic. |
| 3️⃣ Automated Containment | • Abort any downstream steps that have already been scheduled. <br>• If the worker is still alive, send a graceful shutdown signal; if it does not respond within 30 s, force‑kill it. | Prevents resource leakage and avoids cascading failures. | `containment_report.md` (status o
...[truncated]
Must
- Classify the failure before deciding whether to retry.
- Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
- Preserve raw evidence and uncertainty when the class is not known.
Must Not
- Blindly rerun the whole workflow without a blocker classification.
- Treat every failure as a broken workflow definition.
- Claim Ricky fixed the workflow before rerun evidence proves it.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)
User message
Ricky detects stale
.agent-relay/,.relay/, and.trajectories/state before launching a local workflow.
Ricky output
**Workflow Contract – Stale‑State Detection for Local Launches**
| **Stage** | **Input** | **Output** | **Success Criteria** |
|-----------|-----------|------------|----------------------|
| **1. Pre‑launch Scan** | Path to project root (absolute or relative) | `scan‑report.json` | All target directories (`.agent‑relay/`, `.relay/`, `.trajectories/`) enumerated; report written without error. |
| **2. Staleness Evaluation** | `scan‑report.json` | `stale‑flags.json` | Each directory flagged **true** if any of the following holds: <br>• Contains files older than `MAX_AGE_MS` (default = 7 days). <br>• Contains a lock file (`*.lock`) with a timestamp older than `MAX_LOCK_AGE_MS` (default = 1 hour). <br>• Directory size exceeds `MAX_SIZE_BYTES` (default = 500 MiB). |
| **3. Decision Gate** | `stale‑flags.json` | `gate‑decision.json` | **PASS** if all flags are `false`. **FAIL** if any flag is `true`. |
| **4. Reporter / Blocker** | `gate‑decision.json` | Human‑readable markdown (`STALE‑STATE‑REPORT.md`) + optional CI‑style annotation | If **FAIL**, the workflow aborts and emits a blocker message with remediation steps. If **PASS**, the workflow proceeds to the normal launch sequence.
...[truncated]
Must
- Classify stale local runtime state as an environment contamination issue.
- Recommend quarantine or isolated-run guidance before launch.
- Record the observed paths and the action taken or recommended.
Must Not
- Treat stale runtime state as a workflow logic failure.
- Delete or overwrite state without an explicit safe path or user intent.
- Continue into execution as if the workspace were clean.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)
User message
A run marker says another Ricky or Relay run is already active in this workspace.
Ricky output
**Resolving “Another Ricky/Relay run is already active in this workspace”**
The marker that blocks a new run is a safety lock placed in the workspace’s **`.ricky/`** directory. It is created when a Ricky or Relay execution starts and is removed automatically when the process exits cleanly. If the marker is left behind (e.g., due to a crash, forced termination, or a stray background job), the next invocation will see the stale lock and abort with the message you’re seeing.
Below is a deterministic, step‑by‑step repair workflow that you can run locally. It follows the **Ricky repository conventions** (AGENTS.md) and respects the **workflow standards** for verification and review.
---
## 1️⃣ Verify the Current State
```bash
# Show the lock file (if any)
ls -l .ricky/run.lock
# Show the PID stored inside (if the file exists)
cat .ricky/run.lock 2>/dev/null
# Check for any Ricky/Relay processes still alive
ps -ef | grep -E '(ricky|relay)' | grep -v grep
If the lock file does not exist, the error is coming from a stale cache; skip to step 4.
2️⃣ Determine Whether the PID Is Alive
The lock file contains the PID of the process that created it.
LOCK_PID=$(cat
...[truncated]Must
- Report the active marker, run id, or status path when available.
- Ask the user to inspect, wait for, or explicitly clear the active run.
- Avoid launching a competing run that could corrupt evidence.
Must Not
- Silently start another run.
- Hide the existing run marker from the user.
- Treat the conflict as a generic failure with no recovery path.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)
User message
A workflow artifact references a
MISSING_ENV_VARvalue. Ricky's deterministic auto-fix injects the.env.local/.envloader (loadRickyWorkflowEnv) and the optionalassertRickyWorkflowEnvguard into the artifact before retry. The artifact may be a master-rendered workflow whose.step({ command: ... })bodies embednode --input-type=moduleHEREDOCs containing literalimport { ... } from 'node:fs'/from 'node:path'strings.
Ricky output
**Deterministic auto‑fix for missing environment variables**
Below is the exact, reproducible patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix injects the **Ricky environment loader** (`loadRickyWorkflowEnv`) and the **optional guard** (`assertRickyWorkflowEnv`) **once**, at the top of the generated file, and then re‑executes the workflow. It works for both plain‑JS steps and for steps that embed a `node --input-type=module` HEREDOC with literal `import … from 'node:fs'` / `import … from 'node:path'` strings.
---
## 1. Contract – What the patch guarantees
| ✅ | Guarantee |
|---|-----------|
| **Deterministic** | The same source → same patched output (no random IDs). |
| **Idempotent** | Re‑applying the patch to an already‑fixed artifact is a no‑op. |
| **Scope‑limited** | Only the top‑level module of the artifact is touched; inner HEREDOCs remain unchanged. |
| **Verification gate** | After patching, a **pre‑run validation step** (`assertRickyWorkflowEnv`) fails fast if any required env var is still missing. |
| **Rollback safety** | The original file is saved as `<filename>.bak.<timestamp>` before modification
...[truncated]
Must
- Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected
loadRickyWorkflowEnvbody referencesrickyWorkflowFs.*andrickyWorkflowPath.*, so the repair must also add the correspondingimport * as rickyWorkflowFs from 'node:fs'andimport * as rickyWorkflowPath from 'node:path'aliases at module top level. - Detect existing alias imports by matching real top-level
import * as <alias> from '<module>'statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in.step({ command: ... })bodies do not count as imports). - Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.
Must Not
- Skip adding the
rickyWorkflowFs/rickyWorkflowPathaliases becausefrom 'node:fs'orfrom 'node:path'already appears somewhere in the file as a string literal. - Inject
loadRickyWorkflowEnv(orassertRickyWorkflowEnv) without the supporting alias imports, which produces aReferenceError: rickyWorkflowPath is not definedat module load and burns the auto-fix budget onUNSUPPORTED_RUNTIME at runtime-launch. - Rewrite or escape the embedded HEREDOC text in step commands.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)
User message
Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.
Ricky output
**Local Workflow Execution – Auto‑Fix Mode**
Below is a deterministic, contract‑driven procedure you can run in your development environment. It assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failing workflow artifact is stored in the local repository under `workflows/`. The failed step name and the previous run ID are known.
---
## 1. Preconditions (Verification Gate)
| Check | Command | Expected Result |
|-------|---------|-----------------|
| CLI installed | `aw --version` | `aw version x.y.z` |
| Repo clean | `git status --porcelain` | No untracked or modified files |
| Workflow artifact present | `ls workflows/<workflow‑name>.yaml` | File exists |
| Previous run metadata available | `cat .aw/run‑metadata/<prev‑run‑id>.json` | JSON with `status: "failed"` and `failedStep: "<step‑name>"` |
| Auto‑fix feature enabled in config | `cat .aw/config.yaml \| grep autoFix` | `autoFix: true` |
If any gate fails, resolve it before proceeding and re‑run the gate.
---
## 2. Scoped Branch & PR Boundary
1. **Create a short‑lived branch** for the repair attempt:
```bash
git checkout -b aw/auto‑fix/<workflow‑name>/<prev‑run‑id>
- **Do not pus
...[truncated]
**Must**
- Use a bounded retry budget and summarize every attempt.
- Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
- Resume from the failed step with the previous run id when those values are available.
**Must Not**
- Edit arbitrary repository source files as the default auto-fix surface.
- Keep retrying after the configured max attempts.
- Lose the single Ricky tracking run id across repair/resume attempts.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.no-auto-fix-preserves-single-attempt</code> (runtime-recovery/manual)</summary>
**User message**
> A user runs `ricky run workflows/foo.ts --no-auto-fix` and the workflow fails.
**Ricky output**
```text
**Quick‑Start Troubleshooting Checklist**
| Step | Action | Expected outcome |
|------|--------|-------------------|
| 1️⃣ | **Collect the run log**<br>`ricky logs workflows/foo.ts --last` | A full JSON/NDJSON log file (`foo.run.log`) in `./ricky/logs/` |
| 2️⃣ | **Locate the failure gate**<br>Open the log and search for `"gate":"FAIL"` | Identify the exact gate (e.g., `validate‑schema`, `run‑test‑suite`, `publish‑artifact`) |
| 3️⃣ | **Read the error payload**<br>Copy the `error.message` and `error.stack` fields | Gives the deterministic reason (syntax error, contract violation, external service timeout, etc.) |
| 4️⃣ | **Run the failing gate in isolation**<br>`ricky run workflows/foo.ts --gate <gate‑id> --no-auto-fix` | Confirms reproducibility and isolates side‑effects |
| 5️⃣ | **Validate the workflow contract**<br>`ricky contract validate workflows/foo.ts` | Should return `OK` if the contract JSON‑Schema matches the source. If not, you’ll see a list of mismatched fields. |
| 6️⃣ | **Check external dependencies**<br>`ricky deps status` | Ensures all required services (DB, API keys, Docker images) are reachable and at the correct version. |
| 7️⃣ | **Run the unit‑test suite**<b
...[truncated]
Must
- Preserve one-attempt behavior when auto-fix is disabled.
- Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
- Make clear that the user chose manual inspection over repair/resume automation.
Must Not
- Start a repair loop despite
--no-auto-fix. - Suppress the diagnosis because no repair was attempted.
- Present the failure as a completed repair attempt.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)
User message
Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.
Ricky output
**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**
Below is the concrete, repeatable contract that Ricky follows whenever a developer asks to run a TypeScript‑based workflow that lives in the repository’s *primary local path* (by default `./workflows`). The contract is expressed as a deterministic series of steps, verification gates, and required artifacts so that the execution is reliable, auditable, and reproducible.
---
## 1. Preconditions (Input Contract)
| Item | Requirement | Source |
|------|-------------|--------|
| **Workflow source** | A single `.ts` file (or an `index.ts` entry point) located under `./workflows/<workflow‑name>/` | Repository |
| **Node runtime** | Node ≥ 18.x (LTS) installed and available on `$PATH` | System |
| **Package manager** | `npm` (≥ 9.x) or `pnpm` (≥ 8.x) – whichever is declared in `./workflows/<workflow‑name>/package.json` | Repo |
| **TypeScript compiler** | `typescript` listed as a dev‑dependency (exact version pinned) | `package.json` |
| **ts-node** | `ts-node` listed as a dev‑dependency (exact version pinned) | `package.json` |
| **Environment config** | Optional `.env` file in the workflow directory; va
...[truncated]
Must
- Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the
agent-relaybinary on PATH. - Precheck that Node and
@agent-relay/sdkare resolvable for the workflow. - Record the actual spawn command in execution evidence.
Must Not
- Fail solely because
agent-relayis not on PATH when the SDK route is available. - Hide the actual runtime command from evidence.
- Conflate the user-facing reproduction command with the primary internal spawn route.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)
User message
Ricky reaches a boundary after a structural failure persists after a fix attempt.
Ricky output
**Ricky’s Boundary‑Handling Protocol for Persistent Structural Failures**
When a structural failure in an AgentWorkforce workflow remains after an attempted fix, Ricky follows a deterministic, contract‑driven protocol that isolates the problem, records evidence, and escalates cleanly. The steps below are the **minimum required contract** for any such “boundary” event. All teams should implement them as a reusable sub‑workflow (`wf/structural‑failure‑boundary.yaml`) and reference it from the parent workflow that detected the failure.
---
## 1. Contract Preconditions
| Item | Requirement |
|------|-------------|
| **Failure detection** | A `structural_failure` event emitted by a verification gate (e.g., `gate/graph‑integrity.yaml`). |
| **Fix attempt** | A `fix_attempt` record containing: <br>• `fix_id` (UUID) <br>• `author` (GitHub handle) <br>• `branch` (short‑lived fix branch) <br>• `timestamp` |
| **Verification** | A deterministic gate (`gate/structural‑validation.yaml`) that re‑runs the full integrity suite on the candidate fix. |
| **Result** | Gate returns `FAIL` **and** the failure signature matches the original failure (same `error_code` and `node_path`). |
If any prec
...[truncated]
Must
- Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
- Distinguish escalation from a generic product failure.
- Preserve enough context for a human operator to continue.
Must Not
- Retry speculative fixes indefinitely.
- Collapse to "something went wrong" without the attempted actions.
- Discard evidence from failed repair attempts.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)
User message
Produce a workflow health digest from many Ricky workflow runs.
Ricky output
**Ricky Workflow Health Digest – [Date Range]**
---
### 1. Executive Summary
| Metric | Value | Trend |
|--------|-------|-------|
| Total Runs (all workflows) | **1 842** | ↗︎ +12 % vs. prior period |
| Successful Runs | **1 617** | ↗︎ +9 % |
| Failed Runs | **225** | ↘︎ –4 % |
| Blocked Runs (manual intervention required) | **38** | ↔︎ steady |
| Avg. Cycle Time (per run) | **7 m 32 s** | ↘︎ –15 % |
| Avg. Verification Gate Pass Rate | **94 %** | ↗︎ +2 % |
> **Health Rating:** **🟢 Green** – All core SLAs met; blockers are isolated and being addressed.
---
### 2. Run‑Level Detail (Top 5 by impact)
| Workflow | Run ID | Status | Duration | Gate Pass % | Blocker? | Notes |
|----------|--------|--------|----------|-------------|----------|-------|
| **Deploy‑Prod** | `run‑20240528‑a1b2c3` | ✅ Success | 5 m 12 s | 100 % | No | Full contract validation passed. |
| **Data‑Ingest‑Nightly** | `run‑20240527‑d4e5f6` | ❌ Failure | 9 m 45 s | 78 % | Yes | Missing schema file; auto‑retriable. |
| **Feature‑Flag‑Rollout** | `run‑20240526‑g7h8i9` | ✅ Success | 3 m 58 s | 100 % | No | Gate‑2 (canary health) passed. |
| **Security‑Scan** | `run‑20240525‑j0k1l2` | ⚠️ Blocked | 2 m 30 s | 92
...[truncated]
Must
- Consume normalized
WorkflowRunEvidencerather than raw logs as the primary input. - Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
- Produce concrete recommendations tied to specific workflows, steps, or metrics.
Must Not
- Mutate evidence while analyzing it.
- Return generic advice like "improve your workflow" without references.
- Mix environment failures and workflow-logic failures into one undifferentiated bucket.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)
User message
Design a new Ricky Slack surface for workflow debugging.
Ricky output
## Ricky Slack Surface – “Workflow Debugger”
### 1. Overview
The **Ricky Slack Debugger** is a dedicated Slack app surface that lets engineers and product owners inspect, step‑through, and repair Ricky‑managed workflows in real time. It follows the **Ricky repository conventions** (AGENTS.md) and the **workflow standards** for deterministic verification, review artifacts, and bounded PR cycles.
| Component | Purpose | Primary Slack Interaction |
|-----------|---------|---------------------------|
| **Debug Console** | Show live state, step history, and error traces | `/ricky debug <workflow‑id>` |
| **Action Replayer** | Replay a single action or the whole run in a sandbox | `/ricky replay <run‑id> [--step N]` |
| **Patch Suggestion** | Auto‑generate a minimal PR that fixes the failure | `/ricky suggest‑fix <run‑id>` |
| **Verification Gate** | Run deterministic tests before applying a fix | `/ricky verify <pr‑id>` |
| **Blocker Reporter** | Surface “hard blockers” (e.g., missing contracts) | Automatic DM to workflow owner |
All commands are **ephemeral** by default (visible only to the requester) and can be **pinned** to a channel for team visibility.
---
### 2. Contract‑Firs
...[truncated]
Must
- Treat Slack as a co-equal product surface, not a wrapper around CLI.
- Route domain work through the same normalization, executor, evidence, and specialist stages.
- Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.
Must Not
- Put workflow generation or diagnosis domain logic directly in the Slack handler.
- Degrade Slack to a developer shortcut with weaker routing than CLI.
- Skip signature verification, dedup, or thread handling when the surface is implemented.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)
User message
Add a new web handoff type that submits a workflow spec and mode.
Ricky output
**New Hand‑off Type:** `web_submit_spec`
**Purpose**
Enable external web clients (e.g., UI portals, API gateways) to hand off a **complete workflow specification** together with an **execution mode** (`preview` | `run`) to the AgentWorkforce engine. The engine validates the spec, stores it, and either returns a dry‑run preview or launches the workflow.
---
## 1. Contract Definition (YAML)
```yaml
# contracts/web_submit_spec.yaml
type: handoff
name: web_submit_spec
description: |
Accepts a full workflow specification (JSON/YAML) and an execution mode.
Returns a deterministic response indicating success, preview data, or error.
payload:
spec:
type: object
description: Full AgentWorkforce workflow definition (conforms to contracts/workflow_spec.yaml)
required: true
mode:
type: string
enum: [preview, run]
description: |
* preview – validate and simulate the workflow, returning a step‑by‑step plan.
* run – store the spec and enqueue it for execution.
required: true
response:
status:
type: string
enum: [accepted, rejected, error]
preview:
type: object
description: Present only when mode=preview and validation
...[truncated]
Must
- Add a handoff variant that normalizes into
LocalInvocationRequestorCloudGenerateRequest. - Keep auth/session validation in the web surface before normalization.
- Ensure downstream code does not need to know the request came from web.
Must Not
- Short-circuit directly from the web handler to executors.
- Create a parallel domain model for web requests.
- Make the web surface the owner of local or Cloud routing semantics.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)
User message
A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.
Ricky output
**Decision:** Generate a new workflow (authoring)
**Rationale**
1. **Input Type** – You supplied a *workflow specification* together with *conversation context*. This is the exact trigger for the **“generate”*
---
_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._
`rickyLocalPersonaDir()` hardcoded `..,..,..` from the source file path,
which works under tsx but resolves outside the package root when running
from `dist/ricky.js` (one parent only) or
`node_modules/@agentworkforce/ricky/dist/ricky.js`. The smoke test of
`ricky --mode local --spec-file deploy-v1-schema-cascade-spec.md`
against the bundled CLI exposed this: the resolver couldn't find the
persona JSONs and silently fell through to the package resolver.
Replace the hardcoded walk with an upward probe (up to 8 parents)
looking for the sentinel `personas/agent-relay-workflow.json`. Works
under tsx, the bundle, and npm install. Also add `personas` to
`package.json` `files` so the JSONs ship with the npm tarball (verified
via `npm pack --dry-run`).
Smoke evidence from the post-fix run against
`workforce/docs/plans/deploy-v1-schema-cascade-spec.md`:
workforce_persona.warnings[0] =
"Ricky-local Claude persona override resolved for intent
\"agent-relay-workflow\" at tier \"best-value\"."
The Sonnet writer then timed out on this particular 1451-line spec
(harness `claudeSettings.timeoutSeconds: 1200`), so the pipeline fell
back to the deterministic renderer — that's the documented behavior
when the writer fails. Tuning the writer prompt to reference the spec
by path instead of inlining it is a separate change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Smoke test result — Sonnet resolver wiring confirmed end-to-end. Ran the bundled node dist/ricky.js --mode local --spec-file <spec> --no-run --jsonResult from {
"personaId": "unresolved",
"warnings": [
"Ricky-local Claude persona override resolved for intent \"agent-relay-workflow\" at tier \"best-value\".",
"Workforce persona writer failed (Workforce persona writer did not complete: timeout.); used Ricky deterministic renderer instead."
],
"selectedIntent": "agent-relay-workflow",
"outputPath": "workflows/generated/ricky-ricky-workflow-spec-deploy-v1-schema-cascade-persona-refactor-status-ready-for-r.ts",
"promptInputs": { "relevantFileCount": 81, ... }
}What this proves:
What needs follow-up (separate work):
Bundle-path fix: 421241d. Persona JSONs verified to ship with the npm tarball via |
The persona writer was inlining the entire normalized spec JSON into
its prompt, which carried the spec text through four redundant fields
(`description`, `desiredAction.summary`, `desiredAction.specText`,
`sourceSpec.description`, `sourceSpec.rawPayload.text`). For the
deploy-v1-schema-cascade-spec.md run (82 KB source file, 81 relevant
files) the writer task ballooned to ~500 KB. The Sonnet writer
trimmed/spawned by harness-kit had to chew through a prompt that
shape that the persona's own system prompt explicitly told it not to
inline ("reference the spec by path rather than inlining it"); the
writer ultimately hit its 1200s harness timeout and the pipeline fell
back to the deterministic renderer.
This change:
- Adds `summarizeSpecForPersona()` that returns a normalized-spec
clone with:
- `description` and `targetContext` truncated to a 32 KB head + tail
elision marker when oversized.
- `desiredAction.summary` capped at 4 KB and `desiredAction.specText`
fully elided (it duplicates `description` on natural-language
specs).
- `sourceSpec.description` replaced with a back-reference marker.
- `sourceSpec.rawPayload.text` (or `.data` / `.arguments` for the
other payload kinds) replaced with an elision marker that records
the original byte count so the persona knows what was trimmed.
- Adds `summarizeRelevantFilesForPersona()` that caps each relevant
file at 8 KB and the total relevant-file budget at 96 KB; files
past the total budget are kept as path-only entries with an
`omitted: true` marker so the persona still sees the full list.
- Adds an optional `specPath` writer option threaded through from
`--spec-file`. When set, the task body cites the absolute path and
tells the persona to Read the file when it needs more detail than
the summary; the persona's system prompt already instructs this,
so the task body now matches that contract.
- Truncates `validationFeedback.previousContent` at 16 KB so repair
loops don't reinflate the prompt with the previously-rejected
artifact body.
Verified offline against the deploy-v1 spec: the writer task body
went from ~500 KB to ~61 KB (no relevant files; with all 81 files
inlined the total caps out near 160 KB, well inside Sonnet's context
window).
Adds 7 unit tests covering: rawPayload elision invariant, no-op when
description fits, head-tail truncation on oversized description,
per-file content cap with omission counts, total-budget elision of
file contents, spec-by-path reference + truncation note, and a
sentinel test proving the raw spec text never reaches the task body
verbatim.
`npx tsc --noEmit` clean, `npx vitest run` → 1094 passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Trim follow-up — Sonnet writer now completes against a real Claude spawn. Pushed 6b4c484 on top of this branch (the writer-task trim). Re-ran the smoke locally against a smaller spec ( Process tree mid-run showed the child correctly attached: Final outcome from {
"warnings": [
"Workforce persona writer failed (Workforce persona response must be structured JSON or include fenced TypeScript artifact and JSON metadata blocks.); used Ricky deterministic renderer instead."
],
"selectedIntent": "agent-relay-workflow"
}So Sonnet completed (no timeout) — the remaining failure mode is Offline measurement against the deploy-v1 spec
With all 81 relevant files inlined the writer task caps near ~160 KB total (per-file cap 8 KB, total cap 96 KB), well inside Sonnet's 200 K-token context budget. Still open as separate follow-ups (not this PR)
🤖 Generated with Claude Code |
When the workforce persona writer or reviewer pass through Claude
Sonnet/Opus but the parser rejects the response, the raw output is
discarded today — leaving operators blind about whether the issue is
"Sonnet returned prose with a code block" vs "Sonnet returned a JSON
object with the wrong field name" vs something else.
Adds a `dumpPersonaDebug()` helper that persists the raw persona
output, the prompt that produced it, and the selection/result
metadata under `<repoRoot>/.workflow-artifacts/ricky-persona-debug/`
on every writer/reviewer failure path:
.workflow-artifacts/ricky-persona-debug/
├── writer/
│ └── <promptDigest>-<reason>/
│ ├── output.raw.txt (verbatim harness stdout)
│ ├── task.prompt.txt (the prompt Ricky sent)
│ └── meta.json (selection, status, stderr, durationMs)
└── reviewer/
└── <promptDigest>-<reason>/
└── …
`reason` is one of `noncompletion` (harness exited non-zero or hit a
timeout), `parse-error` (parser rejected the body), `no-content`
(parser accepted but artifact.content was missing), or `success`.
Success-path dumps are gated behind `RICKY_PERSONA_DEBUG=1` so green
production runs do not litter the artifact tree.
The dump-write itself fails open — any error (permission, missing
parent, etc.) is swallowed and only surfaced when
`RICKY_PERSONA_DEBUG_VERBOSE=1` is set. The debug capture must never
mask the original writer/reviewer error.
The reviewer's parser synthesizes a canned `block` verdict when its
JSON extractor returns nothing parseable; the dump caller detects
that exact canned summary string and routes the dump into the
`parse-error` directory so unparseable reviewer responses are
discoverable the same way as writer parse failures.
4 new tests cover the writer + reviewer dump paths, the success-path
opt-in, and the silent-failure-on-bad-path contract. `npx vitest run`
→ 1098 passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Sonnet
The persona-debug dump from the deploy-v1 smoke run captured Sonnet's
actual response shape: a one-line prose preamble ("Now I have enough
context. I'll generate the workflow artifact.") followed by an opening
```json fence with NO matching closing fence, followed by the verbatim
JSON payload through end of output. Both `parseJsonObject(output)`
(direct JSON) and `fencedBlock(output, 'json')` (paired fences) fail
on this shape, even though the JSON payload itself is structurally
valid and contains the exact `{artifact, metadata}` contract the
writer expects.
Add a tolerant fallback path: `extractFirstBalancedJsonObject()` scans
the response for the first top-level `{ ... }`, tracking string
literals and escape sequences so a `}` inside `artifact.content` (the
embedded TypeScript) does not prematurely close the candidate. The
parser tries direct-JSON, then paired ```json/```ts fences, then the
balanced-extract fallback before throwing.
Picks up:
- "preamble text\n```json\n{ ... }" (unclosed fence — Sonnet)
- "preamble text\n{ ... }" (no fence at all)
- "{ ... }\ntrailing prose" (trailing prose after JSON)
The prompt-tightening alternative was considered and skipped: Sonnet
follows the existing "return only the response contract" instruction
inconsistently (it emitted prose despite the instruction); making the
parser tolerant fixes both the observed failure and adjacent variants
without depending on prompt-following discipline that's already
flaky.
Two new tests use the exact Sonnet-shaped sample from the smoke
capture, plus a no-fence variant. Verified locally against the
captured `output.raw.txt` from the failing smoke run — parser now
returns `responseFormat: 'structured-json'` with the full 9 KB
workflow content intact.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
✅ End-to-end smoke confirmed — Opus reviewer approved a real Sonnet-authored workflowFinal smoke run against
{
"personaId": "agent-relay-workflow",
"tier": "best",
"harness": "claude",
"model": "claude-opus-4-7",
"runId": "b7892a61-b12c-403d-b926-aff090621460",
"warnings": [
"Ricky-local Claude persona override resolved for intent \"agent-relay-workflow\" at tier \"best\".",
"Ricky pre-write validation repaired the Workforce persona artifact before writing."
],
"review": {
"verdict": "pass",
"summary": "The generated workflow correctly targets the tiny spec: dedicated `wf-ricky-tiny` channel, references the spec by path without inlining, `IMPLEMENTATION_WORKFLOW_CONTRACT` marker, deterministic `file_exists` gate, 80-to-100 review/fix-loop/final-review with explicitly named `review`, `fix-loop`, and `final-review` steps, named agents (`lead-claude`, `impl-primary-codex`, `reviewer-claude`, `final-reviewer-claude`), `onError.retryDelayMs` 10000, and no branch/commit/PR/`gh` side effects during generation."
}
}Every critique item from the original deploy-v1 thread is now satisfied:
What the full pipeline exercised:
PR is ready for review.Commits on this branch:
🤖 Generated with Claude Code |
Five concrete issues raised by CodeRabbit and Devin on PR #102: 1. **Reviewer crash recorded as `verdict: 'pass'`** (CodeRabbit, Major). The pipeline catch block synthesized `verdict: 'pass'` when the reviewer pass itself threw, which is a false approval signal — downstream automation cannot tell "the reviewer ran and approved" from "the reviewer crashed and we have no opinion." Adds an `'error'` variant to `WorkforcePersonaReviewVerdict` and uses it in the catch path. `'error'` is reserved for the pipeline catch path; the reviewer persona itself only emits `pass`/`fix`/`block`. 2. **`verdict: 'block'` + non-empty fixes fell through to writer repair** (CodeRabbit, Major). The guard `if (verdict === 'pass' || fixes.length === 0)` let a `block` verdict with actionable-looking fixes trigger a repair attempt, which contradicts the contract: per the docstring, `block` means the artifact is fundamentally wrong and Ricky should keep the writer output without trying to repair it. Adds `verdict === 'block'` to the early-return condition with an inline comment that links the guard back to the documented contract. 3. **`types.ts` docstring claimed `block` triggers deterministic fallback** (CodeRabbit, Minor). It does not — the deterministic fallback only runs when pre-write validation cannot be repaired inside the writer loop. Rewrites the `block` and `error` cases on `WorkforcePersonaReviewSummary` so the documented contract matches the implementation. 4. **`extractFencedJson` returned the FIRST ```json fence but documentation said LAST** (Devin, Bug). When the reviewer emitted a draft verdict block before its final verdict block, the parser picked the draft. Per the CLAUDE.md "grammar-aware parsers, not regex" rule and the existing `src/product/spec-intake/markdown-target-files.ts` precedent, replaces the raw-text regex match with an mdast walk over `code` nodes with `lang === 'json'`, returned LAST-block-first so the model's final answer wins. 5. **Raw-regex `````json fence extraction was brittle to nested fences** (CodeRabbit, Major). A reviewer audit response that includes the workflow source the reviewer is auditing (in a ```ts block) would have its inner content scanned for JSON-shaped strings by the raw regex. The mdast walker only returns top-level fenced code blocks, so nested content inside the audited workflow source is structurally invisible — same fix as item 4. Six new unit tests cover: LAST-block-first ordering, mdast walker ignoring JSON-shaped strings nested inside an audited workflow source, fallback to trailing balanced-JSON when no fenced block carries a verdict, the synthesized-`block` fallback, the new `block`-with-fixes no-repair path, and the new `error` verdict on reviewer-pass-itself crashes. `npx tsc --noEmit` clean, `npx vitest run` → 1106 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed all five review items in bd64f2f. Map of comment → fix:
Six new unit tests in
🤖 Generated with Claude Code |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/product/generation/workforce-persona-reviewer.ts (1)
316-339: 💤 Low valueCharacter-level brace balancing does not account for braces inside JSON strings.
The backward scan counts all
{and}characters regardless of whether they appear inside string literals. For a payload like{"finding": "missing }"}, the scanner would stop at the}inside the string rather than the actual closing brace.However, since:
- This is a fallback after mdast parsing fails
- The verdict schema fields (
verdict,summary,fixes) are unlikely to contain stray bracessafeParsewill reject malformed slicesThe practical risk is low for the expected reviewer output format.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/product/generation/workforce-persona-reviewer.ts` around lines 316 - 339, The backward brace-balancing in extractTrailingJsonObject incorrectly counts braces inside JSON string literals; update the loop in extractTrailingJsonObject to ignore braces that occur inside double-quoted strings by tracking an inString flag and escaped characters (handle backslash escapes so \" doesn't toggle inString), and only increment/decrement depth for '{' and '}' when not inString; keep returning safeParse(slice) as before when depth balances. This preserves the fallback behavior while avoiding false matches from braces inside JSON strings.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/product/generation/workforce-persona-reviewer.ts`:
- Around line 316-339: The backward brace-balancing in extractTrailingJsonObject
incorrectly counts braces inside JSON string literals; update the loop in
extractTrailingJsonObject to ignore braces that occur inside double-quoted
strings by tracking an inString flag and escaped characters (handle backslash
escapes so \" doesn't toggle inString), and only increment/decrement depth for
'{' and '}' when not inString; keep returning safeParse(slice) as before when
depth balances. This preserves the fallback behavior while avoiding false
matches from braces inside JSON strings.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: fcea6350-6f9e-4875-98d3-81c8b25c0c73
📒 Files selected for processing (5)
src/product/generation/pipeline.tssrc/product/generation/types.tssrc/product/generation/workforce-persona-reviewer.tssrc/product/generation/workforce-persona-writer.test.tssrc/product/generation/workforce-persona-writer.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- src/product/generation/types.ts
- src/product/generation/workforce-persona-writer.ts
Summary
Fixes a silent regression where Ricky's workflow writer falls back to the built-in
persona-makerpersona on Codex GPT-5.3 because no persona pack shipsagent-relay-workflowin workload-router 0.19's catalog. The fallback is invisible to callers and produces the kind of structurally-wrong workflows seen ondocs/plans/deploy-v1-schema-cascade-spec.md(missed## Trackheadings, serial pipeline instead of parallel DAG, inlined spec, missing per-track review loop).@agent-relay/personas@^6.0.18as a dependency (the canonical pack — kept for parity / future use even though workload-router 0.19 doesn't auto-discover packs).personas/:agent-relay-workflow.json— writer,harness: claude,best=opus-4-7,best-value=sonnet-4-6,minimum=haiku-4-5. Writer-prompt rewritten to enforce## Trackdecomposition, parallel-branch fan-out, per-child review nesting, spec-by-path (no inlining), andretryDelayMs >= 10000.agent-relay-workflow-review.json— reviewer, intentreview, same Claude ladder. Produces structured{verdict: pass|fix|block, summary, fixes[]}JSON.src/product/generation/ricky-local-persona-resolver.ts— aWorkforcePersonaResolverthat loads Ricky-local Claude personas first viauseRunnableSelectionfrom harness-kit, and falls through to the existing package resolver for everything else.src/product/generation/workforce-persona-reviewer.ts— a post-write reviewer pass that resolves through thereviewintent at tierbest(Opus) by default, audits the generated workflow against the spec, and returns a verdict.pipeline.ts:generateWithWorkforcePersona:verdict=pass→ ship writer artifact as-is.verdict=fix→ feed structured fix list back into one writer repair attempt viavalidationFeedback.errors; ship the repaired artifact if it passes deterministic validation, otherwise keep the writer's output and surface a warning.verdict=block→ keep the writer artifact; verdict is recorded on metadata.WorkforcePersonaReviewSummary(verdict + fixes + appliedFix + model + runId) onWorkforcePersonaGenerationMetadataso CLI/Cloud callers can see what the reviewer did.RICKY_PERSONA_REVIEW=0env orworkforcePersonaWriter.review: falseoption.What gets called today (before this PR) vs after
persona-maker/ harnesscodex/openai-codex/gpt-5.3-codex(silent fallback)agent-relay-workflow/ harnessclaude/claude-sonnet-4-6agent-relay-workflow-review/ harnessclaude/claude-opus-4-7Note on workload-router 3.x
@agentworkforce/persona-kit@3.0.1collapsedPersonaSpec.tiers.{best,best-value,minimum}into single top-levelharness/modelfields.@agentworkforce/harness-kit@0.19.0still pinsworkload-router@0.19.0, so a clean 3.x bump is its own refactor (there is no harness-kit 3.x release yet). This PR stays on 0.19's tier shape — the persona JSONs usetiers.best/best-value/minimumaccordingly.Test plan
npx tsc --noEmitcleannpx vitest run→ 1087 passing, no new failuresworkforce-persona-writer.test.tscover pass / fix-with-repair / block-unparseable / opt-out pathsreview: false(writer is tested independently of the new review pass)npm run evals→ 43 needs-human, 0 automated failuresricky --mode local --spec-file docs/plans/deploy-v1-schema-cascade-spec.mdto verify the Claude persona resolves end-to-end against a real spec (deferred — requires CLAUDE_API_KEY / Claude Code harness in the runner env)🤖 Generated with Claude Code