Skip to content

Guide Ricky workflow persona with squad review defaults#105

Merged
khaliqgant merged 2 commits into
mainfrom
codex/ricky-squad-workflow-default-v2
May 14, 2026
Merged

Guide Ricky workflow persona with squad review defaults#105
khaliqgant merged 2 commits into
mainfrom
codex/ricky-squad-workflow-default-v2

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

  • Make the shadowed 2-3 agent squad loop the default for serious Ricky implementation workflows while preserving Ricky's ability to choose another workflow shape when the spec calls for it
  • Inject repo-local workflow policy docs into the Workforce persona writer prompt: WORKFLOW_STANDARDS, WORKFLOW_AUTHORING_RULES, and generated-workflow-template
  • Add a regression test proving those policy files are included in the persona task

Tests

  • npx vitest run src/product/generation/workforce-persona-writer.test.ts
  • git diff --check

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Review Change Stack

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR standardizes the multi-agent workflow squad pattern for serious implementation work across Ricky guidance and injects repo-local workflow policies into persona task generation.

Changes

Workflow Squad Loop Standardization and Policy Injection

Layer / File(s) Summary
Workflow Squad Loop and Team-Shape Standards
AGENTS.md, docs/workflows/WORKFLOW_STANDARDS.md, workflows/meta/spec/generated-workflow-template.md, workflows/shared/WORKFLOW_AUTHORING_RULES.md, .agents/*, .claude/*
Default serious implementation squad now specifies scoped implementers (impl-<scope>-*), per-scope shadow reviewers (shadow-<scope>-*), optional validators, and dual final reviewers (final-reviewer-claude and final-reviewer-codex). Team-shape rules, reliability ladders, and verification phases are expanded to enforce deterministic gating, implementer self-reflection artifacts, independent review loops, and post-fix re-verification across guidance, templates, and skill docs.
Repo-Local Workflow Policy Injection into Persona Task
src/product/generation/workforce-persona-writer.ts, src/product/generation/workforce-persona-writer.test.ts
Persona task generation now includes a Ricky repo-local workflow policy context: section rendered from a fixed set of repo policy Markdown files (truncated per-file) via renderRickyWorkflowPolicyContext and safeReadPolicyFile. Missing files produce MISSING: placeholders. A new Vitest test verifies policy file embedding and absence of placeholder text when files exist.

Sequence Diagram(s)

Sequence diagrams are embedded in the hidden review stack artifact above.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Poem

🐰 A rabbit builds the squad with care,
Shadow eyes and self-aware,
Policies now flow through the task,
Standards set—the agents ask:
"What patterns guide the work today?"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and concisely describes the main change: making the shadowed squad review loop the default guidance for Ricky workflow personas, which is the primary objective of this PR.
Description check ✅ Passed The description clearly relates to the changeset, outlining three interconnected changes: establishing squad loop defaults, injecting policy docs into the persona writer, and adding a regression test.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/ricky-squad-workflow-default-v2

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/product/generation/workforce-persona-writer.test.ts (1)

58-99: ⚡ Quick win

Add one negative-path assertion for missing policy files.

This test covers the happy path well, but the new fallback branch (MISSING: Ricky workflow policy file...) remains untested. A small companion test with one missing file would lock behavior and prevent regressions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/product/generation/workforce-persona-writer.test.ts` around lines 58 -
99, Add a negative-path test in workforce-persona-writer.test.ts that calls
buildWorkflowPersonaTask (same args as the existing test) but leaves out one of
the repo-local policy files (e.g., don't create
workflows/shared/WORKFLOW_AUTHORING_RULES.md), then assert the returned task
contains the fallback marker string "MISSING: Ricky workflow policy file" (and
still contains the other existing file contents) to lock the fallback branch
behavior and prevent regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/product/generation/workforce-persona-writer.ts`:
- Around line 686-690: In safeReadPolicyFile, don’t swallow all read errors;
catch the exception into a variable, inspect its code (e.g., err.code ===
'ENOENT') and return null only for "file not found" errors, otherwise surface
the problem (rethrow the error or log it and return null as appropriate for your
error-handling policy). Update the function safeReadPolicyFile to read the file
inside try, catch (err) { if (err?.code === 'ENOENT') return null; throw err; }
(or replace throw with a processLogger.error call if you prefer logging) so
permission and other I/O errors are not misreported as “not found.”

---

Nitpick comments:
In `@src/product/generation/workforce-persona-writer.test.ts`:
- Around line 58-99: Add a negative-path test in
workforce-persona-writer.test.ts that calls buildWorkflowPersonaTask (same args
as the existing test) but leaves out one of the repo-local policy files (e.g.,
don't create workflows/shared/WORKFLOW_AUTHORING_RULES.md), then assert the
returned task contains the fallback marker string "MISSING: Ricky workflow
policy file" (and still contains the other existing file contents) to lock the
fallback branch behavior and prevent regressions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 99573ef4-c9d9-4bd0-94ae-c1d1596336d0

📥 Commits

Reviewing files that changed from the base of the PR and between 3a789d7 and c643f0a.

📒 Files selected for processing (6)
  • AGENTS.md
  • docs/workflows/WORKFLOW_STANDARDS.md
  • src/product/generation/workforce-persona-writer.test.ts
  • src/product/generation/workforce-persona-writer.ts
  • workflows/meta/spec/generated-workflow-template.md
  • workflows/shared/WORKFLOW_AUTHORING_RULES.md

Comment on lines +686 to +690
function safeReadPolicyFile(path: string): string | null {
try {
return truncateText(readFileSync(path, 'utf8'), MAX_POLICY_FILE_BYTES).text;
} catch {
return null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Differentiate “file missing” from other read failures.

Line 689 swallows all I/O errors and later reports “not found,” which is misleading for permission/other read failures.

Suggested fix
-function safeReadPolicyFile(path: string): string | null {
+function safeReadPolicyFile(path: string): { content: string } | { missing: true } | { error: string } {
   try {
-    return truncateText(readFileSync(path, 'utf8'), MAX_POLICY_FILE_BYTES).text;
-  } catch {
-    return null;
+    return { content: truncateText(readFileSync(path, 'utf8'), MAX_POLICY_FILE_BYTES).text };
+  } catch (error) {
+    if ((error as NodeJS.ErrnoException)?.code === 'ENOENT') return { missing: true };
+    return { error: errorMessage(error) };
   }
 }
-  return RICKY_WORKFLOW_POLICY_FILES.map((path) => {
+  return RICKY_WORKFLOW_POLICY_FILES.map((path) => {
     const absolute = resolve(repoRoot, path);
-    const content = safeReadPolicyFile(absolute);
-    return content
-      ? `# ${path}\n${content}`
-      : `# ${path}\nMISSING: Ricky workflow policy file was not found at ${absolute}.`;
+    const read = safeReadPolicyFile(absolute);
+    if ('content' in read) return `# ${path}\n${read.content}`;
+    if ('missing' in read) return `# ${path}\nMISSING: Ricky workflow policy file was not found at ${absolute}.`;
+    return `# ${path}\nUNREADABLE: ${read.error}`;
   }).join('\n\n');
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/product/generation/workforce-persona-writer.ts` around lines 686 - 690,
In safeReadPolicyFile, don’t swallow all read errors; catch the exception into a
variable, inspect its code (e.g., err.code === 'ENOENT') and return null only
for "file not found" errors, otherwise surface the problem (rethrow the error or
log it and return null as appropriate for your error-handling policy). Update
the function safeReadPolicyFile to read the file inside try, catch (err) { if
(err?.code === 'ENOENT') return null; throw err; } (or replace throw with a
processLogger.error call if you prefer logging) so permission and other I/O
errors are not misreported as “not found.”

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-14T18-47-22-423Z-HEAD-provider
Mode: provider
Git SHA: 11d9914

Passed: 8 | Needs human: 43 | Reviewable: 43 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Scope of the change**  
- Extend the **Ricky** documentation to state that Ricky now leverages the **Agent Assistant** (AA) for deeper reasoning, tool orchestration, and self‑healing.  
- Add a thin **AA wrapper** in the codebase so that all high‑level workflow steps (validation gates, review artifact generation, blocker reporting) are delegated to AA via a deterministic contract.  
- Keep the public API unchanged; only internal implementation and docs are affected.  

---  

## 1. Documentation updates  

| File | Section | Change |
|------|---------|--------|
| `README.md` | Overview | Add sentence: “Ricky now embeds the Agent Assistant to provide deterministic reasoning, self‑healing, and coordinated review generation.” |
| `docs/architecture.md` | Architecture diagram | Insert a new block “Agent Assistant Layer” between **Workflow Engine** and **Tool Adapters**. |
| `docs/workflow_contracts.md` | Contracts | Add a contract definition `AA_Contract` (input JSON, output JSON, deterministic hash). |
| `docs/maintenance.md` | Self‑healing | Explain that AA runs the *repair loop* (80‑100 iterations) automatically when a blocker is reported. |
| `CHANGELOG.md` | vX.Y.Z | “Integrated
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑14)**  

| Metric | Current State | Target / SLA | Gap / Action |
|--------|---------------|--------------|--------------|
| **Workflow contracts defined** | 12 of 18 active agents have a `turn‑context` contract in their `workflow.yaml`. | 100 % of agents on production must expose a contract. | Identify the 6 agents lacking contracts; create minimal contracts (input: `turnId`, `sessionId`; output: `contextSnapshot`). |
| **Deterministic verification gates** | 8 agents run a pre‑merge gate that validates `turn‑context` schema compliance. | Every PR that touches `turn‑context` must pass a schema‑validation gate. | Add the gate to the remaining 10 agents’ CI pipelines (use `ricky-validate-context` step). |
| **Review artifacts** | 65 % of PRs include a “Turn‑Context Impact” section in the review checklist. | 100 % compliance. | Update the shared PR template (`.github/PULL_REQUEST_TEMPLATE.md`) to make the section mandatory. |
| **Validation loops** | Average of 42 validation loops per turn (target 80‑100). | 80‑100 loops to ensure stability across edge cases. | Expand test matrix: add 3 new session‑type fixtures (voi
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all four** capabilities — workflow generation, validation, debugging, staged‑CLI UX, and blocker/evidence wording — into the **Agent Assistant** (AA) module.

---

## Rationale (aligned with AGENTS.md & workflow standards)

| Capability | Why it belongs in AA | Impact on reliability & coordination |
|------------|----------------------|--------------------------------------|
| **Workflow Generation** | AA already owns the *authoring contract* (template → concrete workflow). Centralizing generation avoids duplicated template logic across agents. | Guarantees a single source‑of‑truth for workflow shape; downstream agents can rely on deterministic contracts. |
| **Validation** | AA provides the *deterministic verification gate* (schema + semantic checks) before a workflow is committed to a branch. | Early failure detection reduces downstream rework; validation artifacts become part of the PR review checklist. |
| **Debugging** | AA maintains the *debug‑trace* service (step‑level logs, state diffs). Embedding this in AA lets any agent request a reproducible debug session without re‑implementing tracing. | Improves root‑cause isolation; enables the 80‑to‑100 v
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: 12 weeks (Q3 2026)*  

| Phase | Duration | Objectives | Concrete Deliverables | Acceptance Contract | Verification Gate |
|-------|----------|------------|-----------------------|----------------------|-------------------|
| **1️⃣ Initiation & Stakeholder Alignment** | Weeks 1‑2 | • Secure executive sponsorship <br>• Define slice scope (core workflow reliability, coordination UI, authoring helpers) | • Stakeholder charter (signed) <br>• Scope‑definition doc (features, excluded items) | **Scope‑Lock Contract** – no new features added without change‑request PR | Review by PMO & Architecture board (sign‑off checklist) |
| **2️⃣ Baseline Architecture & Contract Draft** | Weeks 3‑4 | • Freeze the “Ricky Core API” <br>• Publish deterministic contracts for each subsystem | • API spec (OpenAPI 3.1) <br>• Data‑model schema (JSON‑Schema) <br>• Service‑level agreement (SLA) matrix | **API‑Stability Contract** – any breaking change requires a version bump and migration plan | Automated contract test suite runs against mock services; 100 % pass required |
| **3️⃣ Incremental Development (80‑100 loops)** | Weeks 5‑9 | 
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky – Future Slack / Web Support Design  
*Agent Assistant Package (AAP)‑based, deterministic, review‑ready, and scoped for incremental delivery.*

---

### 1. High‑Level Architecture

| Component | Responsibility | AAP Package | Runtime |
|-----------|----------------|------------|---------|
| **Ricky‑Core** | Workflow orchestration, contract enforcement, state store | `ricky-core` | Node 18 (Lambda) |
| **Slack‑Adapter** | Bi‑directional Slack events → Ricky actions | `ricky-slack` | Slack Bolt (Node) |
| **Web‑Adapter** | Embedded chat widget + REST API for UI | `ricky-web` | Express + Socket.io |
| **Auth‑Gateway** | OAuth2 (Slack) + JWT (Web) | `ricky-auth` | API‑Gateway + Cognito |
| **Verification Hub** | Deterministic gate execution, test harness, contract validation | `ricky-verify` | CI/CD (GitHub Actions) |
| **Observability** | Structured logs, metrics, trace IDs | `ricky-obs` | OpenTelemetry → CloudWatch/Datadog |
| **Data Store** | Durable workflow state, user preferences | `ricky-store` | DynamoDB (TS) |

All packages are published to the internal **Agent Assistant Registry** and version‑locked via `package.json` `peerDependencies`.  

---

### 2. Contract‑First
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What the command does**

ricky --mode local --spec "generate a workflow for package checks"


* **`--mode local`** – Runs the CLI in the local‑only sandbox; no remote services are contacted.  
* **`--spec "…"`** – Supplies a natural‑language specification that Ricky will turn into a **workflow contract** (the JSON/YAML description of inputs, steps, outputs, and verification gates).  
* **No `--run` flag** – Tells Ricky **not to execute** the generated workflow; it will stop after producing the workflow definition and any associated validation artifacts.

So the command **creates** a workflow file (by default `ricky_workflow.json` in the current directory) and prints a short summary, but it **does not** start any checks, builds, or tests.

---

### Expected output

1. **Parsing the spec** – Ricky runs the spec‑to‑contract transformer, applying the shared authoring rules (e.g., naming conventions, scoped step IDs).  
2. **Generating the contract** – A deterministic JSON contract is emitted, containing:
   * `metadata` (author, timestamp, mode)
   * `inputs` (e.g., `package_path`, `config_file`)
   * `steps` (e.g., `lint`, `unit_test`, `security_scan`)
   * `verification_gate
...[truncated]

Must

  • Say generation is the default and execution was not requested.
  • Print the generated artifact path, workflow id, spec digest, and next run command.
  • Avoid showing execution evidence for a generation-only request.

Must Not

  • Imply the workflow ran automatically.
  • Present a generation-only result as execution success.
  • Hide the opt-in commands for running the artifact.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.first-run-copy-is-compact-and-truthful (cli-behavior/manual)

User message

Render Ricky's first-run CLI onboarding for a new user.

Ricky output

**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** CLI. It follows the repository conventions (AGENTS.md), includes deterministic verification gates, and produces the required review artifacts. The workflow is scoped to a single feature branch (`onboard‑first‑run`) and a corresponding PR (`RICKY‑001‑first‑run‑onboarding`).

---

## 1️⃣  Branch & PR Setup

| Item | Command |
|------|---------|
| Create feature branch | `git checkout -b onboard-first-run` |
| Push branch | `git push origin onboard-first-run` |
| Open PR (template `RICKY‑001‑first‑run‑onboarding`) | Use GitHub UI or `gh pr create -f -t "RICKY‑001 First‑Run Onboarding" -b "onboard-first-run"` |

*PR template includes sections for “Verification Gates”, “Artifacts”, and “Blockers”.*

---

## 2️⃣  Onboarding Script (`scripts/onboard.sh`)

```bash
#!/usr/bin/env bash
set -euo pipefail

# -------------------------------------------------
# Ricky – First‑Run CLI Onboarding
# -------------------------------------------------
# 1. Verify prerequisites
# 2. Install Ricky core
# 3. Initialise user config
# 4. Run self‑test suite (80‑100 validation l
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Ricky’s Standard Response to an Empty or Missing Spec File**

---

### 1. Immediate Detection & Blocker Report
| Step | Action | Output |
|------|--------|--------|
| **1.1** | Verify the presence of the spec file at the expected path (`/specs/<workflow>.yaml`). | If `fs.exists?` returns **false** → **Blocker**. |
| **1.2** | If the file exists, parse it with the YAML validator. | If parsing yields **null** or **empty document** → **Blocker**. |
| **1.3** | Emit a deterministic blocker ticket (GitHub Issue or JIRA) with the exact reason: *“Spec file missing/empty for `<workflow>` – cannot proceed with authoring or validation.”* | Include: workflow name, repository, commit SHA, timestamp, and a link to the failing CI job. |

> **Rule:** Blockers must be raised before any downstream automation runs. No PR can be opened until the blocker is cleared.

---

### 2. Automated Placeholder Generation (Optional “bootstrap” mode)

If the team has opted‑in to a **bootstrap** policy (see `AGENTS.md → bootstrap‑spec‑policy`), Ricky can create a minimal, syntactically‑valid placeholder spec to unblock the pipeline while the real spec is being authored.

| Field | Value | Rationale |
|-------|--
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)

🔧 Configuration
• Google OAuth client ID :
• GitHub App ID :
• Credential store : present

⚙️ System Health
• Core services : RUNNING
• Scheduler : ACTIVE
• Database connection : OK

🚧 Blockers
• Google integration – pending successful connectivity check
• GitHub integration – pending successful connectivity check

📝 Next Steps

  1. Run provider health checks:
    ricky check --provider google
    ricky check --provider github
  2. Review any error logs under logs/provider/.
  3. Re‑run ricky status after checks succeed.

--- End of Report ---

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**Workflow: “GitHub Primitive Change”**  
*Purpose*: Apply a small, self‑contained change to a repository (e.g., typo fix, comment update, config tweak) while guaranteeing traceability, deterministic verification, and safe merge.

| Phase | Step | Owner | Input | Output | Verification Gate |
|-------|------|-------|-------|--------|--------------------|
| **1️⃣ Planning** | 1.1 Create a scoped branch | **Ricky – Branch Manager** | Issue/Task ID, target repo, base branch | `feature/<issue‑id>-primitive` | Branch name matches regex `^feature\/[0-9]+-primitive$` |
| | 1.2 Draft change description | **Ricky – Spec Writer** | Issue details | `CHANGE.md` (one‑sentence summary, rationale) | Lint check: ≤ 80 char line length, required fields present |
| **2️⃣ Development** | 2.1 Checkout branch & apply edit | **Ricky – Editor** | Branch, `CHANGE.md` | Modified file(s) | `git diff --stat` shows ≤ 5 changed lines |
| | 2.2 Run local unit/format checks | **Ricky – Validator** | Modified code | Test report | All tests pass (`npm test` / `pytest -q`) **AND** formatter returns clean |
| **3️⃣ Review** | 3.1 Open PR with template | **Ricky – PR Opener** | Branch, `CHANGE.md` | PR # | PR title mat
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: `review‑and‑implement‑claude‑codex.yml`**  
*(conforms to AGENTS.md conventions, workflow standards, and shared authoring rules)*  

```yaml
name: Review & Implement (Claude → Codex)

on:
  push:
    branches: [ main ]
  workflow_dispatch:

env:
  # Shared environment variables
  WORKDIR: ${{ github.workspace }}/work
  LOG_LEVEL: info

jobs:
  # --------------------------------------------------------------
  # 1️⃣  INPUT VALIDATION
  # --------------------------------------------------------------
  validate_input:
    runs-on: ubuntu-latest
    outputs:
      spec_path: ${{ steps.set.outputs.spec_path }}
    steps:
      - name: Checkout repository
        uses: actions/checkout@v3

      - name: Locate spec file
        id: set
        run: |
          SPEC=$(git ls-files | grep -E 'spec\.ya?ml$' | head -n1)
          if [[ -z "$SPEC" ]]; then
            echo "::error::Specification file not found (expected spec.yml or spec.yaml)."
            exit 1
          fi
          echo "spec_path=$SPEC" >> $GITHUB_OUTPUT
      - name: Validate spec schema
        uses: ./.github/actions/validate-schema
        with:
          file: ${{ steps.set.outputs.spec_path }}
       
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes an existing workflow JSON/YAML and rewrites each **step.task** and **step.gate** to be:

1. **Concrete** – no vague verbs, explicit inputs/outputs.  
2. **Verifiable** – each gate becomes a single, testable predicate (e.g., “file‑exists”, “status‑code == 200”).  
3. **Scoped** – only the step being refined is touched; downstream steps are left unchanged unless a dependency changes.

Below is a minimal end‑to‑end pattern you can copy‑paste into your CI/CD pipeline or local dev script.

---

## 1. Prerequisites  

| Tool | Version |
|------|---------|
| `awf-cli` (AgentWorkforce CLI) | ≥ 2.4.0 (adds `--refine`) |
| `jq` (JSON processor) | ≥ 1.6 |
| `yamllint` (optional) | ≥ 1.28 |

Install the CLI if you haven’t already:

```bash
pip install awf-cli==2.4.0

2. Input Workflow (example)

# workflow.yaml
name: data‑ingest
steps:
  - id: fetch
    task: "Get data from source"
    gate: "response ok"
  - id: transform
    task: "Clean the data"
    gate: "cleaned"
  - id: load
    task: "Store data"
    gate: "stored"

3. R

...[truncated]


**Must**

- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

**Must Not**

- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for the `ricky --version` spec.

**Ricky output**

```text
**Workflow: `ricky --version` Specification & Delivery**  
*(Ricky AgentWorkforce – v0.9.3)*  

---  

### 1. Objective
Produce a deterministic, test‑driven implementation of the `ricky --version` CLI flag that:

* Returns the current Ricky package version (semantic‑version string) on **stdout**.  
* Exits with status **0** on success, **1** on any internal error.  
* Is reproducible across supported runtimes (Python 3.9+, Node 18+, Go 1.22).  

The workflow is scoped to a single feature branch `feat/version-flag` and ends with a merge‑ready PR.

---  

### 2. Contract (Input / Output)

| Element | Description |
|---------|-------------|
| **Command** | `ricky --version` |
| **Stdout** | `<MAJOR>.<MINOR>.<PATCH>` (e.g., `2.4.1`) |
| **Stderr** | *empty* on success; error message on failure |
| **Exit Code** | `0` on success, `1` on failure |
| **Version Source** | Single source of truth: `ricky/__init__.py` (`__version__`) for Python, `package.json` for Node, `VERSION` file for Go. All language wrappers must read the same value. |
| **Determinism** | The flag must never emit additional whitespace, timestamps, or environment‑specific data. |

---  

### 3. Branch & PR Scope

| Item 
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, code modules, config bundles) through deterministic validation, automatic fixing where possible, and controlled re‑execution until all pass or a blocker is reported.

---

## 1. Scope & Boundaries
| Aspect | Definition |
|--------|------------|
| **Artifacts** | Any file/object that can be validated independently (e.g., `*.json`, `*.py`, `*.csv`). |
| **Batch size** | Up to 10 000 artifacts per run – split into sub‑batches of ≤ 500 for parallelism. |
| **Branch/PR** | One **feature branch** per batch (`batch‑<timestamp>`). All changes (fixes) are committed to this branch; a PR is opened automatically after the run finishes. |
| **Owner** | `@team‑artifact‑ops`. All blockers are assigned here. |
| **Success criteria** | 0 % failing artifacts after the final validation pass. |

---

## 2. Contract (Inputs / Outputs)

| Item | Type | Description |
|------|------|-------------|
| `artifact_list` | JSON array of strings | Paths (relative to repo root) of artifacts to process. |
| `validation_rules` | JSON object | Mapping `artifact_type → rule_set_id`. |

...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

Below is a concise “skill‑impact map” that shows the direct effect each of the core Ricky workflow‑writing skills has on the structure, reliability mechanisms, and artefacts of a generated workflow. The map is illustrated with a concrete example: a **“Customer‑On‑Boarding (COB) – Identity Verification”** workflow that moves a new user from sign‑up to a verified status.

| **Skill** | **What the skill does** | **Concrete effect on the generated workflow** | **Resulting artefact / gate** |
|-----------|------------------------|-----------------------------------------------|------------------------------|
| **Deterministic Contract Definition** | Writes a **workflow contract** (`contract.yaml`) that enumerates inputs, outputs, pre‑conditions, post‑conditions, and SLA limits. | • Input contract: `signupPayload` (JSON schema).<br>• Output contract: `verifiedUser` (JSON schema).<br>• SLA: *max latency 5 s per step*. | Contract file is **validated** by the `contract‑validator` gate before any code is merged. |
| **Explicit Verification Gates** | Inserts **verification steps** that run automatically after each lo
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Ricky – Workflow Generation Assistant**

---

### 📋 Quick Summary  
To turn a product specification into a reliable, review‑ready workflow we need a **clear, concrete contract** that defines:

1. **Inputs & Outputs** – data types, schemas, and ownership.  
2. **Stages & Tasks** – deterministic steps, owners, and success criteria.  
3. **Verification Gates** – automated tests, manual reviews, and exit conditions.  
4. **Review Artifacts** – design docs, test plans, and PR checklists.  
5. **Validation Loops** – 80‑100 % coverage loops for each gate.  
6. **Blocker Reporting** – explicit “Blocker” flags with remediation owners.  
7. **Branch/PR Scope** – one logical change per PR, with feature‑branch naming convention.

Because the spec you provided is **vague**, I’ll need a few concrete details before I can generate a complete workflow file (YAML/JSON) that satisfies our standards.

---

## 🔎 What I Need From You

| Area | Example Detail | Why It Matters |
|------|----------------|----------------|
| **Product Goal** | “Generate a PDF invoice from an order JSON.” | Drives the overall flow and required outputs. |
| **Primary Actors** | `frontend`, `order‑service`, `pdf‑generator`
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Response to a Failed Workflow**

| **Phase** | **Action** | **Why** | **Resulting Artifact** |
|-----------|------------|---------|------------------------|
| 1️⃣ Detect & Log | • Capture the timeout event (step X) and the “alive‑but‑idle” worker (step Y). <br>• Write a structured log entry to the workflow‑run ledger (timestamp, step IDs, exit codes, heartbeat timestamps). | Provides an immutable audit trail for post‑mortem and for the deterministic verification gate. | `run‑<uuid>.log.json` |
| 2️⃣ Blocker Report | • Mark the run as **BLOCKED** in the orchestration state store. <br>• Emit a **Ricky‑Blocker** ticket (e.g., `RB‑<run‑id>`) with severity = high, description, and affected artifacts. | Guarantees that downstream pipelines do not consume partial output and that the issue is surfaced to the team. | `RB‑<run‑id>.md` (auto‑generated ticket) |
| 3️⃣ Isolate & Quarantine | • Spin up a *sandbox* branch `fix/<run‑id>-timeout‑worker‑cleanup`. <br>• Freeze the current DAG node graph; prevent any new workers from attaching to the stale node. | Keeps the production DAG clean and provides a scoped branch for repair without affecting other runs. | Git branch `fix/
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – “Stale‑State Guard for Local AgentWorkforce Runs”**  

| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `guard‑stale‑state` |
| **Scope** | Local developer machines before any `aw run` (or `aw launch`) command. |
| **Inputs** | 1. Working‑tree root (detected via `git rev‑parse --show-toplevel`).<br>2. Optional config file `.aw/guard‑stale‑state.yml` (see *Config* section). |
| **Outputs** | - Exit code `0` → clean state, workflow may proceed.<br>- Exit code `1` → stale state detected; abort launch.<br>- Log file `aw‑guard‑stale‑state.log` in the repo root with details. |
| **Side‑effects** | None (read‑only). |
| **Idempotence** | Fully idempotent – repeated invocations produce the same result unless the filesystem changes. |
| **Performance SLA** | ≤ 150 ms on a typical repo (≤ 200 files in the three target directories). |
| **Failure Mode** | If any of the target directories exist **and** contain **any** file **older than** the configured freshness threshold, the guard fails. |
| **Recovery** | User runs `aw clean‑state` (provided in the same repo) or manually removes the stale directories. |

---

### 1. Rationale
Local runs re
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky or Relay run is already active in this workspace”**

The marker that blocks a new run is a safety file created by the **Ricky** CLI to prevent overlapping executions. When the file is left behind (e.g., a crash, a killed process, or a stale CI workspace) the next invocation aborts with the message you’re seeing.

Below is the deterministic, step‑by‑step contract for clearing the marker safely, verifying that the workspace is clean, and re‑enabling normal runs.

---

## 1️⃣ Verify the Current State (Read‑Only)

| Command | Expected Output |
|---------|-----------------|
| `ricky status` | Shows **ACTIVE_RUN: false** (or “stale marker detected”) |
| `ls -l .ricky/run.marker` | File exists, size > 0, timestamp ≈ last run start |
| `ps -ef \| grep ricky` | No `ricky` or `relay` processes (or only the stale PID) |

If any `ricky`/`relay` process is still alive, **skip step 2** and kill it first:

```bash
# Example: kill the stray process
kill -9 <PID>

2️⃣ Remove the Stale Marker (Deterministic Write)

Contract: The marker must be removed only when no active process holds the lock.

# Ensure you are at the repository root
cd /path/to/y
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for missing environment variables**

Below is the concrete, reproducible patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix does three things:

1. **Injects the environment loader** – `loadRickyWorkflowEnv()` – at the top of the generated file so that `.env.local` (or fallback `.env`) is parsed before any step runs.  
2. **Adds an optional guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.  
3. **Leaves the existing step bodies untouched** – even when they contain `node --input-type=module` HEREDOCs with literal `import { … } from 'node:fs'` or `from 'node:path'` strings.

---

## 1. Helper utilities (single source of truth)

Create (or update) `ricky-workflow-env.ts` in the workflow repo root:

```ts
// ricky-workflow-env.ts
import { config as loadDotenv } from 'dotenv';
import { resolve } from 'node:path';
import { existsSync } from 'node:fs';

/**
 * Load environment variables from .env.local (preferred) or .env.
 * Returns the loaded env object for inspection.
 */
export function loadRickyWorkflowEnv(): NodeJS.Proces
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. The contract assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run ID are known.

---

## 1️⃣ Pre‑flight Checks

| Check | Command | Expected outcome |
|-------|---------|------------------|
| CLI version | `aw --version` | `>=2.4.0` |
| Repo root | `git rev-parse --show-toplevel` | prints absolute path |
| Clean working tree | `git status --porcelain` | no output (no uncommitted changes) |
| Docker daemon (if workflow uses containers) | `docker info` | daemon reachable |

If any check fails, resolve it before proceeding.

---

## 2️⃣ Input Parameters (populate before running)

```bash
# ---- USER‑SUPPLIED VALUES ----
FAILED_STEP="my_step_name"          # exact name as shown in the failure log
PREV_RUN_ID="run-2023-11-07-abc123" # ID of the successful run that produced the artifact
WORKFLOW_PATH="./workflows/data‑pipeline.yml"
AUTO_FIX=true                       # keep true for th
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Debug Checklist**

| Step | Command / Action | What to Look For |
|------|------------------|------------------|
| 1️⃣  | `ricky logs workflows/foo.ts` | Full execution trace – timestamps, stage names, and any **ERROR**/**WARN** entries. |
| 2️⃣  | `ricky status workflows/foo.ts` | Current state of the workflow (queued, running, failed, blocked). |
| 3️⃣  | `ricky run workflows/foo.ts --debug` | Re‑run with verbose logging; the debug output will include the exact **contract validation** payloads. |
| 4️⃣  | Open the generated **artifact bundle**: `cat .ricky/artifacts/foo.run.json` | JSON contains the input contract, intermediate results, and the failure payload. |
| 5️⃣  | Verify the **workflow contract**: `cat .ricky/contracts/foo.contract.json` | Ensure required fields, types, and any `enum` constraints match the input you supplied. |
| 6️⃣  | Run the contract validator manually: `ricky validate-contract .ricky/contracts/foo.contract.json .ricky/artifacts/foo.run.json` | Should return **PASS** or a list of mismatches. |
| 7️⃣  | Check the **deterministic verification gate** logs: `cat .ricky/gates/foo.gate.log` | Look for a gate that returned `FAIL` – the gate name
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Execution Playbook – Running a Local TypeScript Workflow Artifact**

Below is the concrete, reproducible contract Ricky follows to launch a TypeScript‑based workflow that lives in the *primary local path* (the repository root or the designated `workflows/` directory). Every step is deterministic, gated, and produces verifiable artifacts so that the run can be audited, reproduced, or rolled back automatically.

---

## 1. Preconditions & Scope

| Item | Requirement | Verification |
|------|-------------|--------------|
| **Repository checkout** | `git clone` of the AgentWorkforce repo, on the `main` (or target) branch. | `git status --porcelain` → clean |
| **Node runtime** | Node ≥ 18.x, npm ≥ 9.x installed and on PATH. | `node -v && npm -v` |
| **Primary local path** | `./workflows/<workflow‑name>/` (relative to repo root). | `test -d ./workflows/<workflow‑name>` |
| **TypeScript source** | `index.ts` (or entry defined in `workflow.json`). | `test -f ./workflows/<workflow‑name>/index.ts` |
| **Workflow contract** | `workflow.json` present, containing `name`, `entry`, `runtime`, `inputs` schema. | `jq . ./workflows/<workflow‑name>/workflow.json` succeeds |

If any verific
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Workflow: “Structural‑Failure Escalation – Persistent Failure after Fix Attempt”**  
*(Ricky – AgentWorkforce reliability & coordination)*  

---  

### 1. Scope & Ownership  
| Element | Detail |
|--------|--------|
| **Trigger** | A structural failure (e.g., service outage, data‑pipeline break, CI‑pipeline collapse) has been **re‑opened** after an initial remediation commit has been merged. |
| **Owner** | **Reliability Engineer (RE)** – primary executor. <br>**Escalation Lead (EL)** – secondary reviewer & blocker arbiter. |
| **Affected Assets** | Codebase branch, CI/CD pipelines, infrastructure‑as‑code (IaC) modules, monitoring alerts. |
| **Boundary** | When the failure persists **after three deterministic verification cycles** post‑fix, the workflow enters the **“Boundary‑Reached”** state and must be escalated to the **Incident Response Team (IRT)**. |

---  

### 2. Contract (Inputs / Outputs)

| Contract | Description |
|----------|-------------|
| **Input Artifact** | `failure_report.yaml` – contains: <br>• Failure ID (UUID) <br>• Original root‑cause analysis (RCA) <br>• Fix commit SHA <br>• Verification gate results (pass/fail) <br>• Timestamp of each verification run |
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – [Date Range]**

---

### 1. Executive Summary
| Metric | Value | Trend |
|--------|-------|-------|
| Total Runs Executed | **1 842** | ↗︎ +12 % vs. prior period |
| Successful Runs | **1 679** | ↗︎ +9 % |
| Failed Runs | **163** | ↘︎ ‑15 % |
| Blocked Runs (manual intervention) | **23** | ↔︎ steady |
| Avg. Cycle‑time (per run) | **4 m 27 s** | ↘︎ ‑8 % |
| Avg. Verification Gate Pass‑rate | **96.4 %** | ↗︎ +1.2 % |

> **Health Rating:** **🟢 Green** – > 95 % success, verification gates passing, no critical blockers.

---

### 2. Run‑level Detail (Top 10 % of runs by latency)

| Run‑ID | Workflow | Status | Cycle‑time | Gate Passes / Total | Blocker? | Notes |
|--------|----------|--------|------------|---------------------|----------|-------|
| `wf‑20240528‑0012` | `data‑ingest‑v3` | ✅ Success | **6 m 12 s** | 9/9 | – | Minor retry on step 3 (network hiccup) |
| `wf‑20240528‑0047` | `model‑train‑beta` | ❌ Failed | **3 m 45 s** | 4/7 | Yes | Missing secret in step 5 – auto‑blocked |
| `wf‑20240528‑0099` | `report‑gen‑v2` | ✅ Success | **2 m 58 s** | 5/5 | – | – |
| `wf‑20240528‑0115` | `data‑ingest‑v3` | ✅ Success | **5 m 01 s** | 9/9 | – | – |
| `w
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

**Ricky Slack Surface – “Workflow Debugger”**

---

## 1. Overview
A dedicated Slack app view that lets engineers **inspect, step‑through, and mutate** any Ricky‑managed workflow in real time. It is built as a **modal‑based UI** triggered from a slash‑command (`/ricky‑debug`) and follows the **Ricky repository conventions** (AGENTS.md, workflow standards, shared authoring rules).

---

## 2. User Interaction Flow

| Step | Slack UI Element | Action | Result |
|------|------------------|--------|--------|
| 1 | `/ricky‑debug <workflow‑id>` (slash command) | Invokes the Debugger modal. | Modal opens with workflow summary. |
| 2 | **Workflow Summary** (section block) | Shows name, version, last run status, and a *“Load latest run”* button. | Loads the most recent execution graph. |
| 3 | **Execution Graph** (image block generated by GraphViz) | Clickable nodes (hover shows step metadata). | Selecting a node opens **Step Detail** pane. |
| 4 | **Step Detail** (input block) | Shows input, output, logs, and a *“Re‑run from here”* button. | Triggers a deterministic re‑run starting at the selected step. |
| 5 | **Live Controls** (actions block) | *Pause*, *Resume*, *Abort* buttons. | Sends
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to a developer shortcut with weaker routing than CLI.
  • Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Hand‑off Type:** `web_submit_spec`

| Field | Type | Description |
|-------|------|-------------|
| **name** | `string` | Fixed value: `"web_submit_spec"` |
| **payload** | `object` | The body the UI sends to the AgentWorkforce engine. |
| **payload.spec** | `object` | Full workflow specification (JSON‑Schema‑validated). |
| **payload.mode** | `enum` | `"draft"` | `"activate"` | `"test"` – determines how the engine treats the spec. |
| **payload.metadata** *(optional)* | `object` | Caller‑provided free‑form data (e.g., `requestId`, `userId`). |
| **response** | `object` | Engine reply (see below). |

---

## 1. Contract Definition (JSON‑Schema)

```json
{
  "$id": "urn:awf:handoff:web_submit_spec",
  "type": "object",
  "required": ["name", "payload"],
  "properties": {
    "name": { "const": "web_submit_spec" },
    "payload": {
      "type": "object",
      "required": ["spec", "mode"],
      "properties": {
        "spec": { "$ref": "urn:awf:workflow:spec" },
        "mode": { "enum": ["draft", "activate", "test"] },
        "metadata": { "type": "object", "additionalProperties": true }
      },
      "additionalProperties": false
    }
  },
  "additionalProperties": false
...[truncated]

Must

  • Add a handoff variant that normalizes into LocalInvocationRequest or CloudGenerateRequest.
  • Keep auth/session validation in the web surface before normalization.
  • Ensure downstream code does not need to know the request came from web.

Must Not

  • Short-circuit directly from the web handler to executors.
  • Create a parallel domain model for web requests.
  • Make the web surface the owner of local or Cloud routing semantics.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)

User message

A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.

Ricky output

**Decision:** Generate a new workflow (or a revised version) → **Action:** Create a concrete workflow contract, include deterministic verification gates, and set up the initial review artifacts.

**Rationale**

| Factor | Observation | I

---

_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._

@khaliqgant khaliqgant merged commit 752ae41 into main May 14, 2026
2 of 3 checks passed
@khaliqgant khaliqgant deleted the codex/ricky-squad-workflow-default-v2 branch May 14, 2026 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant