Skip to content

Detach Ricky background CLI runs#104

Merged
khaliqgant merged 2 commits into
mainfrom
codex/background-run-detach
May 14, 2026
Merged

Detach Ricky background CLI runs#104
khaliqgant merged 2 commits into
mainfrom
codex/background-run-detach

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

  • make power-user local --run --background commands persist a run id and spawn a detached child process
  • reuse that run id in the child monitor so status, logs, evidence, and fixes land in the normal local run state paths
  • add CLI coverage for parent detachment, child run-id reuse, and spinner suppression on detached background runs

Validation

  • npm test -- src/surfaces/cli/commands/cli-main.test.ts src/surfaces/cli/flows/local-run-monitor.test.ts
  • npm run typecheck

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Review Change Stack

Caution

Review failed

Pull request was closed or merged during review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 29483c36-5e1b-4a6a-a6a8-73bb3a7ae749

📥 Commits

Reviewing files that changed from the base of the PR and between d1ea921 and bd3897c.

📒 Files selected for processing (2)
  • src/surfaces/cli/commands/cli-main.test.ts
  • src/surfaces/cli/commands/cli-main.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/surfaces/cli/commands/cli-main.test.ts

📝 Walkthrough

Walkthrough

Adds detached background run support: parent spawns a detached CLI child with an env-based run-id handoff and exits; child detects the env, reattaches by starting the local run monitor in foreground; local monitor state initialization now persists and returns a running state.

Changes

Detached Background Process Feature

Layer / File(s) Summary
Detached Background Type Contract & Dependencies
src/surfaces/cli/commands/cli-main.ts
DetachedProcessSpawner type introduced; CliMainDeps extended with optional spawnDetachedProcess; DETACHED_BACKGROUND_RUN_ID_ENV constant and child-process imports added.
Local Run Monitor State Initialization
src/surfaces/cli/flows/local-run-monitor.ts
initializeLocalRunMonitorState now persists and returns a running state. startLocalRunMonitor awaits initialization, derives handoff from runId, calls onMonitorStarted, and handles background execution without awaiting while persisting failures.
Detached Background Launcher & Dispatch Routing
src/surfaces/cli/commands/cli-main.ts
New helpers determine launch vs reattach, initialize background monitor state, spawn a detached child with normalized child args (force --foreground) and env run-id, provide a foreground reattachment runner, and render background-run/reattach output. cliMain dispatch short-circuits to reattach or launch paths.
Detached Background Feature Tests
src/surfaces/cli/commands/cli-main.test.ts
Tests updated and added: import type changes; background confirmation argv tweak; tests verify parent spawn behavior (child args, spawn options, env injection, unref), child reattachment persisting state.json with runId and reattachCommand, regression for inherited env with --no-run, and spinner suppression test refactor using temp repos and detached spawn stubs.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

I’m a rabbit in the CLI lair,
I nudge a child to run elsewhere,
I plant a token in the air,
It finds its way and takes good care,
Logs and state persist — hooray, software! 🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Detach Ricky background CLI runs' directly and specifically describes the main change: enabling detached background CLI runs via spawned child processes with run id persistence.
Description check ✅ Passed The description clearly relates to the changeset, covering the three main objectives: persisting run ids for background commands, spawning detached child processes, and adding test coverage for the new functionality.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/background-run-detach

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment thread src/surfaces/cli/commands/cli-main.ts Outdated
Comment on lines +1867 to +1869
if (cliHandoff && process.env[DETACHED_BACKGROUND_RUN_ID_ENV]) {
return runDetachedBackgroundChild(parsed, cliHandoff, deps);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Detached child env-var check bypasses --no-run and mode safety gates, forcing workflow execution

The detached child detection at cli-main.ts:1867 only checks cliHandoff && process.env[DETACHED_BACKGROUND_RUN_ID_ENV] without validating any parsed args. In contrast, the parent-side gate shouldLaunchDetachedBackgroundRun (cli-main.ts:706-714) carefully validates parsed.runRequested === true, parsed.background === true, and parsed.mode !== 'cloud'. The child path applies none of these checks.

When runDetachedBackgroundChild is entered, it calls startLocalRunMonitorwithSafeRunOptions (local-run-monitor.ts:47), which forces stageMode: 'run' regardless of the user's intent. Because the env var is set via spawn() and inherited by all descendant processes, any nested ricky invocation from within the executing workflow (e.g., ricky local --spec 'something' --no-run) would be intercepted: it builds a cliHandoff, sees the inherited env var, enters the child path, and has its --no-run silently overridden to execute the workflow.

Minimal fix: mirror the parent-side safety checks

Adding parsed.foreground (or at minimum not having parsed.background) to the condition would prevent accidental interception of nested invocations, since only the explicitly spawned child receives --foreground from backgroundChildArgs.

Suggested change
if (cliHandoff && process.env[DETACHED_BACKGROUND_RUN_ID_ENV]) {
return runDetachedBackgroundChild(parsed, cliHandoff, deps);
}
if (cliHandoff && parsed.foreground && process.env[DETACHED_BACKGROUND_RUN_ID_ENV]) {
return runDetachedBackgroundChild(parsed, cliHandoff, deps);
}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in bd3897c. The detached-child path now requires the explicit spawned-child shape (--run --foreground, non-cloud, not --no-run) instead of only checking the inherited env var, and runDetachedBackgroundChild deletes RICKY_DETACHED_BACKGROUND_RUN_ID before starting the monitor so descendant Ricky invocations do not inherit it. Added regression coverage for an inherited env var plus nested --no-run command, and asserted the env is cleared before local execution.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-14T08-52-03-653Z-HEAD-provider
Mode: provider
Git SHA: c226a30

Passed: 8 | Needs human: 43 | Reviewable: 43 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Ricky – Updated Documentation & Code to Reflect Deeper Integration with Agent Assistant**

Below is a complete, deterministic workflow for delivering the requested change. It follows the **Ricky repository conventions** (AGENTS.md, workflow standards, shared authoring rules) and includes all required artifacts, verification gates, and review checkpoints.

---

## 1. Scope & Objectives
| Item | Description |
|------|-------------|
| **Target** | `README.md`, `docs/architecture.md`, and the `ricky/__init__.py` module. |
| **Change** | Add explicit language that Ricky *leverages* the **Agent Assistant** for: <br>• Prompt orchestration <br>• Context‑aware validation <br>• Adaptive branching decisions. |
| **Boundaries** | No functional code changes beyond updating the `AGENT_ASSISTANT_INTEGRATION` constant and docstrings. All other modules remain untouched. |
| **Blockers** | None anticipated. If the `agent_assistant` package version is < 2.0, raise a blocker (see verification gate 3). |

---

## 2. Workflow Contract (YAML)

```yaml
name: ricky-agent-assistant-update
description: Update docs & code to surface deeper Agent Assistant integration.
owner: ricky-team
branch: update/agent-
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review**  
*(Compiled 14 May 2026 – based on the latest CI/CD dashboards, repository analytics, and stakeholder feedback)*  

---

## 1. Adoption Snapshot (last 90 days)

| Metric | Current Value | Target | Status |
|--------|---------------|--------|--------|
| **Agents with `turn-context` enabled** | 42 / 58 (72 %) | 100 % | ⚠️ |
| **Average turn‑latency impact** | +12 ms (±3 ms) per turn | ≤ +5 ms | ❌ |
| **Context‑drift incidents** | 3 incidents (all resolved) | 0 | ⚠️ |
| **PRs touching `turn-context`** | 27 PRs (13 merged, 14 open) | ≤ 5 open at any time | ⚠️ |
| **Documentation coverage** | 68 % of agents have updated README sections | 100 % | ⚠️ |
| **Automated test coverage** | 84 % of `turn-context` code paths | 90 % | ⚠️ |
| **Stakeholder satisfaction (survey)** | 4.1 / 5 | ≥ 4.5 | ⚠️ |

*Overall health: **Yellow** – functional but adoption gaps and performance regressions are evident.*

---

## 2. Concrete Findings

| Area | Observation | Impact | Evidence |
|------|-------------|--------|----------|
| **Enablement Gap** | 16 agents still run without the context wrapper. | Missed consistency, harder to audit state. | `agent
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all four** capabilities—workflow generation, validation, debugging, and the staged CLI‑UX with blocker/evidence wording—into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md & workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|---------------------------------------|
| **Workflow Generation** | Agent Assistant already owns the *authoring contract* (input schema → workflow definition). Centralising generation avoids duplicated template logic across agents. | Guarantees a single source‑of‑truth for workflow shape; downstream agents can rely on a deterministic contract. |
| **Workflow Validation** | Validation gates (schema, type‑checking, deterministic‑step‑ordering) are the first line of defense before a workflow is persisted. Agent Assistant is the natural gatekeeper. | Early failure detection reduces downstream “broken‑pipeline” noise and keeps the 80‑100 validation loop tight. |
| **Workflow Debugging** | Debugging utilities (step‑trace, state‑snapshot, deterministic replay) require intimate knowledge of the workflow
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption – “Ricky v2.0 Slice”**  
*Target window: July 1 – Oct 31 2026*  

---

## 1. Slice Goal
Deliver a production‑ready, self‑service adoption package that lets **any Team** onboard a **Ricky‑powered AgentWorkforce (AW) workflow** in ≤ 2 working days, with built‑in reliability gates and measurable success criteria.

| Success Metric | Target |
|----------------|--------|
| Time‑to‑first‑run (TTFR) | ≤ 48 h after request |
| First‑run success rate | ≥ 95 % (no fatal errors) |
| Post‑run SLA compliance | 99.5 % of verification gates pass |
| Adoption NPS (survey after 2 weeks) | ≥ +7 |

---

## 2. Scope & Boundaries (Branch `slice/ricky‑v2‑adopt`)  

| Included | Excluded |
|----------|----------|
| **Core onboarding workflow** (request → repo fork → CI‑validated PR → merge → runtime deployment) | Custom UI extensions beyond the provided “Ricky‑Console” |
| **Deterministic verification gates** (syntax, contract, test‑suite, performance) | Legacy AW pipelines that are not migrated to the new CI template |
| **Documentation & training assets** (quick‑start guide, video, FAQ) | On‑premise self‑hosted AW clusters (focus on SaaS) |
| **Observability bundle** (Prometh
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky × Agent Assistant – Future Slack / Web Support Design  

Below is a **contract‑first workflow** that lets Ricky expose its reliability, coordination, and authoring capabilities through **Slack** (chat‑ops) and a **Web UI** (self‑service portal). The design follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic gates, review artifacts, 80‑100 validation loops), and the **shared authoring rules** (scoped branches/PRs, blocker reporting).

---

### 1. High‑Level Architecture  

| Component | Responsibility | Tech Stack (suggested) |
|-----------|----------------|------------------------|
| **Ricky Core** | AgentWorkforce orchestration, contract validation, state store | Python 3.11, FastAPI, PostgreSQL, Redis |
| **Agent Assistant Packages (AAP)** | Re‑usable skill modules (e.g., “Ticket‑Lookup”, “Escalation‑Router”) | Poetry packages, type‑annotated, unit‑tested |
| **Slack Bot** | Slash‑commands, interactive messages, event listeners | Bolt for Python, Socket Mode, OAuth scopes `commands, chat:write, reactions:read` |
| **Web Portal** | Dashboard, form‑based ticket creation, real‑time status | React 18 + Vite, TailwindCSS, GraphQ
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag

Step Action Result
1️⃣ Parse arguments – Ricky reads --mode local and the --spec string. The CLI knows you want a local workflow generation based on the textual spec.
2️⃣ Validate the spec – The spec text is passed through the Spec‑Validator gate (deterministic, JSON‑schema‑based). If the spec is syntactically valid, validation passes; otherwise you get a clear error (e.g., “Spec must be a JSON/YAML object with name, steps, …”).
3️⃣ Generate workflow contract – The Workflow‑Builder creates a deterministic contract (YAML) that includes:
name (derived from the spec)
mode: local
steps (checkout, lint, unit‑test, package‑audit, report)
artifacts (logs, test‑reports, package‑sbom)
The contract is written to ./ricky-output/<workflow‑id>.yaml.
4️⃣ Skip execution – Because --run is omitted, the Execution‑Engine gate is not invoked. No containers are started, no checks are performed. You end up with a **static wo
...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** AgentWorkforce CLI. It follows the repository conventions (AGENTS.md), the workflow standards, and the shared authoring rules. The contract is deterministic, includes verification gates, and produces the required review artifacts.

---

## 1️⃣ Workflow Contract

| Item | Description |
|------|-------------|
| **Name** | `ricky-onboard-first-run` |
| **Scope** | New‑user local environment setup (OS ≥ Ubuntu 20.04, macOS 12+, Windows 10+). |
| **Inputs** | None (interactive). |
| **Outputs** | • `~/.ricky/config.yaml` (validated config) <br>• `~/.ricky/credentials.json` (encrypted) <br>• `ricky version` printed <br>• Success flag file `~/.ricky/.onboard_complete` |
| **Pre‑conditions** | • `git` ≥ 2.30, `python` ≥ 3.9, `pip` ≥ 22.0 installed. <br>• Network connectivity to `pypi.org` and `github.com`. |
| **Post‑conditions** | • CLI commands (`ricky status`, `ricky workflow list`) run without error. <br>• No stray temporary files left in `/tmp`. |
| **Verification Gates** | 1. **Dependency Check** – abort if any required binary missing. <br>2. **Config Sc
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Ricky’s Standard Response to an Empty or Missing Spec File**

---

### 1. Immediate Detection & Blocker Report
| Step | Action | Output |
|------|--------|--------|
| **1.1** | Verify the presence of the spec file at the expected path (`/specs/<workflow>.yaml`). | If the file is not found, raise a **Blocker**: <br>`🚨 BLOCKER: Spec file “<workflow>.yaml” is missing.` |
| **1.2** | If the file exists but is empty (0 bytes) or contains only comments/whitespace, raise a **Blocker**: <br>`🚨 BLOCKER: Spec file “<workflow>.yaml” is empty or contains no actionable content.` |
| **1.3** | Log the blocker in the workflow’s **#issues** channel and tag the requestor. | `@requestor Please provide a complete spec or confirm the intended workflow name.` |

---

### 2. Create a Minimal Placeholder (Optional, for traceability)

If the requestor cannot immediately supply a spec, generate a **placeholder** that satisfies the repository’s contract and makes the missing‑spec state explicit.

```yaml
# /specs/<workflow>.yaml  (placeholder)
# -------------------------------------------------
# NOTE: This is a generated placeholder. The real
# workflow definition is pending from the requestor.
# -----
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks succeeded for Google or GitHub)**  

=== AgentWorkforce Status Report ===
Timestamp: 2026‑05‑14T⟨HH:MM:SS⟩Z

🟠 Provider Connectivity
• Google : ❌ NOT CONNECTED (no successful health‑check)
• GitHub : ❌ NOT CONNECTED (no successful health‑check)

🔧 Workflow Engine
• Scheduler : ✅ RUNNING
• Queue : 0 pending jobs
• Workers : 4 active / 4 idle

📦 Repository State
• Branches : main (up‑to‑date)
• PR Queue : empty
• Last Commit : (⟨date⟩)

⚠️ Blockers
• Google OAuth token missing or invalid.
• GitHub personal‑access token missing or invalid.
• Dependent workflows that require Google or GitHub access are paused.

🛠️ Recommended Actions

  1. Verify OAuth credentials for Google and store them in the GOOGLE_TOKEN secret.
  2. Verify GitHub PAT and store it in the GITHUB_TOKEN secret.
  3. Re‑run provider health checks:
    ricky provider check google
    ricky provider check github
  4. After successful checks, re‑enable any paused workflows:
    ricky workflow resume <workflow‑name>

📚 Documentation
• Provider configuration: https
...[truncated]


**Must**

- Report unknown or not-connected provider state honestly.
- Update provider status only from explicit provider checks or Cloud status results.
- Give concrete setup guidance for Cloud when relevant.

**Must Not**

- Mark Google or GitHub connected because guidance text was shown.
- Invent a provider connection URL or OAuth flow.
- Show empty fields with no recovery guidance when config is missing.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.skill-matcher-registry-backed</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

**Ricky output**

```text
## GitHub Primitive‑Change Workflow  
**Purpose:** Apply a single‑file edit (e.g., README typo fix) to a public repository while guaranteeing reproducible review, testing, and merge.  

### 1. Workflow Contract (YAML‑style summary)

```yaml
name: primitive-change
description: One‑file edit → PR → CI → Merge
trigger:
  - manual (Ricky UI) or API call
inputs:
  repo:               # full repo slug “org/project”
  branch: main        # target branch
  file_path:          # relative path to edited file
  new_content:        # full file contents (or diff patch)
  commit_message:     # PR title & body
  reviewers: []       # GitHub usernames (optional)
outputs:
  pr_url: string
  merge_status: enum[merged, closed, blocked]
gates:
  - syntax_check
  - lint
  - unit_test (if applicable)
  - review_approval
  - merge_protection
validation_loops: 80-100   # max attempts per gate before blocker

2. Step‑by‑Step Execution Plan

# Step Owner Artifact Deterministic Gate
1 Initialize – Ricky clones repo at branch. Ricky (automation) clone.log
2 Apply Change – Write new_content to `file_pat
...[truncated]

**Must**

- Select skills from the actual registry rather than a hardcoded static set.
- Record ranked skill matches with confidence and trigger evidence.
- Fall back gracefully when the registry is missing or no skill clears the threshold.

**Must Not**

- Claim runtime agents embody skills when only generation-time guidance was applied.
- Fail generation solely because optional skill files are missing.
- Hide skill selection evidence from artifacts or JSON output.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.tool-selector-honors-spec-hints</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow where the spec says "use Claude to review and Codex to implement".

**Ricky output**

```text
**Workflow Title:** Claude‑Review → Codex‑Implementation Pipeline  

**Purpose**  
Automate a deterministic hand‑off where Claude (LLM‑reviewer) validates design/spec changes and Codex (code‑generator) produces the implementation. The workflow enforces reproducible contracts, verification gates, and explicit review artifacts to guarantee reliability and traceability.

---

## 1. Contract Overview  

| Element | Definition |
|---------|------------|
| **Input Artifact** | `design_spec.md` – Markdown spec located in `specs/` (UTF‑8, ≤ 200 KB). |
| **Reviewer** | Claude (model `claude-3-opus`) invoked via `ricky invoke claude`. |
| **Generator** | Codex (model `codex-12b`) invoked via `ricky invoke codex`. |
| **Output Artifacts** | 1. `review_report.json` (Claude) 2. `implementation/` (generated source files) 3. `ci_report.xml` (test results). |
| **Success Criteria** | • Claude rating ≥ 4/5 on “spec completeness”. <br>• All generated unit tests pass (≥ 90 % coverage). <br>• No blocker flagged in `review_report.json`. |
| **Failure Modes** | • Rating < 4 → abort & raise blocker. <br>• Test failures → auto‑re‑run Codex up to 3 attempts, then raise blocker. |
| **Versioning** | Branch 
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes a **generated workflow JSON** (or YAML) and returns a **tighter, review‑ready version**. It focuses on two things:

| Target | What `--refine` does |
|--------|----------------------|
| **Task descriptions** | • Removes vague verbs (“do”, “handle”). <br>• Adds concrete inputs, outputs, and success criteria. <br>• Enforces the *“who‑does‑what‑when‑how”* template. |
| **Acceptance gates** | • Converts informal “check” statements into explicit **verification predicates** (e.g., `output.fileExists && output.size > 0`). <br>• Inserts **deterministic verification steps** (unit test, schema validation, checksum). <br>• Adds a **gate‑type tag** (`gate: unit-test`, `gate: schema`, `gate: smoke`). |

---

## 1. Basic Invocation  

```bash
agentworkforce generate workflow.yaml --model gpt‑4o-mini > raw_workflow.yaml
agentworkforce refine raw_workflow.yaml --out refined_workflow.yaml
  • raw_workflow.yaml – the initial draft (may contain free‑form prose).
  • refined_workflow.yaml – the output with sharpened tasks and explicit gates.

You can also pipe di
...[truncated]


**Must**

- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

**Must Not**

- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for the `ricky --version` spec.

**Ricky output**

```text
**Workflow: `ricky --version` Specification & Delivery**  
*(Ricky AgentWorkforce – v0.9.3)*  

---  

### 1. Objective
Produce a deterministic, test‑driven implementation of the `ricky --version` CLI flag that:

* Returns the current Ricky version string (e.g., `ricky 0.9.3`).
* Exits with status 0 on success, non‑zero on unexpected failure.
* Is fully covered by unit tests, integration tests, and CI verification gates.
* Is delivered via a scoped branch/PR with all required review artifacts.

---  

### 2. Contract (Interface Definition)

| Element | Description |
|---------|-------------|
| **Command** | `ricky --version` |
| **Output** | `ricky <semver>` printed to **stdout** (no trailing newline required). |
| **Exit Code** | `0` on success, `>0` on any internal error (e.g., version file missing). |
| **Side‑effects** | None (pure read‑only). |
| **Supported Platforms** | Linux, macOS, Windows (via the same entry point). |
| **Version Source** | Single source of truth: `src/__init__.py` variable `__version__ = "0.9.3"` (or the value from `pyproject.toml` under `[tool.poetry.version]`). |
| **Error Message** | If version cannot be resolved, print `Error: unable to determine Ric
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large collection of independent artifacts (e.g., data files, code modules, model checkpoints) through a deterministic validation pipeline, automatically apply fix‑scripts where possible, and rerun until all artifacts are either **PASS** or **BLOCKED**. The workflow is scoped to a single feature branch (`batch‑validation‑loop`) and produces a single PR that contains only the generated CI configuration and supporting scripts.

---

## 1. Repository Layout (branch `batch‑validation‑loop`)

/.github/workflows/
batch_validation.yml # CI definition (this workflow)
/scripts/
validate.py # deterministic validator (returns JSON)
fix.py # idempotent fixer (optional per‑artifact)
report.py # aggregates results, creates PR comment
/config/
validation_schema.json # JSON‑Schema for artifact validation
/artifacts/
*.json / *.csv / *.py … # target artifacts (any type, independent)


All new files are added under `/scripts/` and `/config/`. No existing production code is modified.

---

## 2. Contract (Inputs / O
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shaped the Generated Ricky Workflow  

| Skill | What the Skill Enforces | Direct Effect on the Generated Workflow |
|-------|------------------------|------------------------------------------|
| **Concrete Contract Definition** | Every step must declare its **input contract** (type, required fields, validation rules) and **output contract** (shape, optional fields, version). | The workflow begins with a `contract` block for the trigger event (`event.type: “ticket.created”`). Down‑stream steps reference `{{stepX.output}}` only after the contract is satisfied, preventing accidental field leakage. |
| **Deterministic Verification Gates** | After each logical block, a **gate** runs a pure‑function test that either passes or throws a deterministic error. | After the “Enrich Ticket” step, a gate checks `output.priority ∈ {P1,P2,P3,P4}`. If the gate fails, the workflow aborts with a clear `GateFailure: priority‑validation`. This makes the path reproducible and debuggable. |
| **Review Artifacts Generation** | The author must emit a **review‑artifact** markdown file that lists step purpose, contract diff, and test coverage. | The generated workflo
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Ricky – Workflow Generation Assistant**

---

### 📋 Quick Summary  
To turn a product specification into a reliable, review‑ready workflow we need a **clear, concrete contract** that defines:

1. **Inputs** – data types, sources, and validation rules.  
2. **Processing steps** – deterministic actions, error‑handling, and any external calls.  
3. **Outputs** – artifacts, formats, and downstream hand‑off points.  
4. **Verification gates** – unit‑test suites, integration checks, and manual‑review checkpoints.  
5. **Review artifacts** – design doc, test plan, and PR checklist.  

Because the spec you provided is vague, I’ll ask for the missing details needed to create a scoped, deterministic workflow that meets our **80‑to‑100 validation loop** standard.

---

## 🔎 Information Needed

| Area | What I Need from You | Why It Matters |
|------|----------------------|----------------|
| **Goal & Scope** | A one‑sentence description of the product’s primary function (e.g., “Generate a PDF invoice from an order JSON”). | Defines the end‑to‑end boundary of the workflow. |
| **Inputs** | • Data format (JSON, CSV, etc.)  <br>• Source (API, file upload, DB)  <br>• Required fields & validat
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s post‑failure protocol**

1. **Detect & Log the anomaly**  
   - The orchestrator reports *Step X* timed‑out (exceeded its SLA).  
   - *Worker Y* is still “alive” (heartbeat present) but has not emitted any artifact or status update.

2. **Mark the step as failed**  
   - Update the workflow state machine: `StepX.status = FAILED_TIMEOUT`.  
   - Record the exact timestamp, configured timeout value, and any partial logs that were captured.

3. **Isolate the stray worker**  
   - Send a *graceful‑shutdown* command to Worker Y.  
   - If the worker does not acknowledge within a short grace period (e.g., 30 s), issue a *hard‑kill* (container stop / process kill).  
   - Record the shutdown outcome in the audit log.

4. **Generate a deterministic verification gate**  
   - **Gate G‑timeout‑artifact**:  
     - Preconditions: `StepX.status == FAILED_TIMEOUT` **AND** `WorkerY.state == TERMINATED`.  
     - Postconditions: No artifact for Step X exists **AND** a *failure ticket* is created.  
   - The gate must pass before any downstream steps are allowed to proceed.

5. **Create the failure ticket (review artifact)**  
   - Title: *Workflow <WF‑ID> – Step X timeout & orphan work
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Stale‑State Detection for Local Launches**  

| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `detect‑stale‑state` |
| **Scope** | Runs **once**, immediately before any local workflow is started (e.g., `ricky run …`). |
| **Inputs** | - Working directory (root of the repo).<br>- Optional config `ricky.yaml` flag `staleCheck.enabled` (default = true). |
| **Outputs** | - `stale: true|false` (boolean).<br>- `stalePaths: []string` (list of stale directories/files).<br>- `report: string` (human‑readable summary). |
| **Side‑effects** | - If `stale == true`, the workflow **fails fast** with a deterministic error code `ERR_STALE_STATE`.<br>- Writes a JSON artifact `./.ricky/artifacts/stale‑state.json`. |
| **Deterministic Verification Gate** | `verify‑stale‑state` (see below). |
| **Failure Mode** | Blocker – any stale state must be resolved before proceeding. |

---

### 1. Detection Logic (Implementation Sketch)

```yaml
# .ricky/workflows/detect-stale-state.yaml
name: detect-stale-state
description: Detect stale .agent‑relay/, .relay/, and .trajectories/ state before a local run.
runs:
  using: composite
  steps:
    - id: scan
    
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**How to clear the “another Ricky/Relay run is already active in this workspace” marker**

The marker is a lock file that prevents two concurrent executions of the same workspace. It can be left behind if a previous run crashed, was killed, or if you accidentally started a second instance. Follow the deterministic clean‑up contract below.

---

## 1. Verify the current lock state (read‑only)

```bash
# From the root of the workspace
ricky status          # shows active runs, PID, and lock file path
cat .ricky/run.lock   # optional: view raw lock contents

You should see something like:

Lock file: .ricky/run.lock
Owner PID: 12345
Started: 2026‑05‑14T09:32:11Z
Command: ricky run …

If the PID does not exist (e.g., ps -p 12345 returns nothing) the lock is stale.


2. Deterministic clean‑up gate

Condition Action
PID is alive and you intentionally started another run Abort the new run. Wait for the first run to finish or cancel it with ricky cancel <run‑id>.
PID is alive but the process is hung or you need to pre‑empt it Terminate the process (kill -9 12345) only after confirming no other team member
...[truncated]

**Must**

- Report the active marker, run id, or status path when available.
- Ask the user to inspect, wait for, or explicitly clear the active run.
- Avoid launching a competing run that could corrupt evidence.

**Must Not**

- Silently start another run.
- Hide the existing run marker from the user.
- Treat the conflict as a generic failure with no recovery path.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.env-loader-injection-runtime-loadable</code> (runtime-recovery/manual)</summary>

**User message**

> A workflow artifact references a `MISSING_ENV_VAR` value. Ricky's deterministic auto-fix injects the `.env.local` / `.env` loader (`loadRickyWorkflowEnv`) and the optional `assertRickyWorkflowEnv` guard into the artifact before retry. The artifact may be a master-rendered workflow whose `.step({ command: ... })` bodies embed `node --input-type=module` HEREDOCs containing literal `import { ... } from 'node:fs'` / `from 'node:path'` strings.

**Ricky output**

```text
**Deterministic auto‑fix for a workflow that references a missing environment variable**

Below is the concrete, repeatable contract you can apply to any workflow artifact that fails because it tries to read `process.env.MISSING_ENV_VAR` (or any other undefined variable). The fix injects the Ricky environment loader and optional guard, then re‑renders the step bodies that contain inline `node --input-type=module` HEREDOCs.

---

## 1. Preconditions (Verification Gate)

| Check | Expected | Action if not met |
|------|----------|-------------------|
| Artifact is a **master‑rendered** workflow (i.e. a `.js` file that exports a `workflow` object). | ✅ | Abort – not a supported artifact. |
| The artifact contains at least one `process.env.MISSING_ENV_VAR` reference (or any `process.env.*` that is undefined at load time). | ✅ | Continue. |
| The artifact uses **step‑level HEREDOCs** with `node --input-type=module` (e.g. ```.step({ command: `node --input-type=module <<'EOF'\nimport …\nEOF` })```). | ✅ | Continue. |
| A `.env` or `.env.local` file exists at the repository root (or can be created). | ✅/❌ | If missing, create an empty `.env.local` file (git‑ignored). |

If any gate fails, 
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Mode**  

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, detect the failure, invoke the auto‑fix routine, and re‑run the repaired artifact. All verification gates and review artifacts are included so you can audit each loop.

---

## 1️⃣ Prerequisites (one‑time)

| Item | Check | Command |
|------|-------|---------|
| `ricky` CLI installed (≥ 2.4) | `ricky --version` | `pip install ricky-workflow` |
| Workflow repo cloned & on a clean branch | `git status` shows “nothing to commit” | `git checkout -b wf‑auto‑fix‑run` |
| Docker (or the runtime specified in `workflow.yaml`) | `docker version` | Install Docker CE if missing |
| Access to the previous run ID (e.g., `run‑20240512‑a1b2c3`) | Provided by user | – |
| Failed step name (e.g., `build‑docker‑image`) | Provided by user | – |

---

## 2️⃣ Input Contract (what you must supply)

```json
{
  "workflow_id": "my_project/pipelines/ci.yml",   // path to the workflow definition
  "run_id": "run-20240512-a1b2c3",                // previous successful run (used for cache)
  "failed_step": "build-docker-image",           // ex
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Failure Checklist**

| Step | Action | Why it matters |
|------|--------|----------------|
| 1️⃣ | **Inspect the CLI output** – look for the *error code* and the *stack trace* printed just before the process exits. | The first line usually tells you which validation gate failed (e.g., `ContractViolation`, `TypeCheck`, `RuntimeAssert`). |
| 2️⃣ | **Open the workflow log** – `ricky logs workflows/foo.ts` (or check `~/.ricky/logs/foo-<timestamp>.json`). | Logs contain the deterministic verification gate results, the exact input payload, and any side‑effect snapshots. |
| 3️⃣ | **Run in debug mode** – `ricky run workflows/foo.ts --debug --no-auto-fix`. | Adds step‑by‑step tracing, prints the intermediate contract payloads, and disables any silent retries. |
| 4️⃣ | **Validate the contract file** – open `workflows/foo.contract.ts` (or `.json`) and run `ricky contract validate workflows/foo.contract.ts`. | Guarantees the contract is syntactically correct and that all required fields are present. |
| 5️⃣ | **Check the branch/PR boundary** – ensure you are on the branch that contains the latest PR for `foo.ts`. If the workflow was edited in a PR that isn’t merged, the runtime
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

Below is the concrete, repeatable contract that Ricky follows whenever a developer asks to run a TypeScript‑based workflow that lives in the *primary local path* (by convention `./workflows`). The steps are written as a deterministic “execution gate” that can be scripted, reviewed, and automatically verified.

---

## 1. Preconditions (Gate A – Environment Validation)

| Check | Command | Expected Result | Failure Action |
|-------|---------|----------------|----------------|
| Node version ≥ 18 | `node -v` | `v18.x` or higher | Abort – report “Node version too low”. |
| npm ≥ 9 | `npm -v` | `9.x` or higher | Abort – report “npm version too low”. |
| `ts-node` installed globally or locally | `npx ts-node -v` | `ts-node vX.Y.Z` | If missing, run **Gate B** to install. |
| Workspace clean (no uncommitted changes) | `git status --porcelain` | empty output | Abort – ask user to stash/commit. |
| `package.json` present in `./workflows` | `test -f ./workflows/package.json` | file exists | Abort – ask user to create a minimal `package.json`. |

If **all** checks pass, proceed to Gate B. If any check fails, 
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Workflow: “Structural‑Failure Escalation – Persistent Failure after Fix Attempt”**  
*(Ricky – AgentWorkforce reliability & coordination)*  

---

## 1. Scope & Ownership
| Element | Detail |
|---|---|
| **Trigger** | A structural failure (e.g., service outage, data‑pipeline break, infra‑component collapse) has been **re‑opened** after an initial remediation was merged. |
| **Owner** | **Reliability Engineer (RE)** – primary executor. <br>**Escalation Lead (EL)** – reviewer & gatekeeper. |
| **Stakeholders** | SRE team, Product Owner, Incident Commander, QA, Architecture Review Board (ARB). |
| **Branch** | `fix/<service>-<issue-id>` → `escalate/persistent‑<issue-id>` (protected). |
| **PR Target** | `main` (or the service‑specific release branch). |

---

## 2. Contract (Inputs / Outputs)

| Contract Item | Description |
|---|---|
| **Input Artifact** | `failure_report_<id>.json` – includes: <br>• Original failure ID, timestamps, logs, metrics. <br>• Fix commit SHA, diff, and deployment version. <br>• Re‑open reason (e.g., regression test failure, post‑deploy alert). |
| **Output Artifact** | `escalation_bundle_<id>.zip` – contains: <br>• Updated root‑cause analysis (RCA) markdo
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow – Health‑Digest Generator**  
*Purpose:* Aggregate the results of multiple Ricky workflow runs, compute key health metrics, and emit a concise “Health Digest” report (Markdown) for distribution to the Ops team.

---

## 1. Contract Overview  

| Item | Specification |
|------|----------------|
| **Input** | `run‑metadata.jsonl` – newline‑delimited JSON objects, one per workflow run. Each object must contain the fields listed in **Section 2**. |
| **Output** | `health‑digest.md` – Markdown file containing a summary table, trend charts (as embedded Base64 PNGs), and a “Blockers” section. |
| **Success Criteria** | - All required input fields are present for ≥ 95 % of runs.<br>- No validation errors are emitted.<br>- Digest file is written and passes schema validation (see § 4). |
| **Failure Modes** | - Missing required fields → **BLOCKER** (abort, raise issue).<br>- JSON parse error → **BLOCKER** (abort).<br>- No runs in the time window → **WARN** (empty digest). |
| **Idempotency** | The workflow is pure: given the same `run‑metadata.jsonl` it always produces the same `health‑digest.md`. |
| **Deterministic Gates** | 1️⃣ Input validation<br>2️⃣ Aggregation sanity c
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

**Ricky Slack Surface – “Workflow Debugger”**  
*Version 0.1 – Design Draft*  

---

## 1. Overview
The **Workflow Debugger** Slack surface gives engineers and product owners a deterministic, low‑latency UI for inspecting, stepping through, and fixing Ricky‑managed workflows directly from a Slack channel or DM. It follows the **Ricky repository conventions** (AGENTS.md) and the **workflow standards** (contract‑first, verification gates, review artifacts, 80‑100 validation loops, blocker reporting, scoped PR boundaries).

---

## 2. Interaction Model

| Slack Command | Trigger | Payload | Response | Follow‑up |
|---------------|---------|---------|----------|----------|
| `/ricky debug <workflow‑id>` | Slash command in any channel/DM | `{workflowId: string}` | Interactive message with **Workflow Summary Card** (status, last run, owner, version) and **Action Buttons** | `View Run`, `Step`, `Patch`, `Abort` |
| `View Run` | Button | `{runId: string}` | Modal showing **Run Timeline** (nodes, timestamps, inputs/outputs, logs) | Click node to open **Node Detail** |
| `Step` | Button (enabled only on paused runs) | `{runId, nodeId}` | Modal with **Step Controls** (Continue, Skip, Re‑run, 
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to a developer shortcut with weaker routing than CLI.
  • Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**Feature:** Add a new *Web Handoff* type – **`WebSubmitSpec`** – that allows a client‑side form to submit a complete workflow specification together with an execution mode (e.g., `dry‑run`, `live`).  

The change is scoped to the **`workflow‑handoff`** module and will be delivered in a dedicated branch **`feat/web‑handoff‑submit‑spec`** with a single PR.

---

## 1. Contract Definition (API)

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `spec` | `object` (JSON‑serializable) | ✅ | Full workflow spec as defined in `WorkflowSpec` (tasks, dependencies, contracts, etc.). |
| `mode` | `enum` {`dry-run`, `live`} | ✅ | Execution mode. `dry-run` validates and sim‑executes without side‑effects; `live` enqueues the workflow for real execution. |
| `metadata` | `object` (optional) | ❌ | Arbitrary key‑value pairs (e.g., `requestId`, `origin`). |
| `authToken` | `string` | ✅ | Bearer token for authorisation (validated by `AuthMiddleware`). |

**Endpoint**: `POST /api/v1/web-handoff/submit-spec`  
**Content‑Type**: `application/json`  
**Response**:  

```json
{
  "status": "accepted|rejected",
  "workflowId": "<uuid>",          // present if accepted
  "
...[truncated]

Must

  • Add a handoff variant that normalizes into LocalInvocationRequest or CloudGenerateRequest.
  • Keep auth/session validation in the web surface before normalization.
  • Ensure downstream code does not need to know the request came from web.

Must Not

  • Short-circuit directly from the web handler to executors.
  • Create a parallel domain model for web requests.
  • Make the web surface the owner of local or Cloud routing semantics.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)

User message

A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordina


Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

@khaliqgant khaliqgant merged commit 0346450 into main May 14, 2026
2 of 3 checks passed
@khaliqgant khaliqgant deleted the codex/background-run-detach branch May 14, 2026 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant