Skip to content

bug(pipeline): llm_grader_results/ not pre-created, risks silent zero-score regression #1152

@christso

Description

@christso

Objective

pipeline input writes <test-id>/llm_graders/<name>.json and <test-id>/code_graders/<name>.json, but no parent llm_grader_results/ directory exists until a grader subagent creates one during Phase 2. In contrast, code_grader_results/ is created lazily by its writer — see apps/cli/src/commands/pipeline/grade.ts:296 and apps/cli/src/commands/pipeline/run.ts:363 (both mkdir before writing).

No such pipeline-side writer exists for llm_grader_results/ — the grader subagent is the writer. So the responsibility for ensuring the parent directory exists shifts to the subagent, which may not think to mkdir -p first.

Failure chain when subagent skips mkdir

  1. Grader subagent calls writeFile('<bench-dir>/<test-id>/llm_grader_results/<name>.json', ...) — throws ENOENT.
  2. The throw is captured inside the subagent's own output; never surfaced to the orchestrator as a pipeline error.
  3. The result file never lands on disk.
  4. pipeline bench tries to read llm_grader_results/ at bench.ts:80-111 — the try/catch swallows the missing directory silently.
  5. pass_rate=0 is reported with no grader results merged.

This is the same user-visible symptom #1148 fixed, via a different mechanism. We reproduced step (1) during PR #1151 e2e; the rest of the chain is from reading the code.

Design latitude

  1. Pre-create in pipeline input — add a symmetric mkdir(llmResultsDir, { recursive: true }) next to the existing mkdir(llmGradersDir, ...) at apps/cli/src/commands/pipeline/input.ts:244. Removes the requirement on subagents.
  2. Document the requirement in grader.md Step 9 — explicitly tell the subagent to mkdir -p the parent dir before writing.

Option 1 is more robust (doesn't rely on subagent discipline) and simpler to enforce. Option 2 is lower-risk if there's a reason not to pre-create.

Acceptance signals

  • A fresh pipeline input produces an empty llm_grader_results/ directory next to llm_graders/ for every test with LLM graders (if Option 1), OR grader.md Step 9 has an unambiguous mkdir -p instruction (if Option 2).
  • A grader subagent that writes <bench-dir>/<test-id>/llm_grader_results/<name>.json without manually creating the parent dir does not produce a silent zero-score.
  • Regression test (if Option 1): an assertion in apps/cli/test/commands/eval/pipeline/input.test.ts that llm_grader_results/ exists after pipeline input on a fixture with at least one llm-grader assertion.

Non-goals

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions