Objective
pipeline input writes <test-id>/llm_graders/<name>.json and <test-id>/code_graders/<name>.json, but no parent llm_grader_results/ directory exists until a grader subagent creates one during Phase 2. In contrast, code_grader_results/ is created lazily by its writer — see apps/cli/src/commands/pipeline/grade.ts:296 and apps/cli/src/commands/pipeline/run.ts:363 (both mkdir before writing).
No such pipeline-side writer exists for llm_grader_results/ — the grader subagent is the writer. So the responsibility for ensuring the parent directory exists shifts to the subagent, which may not think to mkdir -p first.
Failure chain when subagent skips mkdir
- Grader subagent calls
writeFile('<bench-dir>/<test-id>/llm_grader_results/<name>.json', ...) — throws ENOENT.
- The throw is captured inside the subagent's own output; never surfaced to the orchestrator as a pipeline error.
- The result file never lands on disk.
pipeline bench tries to read llm_grader_results/ at bench.ts:80-111 — the try/catch swallows the missing directory silently.
pass_rate=0 is reported with no grader results merged.
This is the same user-visible symptom #1148 fixed, via a different mechanism. We reproduced step (1) during PR #1151 e2e; the rest of the chain is from reading the code.
Design latitude
- Pre-create in
pipeline input — add a symmetric mkdir(llmResultsDir, { recursive: true }) next to the existing mkdir(llmGradersDir, ...) at apps/cli/src/commands/pipeline/input.ts:244. Removes the requirement on subagents.
- Document the requirement in
grader.md Step 9 — explicitly tell the subagent to mkdir -p the parent dir before writing.
Option 1 is more robust (doesn't rely on subagent discipline) and simpler to enforce. Option 2 is lower-risk if there's a reason not to pre-create.
Acceptance signals
- A fresh
pipeline input produces an empty llm_grader_results/ directory next to llm_graders/ for every test with LLM graders (if Option 1), OR grader.md Step 9 has an unambiguous mkdir -p instruction (if Option 2).
- A grader subagent that writes
<bench-dir>/<test-id>/llm_grader_results/<name>.json without manually creating the parent dir does not produce a silent zero-score.
- Regression test (if Option 1): an assertion in
apps/cli/test/commands/eval/pipeline/input.test.ts that llm_grader_results/ exists after pipeline input on a fixture with at least one llm-grader assertion.
Non-goals
Related
Objective
pipeline inputwrites<test-id>/llm_graders/<name>.jsonand<test-id>/code_graders/<name>.json, but no parentllm_grader_results/directory exists until a grader subagent creates one during Phase 2. In contrast,code_grader_results/is created lazily by its writer — seeapps/cli/src/commands/pipeline/grade.ts:296andapps/cli/src/commands/pipeline/run.ts:363(bothmkdirbefore writing).No such pipeline-side writer exists for
llm_grader_results/— the grader subagent is the writer. So the responsibility for ensuring the parent directory exists shifts to the subagent, which may not think tomkdir -pfirst.Failure chain when subagent skips mkdir
writeFile('<bench-dir>/<test-id>/llm_grader_results/<name>.json', ...)— throwsENOENT.pipeline benchtries to readllm_grader_results/atbench.ts:80-111— thetry/catchswallows the missing directory silently.pass_rate=0is reported with no grader results merged.This is the same user-visible symptom #1148 fixed, via a different mechanism. We reproduced step (1) during PR #1151 e2e; the rest of the chain is from reading the code.
Design latitude
pipeline input— add a symmetricmkdir(llmResultsDir, { recursive: true })next to the existingmkdir(llmGradersDir, ...)atapps/cli/src/commands/pipeline/input.ts:244. Removes the requirement on subagents.grader.mdStep 9 — explicitly tell the subagent tomkdir -pthe parent dir before writing.Option 1 is more robust (doesn't rely on subagent discipline) and simpler to enforce. Option 2 is lower-risk if there's a reason not to pre-create.
Acceptance signals
pipeline inputproduces an emptyllm_grader_results/directory next tollm_graders/for every test with LLM graders (if Option 1), ORgrader.mdStep 9 has an unambiguousmkdir -pinstruction (if Option 2).<bench-dir>/<test-id>/llm_grader_results/<name>.jsonwithout manually creating the parent dir does not produce a silent zero-score.apps/cli/test/commands/eval/pipeline/input.test.tsthatllm_grader_results/exists afterpipeline inputon a fixture with at least onellm-graderassertion.Non-goals
thresholdambiguity (docs(grader): clarify how threshold in llm_graders config should affect the passed boolean #1153).Related