feat: independent context-isolated verifier auto-gate (Generator–Verifier split) by ewalliss · Pull Request #4 · pilotspace/ADD

ewalliss · 2026-06-10T12:10:48Z

Why

Under autonomy: auto, the run's "adversarial verify" and "completeness-critic" execute in the same agent and same context that wrote the code (run.md). In-context self-critique is unreliable — a model favours its own output (self-enhancement bias) and repeats its own errors, so the in-context skeptic approves it uncritically (Zheng et al. 2023, LLM-as-judge; Huang et al. 2023, "LLMs Cannot Self-Correct Reasoning Yet"). This quietly reintroduces "one agent finds it plausible" — the exact thing ADD sets out to eliminate.

What

Apply the Generator–Verifier (Generator–Critic) pattern for real, with the engine enforcing structure + the deterministic decision and the prompts driving the independent verification:

Context-isolated verifier (skill/add/verify-critic.md): the run spawns fresh subagents per lens (wiring · concurrency-security · contract-conformance) that see only §3 contract, §4 tests, the diff and CONVENTIONS — never the build transcript — and must refute.
Structured verdict (_validate_verdict): a verdict is data (lens/verdict/evidence), validated before write — no shallow "looks good" auto-PASS.
Self-consistency consensus (_consensus): any security finding or refutation → HARD-STOP; residue → ESCALATE; auto-PASS only with agreement across ≥3 distinct lenses.
Eval harness (add.py eval): scores the decision logic against labeled fixtures so the gate's own judgment is measurable — recall = no seeded-bad build slips through. TDD's red/green applied to the AI's judgment, not just the code.

New CLI: add.py verdict (append-only per-task ledger, fail-closed) · add.py consensus (read-only PASS/HARD-STOP/ESCALATE) · add.py eval. Wired into run.md's auto-gate and phases/6-verify.md. The engine stays LLM-free; cmd_gate is untouched (non-breaking).

Tests

Built test-first (tooling/test_independent_verify.py, 19 cases). Full suite 721 green on CI Python (3.10/3.12); all repo invariants honored — three byte-identical add.py copies + re-pinned ENGINE_MD5, skill-tree parity across canonical/_bundled/.claude, regenerated bundle, subcommand-coverage census, wording rubric.

…rifier auto-gate

…er, eval)

ewalliss added 3 commits June 10, 2026 19:12

feat: skill - independent context-isolated verification (verify-critic)

af7f7c1

feat: add.py - verdict/consensus/eval commands for the independent-ve…

4811f33

…rifier auto-gate

test: independent-verifier - behavioral spec (schema, consensus, ledg…

88295ac

…er, eval)

ewalliss force-pushed the feat/independent-verifier branch from d6b34b6 to 88295ac Compare June 10, 2026 12:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: independent context-isolated verifier auto-gate (Generator–Verifier split)#4

feat: independent context-isolated verifier auto-gate (Generator–Verifier split)#4
ewalliss wants to merge 3 commits into
pilotspace:mainfrom
ewalliss:feat/independent-verifier

ewalliss commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ewalliss commented Jun 10, 2026

Why

What

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant