feat(method): ground phase · ground-context · verify-integrity (foundation fv25→27)#6
Merged
Conversation
…to (prose ≡ enforcement) Task explicit-autonomy-dial closes milestone flag-first-freeze (2/2). Replaces the binary conservative-only high-risk guard with an explicit ordered ladder manual < conservative < auto, declared per task in the TASK.md header, and aligns every surface — engine, skill, book, glossary, templates — so the prose names exactly what the engine enforces. Engine (add.py, all 3 trees byte-identical @ c0c9329c): - _AUTONOMY_LEVELS = (manual, conservative, auto) + _AUTONOMY_LINE_RE; _autonomy_level() returns the rung / None (unset) / "?" (unknown token); _autonomy_lowered() = high-risk-safe. - high-risk guard widened: cmd_gate + audit refuse risk:high without a LOWERED rung (manual OR conservative), not conservative-only (unguarded_high_risk_auto). - cmd_check: unknown_autonomy_level (red) + implicit_autonomy (live-only WARN). - cmd_status surfaces the active task's autonomy rung every session. - TASK seed defaults autonomy: auto. engine_pin re-aimed to c0c9329c. Docs/skill (synced ×3): GLOSSARY (survivor + template), 11-governance, 10-setup, appendix-c, run.md, streams.md, SKILL.md all name the 3-mode ladder; "dial" kept only as the formerly-bridge marker (vocab-linter exempt). Tests: test_explicit_autonomy_dial.py DocsAccordTest extended from 1→4 doc surfaces (GLOSSARY + appendix-c + 10-setup + 11-governance), pinning prose ≡ enforcement so it cannot silently regress. 717 green · check 259/0 · audit clean (50). Verify gate: PASS — human-confirmed (Tin, verify gate, 2026-06-10); risk:high·conservative, human-owned (not auto-resolved). §7 emits 2 ADD competency deltas (docs-accord must pin every named surface; a word-ban linter misses a stale multi-valued description). author: Tin Dang
…ersion 23 Milestone flag-first-freeze (the freeze/autonomy seam) closed at 2/2 and archived. Fold (human-gated, fold.md) — 4 open deltas → foundation-version 22→23: - CONVENTIONS +2 bullets: verified-marker-scopes-enforcement-forward (a new guard stamps a marker on the guarded crossing and enforces only on MARKED records, never retro-redding predecessors); prose-accord-pins-every-surface + word-ban-blind-to- stale-enumeration (two faces of necessary-not-sufficient on a "prose ≡ enforcement" deliverable — DocsAccordTest pinned 1 of 4 named surfaces; a word-ban misses a stale "auto | conservative" enumeration). - CONVENTIONS flip-cite: a lived working LABEL drifts from its canonical glossary TERM (bridge with "formerly …" or migrate, never silent-rename) onto the cross-surface-term bullet. - PROJECT §Spec: flag-first-freeze SHIPPED bullet. - PROJECT §Key Decisions: fold row; foundation-version 22→23; updated 2026-06-10. - 4 deltas flipped open→folded in unflagged-freeze + explicit-autonomy-dial §7. Close hygiene (disclosed): the MILESTONE.md was a never-authored stub (goal line only) — the milestone ran entirely task-driven. Back-filled at close with real scope + tasks + 3 observable exit criteria mapped to their tasks, checked [x] per the human's close- affirmation + both verify-gate PASSes. milestone-done passed the v20 goal-gate (3/3), wrote RETRO.md; archive-milestone removed it from active state (+ pre-archive backup). check 249/0 · audit clean (48) · no open deltas · engine unchanged (c0c9329c ×3). author: Tin Dang
The high-risk and autonomy guards read their tokens with `\b<token>:` and took the FIRST match anywhere in the scanned header region — which includes the freeform H1 title and quoted prose, not just the declaration lines. Found at the init-auto-default freeze: a task titled "# TASK: Project seeds autonomy: auto by default at init" read as `auto` even though its declaration line said `autonomy: conservative` — the title substring won. Verified by execution: `_autonomy_level -> auto` (should be conservative), `_autonomy_lowered -> False`. The symmetric hazard is worse: a title containing "autonomy: conservative" on a task whose real declaration is `auto` + risk: high would read as LOWERED, so the `unguarded_high_risk_auto` guard would wave it through. `_RISK_HIGH_RE` shared the identical `\b` flaw. Severity, honestly: a correctness defect in a guard's reader. The title and the declaration are written by the same human in the same file at the same time — no external input, no adversary — so this is a self-inflicted footgun, NOT a security gate event. Fixed, not ceremonialized as a HARD-STOP. Fix: a token counts only at a DECLARATION position — line-start (optionally indented) OR just after the `·` slug-line separator — never a title/prose substring. The deliberately-supported inline form `… · autonomy: conservative` still reads; a title/prose `autonomy: <x>` / `risk: high` no longer does. The FROZEN grammar (`manual|conservative|auto`, line + inline forms) is UNCHANGED — the reader is made to honor it, not amended. Applied to BOTH readers so the two declaration-token readers share one collision behavior (the same fix protects the forthcoming `_project_autonomy`, which reads PROJECT.md prose). Tests (new test_autonomy_reader_anchor.py, 9): line ✓ · inline ✓ · title ✗ · prose ✗ · guard-reliability (title cannot fake lowered) · unset-when-title-only; plus risk line/inline/title. Red-first against the defect, green after the fix. add.py ×3 byte-identical, engine_pin re-aimed c0c9329c -> 6009233a. Full suite green (728) except the unrelated, in-progress init-auto-default red bundle. author: Tin Dang
…autonomy default)
Open the goal-auto-ready sub-milestone and ship its first task. North-star
(recorded as direction, NOT this milestone's deliverable): challenge the spine —
drive toward fewer human gates. This milestone builds the PREREQUISITE — autonomy
earned by goal-clarity; whether a clarified goal may RELAX the freeze gate is
deferred to its own later milestone.
init-auto-default (task 1 · autonomy: conservative · human-gated verify PASS):
autonomy stops being a constant buried in the TASK.md template and becomes an
EXPLICIT, project-scoped, INHERITABLE posture.
- init declares `autonomy: auto` in PROJECT.md (PROJECT.md.tmpl, ×3 trees).
- new-task INHERITS it via `_project_autonomy(root)` — a PURE read-path mirroring
`_project_goal`, fail-SAFE: declared+recognized -> that rung; no line -> `auto`
(v7: absent = auto); garbled/unrecognized -> `conservative` + a
`garbled_project_autonomy` warning (NEVER silently `auto`). TASK.md.tmpl line 4
`autonomy: auto` -> `autonomy: {{autonomy}}`.
- status surfaces `project autonomy: <level> (default — new tasks inherit)`.
- the load-bearing proof: a NON-auto PROJECT.md default flows into a new task
(test_non_auto_default_inherited) — the declared line is load-bearing, not cosmetic.
Frozen contract held under pressure: a bonus `project_autonomy` key added to
`status --json` was caught by the frozen-surface guard (test_json_surface_frozen,
`json_surface_unsanctioned_key`) and REVERTED — the frozen test was left intact,
not edited to pass. A JSON extension would need its own ratified change-request.
Tests: test_init_auto_default.py (8) — init-declares · inherit-auto · non-auto-
inherits (load-bearing) · absent->auto · garbled->conservative+warn · status-
surface · helper-resolves · templates-×3. Full suite 734 green.
add.py ×3 byte-identical; engine_pin re-aimed 6009233a -> cad072c1. .add/PROJECT.md
declares `autonomy: auto` (dogfood). Verify gate: human PASS (Tin) at the
conservative gate — auto-PASS disabled for a trust-layer touch.
4 open §7 deltas for the milestone-close fold: (ADD) declaration-token readers
anchor to a declaration position; (SDD) project-level inheritable autonomy default;
(SDD) deferred `init --autonomy` flag; (ADD) the build stays inside the frozen
contract even for "harmless additive" changes.
Builds on 55d64d9 (the reader-anchor defect fix that this task's own title exposed).
author: Tin Dang
Add the auto-ready-goal classifier: a milestone goal is AUTO-READY iff its `## Exit criteria` has >= 1 criterion AND every one cites a verifier `(verify: <test|command|metric>)`, so a self-driving run can check its own result against the goal without human judgement. Autonomy is EARNED by goal-clarity — the central lever for a fully-auto AI loop. Engine (×3 trees, byte-identical; engine_pin re-aimed): - _exit_criteria_cited(root, mslug) -> (cited, total): PURE read over MILESTONE.md exit criteria; a NON-EMPTY (verify: …) counts — a bare (verify:) does not (the mid-text substring trap). - _goal_auto_ready(root, mslug) -> bool: total >= 1 AND cited == total. - check: WARN goal_not_auto_ready (NEVER red) for the OPEN active milestone when total >= 1 AND cited < total; a zero-criteria milestone stays silent (writing criteria is milestone-shaping's nudge, separable from citing them). - status: a goal-ready: line surfaced every session. Live-only (Must #4): the WARN excludes a done-but-not-yet-archived active milestone (status=done stays the active pointer until archive clears it) as well as closed/archived predecessors — found at the verify gate and closed test-first (test_done_active_milestone_not_flagged, red→green). Docs (prose ≡ enforcement, synced ×3): .add/GLOSSARY.md + GLOSSARY.md.tmpl define "auto-ready goal"; appendix-c-glossary.md + 11-governance.md + skill run.md name the term and the goal→autonomy link. Surface-only by design: the freeze gate, the per-task autonomy contract, and milestone_goal_unmet are UNCHANGED. Honest limit: the lint forces a citation slot but cannot prove the citation is honest ((verify: it works) passes) — citation-theater is the recorded irreducible-floor limit; resolving/running the cited verifier is the deferred upgrade. Tests: new suite test_goal_auto_ready_gate (13); full suite 747 OK. Verify gate: human PASS (conservative + risk: high; auto-PASS disabled). Milestone goal-auto-ready closed (2/2 tasks, 3/3 exit criteria met); 7 open deltas flagged for the next foundation fold. author: Tin Dang
…sion 24
Milestone goal-auto-ready (autonomy earned by goal-clarity) closed at 2/2 tasks,
3/3 exit criteria; init-auto-default gate PASS (auto), goal-auto-ready-gate gate
PASS (human, conservative + risk:high — auto-PASS disabled).
Fold (human-gated, fold.md) — 7 open deltas → foundation-version 23→24, all 7
confirmed (none rejected):
- CONVENTIONS +3 bullets:
- anchor-declaration-token-reader-to-a-declaration-position — a freeform title
or quoted `token: value` must never read as a declaration; a title faking a
lowered rung can DEFEAT a guard (init-auto-default, fixed @ 55d64d9).
- live-only-guard-keys-on-terminal-STATUS — a done-but-not-yet-archived
milestone stays the active pointer until `archive`, so pointer-membership
alone flags a CLOSED milestone; key on status, never just the pointer.
- lint-forces-a-SLOT-not-honesty — verifier-citation raises the goal-clarity
floor but can't prove the citation is real (citation-theater); the
irreducible floor — the human still owns honesty.
- CONVENTIONS flip-cite onto frozen-guard→fix-build-not-matcher: a bonus
project_autonomy key on `status --json` tripped json_surface_unsanctioned_key
and was reverted (build fixed, frozen test intact).
- PROJECT §Spec: goal-auto-ready SHIPPED bullet (project auto default +
auto-ready-goal check; `init --autonomy` knob + the freeze-gate-relaxation
SPINE decision deferred OPEN).
- PROJECT §Key Decisions: fold row 23→24; the OSError-guard divergence on
_exit_criteria_cited recorded as an ACCEPTED CEILING (mirrors the sibling
_exit_criteria convention; not hardened in isolation).
- 7 deltas flipped open→folded in init-auto-default + goal-auto-ready-gate §7.
747 tests OK; add.py check 0 fail; no open deltas remain.
author: Tin Dang
Collapse the done goal-auto-ready milestone out of active state (2 tasks: init-auto-default, goal-auto-ready-gate). Files on disk untouched; the active state.json drops the milestone + its task records and clears active_milestone. A pre-archive-state.bak.json recovery snapshot is written (design-for-failure: an accidental archive stays recoverable — it captures the full milestone + member task records the archived slug-list summary drops). author: Tin Dang
… fold → fv25 Add a `ground` phase-0 preamble before specify so a task's contract, tests and build are grounded in the REAL current codebase, not assumption. The seven steps (specify→observe = §1–§7) keep their brand; ground is AI-owned and adds NO new human gate (the one approval stays at the §3 contract freeze). Three tasks (breadth-first), all done & PASS: - ground-phase-engine: insert `ground` as PHASES[0] (PHASE_OWNER=ai · guide map · new-task→ground · advance ground→specify) + the `## 0 · GROUND` TASK.md template (×3) + phases/0-ground.md guide (×3) + ~12 downstream test conformances. Engine byte-identical ×3 (pin e6b8c3da). - ground-bundle-wiring: the contract-freeze "grounded? cite anchors" checklist line + `add.py status`/`check` SURFACE the grounding state — tri-state measure (_grounded_state), human-readable + a never-red WARN on the existing `warnings` array, no new --json key (mirrors goal-auto-ready). Measure, never block. - ground-prose-align: book (02-the-flow ×4 · appendix-c ×4) + skill phase-table (SKILL.md ×3) + GLOSSARY name `ground` and render it as the §0 preamble to the seven steps, byte-synced across all trees. Retrospective consolidation (human-confirmed fold → foundation-version 24→25): 12 open deltas → 5 new CONVENTIONS bullets (ground-before-§3 · ordered-constant-index-hazard · additive-surface-byte-invisible · engine-derived-prose-guard · grandfather-retrofit-ceiling) + 1 flip-cite onto four-mirror-trees + 1 §Spec ground-phase ship bullet + the 7-step frozen-line parenthetical. Then archive-milestone + compact (3 task dirs → .add/archive/). Honest ceiling: shipped with ZERO lived runs STARTING at ground — all 3 tasks were grandfathered at specify (created before ground existed); §0 retrofitted at build so each dogfoods `grounded ✓` live. First lived run is next-milestone. Full suite 790 OK; dogfood check 0 failed. author: Tin Dang
… the working folder The ground phase now gathers more than code. `0-ground.md`'s `## Gather` gains a **Context (working folder)** bullet — docs/textbase · TODOs · config/manifests · data/fixtures (task-delta only) — and the `## 0 · GROUND` template gains one light `Context (working folder):` line between Touches and Honors. The grounding measure is untouched (the `Anchors the contract cites:` line keeps its role; add.py byte-identical to engine_pin), and §0 stays lean (one line, not a per-category block). ground-context-sources ran the FULL flow as the FIRST lived ground run — a task created AT `ground` (not retrofitted), reaching `grounded ✓` live. Dogfooded the milestone's own technique: a haiku subagent did the broad working-folder sweep (returned the ×3/×3 sync md5s + the guard list) while the main context deepened on the precise guard assertions. - §3 FROZEN @ v1 (human-approved: the new Context line over folding into Touches) - test_ground_context.py written RED first (3 feature failures) -> green by the guide+template edits only; full suite 800 -> 810 OK; dogfood check 0 failed - 0-ground.md ×3 + TASK.md.tmpl ×3 byte-identical; book prose untouched (scoped out) Opens the ground-context sub-milestone (1 of 2 tasks). Closes the "zero lived runs starting at ground" ceiling folded at fv25. author: Tin Dang
…heap, deepen task-specifically) The ground phase now has an opinion on grounding *economics*, not only *completeness*. `0-ground.md` gains a compact gather-METHOD hint (HOW — distinct from task 1's WHAT): a "How — gather efficiently:" line closing `## Gather` (prefer a small-model subagent / fast index / skim for the BROAD sweep, then DEEPEN on what THIS task needs — never lock a shallow first pass), a leading Step 0 in the `## AI prompt`, and the intro reworded "REAL current codebase" -> "REAL current working folder" (closing task-1's §7 coherence follow-up). The hint RECOMMENDS a subagent; the engine spawns nothing (tool-agnostic) — add.py stays byte-identical to engine_pin. Dogfooded the technique in-flight: a haiku subagent ran the broad working-folder sweep (returned the ×3/×3 sync md5s + the guard list) while the main context deepened on the guard assertions — exactly the sweep-cheap-then-deepen split this task ships. - §3 FROZEN @ v1 (human-approved: compact "How" line + Step 0 + intro reword) - test_ground_context.py EXTENDED red-first (GatherMethodHint: 3 RED -> green by the guide edits only); full suite 803 OK; dogfood check 259 passed, 0 failed - 0-ground.md ×3 byte-identical (md5 ba7147e5); add.py == engine_pin (no engine action); book prose untouched (scoped out) - verify auto-resolved PASS (autonomy: auto — prose-only, no security/concurrency/ architecture residue) Closes the ground-context sub-milestone (2 of 2 tasks). The milestone goal — ground gathers the full working-folder context efficiently + task-specifically — is met. author: Tin Dang
Human-gated fold at ground-context close (2/2 tasks, 2/2 exit criteria). The
milestone gave ground a second axis: it gathers not only WHAT (the working-folder
categories) but HOW (sweep the broad pass cheaply, then deepen task-specifically).
Consolidate, not append-9-bullets (lean foundation): 9 open deltas → 4 new
CONVENTIONS bullets + 2 flip-cites onto fv25 (δ6 self-closed within-milestone).
CONVENTIONS.md (+4 bullets):
- (ADD) Ground has two axes — completeness (WHAT) + economics (HOW) [δ7]
- (ADD) A capability can be ADDED as guide-prose recommendation while the
engine stays tool-agnostic — the pin holds across the addition [δ8]
- (ADD) Dogfooding the shipped technique in-flight validates it [δ5 + δ9]
- (TDD) A prose feature is RED-greenable by token-presence guards; triage
the RED split [δ2 + δ3]
CONVENTIONS.md (+2 flip-cites onto fv25 bullets):
- additive-byte-invisible → the TEMPLATE twin held (additive §0 line invisible
to structure/token guards) [δ1]
- grandfather-ceiling → CLOSED: the first lived ground run (a task created AT
`ground`) reached `grounded ✓` live [δ4]
PROJECT.md:
- foundation-version 25 → 26
- §Spec ground-context ship bullet (SHIPPED 2026-06-11)
- §Key Decisions fold row
- δ6 self-closed: task 2 reworded the intro task 1 flagged (no bullet; ledger
annotated)
All 9 deltas flipped open→folded; `add.py deltas` → 0 open; `add.py check`
261/0; full suite 803 OK. Engine e6b8c3da ×3 unchanged (prose/template only).
author: Tin Dang
Closes the ground-context sub-milestone after its fv26 fold. Moves the milestone + 2 task bundles (5 files) to .add/archive/ground-context/ — state removed from active, files preserved for recovery (reverse the moves). - archive-milestone: ground-context removed from active state (2 tasks) - compact: milestones/ground-context/ (3 files) + tasks/ground-context-sources/ + tasks/ground-gather-hint/ -> .add/archive/ground-context/ - archived rollup now 18 milestones (58 tasks); no active task - add.py check 249/0 author: Tin Dang
Verify-integrity milestone, task 1 of 3. The build→verify half of ADD
trusts the green; nothing stopped a build from GAMING that green by
editing the red tests or the frozen §3 contract after the red run. This
adds a mechanical floor: an md5 tripwire that catches test/contract
tampering without ever running a test (tool-agnostic).
Engine (×3 byte-identical — canonical · dogfood · bundled; pin bumped to
a6eed5e0c374694945cf4273d1a2581d):
- SNAPSHOT at the tests→build advance: inside cmd_advance's existing
`if nxt == "build":` block, unconditionally overwrite
state[task]["tripwire"] = {contract_md5, tests:{relpath:md5}}. The test
set is exactly what the resolver returns (reuses _resolved_test_files,
never re-globs); the §3 text is _raw_phase_bodies(...).get(3,""). The
existing flag_verified set in the same block is the co-witness.
- RE-CHECK at the verify gate: inside cmd_gate, before any COMPLETING
outcome, _tamper_guard re-reads + re-hashes. A weakened/deleted test or
an edited frozen §3 → _die HARD-STOP. A tamper is HARD-STOP-class — NOT
launderable through RISK-ACCEPTED (the guard sits before the waiver write).
- TRI-STATE, co-witnessed by flag_verified: present+match → pass ·
present+diverged → HARD-STOP · absent+flag_verified → HARD-STOP
(tripwire_missing, the self-erase bypass) · absent+unverified → skip (legacy).
- FAIL-CLOSED: any md5/read error on a tracked file → treated as diverged,
never a crash (design-for-failure).
- STANDING MONITOR: cmd_check gains a never-red build_tampered WARN for a
diverged non-done task (early signal; the gate is where it bites).
- Refactor: _tests_count/_declared_tests_count delegate to new
_primary_test_files/_declared_test_files/_resolved_test_files — one
resolution source, paths reused by the snapshot, every prior count preserved.
Tests: add-method/tooling/test_tamper_tripwire.py — 12 red→green
(snapshot/overwrite · weakened/deleted/contract-edit block · not-launderable
+ clean-RISK-ACCEPTED placement guard · tri-state clean/legacy/suspicious ·
standing never-red WARN). Full suite 815 OK; dogfood check 254 passed, 0 failed.
Known limit (human-ratified at the verify gate, risk:high/conservative):
the snapshot lives in agent-writable state.json, so a determined agent that
forges both the snapshot AND flag_verified still slips — the mechanical floor
raises bypass cost, it does not seal. The semantic refute-read
(earned-green-rubric) + the bounded ≤3-attempt self-heal (heal-then-escalate)
+ the human gate are the backstops. md5 here is tamper-EVIDENCE not
authentication (consistent with engine_pin's file-identity idiom) — ratified
as NOT a security HARD-STOP.
footer: verify-integrity 1/3 — the mechanical floor only; the judgment-cheat
refute-read and the bounded self-heal loop remain.
author: Tin Dang
…ARNED its green
Task 2 of 3 in milestone verify-integrity. Task 1 shipped the MECHANICAL
floor (an md5 tamper tripwire that catches an edited test or frozen contract).
This task adds the JUDGMENT layer for the cheats the tripwire cannot see — a
build that makes the UNCHANGED red suite pass without earning it:
- src OVERFIT to the test fixtures (special-cased to the literal inputs)
- VACUOUS / tautological asserts (green even against an empty implementation)
- real logic STUBBED away (the function returns a constant the tests accept)
Scored by an INDEPENDENT adversarial refute-read — a reviewer (a subagent under
autonomy:auto is recommended; the engine never spawns one) prompted to argue
"the green was NOT earned". A confirmed earned-green failure is HARD-STOP-class:
never auto-passed, never RISK-ACCEPTED. The verify-gate, whole-suite
specialization of run.md's adversarial verify (single-source pointer).
Prose + template only — add.py stays byte-identical to engine_pin (the engine
stays judgment-free; the resolver, not the engine, judges earned-vs-gamed).
ENFORCEMENT (the auto-gate wiring + the <=3-attempt self-heal loop) is task 3
(heal-then-escalate), named as the explicit KNOWN LIMIT in the frozen contract.
Surfaces (the rubric stated identically across every copy):
- guide ×3 phases/6-verify.md — new "Part four — was the green earned?"
- book ×4 docs/08-step-6-verify.md (incl. the previously-unguarded root)
- TASK.md.tmpl §6 ×3 — one additive earned-green check line
- GLOSSARY.md.tmpl ×3 + the living .add/GLOSSARY.md — two new terms
(earned green · adversarial refute-read)
Guarded by a new prose-TDD suite (test_earned_green_rubric.py, 13 tests):
anchor-presence per surface, _norm whitespace-collapse for "stated identically"
across hard-wrapped copies, mirror-parity md5 (guide ×3 / book ×4 / template ×3
each one hash), a root<->canonical book guard (08 is not a woven chapter), a
scope guard that the task-3 loop machinery has NOT leaked into the task-2 guide,
and test_engine_unchanged (add.py == pin). test_xml_convention gains the new
Part-four heading so the over-tagging guard stays real.
Dogfood: this task crossed tests->build under task 1's LIVE tamper tripwire and
re-checked clean at the verify gate (gate PASS, exit 0, no HARD-STOP) — the
tripwire validated end-to-end on a real task. The gate auto-resolved under
autonomy:auto (normal-risk, complete evidence, no residue); the one human gate
was the §3 contract freeze.
Red -> green: the 13 tests ran red (8 anchor-absent failures) before the prose
build, green after. Full suite 815 -> 828 OK. Dogfood add.py check 259 passed,
0 failed.
author: Tin Dang
…escalate)
A confirmed cheat no longer dies on first sight — it returns the task to BUILD
for an honest redo, MONOTONICALLY up to a cap of 3, then records HARD-STOP to
the human. This closes the milestone's third exit criterion: "a confirmed cheat
self-heals for up to 3 honest re-build attempts before it HARD-STOPs; a gamed
green is never auto-passed."
Engine (×3 byte-identical trees + pin re-aimed a9a91cf7→7b05eaf9):
- HEAL_CAP = 3.
- _heal_or_escalate(root, state, slug, *, reason, source): attempts < cap ->
increment + phase="build" (DIRECT, no re-snapshot) + save_state BEFORE
raise SystemExit(3); attempts >= cap -> gate="HARD-STOP" + _die. The increment
is durable-before-exit so a re-run never grants a free attempt. MONOTONIC —
never auto-resets (an unguarded reset via the open cmd_phase would be a trivial
cap bypass; the advisor blocked reset-on-recross).
- _tamper_guard's `if diffs` branch rewired from immediate _die to the loop
(source "tamper", mechanical/ENFORCED); tripwire_missing stays immediate _die.
- `add.py heal <slug> --reason "..."` — the semantic entry (source "refute-read",
honor-system): requires phase==verify, always exits non-zero (3=redo, 1=escalate).
The engine names the CHANNEL but never spawns the read (tool-agnostic).
Detection is two-layer: the mechanical tripwire (task 1) feeds the loop's
enforced entry; the earned-green refute-read (task 2) feeds its honor-system
entry. A confirmed cheat is HARD-STOP-class — never RISK-ACCEPTED-waived, like
a security finding.
Guides + book + glossary (mirrors all one md5):
- run.md gains "The bounded self-heal loop" (home); 6-verify.md + book 08 point
to it; 5-build.md + book 07 carry the honest-redo note; GLOSSARY adds the
"bounded self-heal" term (cap 3 · monotonic · escalation · honor-system).
Tests (839 OK; +11 new, 3 existing EVOLVED — not weakened):
- test_heal_then_escalate.py (NEW, 11): mechanical loop ×6, monotonic ×1,
semantic ×3, loop-documented ×1.
- test_tamper_tripwire.py: _assert_blocked phase=="verify" -> in ("verify",
"build") (a first tamper now returns-to-build); gate=="none" kept STRICT.
- test_earned_green_rubric.py: test_engine_unchanged 1->3 cheat tokens + the
"NOT earned" prompt (allows the frozen "refute-read" label); the absence-check
flipped to a presence/separation guard now the loop landed. Coverage up.
- test_min_pillar.py: added `heal` to the lifecycle census + a non-zero-exit
tolerance (heal is a loop/refusal verb).
Each existing-test edit is an evolution of an OBSOLETE assertion (the behavior
changed), not a weakening to force a pass: the real invariant stays guarded and
coverage holds-or-rises. Dogfooded task 2's own rubric on this build — an
independent adversarial refute-read returned EARNED (zero hard findings; its one
nit, a trivially-true assert, was strengthened before the gate).
Verify gate: human-gated PASS (risk: high · autonomy: conservative). Reviewed by
Tin Dang 2026-06-11 — the engine's first mechanical self-heal loop + a pin bump
human-owned, like task 1.
milestone: verify-integrity (3/3 tasks done · goal-ready)
author: Tin Dang
Milestone verify-integrity done (3/3 tasks, 3/3 exit criteria, each verifier-backed); RETRO.md written. The method's TRUST core gained its first mechanically-enforced HARD-STOP: a two-layer anti-cheat (mechanical tamper- tripwire + adversarial earned-green refute-read) plus a bounded self-heal that returns a confirmed cheat to build ≤3 times then HARD-STOPs. A gamed green is never auto-passed. Human-gated fold of the 16 open deltas → foundation-version 26→27 (consolidate, not append-16; all 16 confirmed, none rejected): CONVENTIONS.md — 5 new bullets: - (ADD) build-integrity = a mechanical floor + a judgment ceiling; the floor on agent-writable state is necessary-not-sufficient, so the refute-read + human gate are the real backstop; a confirmed cheat is HARD-STOP-class. - (ADD) a mechanical-HARD-STOP guard = snapshot-at-seam → re-check-at-gate → fail-closed; its self-heal cap is real only if it cannot be cleared without a recorded human action (monotonic; the phase verb is unguarded). - (TDD) an engine change that invalidates an EXISTING assertion makes the test edit an EVOLUTION not a weakening iff the real invariant stays guarded, coverage holds-or-rises, and the reason is documented. - (ADD) a security-line classification can EMERGE during build — surface it for human ratification AT the gate, never self-grant. - (SDD) two how-we-author sharpenings: a scope guard against later-stage machinery leaking backward into earlier prose; a path-returning helper so a new feature and an existing counter share ONE resolution source. CONVENTIONS.md — 3 flip-cite reinforcements (no new bullets): - dogfood-at-own-gate (first normal task = cheapest E2E; method audits own builds) - presence necessary-not-sufficient (existence on one surface ≠ agreement across two) - prose-guide red→green (anchor-presence + mirror-parity, engine byte-pinned) PROJECT.md — §Spec verify-integrity SHIPPED bullet (carrying the both-gate-paths validation + the live-dogfood re-anchor path); §Key Decisions fv27 row; foundation-version 26→27. 16 deltas flipped open→folded across the 3 task TASK.md files; add.py deltas reports no open deltas; check 265/0; engine 7b05eaf9 ×3 unchanged this step. milestone: verify-integrity (closed) author: Tin Dang
…re milestone Heavy-archive of the done verify-integrity milestone: 6 files moved out of the active tree into .add/archive/verify-integrity/ (the MILESTONE.md + RETRO.md + the 3 task dirs). state.json updated to the archived rollup (19 milestones / 61 tasks); active tree is now clean (no active task). Recovery: reverse the moves; state needs no edit. No engine or test change — the pin stays 7b05eaf9 ×3; dogfood check 249/0. milestone: verify-integrity (archived) author: Tin Dang
…aware docs twin PR #6 was CI-red on Python 3.10/3.12 though green locally on 3.14 — two test-only portability defects the newer interpreter masked. Engine untouched (pin 7b05eaf9 ×3 unchanged); no behavior change, no weakened assertion. Family B — argparse intermixing (2 failures): test_heal_then_escalate._gate and test_tamper_tripwire._to_verify_and_gate built `gate <outcome> --owner.. --ticket.. --expires.. <slug>` (slug LAST). argparse <=3.12 cannot bind an optional (nargs="?") positional that follows value-taking flags -> "unrecognized arguments: <slug>" -> the gate never records (gate stays 'none'/exit 2). 3.13+ intermixing fixed it, masking it on 3.14. Aligned both helpers to the house order the rest of the suite uses: `gate <outcome> <slug> [--flags]` (slug right after outcome). The behavior under test (RISK-ACCEPTED records a waiver / a cheat is not launderable) is unchanged — only the incidental argv order. Family A — gitignored .add/docs twin (5 errors): test_explicit_autonomy_dial.DocsAccordTest and test_goal_auto_ready_gate. DocsAccordTest read the `.add/docs` twin unconditionally. `.add/docs` is gitignored (regenerated by `add.py init`) and ABSENT on a clean CI checkout -> FileNotFoundError. Adopted the present-trees idiom already used by test_foundation_update_loop / test_flow_diagram: assert parity only against twins that exist, requiring the tracked canonical + bundle. The ×3 byte-sync guarantee is preserved (the gitignored mirror is checked locally where it exists, skipped where it can't). Verified GREEN on the CI interpreters: python3.10 839 OK · python3.13 839 OK (skipped=3) · python3.14 839 OK. Follow-up (not in this commit): the engine itself rejects flags-before-slug on py<=3.12 for every optional-slug subcommand — a real robustness gap to harden behind a pin bump in a separate task. The natural order (slug after outcome) works on all versions. author: Tin Dang
pilotspacex-byte
added a commit
that referenced
this pull request
Jun 11, 2026
… + worker commits its report (#7) * chore(method): log engine-argv-portability follow-up task Tracks the deferred robustness half found during PR #6 review: the CLI rejects `gate <outcome> --flags <slug>` (flags before the optional slug positional) on Python <=3.12 across every optional-slug subcommand — argparse cannot bind an optional positional that follows value-taking flags (3.13+ intermixing fixed it). The test helpers were fixed to the natural order in 9d52302; the engine itself is left as-is and tracked here behind a future pin bump. Scaffold carries the repro + fix-direction in §0/§1 so it is actionable. author: Tin Dang * feat(method): wave-protocol-runtime — streams.md merge-time fork-base + worker commits its report Amend the parallel-streams rubric (streams.md ×3) to close v19 wave deltas #7 and #8, so the concurrency protocol is satisfiable on a spawn-time-worktree runner and a worker durably persists its own report. Amendment A — merge-time fork-base shift: on a runner that creates each worktree AT spawn from a pool, the pre-spawn `rev-parse HEAD` evidence cell is unsatisfiable, so the `unverified_fork_base` check SHIFTS (it never skips) to worker step-0 (sync-to-base + re-echo), verified by the orchestrator at merge-time before merge-back. The pre-spawn rule stays the DEFAULT for fresh-HEAD-worktree runners; the merge-time path is an additive ALTERNATIVE. Amendment B — worker commits its report: the worker `<return>` contract now requires COMMITTING SUMMARY.md + deltas.md in the worktree (uncommitted files survive only by harness courtesy), so the serial-integration merge-back carries the worker's verdict. Bundle ran §1→§7 under risk:high · autonomy:conservative with red/green TDD: 4 token-presence + ×3-parity guards in test_streams.py (2 new-behaviour tests red before the build, 2 invariant tests green throughout). Full suite 843 OK on py3.10 AND py3.14. Tamper tripwire CLEAN (engine `_tripwire_divergence` -> []); §3 and test_streams.py byte-unchanged since the tests->build snapshot — the build touched only streams.md ×3. engine_pin HOLDS (prose-only, no add.py change). Human verify gate: PASS (green EARNED per the refute-read). The one residue is the freeze-approved deferred-enforcement flag: these guards lock the WORDS and the MIRROR, not engine EXECUTION of the shift (the engine can't see a worktree pool). Logged as the engine-merge-base-enforcement follow-up so the disclosed gap is tracked, not forgotten. author: Tin Dang * docs(method): wave-protocol-runtime — land Amendment A in its other §3-named home (ledger evidence-cell) PR #7 careful review found Amendment A stated the merge-time fork-base shift in the "Design for failure" bullet but NOT in the ledger "Evidence cells, not ticks" paragraph — a same-file second mention that still read pre-spawn-only, so a reader of the ledger section alone would think a spawn-time-pool-runner spawn is impossible (the opposite of what A enables). Add one clause to that paragraph: on a spawn-time pool runner the pre-spawn paste is unsatisfiable, so the fork-base cell holds the worker's step-0 post-sync echo (still == base) and the `unverified_fork_base` refusal shifts to merge-time before merge-back — it shifts, it never lifts. WITHIN the frozen §3 — the contract named "the evidence-cell `unverified_fork_base` note" as a valid home and the freeze flag said "either spot is conformant". No test weakened, no contract edited, engine_pin HOLDS. streams.md ×3 re-synced byte-identical (md5 82e08b0d); full suite 843 OK on py3.10; dogfood check 256/0. A post-gate honesty note records that this text changed AFTER the PASS as a within-frozen-§3 merge-review refinement. author: Tin Dang --------- Co-authored-by: Tin Dang <tindang.ht97@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this branch ships
Three ADD-method milestones, each closed with a human-gated fold into the versioned foundation (
foundation-version 24 → 27):1.
ground-phase— a phase-0 GROUND preamble (→ fv25)A new
groundphase rides in front of the seven steps (PHASESis now 9:ground → … → done). It is AI-owned, with no new human gate (the one approval stays at the §3 contract freeze). Each task carries a## 0 · GROUNDmap (real files · symbols · the anchors §3 cites);status/checksurface the grounding state as a never-red measure. Additive to the frozen 7-step flow.2.
ground-context— Ground gathers the whole working folder, efficiently (→ fv26)The §0 gather now spans the working folder, not just code: docs/textbase · TODOs · config/manifests · data/fixtures.
0-ground.mdgained a gather-method hint — sweep the broad pass cheaply (a small-model subagent / fast index / skim), then deepen task-specifically — a recommendation the engine never spawns (tool-agnostic). Closed the fv25 zero-lived-run ceiling.3.
verify-integrity— prove the green was EARNED, not gamed (→ fv27)The method's TRUST core gains its first mechanically-enforced HARD-STOP:
tamper-tripwire) — md5(red tests + §3 contract) snapshotted at tests→build, re-checked at the verify gate; any post-red edit blocks an auto-PASS.earned-green-rubric) — overfit · vacuous · stubbed-away, scored by an independent adversarial refute-read (a subagent recommended underauto; the engine never spawns it — tool-agnostic).heal-then-escalate) — a confirmed cheat (either layer) returns to build for ≤3 monotonic honest re-builds, then HARD-STOPs to the human. A gamed green is never auto-passed and never RISK-ACCEPTED-waived — HARD-STOP-class, like security.Engine pin bumped to
7b05eaf9×3 trees.Verification
add.py check249/0.verify-integrityexit criteria each verifier-backed; every gate human-confirmed (or auto-resolved on complete evidence, per the autonomy ladder).Foundation
16 verify-integrity competency deltas folded into
CONVENTIONS.md(5 new conventions + 3 flip-cite reinforcements) andPROJECT.md(§Spec ship bullet + §Key Decisions row);foundation-version: 27.