Skip to content

[devday] Reading order: deterministic, proof-derived "read these first" (co-change wired)#331

Open
rpatricksmith wants to merge 16 commits into
mainfrom
feature/devday-scan
Open

[devday] Reading order: deterministic, proof-derived "read these first" (co-change wired)#331
rpatricksmith wants to merge 16 commits into
mainfrom
feature/devday-scan

Conversation

@rpatricksmith

Copy link
Copy Markdown
Collaborator

What this ships

A deep-scan reading order — a deterministic, proof-derived "read these first" ranking written into the ana scan card and the project-context scaffold. It fuses three measured signals:

  1. Import-graph PageRank centrality (weighted toward architectural signal, away from ubiquitous "stopword" files)
  2. Proven rework risk from the proof chain (completed, contract-verified work items — not git churn)
  3. Proof-derived co-change — files that repeatedly changed together across verified work items, with hidden-coupling detection (co-changed but no shared import edge — the relationship structure alone can't see)

Plus the proof-history risk map, the TS/JS import graph + PageRank, and a non-blocking rescan-on-complete hook.

The headline fix: co-change is now real, not asserted

The branch previously computed co-change (intentCouples) but never threaded it into the fusion, while the scaffold claimed "Fused from import centrality, proven rework risk, and co-change." Two signals fused, three asserted — a by-construction-honesty violation in a product whose whole thesis is honesty.

Co-change is now wired first-class:

  • Honest provenance — "changed together in N verified items" from the proof chain, never a synthetic percentage.
  • Gated to ≥ 2 verified items (denoises, and is what makes it honestly verified co-change; dissolves the mega-refactor artifact).
  • The scaffold's co-change clause is conditional on the signal actually contributing.

Evidence it helps (real before/after on this repo): work.ts rises #6#1 (it's the genuine top hotspot), the pipeline core cluster (work/proof/artifact) surfaces correctly with hidden-coupling flags, and both promotions are legitimate high-churn core files — no test/fixture noise promoted.

Also in this PR (quality + honesty hardening)

  • NUL-byte corruption fixed — proof-history used a literal 0x00 separator that made the file git-binary (unreviewable); escaped to \0, byte-identical runtime, now plain text.
  • Brittle tests fixturized — three tests pinned exact counts to the live, growing proof_chain.json (broke on this merge: 68→69). Exact counts now lock against a frozen synthetic fixture; live-data tests assert only drift-robust invariants.
  • 750-cap coverage honesty hole closed — the caveat now fires when the graph was built from the truncated sample, so a mid-size repo (~750–2500 files) no longer presents a partial ranking as whole-repo. Polyglot caveat reframed by language (no misleading %).
  • Determinism — co-change top-partner pick breaks ties on path, so the fusion self-determinizes regardless of input order.

Validation

  • Merged current main (post-config [devday-config] Optional configurability — honest Tier-1 core (Tier-2 excised) #330); clean, work.ts composition verified line-by-line (main's verdict gate + this branch's rescan hook are non-interacting).
  • Full suite: 4154 passed | 2 skipped, 0 failures (was 4141 + 2 failed pre-fix). Build, typecheck, typecheck:tests, lint all green.
  • Deep-scan latency measured at ~2.9s on a 6,000-file repo (within the "2–5s" claim; 750-cap bounds it); fusion timing gate added.
  • Adversarial review (3 lenses) reconciled — every real finding fixed and covered by a test.

🤖 Generated with Claude Code

rpatricksmith and others added 15 commits June 13, 2026 02:20
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wire the written-but-unused namedImport query so extractImports populates
names[] for TS/TSX (fixing the names:[] // Simplified stub). The query was
also malformed (clause-before-source ordering + name: field required) — fixed
in queries.ts for both typescript and tsx.

Add analyzers/graph/buildGraph.ts: a deterministic file->file import digraph.
Resolves relative + tsconfig-path specifiers (with NodeNext .js->.ts rewrite)
to in-repo files; unresolved/external specifiers produce NO edge. Node identity
is repo-relative POSIX (matching the symbol index); SymbolEntry is untouched.

Persistence to .ana/state/code-graph.json is opt-in via scanProject's new
persistGraphTo option, so ana scan stays read-only (no byte-parity regression).
init passes its staging state dir so the graph travels through the atomic swap
alongside scan.json / symbol-index.json. Deep-tier only, behind the 750 cap.

Tests: named-import wiring (incl. no cross-attribution, default/namespace -> no
names, Python unchanged); graph resolution (relative/index/alias/external/escape,
dedup, self-edge, determinism, fail-soft, .js->.ts rewrite, absolute->relative);
a 750-file deep-tier latency gate. Full suite green (3740 pass).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fuse import-graph PageRank centrality + Slice-1 proof-chain bug-magnet rate +
co-change into one token-budgeted "read these first" list, personalized toward
the active scope when present.

- analyzers/graph/pagerank.ts: ~40-line deterministic power-iteration PageRank
  (fixed 40 iterations, damping 0.85, uniform dangling-mass redistribution,
  node order from the already-sorted graph) — two runs byte-identical.
- analyzers/reading-order/index.ts: buildReadingOrder() fuses the three signals
  into ranked {file, score, reasons[]}, binary-searches the list to a ~1k-token
  budget, and boosts in-scope files (1.5x). Every reason states its measured
  basis (work items, rework cycles, co-change partner, import centrality) —
  never fabricated. Returns null below a 3-edge threshold. resolveImportRelation-
  ships() cross-refs co-change rows vs the graph to set hasImportRelationship
  (true with edge, false when both nodes in-graph without one, null — never
  false — when a file is missing / low-confidence).
- analyzers/reading-order/scope.ts: findActiveScope() reads the single active
  .ana/plans/active/<slug>/scope.md "Files affected" line (declines when zero
  or many are active — no guessing), rebasing repo-root paths onto node identity.
- scan-engine.ts: build the Slice-2 graph in memory at deep tier (persist still
  opt-in), then populate readingOrder + resolve hasImportRelationship. Surface
  tier and sparse graphs stay null (byte-parity preserved).
- scan.ts: render the "Start here" card; scaffold-generators.ts: inject the
  ~15-line Start Here block into project-context.md.

Additive/nullable — existing scan output keeps byte-parity (populates a
currently-null field). Real ana scan surfaces work.ts (68 work items, 14 rework
cycles) and proofSummary.ts (36) fused with centrality; two real-repo scans are
byte-identical. 46 new tests; full suite 3804 pass / 2 skip, lint + test
typecheck clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lobal PageRank + in-degree floor + CJS) — fixes readingOrder ranking

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…+ orchestrator/entrypoint boost)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nary → text)

The intent-couple pair key used a literal NUL byte (0x00) as its separator,
which made git treat the whole file as binary — no diff, no merge, no review.
Replace the raw NUL with the \0 escape sequence: byte-identical runtime
behavior (NUL is still the separator, the one byte a POSIX path can't contain),
but the source is now plain ASCII and reviewable. Also refresh two stale
'2 of 202' comments to be count-agnostic. (This commit shows as a binary diff
for the last time — the new blob is text; all future diffs are normal.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…signal

The scaffold claimed a three-signal fusion (centrality + rework + co-change)
while co-change was computed (intentCouples) but never threaded in — two
signals fused, three asserted. Wire intentCouples first-class into the
reading-order fusion: honest 'changed together in N verified items' provenance
(from coTouchCount/slugs, never a synthetic percentage), gated to >= 2 verified
items so one-off co-touches and mega-refactor artifacts don't count, with
hidden-coupling detection (co-changed but no shared import edge). The scaffold's
co-change clause is now conditional on the signal actually contributing.

Also harden two adversarial-review findings in the same path:
- Coverage caveat now fires when the import graph was built from the 750-file
  SAMPLE, closing a honesty hole where mid-size repos (~750-2500 files) showed a
  partial ranking as whole-repo; polyglot caveat reframed by language (no
  misleading percentage), percentage clamped to 100.
- Co-change top-partner pick now breaks ties on partner path, so the fusion is
  self-determinizing rather than relying on the caller pre-sorting couples.

gitIntelligence.coChangeCoupling stays null on purpose (reserved for a future
git-churn path; populating it would demand a fabricated percentage).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cite

The rescan-on-complete refreshes scan.json only — it must NOT persist the import
graph, which would write an untracked .ana/state/code-graph.json into the
working tree after completion (state/ is only gitignored once ana init has run).
Nothing reads code-graph.json on the live path, so the marginal freshness isn't
worth churning the tree; it refreshes on the next ana init. Also corrects the
stale 'scan.ts:470' worktree-guard cite in the hook comment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Fixturize proof-history count assertions: the 'verified premises' test pinned
  exact touch counts (68/42/36) against the LIVE, ever-growing proof_chain.json,
  so any merge that completed work broke it (work.ts 68 -> 69 on this merge).
  Lock exact counts against a FROZEN synthetic fixture instead; the live-data
  tests now assert only drift-robust invariants (rank, gate, determinism).
- Co-change: integration tests drive it end-to-end through scanProject (the live
  path always passed []) — proving it fires, gates one-offs, flags hidden
  coupling, stays null-safe on a degenerate chain, and a non-tautological
  order-flip test (co-change lifts d.ts above a.ts against the path tie-break,
  failing if the score term is removed — proving it changes ORDER, not reasons).
- New coverage: sampled-graph caveat, polyglot wording, partner-pick determinism
  under shuffled input, conditional co-change subtitle, and a fusion timing gate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The reading-order differentiator (deterministic, proof-derived 'read these
first') was invisible in docs and the CHANGELOG was untouched by the whole
feature. Add a 'Where to start reading' section to the scan concept doc (three
signals, hidden coupling, determinism, the honest coverage caveat — no numbers)
and a comprehensive [Unreleased] CHANGELOG entry covering the reading order,
proof-history risk map, import graph, rescan-on-complete, and the co-change
honesty fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 18, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
anatomia Ready Ready Preview, Comment Jun 18, 2026 1:16am

Request Review

CI (clean checkout) caught a type error a stale local incremental tsc cache
hid: the perf-gate edge literals were {from,to} but ImportEdge requires names.
Annotate as ImportEdge[] and add names: [] — matches the edge() helper used by
the other reading-order tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant