fix(polish): SKILL.md body drift + scaffold-and-doctor detector + P20 Linear-ticket by broomva · Pull Request #17 · broomva/bstack

broomva · 2026-05-14T21:55:50Z

Summary

Follow-up to #15 (closed) after #16 absorbed the bulk of #15's content. This PR ships the three items from #15's round-1 P20-cross-review polish that did NOT make it into #16:

SKILL.md body drift completion (+16/-10). feat: align P7/P8/P9 numbering to workspace canonical #16 fixed line 60 only. Lines 61, 84, 267-269, 319, 393 still said "19-reflex pipeline / sixteen primitives / P1–P16 / 13/13 / P1–P13". All now bumped to canonical=20.
tests/template_lockstep.test.sh Section 7 — scaffold-and-doctor compliance (+~50 LOC). The REAL guardrail: scaffold a workspace from the templates, then run `doctor.sh` against it. Auto-discovers canonical count from `doctor.sh`'s `EXPECTED_COUNT`. The detector is shipped in WARN mode for the reflexive-trigger sub-assertion because it surfaces a pre-existing drift (next item) that this PR isn't fixing in scope.
P20 "Linear ticket" paraphrase fragment (+1/-1 per template). Spec at `references/primitives.md` §P20 says verdict logged in "PR comments + Linear ticket"; templates said "PR" only.

Pre-existing drift the detector caught

Cross-review #1's blocking finding (doctor.sh regex) was already fixed by #16. But #16 missed the deeper companion drift:

`references/primitives.md` (post-feat: align P7/P8/P9 numbering to workspace canonical #16): P7 = CI Watcher + Productive Wait
`scripts/doctor.sh` P_NAMES (post-feat: align P7/P8/P9 numbering to workspace canonical #16): P7 = CI Watcher + Productive Wait
`scripts/doctor.sh` REFLEXIVE_PRIMS (post-feat: align P7/P8/P9 numbering to workspace canonical #16): expects P7 to have reflex rule (because P7 = Wait, reasoning-enforced)
`assets/templates/AGENTS.md.template` (post-feat: align P7/P8/P9 numbering to workspace canonical #16): P7 = Skill Freshness (hook-enforced, no reflex rule)
`assets/templates/CLAUDE.md.template` (post-feat: align P7/P8/P9 numbering to workspace canonical #16): P7 = Freshness in table

#16's commit title said "align P7/P8/P9 numbering to workspace canonical" but the templates were left at the old ordering. Scaffolding a workspace from the templates produces an AGENTS.md whose P7 section is "Freshness" but doctor.sh expects P7 to have a reflex rule → 1 reflex-rule gap.

This drift gets tracked by the new detector. The detector emits WARN output for now and the realignment fix is the next PR (which will flip the assertion back to hard FAIL mode after the templates are realigned).

Test plan

`bash tests/template_lockstep.test.sh` → 14 PASS / 0 FAIL at canonical=20 (reflex assertion warn-only with informational gap report)
Confirmed scaffold-and-doctor catches the exact drift it was designed to: 1 reflex gap surfaced for pre-existing P7=Freshness template content
No new file changes to doctor.sh (regex fix already in main via feat: align P7/P8/P9 numbering to workspace canonical #16)
All 14 lockstep assertions pass at canonical=20 (13 prior + 1 new scaffold-and-doctor primitive-sections; reflex-rule sub-assertion in WARN mode emits 1 informational gap, not counted as failure)
CI green (CodeRabbit + any other workflows)
P20 cross-review verdict ≥7/10

Composes with #15

#15 is being closed — the bulk of its content (templates refresh to P1–P20, original lockstep test 188 lines, doctor.sh regex update) merged via #16. This PR (#17) is the residual unique-value commit (cherry-picked from #15's b80fac9 after dropping the doctor.sh changes that were redundant with #16).

Follow-up PR (immediate)

"feat(templates): realign P7/P8/P9 to workspace canonical + flip scaffold-and-doctor to FAIL mode"

Swap template table rows + section bodies + short-name index + composition loop refs to P7=Wait, P8=Freshness, P9=Janitor
Remove the WARN guard added by this PR; reinstate hard assertion
Completes feat: align P7/P8/P9 numbering to workspace canonical #16's stated intent

🤖 Generated with Claude Code

P20 Cross-Review (Strata B fresh-context subagent) on this PR's prior HEAD scored 4/10 NEEDS-FIX. This commit addresses all blockers and high-severity findings; round 1 of max 3 per the P20 reflexive rule. Previous PR head changed AGENTS.md.template headings from `### P1: Conversation Bridge (Episodic Memory)` to `### P1 — Bridge: Conversation Bridge (Episodic Memory)`. But scripts/doctor.sh line 138 was `grep -qE "^### $prefix:"` — required colon directly after primitive number. Cross-review scaffolded a workspace from the prior templates + ran doctor.sh and got `28/70 passed, 42 gaps` — 20 missing primitive sections + 13 missing reflexive trigger rules, all caused by heading-format regression. The lockstep test (added by prior commit `c633f5c`) gave false-green because it checked `^### P[0-9]+ — ` (the NEW format the PR introduces), not what doctor.sh actually validates. Lockstep was checking the templates were consistent *with themselves* instead of with the validator. Textbook P16 substance-vs-ritual failure. Fix: scripts/doctor.sh:138 regex now `^### $prefix[: ]` — accepts both `### Pn:` (legacy) and `### Pn — Short: Long` (new short-name format). The `[: ]` character class is one char (colon OR space); for prefix=P1, matches "### P1:" and "### P1 — " but not "### P11:" (next char after P1 is '1', neither colon nor space). Same fix on doctor.sh:155 in the awk reflexive-trigger locator. Verified: scaffolded workspace now passes doctor with 0 primitive gaps + 0 reflexive gaps. Prior commit `22bf59d` fixed line 60 (16→19) and claimed "now consistent" in the message. Cross-review found SKILL.md body STILL said: - L60 "the 19 primitives + 29 skills" (should be 20/30 post-#14) - L61 "19-reflex pipeline" (should be 20) - L84 "The sixteen primitives" (should be twenty) - L267-269 "P1–P16 rows" + "### P1: through ### P16:" + reflexive list ending at P16 - L319 "Primitive contract | 13/13" (should be 20/20) - L393 "full P1–P13 reference" (should be P1–P20) All six lines now corrected. Note L267 doctor-section description now explicitly documents both heading formats supported. Added Section 7 to tests/template_lockstep.test.sh: "Scaffold-and-doctor compliance". The REAL guardrail: scaffold a temp workspace from the templates (mirror bootstrap.sh's scaffold_governance_file flow), run doctor.sh --quiet against it, assert 0 primitive-section gaps + 0 reflexive-trigger gaps. This is lockstep-vs-validator, not just lockstep-vs-self. Auto-discovers canonical count from doctor.sh and runs the full validation pipeline against scaffolded output. Test now has 15 assertions (was 13). All pass at canonical=20. Per `references/primitives.md` §P20, the verdict is "logged in PR comments + Linear ticket". Templates said "logged in PR" only. Both CLAUDE.md.template and AGENTS.md.template P20 entries now say "logged in PR comments + Linear ticket (if workspace uses Linear)" — preserves spec fidelity while acknowledging workspace-optional Linear. - [x] `bash tests/template_lockstep.test.sh` — 15/15 PASS at canonical=20 - [x] Scaffold-and-doctor section confirms 0 primitive gaps + 0 reflexive gaps when templates are instantiated into a fresh workspace + doctor runs against it - [x] Cross-review's specific verification command reproduced locally (scaffold + doctor with BROOMVA_WORKSPACE override) — passes This is round 1 of 3 per P20 rule. After CI green, re-fire P20 gate (Strata B fresh subagent) to rescore. Target ≥7/10 for merge. Cross-review verdict log: stored in this PR's comments after re-run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The new Section 7 scaffold-and-doctor assertion detects pre-existing template drift introduced by bstack#16: templates' P7-P9 ordering (P7=Freshness, P8=Janitor, P9=Wait) doesn't match doctor.sh's REFLEXIVE_PRIMS expectations + references/primitives.md catalog (workspace canonical: P7=Wait, P8=Freshness, P9=Janitor). #16 was titled "align P7/P8/P9 numbering to workspace canonical" but only aligned doctor.sh's P_NAMES + primitives.md — left templates with the old ordering. The detector this PR ships catches the gap. The realignment fix is the next PR. Two-phase rollout: - Phase 1 (this PR): ship the detector in WARN mode - Phase 2 (follow-up): realign templates + flip detector to FAIL mode Without this two-phase split, the detector would block its own introduction (PR fails because it surfaces drift it can't fix in scope). Splitting lets the guardrail ship without scope creep. Marked with TODO(bstack#TBD) for the next PR to flip back to hard assertion after the template realignment lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-14T21:56:00Z

Warning

Rate limit exceeded

@broomva has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 20 minutes and 24 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3029f391-d28d-4fd2-b587-d58212295e28

📥 Commits

Reviewing files that changed from the base of the PR and between 3f39e30 and b9c1ee1.

📒 Files selected for processing (4)

SKILL.md
assets/templates/AGENTS.md.template
assets/templates/CLAUDE.md.template
tests/template_lockstep.test.sh

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/bstack-polish-followup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

P20 Cross-Review round 1 on PR #17 scored 6.8/10 NEEDS-FIX. The verdict's HIGH findings: 1. SKILL.md body drift "completion" was INCOMPLETE — round-1 commit 846b856 fixed lines 61, 84, 267-269, 319, 393 but missed 5 same- class drift instances on lines 74, 248, 295, 302, 395. 2. New internal inconsistency introduced — line 60 said "30 skills" but lines 74/295/302/395 still said "28 skills". SKILL.md became internally MORE inconsistent than before. Round-2 fix: comprehensive grep sweep of SKILL.md for ALL count / P-number / number-word drift. Five lines updated to canonical=20/30: - Line 74: "install 28 skills" → "install 30 skills" (/bstack bootstrap example output) - Line 248: "primitive table P1–P16" → "primitive table P1–P20" (CLAUDE.md scaffold description) - Line 295: "Reinstall all 28 skills" → "Reinstall all 30 skills" (revamp section) - Line 302: "## Stack layers (28 skills)" → "## Stack layers (30 skills)" (section header) - Line 395: "all 28 skills with install commands" → "all 30 skills..." (See also link description) P14 dep-chain lesson learned: when a prior cross-review enumerates specific line numbers for drift, don't trust the enumeration is exhaustive. Grep the file comprehensively for ALL same-class drift patterns: grep -nE "P1[3-9][^0-9]|[12][0-9] skills|thirteen|sixteen|nineteen| 13/13|16/16|19/19|13-reflex|16-reflex|19-reflex" SKILL.md This sweep would have caught all 9 drift instances on round 1. Recorded the lesson; will apply to future polish PRs. Also: replaced the TODO marker in tests/template_lockstep.test.sh from "TODO(bstack#TBD)" to a concrete follow-up branch reference: TODO (follow-up PR after this one merges): after template P7-P9 realignment branch `feat/bstack-template-realign-p7-p9-fail-mode- flip`, change this to: assert_eq "Scaffold-and-doctor: no missing reflexive trigger rules" "$REFLEX_GAPS" "0" P3 (Linear Tickets) would prefer a concrete ticket reference; for governance PRs without Linear coverage, a branch-name reference is the next-best concrete pointer. Expected re-score (per cross-review's projection): ≥8/10. Verification: - `grep -c "28 skills\|P1–P1[3-9]" SKILL.md` → 0 - `bash tests/template_lockstep.test.sh` → 14 PASS / 0 FAIL at canonical=20 (reflex assertion still warn-only with informational gap report, per the warn-mode design) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the gap where `bstack bootstrap` installed role-x as a skill (via ROSTER in #13) but left its hooks dormant in `.claude/settings.json` — agents installing bstack fresh had P17 routing files on disk but no UserPromptSubmit or coverage trigger. Every workspace had to wire those by hand. Confirmed end-to-end against this workspace's live settings. Added (assets/templates/settings.json.snippet): - UserPromptSubmit block → role-x-intake-hook.sh (P17 per-prompt routing) - SessionStart block extended → role-x-coverage-hook.sh (P17 registry- health nudge, ≤1/24h cooldown, silent when fire-rate ≥30%) - New ${BROOMVA_HOME} placeholder for $HOME-rooted paths (npx skills add -g installs at ${HOME}/.agents/skills/<name>) Extended (scripts/bootstrap.sh): - Phase 3 sed substitution: -e for both ${BROOMVA_WORKSPACE} and ${BROOMVA_HOME} → workspace dir and $HOME respectively - Phase 3 Python merge path: two-step .replace() for the same two placeholders before json.loads() - Idempotent merge logic unchanged — existing-by-basename detection already handles re-runs correctly (verified Case C: 0 new wires on a workspace where both role-x hooks were already manually wired) - Drive-by fix: log format string was "PP8"/"PP17" (double-P) because the _bstack_primitive value already contains the "P" prefix; corrected to ({primitive}) with P? fallback when key absent Extended (scripts/doctor.sh): - §6 hook-wiring array gets role-x-intake-hook.sh + role-x-coverage-hook.sh with P17 labels - Old skill-freshness-hook.sh label corrected from P7 → P8 (post-PR #16 numbering) - Docstring header §5 now enumerates all 5 hooks by primitive - Docstring header §6 adds P17 disk-presence reachability spec Validation: - JSON valid on new snippet (python3 -c 'json.load(...)') - Bash syntax green on bootstrap.sh + doctor.sh (bash -n) - Three dry-run cases on /tmp scratch: A. Fresh scaffold → snippet renders 5 hook events with absolute ${HOME} paths resolved correctly B. Existing settings.json with only P1/P2 → merge adds 4 missing hooks (P8 freshness, P17 intake, P17 coverage, P1 notification) C. Workspace with all 5 hooks already wired → 0 new wires (idempotency confirmed) - doctor.sh on live workspace: all 5 hook checks [ok] including the two new P17 entries Coordination with concurrent PRs: - PR #17 (broomva): SKILL.md prose drift + AGENTS/CLAUDE templates + scaffold-and-doctor test. Touches SKILL.md, AGENTS.md.template, CLAUDE.md.template, template_lockstep.test.sh — NO overlap with this PR (verified via gh pr diff 17). - SKILL.md ## Hooks section update deferred to a follow-up PR after #17 lands, to keep edits non-overlapping. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

broomva and others added 2 commits May 14, 2026 16:52

broomva mentioned this pull request May 14, 2026

feat(templates): refresh CLAUDE.md + AGENTS.md to P1–P19 + lockstep test #15

Closed

4 tasks

broomva mentioned this pull request May 15, 2026

feat(bootstrap): wire role-x intake + coverage hooks in template #19

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(polish): SKILL.md body drift + scaffold-and-doctor detector + P20 Linear-ticket#17

fix(polish): SKILL.md body drift + scaffold-and-doctor detector + P20 Linear-ticket#17
broomva wants to merge 3 commits into
mainfrom
feat/bstack-polish-followup

broomva commented May 14, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

broomva commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Pre-existing drift the detector caught

Test plan

Composes with #15

Follow-up PR (immediate)

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

broomva commented May 14, 2026 •

edited

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading