Skip to content

chore: root reorg + GEPA skill optimization#12

Merged
jonathanpeterwu merged 20 commits intomainfrom
chore/root-reorg
Apr 20, 2026
Merged

chore: root reorg + GEPA skill optimization#12
jonathanpeterwu merged 20 commits intomainfrom
chore/root-reorg

Conversation

@jonathanpeterwu
Copy link
Copy Markdown
Collaborator

Summary

  • Reorganize root: move SPEC.md, RELEASE_NOTES.md, vision.md, tomorrow.md into docs/
  • Clean up .gitignore and remove stale lint artifacts
  • GEPA v3: phase-level prompt optimization with auto-targeting from outcomes.jsonl
  • GEPA skill optimization: audit hook, 5 skill targets, eval harness for .md files
  • Fix session tests: mock canonicalStateStore
  • Add /learn context to CLAUDE.md maintenance section

Test plan

  • npm run lint passes
  • npm run test:run passes
  • npm run build passes
  • Verify skill-audit hook fires on /next or other skill invocations
  • Run node scripts/gepa/optimize.js skill-stats after a few skill calls

StackMemory Bot (CLI) and others added 20 commits April 8, 2026 12:25
This is a merge commit the virtual branches in your workspace.

Due to GitButler managing multiple virtual branches, you cannot switch back and
forth between git branches and virtual branches easily. 

If you switch to another branch, GitButler will need to be reinitialized.
If you commit on this branch, GitButler will throw it away.

Here are the branches that are currently applied:
 - sbc-branch-1 (refs/gitbutler/sbc-branch-1)
   branch head: 2bba656
For more information about what we're doing here, check out our docs:
https://docs.gitbutler.com/features/branch-management/integration-branch
…480)

- Add CrossProjectSearch engine with FTS5/BM25 ranking across N databases
- Project registry (~/.stackmemory/projects.json) with CRUD + auto-discovery
- Read-only SQLite connections for safety, LIKE fallback for non-FTS databases
- 4 MCP tools: sm_cross_search, sm_cross_discover, sm_cross_register, sm_cross_list
- CLI: `stackmemory search --all-projects "query"` for cross-project search
- 17 tests: registry CRUD, multi-db FTS5 search, ranking, LIKE fallback, graceful skip
Consolidate duplicate docs, relocate wandering files, and tighten
.gitignore for agent scratch dirs.

- Move SPEC.md, RELEASE_NOTES.md, tomorrow.md, vision.md to docs/
  (replacing stale docs/ copies with the up-to-date root versions)
- Move mcp_review_config.json to config/
- Untrack .lint-fix-log.json (ephemeral lint artifact)
- Delete stale .tsbuildinfo-* and .lint-errors.log
- Ignore agent scratch dirs (.ralph/, .swarm/, .bjarne/, .entire/,
  .opencode/, .git.backup/) and local trees (archive/, site/,
  voyager/, plugins/)
- Update README.md Vision link to docs/vision.md
Session tests mocked fs/promises but not the canonical-store module.
The canonicalStateStore singleton inherited the mocked fs, causing
pathExists to return true while readFile returned undefined — crashing
JSON.parse. Mock the entire canonical-store module with stubs for
upsertSession, appendEvent, and endSession.
Split conductor prompt-template.md into 5 phase files (system,
understand, implement, validate, deliver). GEPA now auto-targets
the worst-performing phase from outcomes.jsonl instead of mutating
the entire template as a monolith.

- Phase-aware prompt building in orchestrator with DSPy bridge
- Assertion-based retry injects phase-specific error guidance
- promptVersions hash map in AgentOutcomeEntry for attribution
- Stop hook fires GEPA session accumulator (auto-optimize at threshold)
- after-run.sh triggers GEPA + DSPy (every 50 runs) automatically
- Gold sets mined from 71 outcomes across 4 phases
- eval-phases.js harness validates mutations before applying
- npm run gepa:eval / gepa:mine scripts
Add GEPA support for optimizing Claude Code slash command .md files:
- skill-audit.js hook logs Skill tool calls to skill-audit.jsonl
- 5 skill targets in config (start, stop, learn, next, summary)
- skill-tasks.jsonl with 8 eval tasks for skill quality
- skill-stats and run-skills CLI commands
- getSkillAuditContext() feeds usage data into mutation prompts
- Add API key validation at startup (fail fast before burning budget)
- Fix callJudge() to log errors, use config timeout (120s vs 30s)
- Add ASI feedback field to judge schema (CoT + actionable suggestions)
- Persist judge feedback to results/feedback-{gen}.json
- Inject ASI feedback into mutation prompts via getRecentFeedback()
- Add extractCodeBlocks() for regex judge (focus on code, not prose)
- Add 10 new regex criterion patterns (shows_branch, concise_output, etc)
- Support custom regex from eval task definitions
- Add elitism tiebreaker (prefer baseline/incumbent on score ties)
- Add crossover operator (recombine sections from two parent variants)
- Add eval response cache (record/replay for deterministic baselines)
- Expand skill eval tasks from 8 to 30 with adversarial cases
- Add held-out eval partition (train/test split for Goodhart detection)
- Increase population 4→8, add crossoverCount=2, judge timeout 120s
- Keep cache/ in .gitignore
- Remove tracked generation files (now gitignored)
- Take theirs for .before-optimize.md
@jonathanpeterwu jonathanpeterwu merged commit b9cedca into main Apr 20, 2026
4 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants