chore: root reorg + GEPA skill optimization#12
Merged
jonathanpeterwu merged 20 commits intomainfrom Apr 20, 2026
Merged
Conversation
This is a merge commit the virtual branches in your workspace. Due to GitButler managing multiple virtual branches, you cannot switch back and forth between git branches and virtual branches easily. If you switch to another branch, GitButler will need to be reinitialized. If you commit on this branch, GitButler will throw it away. Here are the branches that are currently applied: - sbc-branch-1 (refs/gitbutler/sbc-branch-1) branch head: 2bba656 For more information about what we're doing here, check out our docs: https://docs.gitbutler.com/features/branch-management/integration-branch
…480) - Add CrossProjectSearch engine with FTS5/BM25 ranking across N databases - Project registry (~/.stackmemory/projects.json) with CRUD + auto-discovery - Read-only SQLite connections for safety, LIKE fallback for non-FTS databases - 4 MCP tools: sm_cross_search, sm_cross_discover, sm_cross_register, sm_cross_list - CLI: `stackmemory search --all-projects "query"` for cross-project search - 17 tests: registry CRUD, multi-db FTS5 search, ranking, LIKE fallback, graceful skip
Consolidate duplicate docs, relocate wandering files, and tighten .gitignore for agent scratch dirs. - Move SPEC.md, RELEASE_NOTES.md, tomorrow.md, vision.md to docs/ (replacing stale docs/ copies with the up-to-date root versions) - Move mcp_review_config.json to config/ - Untrack .lint-fix-log.json (ephemeral lint artifact) - Delete stale .tsbuildinfo-* and .lint-errors.log - Ignore agent scratch dirs (.ralph/, .swarm/, .bjarne/, .entire/, .opencode/, .git.backup/) and local trees (archive/, site/, voyager/, plugins/) - Update README.md Vision link to docs/vision.md
Session tests mocked fs/promises but not the canonical-store module. The canonicalStateStore singleton inherited the mocked fs, causing pathExists to return true while readFile returned undefined — crashing JSON.parse. Mock the entire canonical-store module with stubs for upsertSession, appendEvent, and endSession.
Split conductor prompt-template.md into 5 phase files (system, understand, implement, validate, deliver). GEPA now auto-targets the worst-performing phase from outcomes.jsonl instead of mutating the entire template as a monolith. - Phase-aware prompt building in orchestrator with DSPy bridge - Assertion-based retry injects phase-specific error guidance - promptVersions hash map in AgentOutcomeEntry for attribution - Stop hook fires GEPA session accumulator (auto-optimize at threshold) - after-run.sh triggers GEPA + DSPy (every 50 runs) automatically - Gold sets mined from 71 outcomes across 4 phases - eval-phases.js harness validates mutations before applying - npm run gepa:eval / gepa:mine scripts
Add GEPA support for optimizing Claude Code slash command .md files: - skill-audit.js hook logs Skill tool calls to skill-audit.jsonl - 5 skill targets in config (start, stop, learn, next, summary) - skill-tasks.jsonl with 8 eval tasks for skill quality - skill-stats and run-skills CLI commands - getSkillAuditContext() feeds usage data into mutation prompts
- Add API key validation at startup (fail fast before burning budget)
- Fix callJudge() to log errors, use config timeout (120s vs 30s)
- Add ASI feedback field to judge schema (CoT + actionable suggestions)
- Persist judge feedback to results/feedback-{gen}.json
- Inject ASI feedback into mutation prompts via getRecentFeedback()
- Add extractCodeBlocks() for regex judge (focus on code, not prose)
- Add 10 new regex criterion patterns (shows_branch, concise_output, etc)
- Support custom regex from eval task definitions
- Add elitism tiebreaker (prefer baseline/incumbent on score ties)
- Add crossover operator (recombine sections from two parent variants)
- Add eval response cache (record/replay for deterministic baselines)
- Expand skill eval tasks from 8 to 30 with adversarial cases
- Add held-out eval partition (train/test split for Goodhart detection)
- Increase population 4→8, add crossoverCount=2, judge timeout 120s
- Keep cache/ in .gitignore - Remove tracked generation files (now gitignored) - Take theirs for .before-optimize.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan
npm run lintpassesnpm run test:runpassesnpm run buildpasses/nextor other skill invocationsnode scripts/gepa/optimize.js skill-statsafter a few skill calls