Skip to content

EPIC: the maestro as an interpreted skill tree — policy-as-data, self-improving orchestration #465

Description

@hadamrd

Vision (CRO)

The maestro's orchestration ('what to do') could itself be a skill written as a tree that the maestro interprets — 'one skill explains to the maestro what to do' — rather than hardcoded Python. Recursive with the skill-tree (#458): the maestro skill is just another node, so the same harvest -> inject -> curate machinery improves how the loop orchestrates, not only how it does a ticket. Self-improvement at two levels.

The boundary that makes it safe (non-negotiable)

Two layers, kept separate:

  • Mechanism (stays deterministic Python): side-effecting primitives (spawn worker, run critic, git/gh/sqlite) AND the safety gates — risk:high -> park, critic -> block merge, spend caps, merge-gate. NEVER interpretable. The loop is money-safe because these are hardcoded.
  • Policy (becomes an interpreted skill tree): the soft judgment — next-ticket selection, triage promotion order, repair-vs-new priority, frontier-complete/next-arc. This is what 'a skill explains to the maestro'.

A thin interpreter over hard primitives, driven by an editable + evolvable policy skill. forge-loop is already half-way: the brainstormer is LLM policy; the tick is deterministic mechanism. This epic pushes more policy into declarative, learned skills WITHOUT touching mechanism/gates.

Safe incremental arc (each slice falsifiable, gates untouched)

  • S1 (zero behavior risk): extract the CURRENT hardcoded tick policy into an explicit declarative maestro-policy skill (a tree/document) — the literal 'maestro as a skill' written down. No runtime change yet; the tick still executes Python. Adversarial: the policy doc round-trips to the same decisions the tick makes on a fixed fixture.
  • S2: introduce a policy-interpreter seam for ONE soft decision (next-ticket selection) — the tick consults the policy skill, with the hardcoded rule as fallback; prove parity + measure. Gates stay hard.
  • S3: make that policy skill evolvable — orchestration outcomes harvested/curated into refined policy cards (meta-learning), reusing EPIC: learned skill-tree — procedural memory the maestro retrieves and agents evolve #458's substrate.
  • Invariant for every slice: removing/garbling a policy skill degrades to the hardcoded default; a safety gate is NEVER read from a skill (enforced by a test that a tampered policy cannot bypass the merge/risk gate).

Relation to #458

Reuses the skill-card store, tags, supersession, harvest/inject/curate. Adds a 'kind' or area namespace for orchestration-policy skills vs repo-procedure skills. Built BY hand / carefully (loop's own brain; the self-loop stays stopped) to the adversarial bar — not a big-bang rewrite.

Metadata

Metadata

Assignees

No one assigned

    Labels

    epicMulti-PR umbrella tracking a major theme

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions