Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions docs/design/81-skill-tree.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Skill-Tree — learned procedural memory the maestro grows and retrieves

Status: in implementation (branch `feat/skill-tree`). Epic: hadamrd/forge-loop#458.

## Problem

Every worker starts cold: it re-derives repo conventions, file maps, and test
recipes on every ticket — burning tokens, making inconsistent choices, and
repeating known mistakes. The biggest faster+cheaper+more-accurate lever is to
**retrieve learned procedure instead of re-deriving it**.

## What already exists (substrate — reuse, do not duplicate)

- `memory/store.py` `SqliteMemoryStore` at `.forge/memory.db` — `put` / `get` /
`list_active(kind=)` / `supersede`.
- `memory/models.py` `MemoryItem` (kind/title/body/provenance/tags/superseded_by),
`MemoryKind.PROCEDURAL`, `derive_skill_key(failing_signal, target)`,
`skill_tag` / `skill_from_tags`, tag-prefix pattern (`axis:`, `skill:`).
- `runner/learning.py` `record_procedural_skill(...)` — writes a procedural card
and supersedes the prior card of the same skill-key (lineage preserved).
- Episodic memory is already harvested on merge (`record_merged_outcomes` via
`runner/tick.py::_record_merged_memory`) and injected into briefs
(`worker_brief.py::_render_prior_episodes`).

**Gap:** procedural skills are never harvested, never retrieved/injected, and
have no tree structure.

## Design

### Tree structure (leaves + internal nodes)

Each card carries an **area path** as a tag: `area:<path>` (e.g.
`area:pulsar-node/http-route`). The `/`-delimited path IS the tree.

- **Leaf** = a concrete procedure card (has both a `skill:<sig>` tag and an
`area:<path>` tag). Body schema (plain text sections): `trigger`, `procedure`
(file map + template + test recipe), `pitfalls`.
- **Internal node** = a generalized card for an area subtree (has an
`area-node` marker tag + `area:<path>`), distilled from ≥N descendant leaves.

New helpers in `memory/models.py` (mirror `axis_tag`): `AREA_TAG_PREFIX`,
`area_tag(path)`, `area_from_tags(tags)`, `AREA_NODE_TAG`.

### Lifecycle

1. **Harvest** (`skill_librarian.py` + `runner/learning.py`): after a
critic-clean merge, a one-shot LLM "librarian" distills the merged diff +
acceptance criteria into a `SkillCard(area, trigger, procedure, pitfalls,
failing_signal, target, confidence)`. `harvest_skills_from_merge(...)` is pure
and injectable (the LLM call is a passed-in callable) so the wiring is unit
tested without network. It calls `record_procedural_skill` with the `area:`
tag and provenance carrying the merged SHA in `evidence_refs`. Emits
`skill_harvested`.
2. **Retrieve + inject** (`memory/skills.py` + `worker_brief.py`):
`retrieve_skills_for(store, query, *, k)` ranks active procedural cards by
area/title/tag token overlap with the ticket, walking most-specific-leaf →
ancestor-node, capped at `k` (token budget). `render_skill_section(...)`
formats them; injected into `make_brief` and `make_repair_brief`. Emits
`skill_injected` (rank) per card — enables measuring the turns/tokens win.
v1 retrieval is deterministic keyword/area ranking; a clean seam is left for
semantic (Lumen) retrieval as a fast-follow (NOT in this PR).
3. **Curate/evolve** (`memory/skills.py`): `expire_stale_skills(...)` supersedes
cards whose proof-SHA is absent from the current history / older than a
horizon; `promote_internal_nodes(store, *, min_leaves)` distills an
internal-node card once an area has ≥`min_leaves` leaves. Emits
`skill_expired` / `skill_promoted`.

### Traps engineered around

- **Stale > empty:** every card carries provenance SHA + confidence; expiry is
mandatory. Adversarial test: an expired card is not retrieved.
- **Retrieval is the hard part:** ranking is the real work; honest v1 is
keyword/area, semantic is a seam.
- **Contradiction sprawl:** one librarian writes; supersession (existing)
preserves lineage; no free-for-all.

## Events (new, `events.py`)

`SkillHarvestedEvent`, `SkillInjectedEvent`, `SkillPromotedEvent`,
`SkillExpiredEvent`.

## Testing

Pure modules (models helpers, ranking, expiry, promotion, harvest-wiring with an
injected fake librarian) are fully unit-tested against a real
`SqliteMemoryStore` (`tmp_path`). Each behavior has an adversarial test: remove
the harvest call / the inject / the expiry filter and a test goes red. The
LLM-dependent distiller is isolated behind a callable seam and tested with a fake.

## Out of scope (this PR)

Semantic/Lumen retrieval (seam only); a UI surface; cross-repo skill sharing.
47 changes: 47 additions & 0 deletions src/forge_loop/events.py
Original file line number Diff line number Diff line change
Expand Up @@ -445,6 +445,53 @@ class WorkerPolicyEnforcedEvent(EventBase):
withheld_secrets: list[str] = Field(default_factory=list)


@register_event
class SkillHarvestedEvent(EventBase):
"""A procedural skill card was distilled from a critic-clean merge."""

KIND: ClassVar[str] = "skill_harvested"
issue: int = Field(ge=1)
area: str = ""
skill_key: str = ""
memory_id: str = ""
sha: str = ""
confidence: float = Field(ge=0.0, le=1.0, default=1.0)


@register_event
class SkillInjectedEvent(EventBase):
"""A skill card was retrieved and injected into a worker brief.

``rank`` is its position in the retrieval ranking (0 = best match) so the
measurement track can correlate which cards actually drove a cheaper run.
"""

KIND: ClassVar[str] = "skill_injected"
issue: int = Field(ge=1)
memory_id: str = ""
area: str = ""
rank: int = Field(ge=0, default=0)


@register_event
class SkillPromotedEvent(EventBase):
"""Leaves under an area were generalised into an internal-node card."""

KIND: ClassVar[str] = "skill_promoted"
area: str = ""
memory_id: str = ""
leaf_count: int = Field(ge=1, default=1)


@register_event
class SkillExpiredEvent(EventBase):
"""A stale skill card was superseded/retired (its proof SHA aged out)."""

KIND: ClassVar[str] = "skill_expired"
memory_id: str = ""
reason: str = ""


# ---------------------------------------------------------------------------
# Emit + back-compat shim. ``emit`` is the typed path; ``append_event_with_
# registry_check`` is the back-compat wrapper called by state.append_event.
Expand Down
44 changes: 44 additions & 0 deletions src/forge_loop/memory/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,50 @@ def skill_from_tags(tags: tuple[str, ...]) -> str:
return ""


#: Tag prefix carrying a procedural skill's *area path* — its address in the
#: skill tree (e.g. ``area:pulsar-node/http-route``). The ``/``-delimited path
#: IS the tree: a leaf sits at a full path; an internal node summarises a
#: subtree prefix. Mirrors ``AXIS_TAG_PREFIX`` / ``SKILL_TAG_PREFIX``.
AREA_TAG_PREFIX = "area:"

#: Marker tag distinguishing an *internal node* card (a generalised pattern
#: distilled over an area subtree) from a *leaf* card (a concrete procedure).
AREA_NODE_TAG = "area-node"

#: Marker tag retiring a skill card whose proof commit has aged out of history
#: (rebased/squashed away). An expired card stays in the store for provenance
#: but is filtered out of retrieval — a confidently-stale recipe is worse than
#: none. Distinct from supersession, which means "replaced by a better version".
EXPIRED_TAG = "expired"


def area_tag(path: str) -> str:
"""Render an ``area:<path>`` tag for ``path`` (stripped)."""
return f"{AREA_TAG_PREFIX}{path.strip()}"


def area_from_tags(tags: tuple[str, ...]) -> str:
"""Extract the area path from an ``area:<path>`` tag, or ``""`` if absent."""
for tag in tags:
if tag.startswith(AREA_TAG_PREFIX):
return tag[len(AREA_TAG_PREFIX) :]
return ""


def area_ancestors(path: str) -> tuple[str, ...]:
"""Return the area path and its ancestors, most-specific first.

``"a/b/c"`` → ``("a/b/c", "a/b", "a")`` — the retrieval walk order: the most
specific leaf area first, then its ancestor internal nodes. A blank path
yields ``()``. Leading/trailing slashes are ignored.
"""
cleaned = path.strip().strip("/")
if not cleaned:
return ()
segments = cleaned.split("/")
return tuple("/".join(segments[:i]) for i in range(len(segments), 0, -1))


def derive_memory_id(source_key: str, *, prefix: str) -> str:
"""Derive a stable ``memory_id`` from a source key.

Expand Down
194 changes: 194 additions & 0 deletions src/forge_loop/memory/skills.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
"""Retrieve and render learned procedural skills (the skill tree) for a brief.

Retrieval is deliberately deterministic for v1: a card is ranked by token
overlap between the ticket text and the card's *area path* (weighted heavily,
since the area IS the tree address) and its title/body. A clean seam is left for
a future semantic (embedding) ranker — :func:`retrieve_skills_for` is the single
entry point a Lumen-backed implementation would replace, with the same shape.

Cards that overlap nothing with the query are NOT returned: injecting an
irrelevant or stale recipe is worse than injecting none.
"""

from __future__ import annotations

import re
from collections import defaultdict
from collections.abc import Callable
from dataclasses import dataclass, replace

from forge_loop.memory.models import (
AREA_NODE_TAG,
EXPIRED_TAG,
MemoryItem,
MemoryKind,
area_from_tags,
area_tag,
)
from forge_loop.memory.store import MemoryStore

__all__ = [
"PromotedNode",
"expire_stale_skills",
"promote_internal_nodes",
"render_skill_section",
"retrieve_skills_for",
]

_COMMIT_PREFIX = "commit:"

#: An area-path token match counts for more than an incidental body-word match:
#: the tree address is the strongest relevance signal.
_AREA_WEIGHT = 3
_MIN_TOKEN_LEN = 3
_TOKEN_RE = re.compile(r"[a-z0-9]+")


def _tokens(text: str) -> set[str]:
return {tok for tok in _TOKEN_RE.findall(text.lower()) if len(tok) >= _MIN_TOKEN_LEN}


def _area_tokens(area: str) -> set[str]:
# Split the path on "/" and "-" so "pulsar-node/http-route" contributes
# {pulsar, node, http, route}.
return _tokens(area.replace("/", " ").replace("-", " "))


def _score(item: MemoryItem, query_tokens: set[str]) -> int:
area = area_from_tags(item.tags)
area_score = _AREA_WEIGHT * len(query_tokens & _area_tokens(area))
text_score = len(query_tokens & _tokens(f"{item.title} {item.body}"))
return area_score + text_score


def retrieve_skills_for(store: MemoryStore, query: str, *, k: int = 3) -> tuple[MemoryItem, ...]:
"""Return up to ``k`` active procedural skill cards most relevant to ``query``.

Ranks active ``PROCEDURAL`` cards by area-weighted token overlap; cards with
zero overlap are excluded. Ties break by confidence (desc) then insertion
order (stable). An empty/blank query returns ``()``.
"""
query_tokens = _tokens(query)
if not query_tokens:
return ()
scored = [
(score, item)
for item in store.list_active(kind=MemoryKind.PROCEDURAL)
if EXPIRED_TAG not in item.tags and (score := _score(item, query_tokens)) > 0
]
# Stable sort: equal (score, confidence) keeps list_active's insertion order.
scored.sort(key=lambda si: (si[0], si[1].provenance.confidence), reverse=True)
return tuple(item for _score_value, item in scored[:k])


def render_skill_section(items: tuple[MemoryItem, ...]) -> str:
"""Render retrieved skill cards as a brief section, or ``""`` if none."""
if not items:
return ""
lines = [
"## Learned skills for this repo (retrieved from past merges)",
(
"Distilled from prior successful, reviewed changes. Apply when "
"relevant, but verify against current code — they can go stale."
),
"",
]
for item in items:
area = area_from_tags(item.tags) or "general"
provenance = ", ".join(item.provenance.evidence_refs) or "—"
lines.append(f"### {area} (confidence {item.provenance.confidence:.2f})")
lines.append(item.body)
lines.append(f"_provenance: {provenance}_")
lines.append("")
return "\n".join(lines).rstrip() + "\n"


def _commit_shas(item: MemoryItem) -> tuple[str, ...]:
return tuple(
ref[len(_COMMIT_PREFIX) :]
for ref in item.provenance.evidence_refs
if ref.startswith(_COMMIT_PREFIX)
)


def expire_stale_skills(store: MemoryStore, *, live_shas: set[str]) -> tuple[str, ...]:
"""Retire procedural cards whose proof commit is no longer in history.

A card carries its proof as ``commit:<sha>`` evidence. When none of a card's
proof commits are in ``live_shas`` (the set of commits currently reachable in
the repo), the change it was distilled from has been rebased/squashed away —
the recipe may no longer match the code, so the card is tagged
:data:`~forge_loop.memory.models.EXPIRED_TAG` and dropped from retrieval. A
card with no commit provenance cannot be judged and is left untouched.
Returns the ``memory_id`` of each card expired this pass.
"""
expired: list[str] = []
for item in store.list_active(kind=MemoryKind.PROCEDURAL):
if EXPIRED_TAG in item.tags:
continue
shas = _commit_shas(item)
if not shas:
continue
if any(sha in live_shas for sha in shas):
continue
store.put(replace(item, tags=(*item.tags, EXPIRED_TAG)))
expired.append(item.memory_id)
return tuple(expired)


@dataclass(frozen=True)
class PromotedNode:
"""Descriptor of one internal node promoted from sibling leaves."""

memory_id: str
area: str
leaf_count: int


def promote_internal_nodes(
store: MemoryStore,
*,
min_leaves: int,
summarize: Callable[[tuple[MemoryItem, ...]], str],
) -> tuple[PromotedNode, ...]:
"""Generalise sibling leaves into an internal-node card per parent area.

Groups active *leaf* cards (a leaf has an ``area:`` tag and is neither an
:data:`~forge_loop.memory.models.AREA_NODE_TAG` node nor expired) by their
parent area path. When a parent gathers at least ``min_leaves`` leaves, the
injected ``summarize`` distils their shared pattern into the body of an
internal-node card filed at the parent path (tagged ``AREA_NODE_TAG``).
Internal nodes never count as leaves, so promotion does not cascade upward.
Idempotent: re-running upserts the node in place (stable per parent).
"""
from forge_loop.runner.learning import record_procedural_skill

groups: dict[str, list[MemoryItem]] = defaultdict(list)
for item in store.list_active(kind=MemoryKind.PROCEDURAL):
if AREA_NODE_TAG in item.tags or EXPIRED_TAG in item.tags:
continue
area = area_from_tags(item.tags)
if not area:
continue
parent = "/".join(area.split("/")[:-1])
if parent:
groups[parent].append(item)

promoted: list[PromotedNode] = []
for parent, leaves in groups.items():
if len(leaves) < min_leaves:
continue
body = summarize(tuple(leaves))
memory_id = record_procedural_skill(
store,
failing_signal="",
target=parent,
title=f"{parent} (pattern)",
body=body,
source_key=f"node:{parent}",
source_task_ref=f"area:{parent}",
extra_tags=(area_tag(parent), AREA_NODE_TAG),
confidence=0.7,
)
promoted.append(PromotedNode(memory_id, parent, len(leaves)))
return tuple(promoted)
Loading
Loading