CASCADE

Constraint-Adapted Subtask Cascade for Agentic Decomposition & Execution

Get frontier-quality work from small, local language models — by growing a
knowledge graph with a person in the loop until every step is small enough to do reliably.

A local command-line tool that decomposes hard, fuzzy tasks into tiny, verified steps a small model can actually finish — with you steering through an Obsidian vault.

Status — in active development. The engine, the knowledge-first two-stage pipeline, and the Obsidian human-in-the-loop layer are built and green on the model-free path (the whole engine is testable with no model in the loop). What is still being hardened is end-to-end convergence with a real small model — keeping a small model on-rails across a long live run is the open hard part, and that path is opt-in today.

Works today: build from source, then cascade init, cascade solve, cascade vault, and cascade mcp (serve CASCADE to an external agent over the Model Context Protocol). Coming: a larger-model elevation path, and packaged release binaries for macOS / Windows / Linux.

What is CASCADE?

CASCADE gets real work out of small language models running on your own machine (via Ollama) instead of a large, metered frontier API — by shrinking every step until it is small enough to do reliably, and keeping a person in the loop to supply the judgment a small model can't.

The durable bet: a small model fails when the decision entropy of a single step exceeds its working-memory headroom — not because the overall task is big. So the fix is mechanical: shrink each step until the decisions it forces are within reach (often down to "write one function with one test"), and it becomes reliable.

The pivot that defines CASCADE today: a small model cannot enumerate its own ignorance — it doesn't know what it doesn't know. So design, decisions, and the unknown-unknowns are offloaded to a person (and, later, to larger models), and the model elevates when it gets stuck — it asks for help loudly instead of thrashing or faking progress. Human-in-the-loop is the standard operating mode here, not an escape hatch for failures.

One engine, two modes: "split this" and "do this" are the same constrained, schema-validated, low-entropy call — only the node's type and the schema its output is checked against differ. The bet within the bet: producing a valid decomposition is far lower-entropy than producing a good plan. The grammar removes the freedom small models lack.

How decomposition works: two stages, two dimensions

CASCADE does not build one big plan upfront. It works in two stages across two linked graphs.

Stage 1 — Interrogate to a knowledge graph. Decomposition begins by asking, not planning: ask a question → answer it → each answer raises sub-questions → recurse → exhaust a branch → back up. Research questions are answered by grounding (fetching and probing real docs/APIs); capability questions ("can this machine do X?") by a tool registry. Unknown-unknowns surface naturally as answers raise new questions. The result is a durable knowledge.json.

Stage 2 — Plan from knowledge, then execute. The task DAG is projected out of the completed knowledge graph — not authored in advance. Tasks then execute bottom-up, and the leaves are agentic: they write files, run commands, and call cached tools. The facts gathered in Stage 1 flow down into each task as its grounding context.

Two cross-linked dimensions. There are two graphs — a knowledge dimension (questions and facts) and an execution dimension (tasks) — cross-indexed by unique ID in both directions. A task knows the facts that ground it; a fact knows the tasks it informs. A task's context is its deepest linked knowledge node's path-to-root: the specific fact plus the chain that situates it, and no more.

Two principles keep deep decomposition safe:

Verify before you descend — every split is checked (schema, then meaning) before any child is spawned, which resets per-step reliability at each level instead of letting it decay. Verify before you ascend — pieces that don't fit are caught by a separate reviewer as they recombine on the way back up.

And because structure is durable while conversation is disposable, the whole graph lives as JSON on disk (knowledge.json, tree.json). No model context is load-bearing — a run survives crashes, can be put down and picked up later, and can be rendered by any external tool. That last property is exactly what makes the Obsidian round-trip possible.

The person in the loop, via Obsidian

The engine walks JSON; you never have to edit JSON. The same graph is projected into an Obsidian vault as linked Markdown notes, because brainstorming, linking, and tagging are native there.

Round-trip engines. A deterministic, idempotent render (JSON → Markdown) and an ingest (Markdown → JSON). Your edits merge in: hand-authored content wins on the fields you own, and the engine-computed fields are never clobbered. cascade vault ingest --watch folds your edits into the graph live (one-way, Markdown → JSON) until you stop it; re-run render to refresh the notes.
Graph semantics in the vault. [[links]] are hard directional edges (parent / raises / depends / knowledge); #tags are soft topical labels; kind, status, and atomicity live in frontmatter. Reserved tags like #needs-human and #atomic are a comfort layer that the engine normalizes back into structured fields.
Elevate-when-stuck → a Resolver. When the model dead-ends, blocks, or exhausts its budget, it raises #needs-human in the vault instead of thrashing. A Resolver is whoever resolves it — a person, a larger model, or an external service — behind one common seam. Two calls are explicitly the person's, not the model's: is this atomic enough to just execute? and do we keep decomposing this knowledge, or stop here and start executing?
Iterative and online. Neither you nor the model solves it in one shot. You can jump in mid-run — add a fact, retag, relink — and the model picks up from the changed graph. Growing the graph live, together, is the point.

How a run goes

Start from a concept — a Markdown note or a one-line ask becomes the seed.
Interrogate — the model asks, grounds its answers against real sources, and recurses, growing the knowledge graph.
Elevate when stuck — #needs-human surfaces in Obsidian; a person answers, links, tags, or says "that's enough — start executing."
Plan — the task DAG is projected from the knowledge graph.
Execute — bottom-up, agentic leaves do the work; every result is reviewed by a separate step before it is accepted.
Learn back (in progress) — facts discovered while executing flow back into the knowledge graph, so later tasks and reruns start smarter.

The person is the third gate, alongside verify before you descend and verify before you ascend — for the calls neither automated check can make.

Quick start

Requirements

Ollama installed and running, plus a small local model (cascade init pulls one for you).
Go 1.26+ to build from source. macOS, Windows, or Linux.
Obsidian recommended for the human-in-the-loop loop — optional; the CLI review gates work without it.

Install

Install the cascade binary straight from GitHub (lands in $(go env GOPATH)/bin — make sure that's on your PATH):

go install github.com/robinonsay/cascade/cmd/cascade@latest

Or build from source

git clone https://github.com/robinonsay/cascade
cd cascade
go build ./cmd/cascade

Packaged release binaries are coming. Until then, go install or build from source.

Commands

cascade init                         # detect Ollama, report installed models, create the cache + config
cascade solve                        # the nominal run: interrogate → knowledge graph → review → decompose + execute
cascade vault render --base <dir> --vault <dir>          # graph → Obsidian markdown notes
cascade vault ingest --watch --base <dir> --vault <dir>  # live one-way: your markdown edits → graph

cascade solve takes flags to point it at your own task (--root <file.md>), bound the interrogation (--max-questions <n>), tune the review gate (--review-depth <n>), turn on grounding sources (--project-root <dir>, --web-search), control the live stream (--quiet/--verbose), and auto-approve (--yes). Every flag and why it exists is in the command reference.
The human-in-the-loop loop is meant to be attended and file-mediated through the vault — not run headless with --yes. --yes is for demos and CI.

New here? Read the User Guide. It walks you from install → first run → steering in Obsidian → reading the result, with a full weather-app tutorial.

Drive it from an agent (MCP + a Claude Code skill)

A capable agent can hand a big, decomposable task to CASCADE instead of burning its own context on it — CASCADE decomposes and executes on the small local model while the agent spends a handful of tool calls and plays the Resolver (the role a person plays at the vault). Register the stdio MCP server with any MCP client:

{ "mcpServers": { "cascade": { "command": "cascade", "args": ["mcp"] } } }

It exposes four tools — solve, advance (the agent owns the knowledge gate: approve the graph or keep deepening it), resolve (answer a stuck node), and inspect. See the MCP server reference for the full contract.

Using Claude Code? Drop the bundled using-cascade skill into your project so Claude knows how to drive a run (grow the knowledge graph, check it for accuracy, give detailed resolutions). From your own repo:

mkdir -p .claude/skills/using-cascade
curl -fsSL https://raw.githubusercontent.com/robinonsay/cascade/main/.claude/skills/using-cascade/SKILL.md \
  -o .claude/skills/using-cascade/SKILL.md

Then ask Claude to "use cascade to solve …" and it will follow the operator playbook. The skill is a single self-contained file — copy it anywhere .claude/skills/ is read.

Why use it

Local & private. Your task, code, and data never leave your machine; the models run on Ollama.
Cheap. Small local models instead of per-token frontier API calls — cost scales with your hardware, not your token count.
Reliable on hard work. Verification gates at every level keep quality from decaying as a task is broken down.
A collaboration, not an autopilot. You supply judgment, decisions, and the unknown-unknowns; the small model does the narrow, well-specified work. You stay in the driver's seat.
Gets better over time. Tools it builds are cached per-machine and reused, so common work (fetch a URL, parse some JSON, verify a result) isn't rewritten from scratch.

It's for someone who wants real output from local models on tasks too fuzzy for one-shot prompting — and who is willing to steer, brainstorming and correcting in Obsidian, rather than expecting full autonomy.

Roadmap

The engine was built along a phased axis — Phase 0 skeleton (no model) → 1 local execution via Ollama → 2 runnable verification of every result → 3 escalation + parallel execution → 4 templates + tool cache → 5 reliability study. Two later layers cut across those phases and are what the system is today:

Knowledge-first interrogation — the two-stage rearchitecture: interrogate to a knowledge graph, then plan and execute from it.
Obsidian-mediated co-development — the human-in-the-loop surface and the JSON⇔Markdown round-trip engines.

A third layer is now delivered:

MCP delivery surface — cascade mcp serves the engine to an external agent, which drives the run and owns the knowledge gate (the agent is to MCP what a person is to the vault). See the MCP server reference.

Near-future:

Larger-model elevation — the non-person Resolver path: hand a stuck node to a bigger model.
Packaged release binaries + onboarding — cross-platform distribution.
A co-evolving graph — execution teaches the knowledge graph, and the two are traversed asymmetrically (knowledge breadth-first, tasks depth-first).

The headline demo — building a working weather app from a one-line request — is the milestone the whole system is measured against.

Learn more

Using CASCADE — the User Guide:

Installation — dependencies, build, cascade init.
Quickstart — your first run in five minutes.
Command reference — every command and flag, and why it exists.
The workflow — how a run goes, stage by stage.
Steering in Obsidian — the vault round-trip, giving feedback, elevation.
Watching & results — the live stream and reading the output.
Tutorial: build a weather app — end to end.
Working on a project over time — many tasks, one growing graph.

The design — the specs:

docs/cascade-spec.md — the architecture (the what).
docs/cascade-spec-v0.2-additions.md — agentic execution, the tool cache, MCP, and the two-phase control flow.

For contributors: cmd/cascade is the CLI; internal/* holds the components (engine, knowledge, interrogation, plan, vault, elevation, verifier, assembler, adapter, store, …); docs/ holds the specs. The on-disk graph is plain JSON, renderable by any external tool.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
.claude		.claude
.github/workflows		.github/workflows
cascade-vault/cascade		cascade-vault/cascade
cmd/cascade		cmd/cascade
docs		docs
examples		examples
internal		internal
plan/artifacts		plan/artifacts
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cascade.svg		Cascade.svg
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
VERSION		VERSION
cascade_panel1_two_dimensions.svg		cascade_panel1_two_dimensions.svg
cascade_panel2_projection.svg		cascade_panel2_projection.svg
cascade_panel3_grounding_context.svg		cascade_panel3_grounding_context.svg
d04cadc8-aa65-41cd-9fb0-34e28fae5121.png		d04cadc8-aa65-41cd-9fb0-34e28fae5121.png
go.mod		go.mod
go.sum		go.sum
image.png		image.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CASCADE

What is CASCADE?

How decomposition works: two stages, two dimensions

The person in the loop, via Obsidian

How a run goes

Quick start

Drive it from an agent (MCP + a Claude Code skill)

Why use it

Roadmap

Learn more

License

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CASCADE

What is CASCADE?

How decomposition works: two stages, two dimensions

The person in the loop, via Obsidian

How a run goes

Quick start

Drive it from an agent (MCP + a Claude Code skill)

Why use it

Roadmap

Learn more

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages