Skip to content

robinonsay/cascade

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

158 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CASCADE logo

CASCADE

Constraint-Adapted Subtask Cascade for Agentic Decomposition & Execution

Get frontier-quality work from small, local language models — by growing a
knowledge graph with a person in the loop until every step is small enough to do reliably.

A local command-line tool that decomposes hard, fuzzy tasks into tiny, verified steps a small model can actually finish — with you steering through an Obsidian vault.


Status — in active development. The engine, the knowledge-first two-stage pipeline, and the Obsidian human-in-the-loop layer are built and green on the model-free path (the whole engine is testable with no model in the loop). What is still being hardened is end-to-end convergence with a real small model — keeping a small model on-rails across a long live run is the open hard part, and that path is opt-in today.

Works today: build from source, then cascade init, cascade solve, cascade vault, and cascade mcp (serve CASCADE to an external agent over the Model Context Protocol). Coming: a larger-model elevation path, and packaged release binaries for macOS / Windows / Linux.

What is CASCADE?

CASCADE solving the weather-app demo: a knowledge graph growing in an Obsidian vault while the engine interrogates and executes CASCADE gets real work out of small language models running on your own machine (via Ollama) instead of a large, metered frontier API — by shrinking every step until it is small enough to do reliably, and keeping a person in the loop to supply the judgment a small model can't.

The durable bet: a small model fails when the decision entropy of a single step exceeds its working-memory headroom — not because the overall task is big. So the fix is mechanical: shrink each step until the decisions it forces are within reach (often down to "write one function with one test"), and it becomes reliable.

The pivot that defines CASCADE today: a small model cannot enumerate its own ignorance — it doesn't know what it doesn't know. So design, decisions, and the unknown-unknowns are offloaded to a person (and, later, to larger models), and the model elevates when it gets stuck — it asks for help loudly instead of thrashing or faking progress. Human-in-the-loop is the standard operating mode here, not an escape hatch for failures.

One engine, two modes: "split this" and "do this" are the same constrained, schema-validated, low-entropy call — only the node's type and the schema its output is checked against differ. The bet within the bet: producing a valid decomposition is far lower-entropy than producing a good plan. The grammar removes the freedom small models lack.

How decomposition works: two stages, two dimensions

CASCADE does not build one big plan upfront. It works in two stages across two linked graphs.

Stage 1 — Interrogate to a knowledge graph. Decomposition begins by asking, not planning: ask a question → answer it → each answer raises sub-questions → recurse → exhaust a branch → back up. Research questions are answered by grounding (fetching and probing real docs/APIs); capability questions ("can this machine do X?") by a tool registry. Unknown-unknowns surface naturally as answers raise new questions. The result is a durable knowledge.json.

The two linked dimensions: a knowledge graph of questions and facts cross-indexed to an execution graph of tasks

Stage 2 — Plan from knowledge, then execute. The task DAG is projected out of the completed knowledge graph — not authored in advance. Tasks then execute bottom-up, and the leaves are agentic: they write files, run commands, and call cached tools. The facts gathered in Stage 1 flow down into each task as its grounding context.

The task DAG projected out of the completed knowledge graph, then executed bottom-up

Two cross-linked dimensions. There are two graphs — a knowledge dimension (questions and facts) and an execution dimension (tasks) — cross-indexed by unique ID in both directions. A task knows the facts that ground it; a fact knows the tasks it informs. A task's context is its deepest linked knowledge node's path-to-root: the specific fact plus the chain that situates it, and no more.

A task's grounding context: its deepest linked knowledge node's path-to-root

Two principles keep deep decomposition safe:

Verify before you descend — every split is checked (schema, then meaning) before any child is spawned, which resets per-step reliability at each level instead of letting it decay. Verify before you ascend — pieces that don't fit are caught by a separate reviewer as they recombine on the way back up.

And because structure is durable while conversation is disposable, the whole graph lives as JSON on disk (knowledge.json, tree.json). No model context is load-bearing — a run survives crashes, can be put down and picked up later, and can be rendered by any external tool. That last property is exactly what makes the Obsidian round-trip possible.

The person in the loop, via Obsidian

knowledge graph of weather app example The engine walks JSON; you never have to edit JSON. The same graph is projected into an Obsidian vault as linked Markdown notes, because brainstorming, linking, and tagging are native there.

  • Round-trip engines. A deterministic, idempotent render (JSON → Markdown) and an ingest (Markdown → JSON). Your edits merge in: hand-authored content wins on the fields you own, and the engine-computed fields are never clobbered. cascade vault ingest --watch folds your edits into the graph live (one-way, Markdown → JSON) until you stop it; re-run render to refresh the notes.
  • Graph semantics in the vault. [[links]] are hard directional edges (parent / raises / depends / knowledge); #tags are soft topical labels; kind, status, and atomicity live in frontmatter. Reserved tags like #needs-human and #atomic are a comfort layer that the engine normalizes back into structured fields.
  • Elevate-when-stuck → a Resolver. When the model dead-ends, blocks, or exhausts its budget, it raises #needs-human in the vault instead of thrashing. A Resolver is whoever resolves it — a person, a larger model, or an external service — behind one common seam. Two calls are explicitly the person's, not the model's: is this atomic enough to just execute? and do we keep decomposing this knowledge, or stop here and start executing?
  • Iterative and online. Neither you nor the model solves it in one shot. You can jump in mid-run — add a fact, retag, relink — and the model picks up from the changed graph. Growing the graph live, together, is the point.

How a run goes

  1. Start from a concept — a Markdown note or a one-line ask becomes the seed.
  2. Interrogate — the model asks, grounds its answers against real sources, and recurses, growing the knowledge graph.
  3. Elevate when stuck#needs-human surfaces in Obsidian; a person answers, links, tags, or says "that's enough — start executing."
  4. Plan — the task DAG is projected from the knowledge graph.
  5. Execute — bottom-up, agentic leaves do the work; every result is reviewed by a separate step before it is accepted.
  6. Learn back (in progress) — facts discovered while executing flow back into the knowledge graph, so later tasks and reruns start smarter.

The person is the third gate, alongside verify before you descend and verify before you ascend — for the calls neither automated check can make.

Quick start

Requirements

  • Ollama installed and running, plus a small local model (cascade init pulls one for you).
  • Go 1.26+ to build from source. macOS, Windows, or Linux.
  • Obsidian recommended for the human-in-the-loop loop — optional; the CLI review gates work without it.

Install

Install the cascade binary straight from GitHub (lands in $(go env GOPATH)/bin — make sure that's on your PATH):

go install github.com/robinonsay/cascade/cmd/cascade@latest

Or build from source

git clone https://github.com/robinonsay/cascade
cd cascade
go build ./cmd/cascade

Packaged release binaries are coming. Until then, go install or build from source.

Commands

cascade init                         # detect Ollama, report installed models, create the cache + config
cascade solve                        # the nominal run: interrogate → knowledge graph → review → decompose + execute
cascade vault render --base <dir> --vault <dir>          # graph → Obsidian markdown notes
cascade vault ingest --watch --base <dir> --vault <dir>  # live one-way: your markdown edits → graph
  • cascade solve takes flags to point it at your own task (--root <file.md>), bound the interrogation (--max-questions <n>), tune the review gate (--review-depth <n>), turn on grounding sources (--project-root <dir>, --web-search), control the live stream (--quiet/--verbose), and auto-approve (--yes). Every flag and why it exists is in the command reference.
  • The human-in-the-loop loop is meant to be attended and file-mediated through the vault — not run headless with --yes. --yes is for demos and CI.

New here? Read the User Guide. It walks you from install → first run → steering in Obsidian → reading the result, with a full weather-app tutorial.

Drive it from an agent (MCP + a Claude Code skill)

A capable agent can hand a big, decomposable task to CASCADE instead of burning its own context on it — CASCADE decomposes and executes on the small local model while the agent spends a handful of tool calls and plays the Resolver (the role a person plays at the vault). Register the stdio MCP server with any MCP client:

{ "mcpServers": { "cascade": { "command": "cascade", "args": ["mcp"] } } }

It exposes four tools — solve, advance (the agent owns the knowledge gate: approve the graph or keep deepening it), resolve (answer a stuck node), and inspect. See the MCP server reference for the full contract.

Using Claude Code? Drop the bundled using-cascade skill into your project so Claude knows how to drive a run (grow the knowledge graph, check it for accuracy, give detailed resolutions). From your own repo:

mkdir -p .claude/skills/using-cascade
curl -fsSL https://raw.githubusercontent.com/robinonsay/cascade/main/.claude/skills/using-cascade/SKILL.md \
  -o .claude/skills/using-cascade/SKILL.md

Then ask Claude to "use cascade to solve …" and it will follow the operator playbook. The skill is a single self-contained file — copy it anywhere .claude/skills/ is read.

Why use it

  • Local & private. Your task, code, and data never leave your machine; the models run on Ollama.
  • Cheap. Small local models instead of per-token frontier API calls — cost scales with your hardware, not your token count.
  • Reliable on hard work. Verification gates at every level keep quality from decaying as a task is broken down.
  • A collaboration, not an autopilot. You supply judgment, decisions, and the unknown-unknowns; the small model does the narrow, well-specified work. You stay in the driver's seat.
  • Gets better over time. Tools it builds are cached per-machine and reused, so common work (fetch a URL, parse some JSON, verify a result) isn't rewritten from scratch.

It's for someone who wants real output from local models on tasks too fuzzy for one-shot prompting — and who is willing to steer, brainstorming and correcting in Obsidian, rather than expecting full autonomy.

Roadmap

The engine was built along a phased axis — Phase 0 skeleton (no model) → 1 local execution via Ollama → 2 runnable verification of every result → 3 escalation + parallel execution → 4 templates + tool cache → 5 reliability study. Two later layers cut across those phases and are what the system is today:

  • Knowledge-first interrogation — the two-stage rearchitecture: interrogate to a knowledge graph, then plan and execute from it.
  • Obsidian-mediated co-development — the human-in-the-loop surface and the JSON⇔Markdown round-trip engines.

A third layer is now delivered:

  • MCP delivery surfacecascade mcp serves the engine to an external agent, which drives the run and owns the knowledge gate (the agent is to MCP what a person is to the vault). See the MCP server reference.

Near-future:

  • Larger-model elevation — the non-person Resolver path: hand a stuck node to a bigger model.
  • Packaged release binaries + onboarding — cross-platform distribution.
  • A co-evolving graph — execution teaches the knowledge graph, and the two are traversed asymmetrically (knowledge breadth-first, tasks depth-first).

The headline demo — building a working weather app from a one-line request — is the milestone the whole system is measured against.

Learn more

Using CASCADE — the User Guide:

The design — the specs:

For contributors: cmd/cascade is the CLI; internal/* holds the components (engine, knowledge, interrogation, plan, vault, elevation, verifier, assembler, adapter, store, …); docs/ holds the specs. The on-disk graph is plain JSON, renderable by any external tool.

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages