AgenticGoKit · kunalkushwaha · Jun 21, 2026 · Jun 21, 2026
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,16 @@
+# Build artifacts
+/agk
+/agk.exe
+/agk-dev
+/dist/
+/build/
+
+# Test/coverage output
+coverage.txt
+coverage.html
+
+# AGK runtime output
+.agk/
+
+# Editor/OS
+.DS_Store
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,197 @@
+# CLAUDE.md
+
+Guidance for working in this repository.
+
+## What this is
+
+**AGK** is the official command-line developer toolchain for **AgenticGoKit**, a Go
+framework for building multi-agent AI systems. AGK is *not* the framework itself —
+it is the CLI that manages the lifecycle of agent projects built *with* the framework.
+
+It is one part of a three-repo ecosystem:
+
+| Part | Repo | Role |
+|------|------|------|
+| **Core framework** | `agenticgokit/agenticgokit` (sibling dir `../agenticgokit`) | The library agents are built with (`v1beta` builder API, workflows, memory, RAG, tools, observability). |
+| **CLI tooling (this repo)** | `agenticgokit/agk` | Scaffold, evaluate, and trace agent projects. |
+| **Template registry** | `agk-templates` | Remote templates that `agk init` can pull. |
+
+Typical user flow: **design** with the framework → **scaffold** with `agk init` →
+**test** with `agk eval` → **observe** with `agk trace`.
+
+- Module: `github.com/agenticgokit/agk`, Go **1.24.1**
+- Depends on the core framework: `github.com/agenticgokit/agenticgokit v0.5.5`
+  (uses its `observability` package directly; generated projects import its `v1beta` API).
+- Entry point: `main.go` → `cmd.Execute()` (Cobra root command `agk`).
+
+## Product vision (five pillars)
+
+The README frames AGK around a lifecycle. Two pillars are built, three are roadmap:
+
+1. **Create** ✅ — `init` scaffolding + template registry
+2. **Test** ✅ — `eval` semantic evaluation framework
+3. **Observe** ✅ — `trace` observability (TUI, mermaid, export)
+4. **Distribute** 🔜 — template `pack`/`push` (planned)
+5. **Deploy** 🔜 — `agk deploy` to cloud/k8s/edge (planned)
+
+When asked about "possibilities" or new features, the planned items are: multi-agent
+templates, template distribution (`pack`/`push`), cloud deploy engine, interactive
+init wizard (`agk init -i`), MCP server management, and RAG/knowledge-base management.
+
+## Commands (all under `cmd/`)
+
+| Command | File | What it does |
+|---------|------|--------------|
+| `agk init <name>` | `init.go` | Scaffold a project from a template (`--template`, `--llm`, `--output`, `--force`, `--list`). |
+| `agk run [path]` | `run.go` | `go run .` with tracing on by default; prints a trace summary on exit. Flags: `--watch`, `--no-trace`, `--trace-level`. |
+| `agk template list/add/remove` | `template.go` | Manage the local template cache (pull from GitHub/local/registry). |
+| `agk eval <file.yaml>` | `eval.go` | Run YAML-defined eval tests against a running EvalServer over HTTP. |
+| `agk trace [list/show/view/export/audit/mermaid]` | `trace.go` | Inspect traces stored in `.agk/runs/`. Bare `agk trace` launches the TUI explorer. |
+| `agk version` | `version.go` | Build/version info (injected via ldflags). |
+
+Global flags (in `cmd/root.go`): `--config`, `--verbose`, `--debug`, `--trace`,
+`--trace-exporter` (console|otlp|file), `--trace-endpoint`, `--trace-sample`,
+`--store-prompts`. Config is loaded by Viper from `$HOME/.agk.toml` with env prefix `AGK_`.
+
+## Package layout
+
+```
+cmd/                 Cobra commands (root, init, template, eval, trace, version)
+pkg/scaffold/        Project generation
+  template.go            TemplateType, TemplateMetadata, TemplateGenerator interface
+  template_registry.go   Built-in generators: Quickstart, Workflow + provider/model helpers
+  external_generator.go  Renders cached registry templates (text/template + Sprig, ".tmpl" stripped)
+  service.go             Higher-level Service wrapper
+  templates/             Embedded built-in templates (quickstart/, workflow/) as *.tmpl
+pkg/registry/        Template fetching/caching/resolution
+  resolver.go            Resolves "github.com/...", "./local", "@version", or registry name
+  fetcher.go             GitFetcher / LocalFetcher
+  cache.go               CacheManager (local template cache)
+  manifest.go            agk-template.toml schema (TemplateManifest/TemplateInfo/Variable)
+  index.go               Fetches registry index.json (DefaultRegistryURL → agk-templates/registry)
+internal/eval/       Evaluation framework
+  parser.go              Parses + validates YAML test suites
+  types.go               TestSuite/Target/Test/Expectation/SemanticConfig (canonical schema)
+  runner.go              Executes suites against an HTTPTarget
+  http_target.go         Talks to EvalServer: POST /invoke, GET /health
+  matcher.go             MatcherFactory: exact/contains/regex/semantic
+  embedding_matcher.go   Semantic strategy: embedding cosine similarity
+  llm_judge_matcher.go   Semantic strategy: LLM-as-judge
+  hybrid_matcher.go      Semantic strategy: hybrid (both)
+  reporter.go            Output: console/json/junit/markdown
+internal/audit/      Trace → reasoning analysis
+  collector.go           Reads .agk/runs/<id> spans → TraceObject of typed events
+  types.go               EventType (thought/tool_call/observation/llm_call/decision)
+  mermaid.go             Mermaid flowchart generation
+internal/tui/        Bubble Tea TUIs: trace_viewer, span_tree, styles
+internal/config/     agk.toml generator (ProjectConfig → TOML)
+internal/utils/      zerolog logging, filesystem, errors (has the only *_test.go files)
+```
+
+## Built-in templates
+
+Two are compiled in (see `pkg/scaffold/templates/`):
+
+- **quickstart** (⭐, 2 files) — single `main.go` with a hardcoded agent via
+  `v1beta.NewBuilder(...).WithLLM(...).Build()` + streaming.
+- **workflow** (⭐⭐⭐, 3 files) — sequential multi-agent pipeline
+  (researcher → summarizer → formatter) via `NewSequentialWorkflow` + step streaming.
+
+Generated projects import `github.com/agenticgokit/agenticgokit/v1beta` and a provider
+plugin `plugins/llm/<provider>`. Provider→default-model and provider→API-key-env mappings
+live in `template_registry.go` (`getLLMModel`, `getAPIKeyEnv`). Supported `--llm` values:
+`openai` (gpt-4o), `anthropic` (claude-sonnet-4), `ollama` (llama3.2), `azure`.
+
+External/registry templates are rendered by `external_generator.go`, driven by an
+`agk-template.toml` manifest (`pkg/registry/manifest.go`).
+
+## Key conventions & filesystem layout
+
+- **Traces** are written to `.agk/runs/<run-id>/` when `AGK_TRACE=true`:
+  `trace.jsonl` (OTel spans), `events.jsonl`, `manifest.json`.
+  Run IDs are `run-<unixnano>` (see `generateRunID` in `cmd/root.go`).
+- **Eval reports** auto-save to `.agk/reports/eval-report-<timestamp>.md`.
+- `AGK_TRACE_LEVEL` controls capture granularity: `minimal` | `standard` | `detailed`
+  (use `detailed` to capture prompts/responses/tool args for `trace audit`).
+- Observability is OpenTelemetry-based; the file exporter produces the JSONL that the
+  `trace` and `audit` packages parse back.
+
+## How `agk eval` actually works (important)
+
+`agk eval` does **not** run the agent in-process. It is an HTTP client. The flow is:
+
+1. The user's agent project runs a `v1beta.EvalServer` (see `../agenticgokit/v1beta/eval_server.go`),
+   exposing `POST /invoke` and `GET /health`.
+2. `agk eval tests.yaml` health-checks the target, then POSTs each test `input` to `/invoke`
+   and matches the returned `output` against the expectation.
+
+⚠️ **Doc vs. code mismatch:** the README's eval YAML example uses keys like `evalserver:`,
+`workflow_name:`, and `expected_output:`. The **actual parser** (`internal/eval/types.go` +
+`parser.go`) expects:
+
+```yaml
+name: "Suite name"            # required
+target:                       # required
+  type: http                  # only "http" is supported
+  url: http://localhost:8787
+semantic:                     # optional global config for "semantic" expectations
+  strategy: llm-judge         # llm-judge | embedding | hybrid
+  threshold: 0.7
+  llm: { provider: ollama, model: llama3.2 }
+tests:
+  - name: "..."               # required
+    input: "..."              # required
+    expect:
+      type: semantic          # exact | contains | regex | semantic
+      value: "..."            # value | values | pattern depending on type
+```
+
+Treat `internal/eval/types.go` as the source of truth for the schema, not the README.
+
+## Build, test, lint
+
+```bash
+make build            # go build -o agk main.go
+make test             # go test -v -race ./...
+make test-coverage    # coverage.txt + coverage.html
+make lint             # golangci-lint run ./...
+make fmt              # gofmt -s + goimports
+make install          # go install with version ldflags
+```
+
+- `make install`/release inject `Version`/`GitCommit`/`BuildDate` into the `cmd` package
+  via `-ldflags -X github.com/agenticgokit/agk/cmd.Version=...`.
+- Linting is strict (`.golangci.yml`): includes `gosec`, `gocyclo` (min-complexity 35),
+  `dupl`, `goconst`, `stylecheck`, `errcheck`, etc. Match existing style and keep new
+  functions under the complexity threshold.
+- `make test-integration` references `./test/integration/...`, which **does not exist yet** —
+  there is no `test/` directory. Real test coverage today is only in `internal/utils/`.
+- CI/release configured in `.github/workflows/` and `.goreleaser.yml`.
+
+## Working with the sibling core framework
+
+`../agenticgokit` is the framework this CLI is built around. Reach for it when you need to
+understand:
+
+- the `v1beta` builder/workflow/streaming API that generated templates use
+  (`../agenticgokit/v1beta/builder.go`, `workflow.go`, `streaming.go`);
+- the `EvalServer` contract that `agk eval` targets (`v1beta/eval_server.go`,
+  `eval_types.go`, `eval_handlers.go`);
+- the `observability` package this repo imports directly for tracer setup
+  (`SetupTracer`, `WithRunID`, `WithLogger`, `GetTracer`).
+
+Note the core framework is mid-migration: `v1beta` (formerly `vnext`) is the recommended
+API; legacy `core`/`core/vnext` will be removed at v1.0. New scaffold templates should
+target `v1beta`.
+
+## Conventions for changes
+
+- This is a Cobra CLI: each command is its own file in `cmd/`, registered via `init()` →
+  `rootCmd.AddCommand(...)`. Follow that pattern for new commands.
+- User-facing output uses `fatih/color`; structured logs use `rs/zerolog` (`cmd.GetLogger()`).
+- Keep `internal/` for implementation detail and `pkg/` for reusable scaffold/registry
+  logic (current split).
+- Built-in templates are embedded; after editing files under
+  `pkg/scaffold/templates/`, rebuild to pick them up.
+</content>
+</invoke>
diff --git a/FEATURES.md b/FEATURES.md
@@ -0,0 +1,135 @@
+# AGK — Feature Ideas & Roadmap
+
+Proposed improvements to the AGK CLI, aimed at the developer experience and the
+AI-agent building experience. Each item is grounded in a concrete observation from the
+codebase (file references included) so it's actionable, not aspirational.
+
+Status legend: 🔴 not started · 🟡 partially built · 🟢 quick win
+
+---
+
+## Priority recommendation
+
+In order of leverage:
+
+1. **`agk run` / `agk dev`** — closes the scaffold→run→observe loop (nothing else has this leverage).
+2. **Eval auto-serve + `expect.trace`** — turns the test pillar from "wire up two processes" into one command; the trace-assertion plumbing already exists.
+3. **`agk doctor`** — kills the most common first-run failures, very cheap to build.
+4. **Interactive `init` wizard** — already promised in the roadmap, TUI toolkit already imported.
+
+Then fold in the correctness fixes (Section A) as each area is touched.
+
+---
+
+## A. Fix / finish what's already half-built
+
+Low-risk, high-trust changes where the code already gestures at a feature but doesn't
+deliver. These remove "the docs lied to me" friction.
+
+### A1. Implement the `init -i` interactive flag 🟡
+- **Evidence:** `initInteractive` is parsed in `cmd/init.go` and passed into
+  `GenerateOptions.Interactive`, but no generator ever reads it.
+- **Action:** Build a Bubble Tea wizard (deps `bubbletea`/`bubbles`/`lipgloss` already
+  present). Flow: template → provider → model → features. Satisfies the "Interactive Init
+  Wizard" roadmap item.
+
+### A2. Wire up (or remove) `agk.toml` generation 🟡
+- **Evidence:** `internal/config/generator.go` builds a full project config, and
+  `cmd/init.go` help text promises "Project configuration (agk.toml)", but the built-in
+  generators in `pkg/scaffold/template_registry.go` only write `main.go` + `go.mod`.
+- **Action:** Either call the generator during `init` or drop the claim. A real project
+  `agk.toml` (provider/model/memory defaults) would also let `run`/`eval` stop depending
+  on env + flags.
+
+### A3. `template remove` by name 🟢
+- **Evidence:** Explicit `TODO` in `cmd/template.go` — only removes by exact source string.
+- **Action:** Add name→source lookup in `registry.CacheManager`.
+
+### A4. Fix eval doc/schema mismatch 🟢
+- **Evidence:** README shows `evalserver:` / `expected_output:`; the parser
+  (`internal/eval/types.go`, `parser.go`) expects `target:` / `expect:`.
+- **Action:** Align README to the actual schema (or add a compatibility shim). Currently a
+  confusing first-run failure.
+
+### A5. Real cost estimation 🟢
+- **Evidence:** `cmd/trace.go` hardcodes `estimatedCost := tokens * 0.00001`.
+- **Action:** Per-model pricing table so `trace view` / `trace list` cost numbers are
+  trustworthy.
+
+### A6. Implement `expect.trace` validation 🟡 (see B2 — biggest unlock)
+- **Evidence:** `TraceExpectation` (tool_calls, llm_calls, execution_path, min/max steps)
+  is fully typed in `internal/eval/types.go` but there is a
+  `// TODO: Validate trace expectations` at `internal/eval/runner.go:193`.
+- **Action:** Validate against the captured trace (see B2).
+
+---
+
+## B. Net-new features that close loop gaps
+
+### B1. `agk run` / `agk dev` — the missing center of the loop 🟢 ✅ SHIPPED
+- **Gap:** The CLI *scaffolds* and *observes* but never *runs*. `printNextSteps` in
+  `cmd/init.go` just tells the user to `go run main.go`.
+- **Delivered (`cmd/run.go`):**
+  - `agk run [path]` wraps `go run .`, auto-sets `AGK_TRACE=true` +
+    `AGK_TRACE_EXPORTER=file`, and inherits stdio;
+  - on exit, prints a compact trace summary (duration / spans / LLM calls / tokens / cost)
+    + a `→ agk trace view <run-id>` hint;
+  - `--watch` / `-w` re-runs on `.go` changes (debounced);
+  - `--no-trace` and `--trace-level minimal|standard|detailed` flags.
+- **Follow-ups:** a dedicated `agk dev` alias; making `agk trace` path-aware so summaries for
+  `agk run <subdir>` link correctly from any CWD.
+
+### B2. Eval: auto-serve + behavioral assertions 🔴/🟡 ⭐
+- **Gap:** `agk eval` is an HTTP client (`internal/eval/http_target.go`) that assumes the
+  user is *separately* running a `v1beta.EvalServer` in another terminal.
+- **Proposal A — auto-serve:** `agk eval --serve ./...` (or `agk eval init`) builds/launches
+  the user's eval server, runs tests, and tears it down. One command instead of two
+  processes. `agk eval init` could also scaffold a starter `tests.yaml` + EvalServer wrapper.
+- **Proposal B — behavioral assertions:** Implement `expect.trace` (types already exist) to
+  assert "the `search` tool was called", "≤ 3 LLM calls", "path was research→summarize".
+  Turns eval from output-matching into **behavioral** testing — what agent devs actually
+  need. Validate against the existing `audit.TraceObject` event model
+  (`internal/audit/types.go`).
+- **Proposal C — more target types:** in-process / CLI target so simple agents don't need
+  HTTP at all (parser currently rejects anything but `type: http`).
+
+### B3. `agk doctor` — preflight diagnostics 🔴
+- **Gap:** A large class of first-run failures is environmental (`OPENAI_API_KEY` unset,
+  Ollama not running on `:11434`, model not pulled, registry unreachable). These surface as
+  cryptic runtime errors deep in the agent.
+- **Proposal:** `agk doctor` checks: Go version, provider API keys, Ollama reachability,
+  registry index reachability, `.agk/` health. Cheap, big friction reducer.
+
+### B4. More templates tied to framework strengths 🔴
+- **Gap:** Only `quickstart` + `workflow` ship (`pkg/scaffold/templates/`). The core
+  framework sells memory/RAG, MCP tools, multimodal, and parallel/DAG/loop workflows —
+  **none of which have a template.**
+- **Proposal:** Add self-contained templates: RAG agent, MCP-tool agent, parallel/DAG
+  workflow, chat-REPL agent. Each is small and high-value.
+
+### B5. `agk trace diff <a> <b>` 🔴
+- **Gap:** The observe pillar is the most mature; improvements are incremental.
+- **Proposal:** Run-to-run diff (latency / tokens / cost / execution path) — directly
+  answers "did my prompt change help?". A `trace watch` live-tail is also close, since
+  `internal/tui` `NewTraceViewerWithPath` already supports hot-reload.
+
+### B6. MCP & RAG management 🔴 (longer-term, on-brand)
+- **Gap:** The framework differentiates on "batteries-included" MCP + chromem vector store,
+  but the CLI offers nothing here yet.
+- **Proposal:** `agk mcp add/list` (register MCP servers + scaffold tool wiring) and
+  `agk rag ingest <docs>` / `agk knowledge` (manage the embedded vector store). Bigger lifts,
+  but strategically aligned with the framework's identity and the existing roadmap.
+
+---
+
+## Mapping to the five-pillar vision
+
+| Pillar | Existing | Proposed additions |
+|--------|----------|--------------------|
+| **Create** | `init`, `template` | Interactive wizard (A1), `agk.toml` (A2), more templates (B4) |
+| **Run** *(new)* | — | `agk run`/`dev` (B1), `agk doctor` (B3) |
+| **Test** | `eval` | Auto-serve (B2-A), `expect.trace` (A6/B2-B), more targets (B2-C) |
+| **Observe** | `trace` | Cost table (A5), `trace diff`/`watch` (B5) |
+| **Distribute** | *(planned)* | template `pack`/`push` |
+| **Deploy** | *(planned)* | `agk deploy` (cloud/k8s/edge), MCP/RAG mgmt (B6) |
+</content>
diff --git a/README.md b/README.md
@@ -51,10 +51,16 @@ go mod tidy
 # Set your API key
 export OPENAI_API_KEY=sk-...
 
-# Run the agent
-go run main.go
+# Run the agent (tracing on by default, prints a trace summary on exit)
+agk run
+
+# ...or re-run automatically on file changes
+agk run --watch
 ```
 
+> `agk run` wraps `go run .`, enables tracing, and surfaces a trace summary plus a
+> `agk trace view` hint when the program exits. Prefer plain `go run main.go`? That still works.
+
 ---
 
 ## Templates & Registry
@@ -199,6 +205,8 @@ agk trace mermaid > trace_flow.md
 |---------|-------------|
 | `init` | Create a new project from a template. |
 | `init --list` | Show details of all available templates. |
+| `run` | Build and run a project with tracing on; prints a trace summary on exit. |
+| `run --watch` | Re-run the project automatically when `.go` files change. |
 | `eval` | Run automated tests against workflows with semantic matching. |
 | `trace list` | List all captured trace runs. |
 | `trace show` | Display summary of a specific run. |
@@ -214,6 +222,7 @@ agk trace mermaid > trace_flow.md
 - **Smart Scaffolding** (Quickstart, Workflow bases)
 - **Eval Framework** (Semantic matching, LLM-as-judge, professional reports)
 - **Trace System** (Interactive TUI, Mermaid export, detailed spans)
+- **Run Command** (`agk run` / `--watch` — tracing-on execution with inline trace summary)
 - **Streaming Support** (Native across all templates)
 
 ### In Progress