Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Build artifacts
/agk
/agk.exe
/agk-dev
/dist/
/build/

# Test/coverage output
coverage.txt
coverage.html

# AGK runtime output
.agk/

# Editor/OS
.DS_Store
197 changes: 197 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
# CLAUDE.md

Guidance for working in this repository.

## What this is

**AGK** is the official command-line developer toolchain for **AgenticGoKit**, a Go
framework for building multi-agent AI systems. AGK is *not* the framework itself —
it is the CLI that manages the lifecycle of agent projects built *with* the framework.

It is one part of a three-repo ecosystem:

| Part | Repo | Role |
|------|------|------|
| **Core framework** | `agenticgokit/agenticgokit` (sibling dir `../agenticgokit`) | The library agents are built with (`v1beta` builder API, workflows, memory, RAG, tools, observability). |
| **CLI tooling (this repo)** | `agenticgokit/agk` | Scaffold, evaluate, and trace agent projects. |
| **Template registry** | `agk-templates` | Remote templates that `agk init` can pull. |

Typical user flow: **design** with the framework → **scaffold** with `agk init` →
**test** with `agk eval` → **observe** with `agk trace`.

- Module: `github.com/agenticgokit/agk`, Go **1.24.1**
- Depends on the core framework: `github.com/agenticgokit/agenticgokit v0.5.5`
(uses its `observability` package directly; generated projects import its `v1beta` API).
- Entry point: `main.go` → `cmd.Execute()` (Cobra root command `agk`).

## Product vision (five pillars)

The README frames AGK around a lifecycle. Two pillars are built, three are roadmap:

1. **Create** ✅ — `init` scaffolding + template registry
2. **Test** ✅ — `eval` semantic evaluation framework
3. **Observe** ✅ — `trace` observability (TUI, mermaid, export)
4. **Distribute** 🔜 — template `pack`/`push` (planned)
5. **Deploy** 🔜 — `agk deploy` to cloud/k8s/edge (planned)

When asked about "possibilities" or new features, the planned items are: multi-agent
templates, template distribution (`pack`/`push`), cloud deploy engine, interactive
init wizard (`agk init -i`), MCP server management, and RAG/knowledge-base management.

## Commands (all under `cmd/`)

| Command | File | What it does |
|---------|------|--------------|
| `agk init <name>` | `init.go` | Scaffold a project from a template (`--template`, `--llm`, `--output`, `--force`, `--list`). |
| `agk run [path]` | `run.go` | `go run .` with tracing on by default; prints a trace summary on exit. Flags: `--watch`, `--no-trace`, `--trace-level`. |
| `agk template list/add/remove` | `template.go` | Manage the local template cache (pull from GitHub/local/registry). |
| `agk eval <file.yaml>` | `eval.go` | Run YAML-defined eval tests against a running EvalServer over HTTP. |
| `agk trace [list/show/view/export/audit/mermaid]` | `trace.go` | Inspect traces stored in `.agk/runs/`. Bare `agk trace` launches the TUI explorer. |
| `agk version` | `version.go` | Build/version info (injected via ldflags). |

Global flags (in `cmd/root.go`): `--config`, `--verbose`, `--debug`, `--trace`,
`--trace-exporter` (console|otlp|file), `--trace-endpoint`, `--trace-sample`,
`--store-prompts`. Config is loaded by Viper from `$HOME/.agk.toml` with env prefix `AGK_`.

## Package layout

```
cmd/ Cobra commands (root, init, template, eval, trace, version)
pkg/scaffold/ Project generation
template.go TemplateType, TemplateMetadata, TemplateGenerator interface
template_registry.go Built-in generators: Quickstart, Workflow + provider/model helpers
external_generator.go Renders cached registry templates (text/template + Sprig, ".tmpl" stripped)
service.go Higher-level Service wrapper
templates/ Embedded built-in templates (quickstart/, workflow/) as *.tmpl
pkg/registry/ Template fetching/caching/resolution
resolver.go Resolves "github.com/...", "./local", "@version", or registry name
fetcher.go GitFetcher / LocalFetcher
cache.go CacheManager (local template cache)
manifest.go agk-template.toml schema (TemplateManifest/TemplateInfo/Variable)
index.go Fetches registry index.json (DefaultRegistryURL → agk-templates/registry)
internal/eval/ Evaluation framework
parser.go Parses + validates YAML test suites
types.go TestSuite/Target/Test/Expectation/SemanticConfig (canonical schema)
runner.go Executes suites against an HTTPTarget
http_target.go Talks to EvalServer: POST /invoke, GET /health
matcher.go MatcherFactory: exact/contains/regex/semantic
embedding_matcher.go Semantic strategy: embedding cosine similarity
llm_judge_matcher.go Semantic strategy: LLM-as-judge
hybrid_matcher.go Semantic strategy: hybrid (both)
reporter.go Output: console/json/junit/markdown
internal/audit/ Trace → reasoning analysis
collector.go Reads .agk/runs/<id> spans → TraceObject of typed events
types.go EventType (thought/tool_call/observation/llm_call/decision)
mermaid.go Mermaid flowchart generation
internal/tui/ Bubble Tea TUIs: trace_viewer, span_tree, styles
internal/config/ agk.toml generator (ProjectConfig → TOML)
internal/utils/ zerolog logging, filesystem, errors (has the only *_test.go files)
```

## Built-in templates

Two are compiled in (see `pkg/scaffold/templates/`):

- **quickstart** (⭐, 2 files) — single `main.go` with a hardcoded agent via
`v1beta.NewBuilder(...).WithLLM(...).Build()` + streaming.
- **workflow** (⭐⭐⭐, 3 files) — sequential multi-agent pipeline
(researcher → summarizer → formatter) via `NewSequentialWorkflow` + step streaming.

Generated projects import `github.com/agenticgokit/agenticgokit/v1beta` and a provider
plugin `plugins/llm/<provider>`. Provider→default-model and provider→API-key-env mappings
live in `template_registry.go` (`getLLMModel`, `getAPIKeyEnv`). Supported `--llm` values:
`openai` (gpt-4o), `anthropic` (claude-sonnet-4), `ollama` (llama3.2), `azure`.

External/registry templates are rendered by `external_generator.go`, driven by an
`agk-template.toml` manifest (`pkg/registry/manifest.go`).

## Key conventions & filesystem layout

- **Traces** are written to `.agk/runs/<run-id>/` when `AGK_TRACE=true`:
`trace.jsonl` (OTel spans), `events.jsonl`, `manifest.json`.
Run IDs are `run-<unixnano>` (see `generateRunID` in `cmd/root.go`).
- **Eval reports** auto-save to `.agk/reports/eval-report-<timestamp>.md`.
- `AGK_TRACE_LEVEL` controls capture granularity: `minimal` | `standard` | `detailed`
(use `detailed` to capture prompts/responses/tool args for `trace audit`).
- Observability is OpenTelemetry-based; the file exporter produces the JSONL that the
`trace` and `audit` packages parse back.

## How `agk eval` actually works (important)

`agk eval` does **not** run the agent in-process. It is an HTTP client. The flow is:

1. The user's agent project runs a `v1beta.EvalServer` (see `../agenticgokit/v1beta/eval_server.go`),
exposing `POST /invoke` and `GET /health`.
2. `agk eval tests.yaml` health-checks the target, then POSTs each test `input` to `/invoke`
and matches the returned `output` against the expectation.

⚠️ **Doc vs. code mismatch:** the README's eval YAML example uses keys like `evalserver:`,
`workflow_name:`, and `expected_output:`. The **actual parser** (`internal/eval/types.go` +
`parser.go`) expects:

```yaml
name: "Suite name" # required
target: # required
type: http # only "http" is supported
url: http://localhost:8787
semantic: # optional global config for "semantic" expectations
strategy: llm-judge # llm-judge | embedding | hybrid
threshold: 0.7
llm: { provider: ollama, model: llama3.2 }
tests:
- name: "..." # required
input: "..." # required
expect:
type: semantic # exact | contains | regex | semantic
value: "..." # value | values | pattern depending on type
```

Treat `internal/eval/types.go` as the source of truth for the schema, not the README.

## Build, test, lint

```bash
make build # go build -o agk main.go
make test # go test -v -race ./...
make test-coverage # coverage.txt + coverage.html
make lint # golangci-lint run ./...
make fmt # gofmt -s + goimports
make install # go install with version ldflags
```

- `make install`/release inject `Version`/`GitCommit`/`BuildDate` into the `cmd` package
via `-ldflags -X github.com/agenticgokit/agk/cmd.Version=...`.
- Linting is strict (`.golangci.yml`): includes `gosec`, `gocyclo` (min-complexity 35),
`dupl`, `goconst`, `stylecheck`, `errcheck`, etc. Match existing style and keep new
functions under the complexity threshold.
- `make test-integration` references `./test/integration/...`, which **does not exist yet** —
there is no `test/` directory. Real test coverage today is only in `internal/utils/`.
- CI/release configured in `.github/workflows/` and `.goreleaser.yml`.

## Working with the sibling core framework

`../agenticgokit` is the framework this CLI is built around. Reach for it when you need to
understand:

- the `v1beta` builder/workflow/streaming API that generated templates use
(`../agenticgokit/v1beta/builder.go`, `workflow.go`, `streaming.go`);
- the `EvalServer` contract that `agk eval` targets (`v1beta/eval_server.go`,
`eval_types.go`, `eval_handlers.go`);
- the `observability` package this repo imports directly for tracer setup
(`SetupTracer`, `WithRunID`, `WithLogger`, `GetTracer`).

Note the core framework is mid-migration: `v1beta` (formerly `vnext`) is the recommended
API; legacy `core`/`core/vnext` will be removed at v1.0. New scaffold templates should
target `v1beta`.

## Conventions for changes

- This is a Cobra CLI: each command is its own file in `cmd/`, registered via `init()` →
`rootCmd.AddCommand(...)`. Follow that pattern for new commands.
- User-facing output uses `fatih/color`; structured logs use `rs/zerolog` (`cmd.GetLogger()`).
- Keep `internal/` for implementation detail and `pkg/` for reusable scaffold/registry
logic (current split).
- Built-in templates are embedded; after editing files under
`pkg/scaffold/templates/`, rebuild to pick them up.
</content>
</invoke>
135 changes: 135 additions & 0 deletions FEATURES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# AGK — Feature Ideas & Roadmap

Proposed improvements to the AGK CLI, aimed at the developer experience and the
AI-agent building experience. Each item is grounded in a concrete observation from the
codebase (file references included) so it's actionable, not aspirational.

Status legend: 🔴 not started · 🟡 partially built · 🟢 quick win

---

## Priority recommendation

In order of leverage:

1. **`agk run` / `agk dev`** — closes the scaffold→run→observe loop (nothing else has this leverage).
2. **Eval auto-serve + `expect.trace`** — turns the test pillar from "wire up two processes" into one command; the trace-assertion plumbing already exists.
3. **`agk doctor`** — kills the most common first-run failures, very cheap to build.
4. **Interactive `init` wizard** — already promised in the roadmap, TUI toolkit already imported.

Then fold in the correctness fixes (Section A) as each area is touched.

---

## A. Fix / finish what's already half-built

Low-risk, high-trust changes where the code already gestures at a feature but doesn't
deliver. These remove "the docs lied to me" friction.

### A1. Implement the `init -i` interactive flag 🟡
- **Evidence:** `initInteractive` is parsed in `cmd/init.go` and passed into
`GenerateOptions.Interactive`, but no generator ever reads it.
- **Action:** Build a Bubble Tea wizard (deps `bubbletea`/`bubbles`/`lipgloss` already
present). Flow: template → provider → model → features. Satisfies the "Interactive Init
Wizard" roadmap item.

### A2. Wire up (or remove) `agk.toml` generation 🟡
- **Evidence:** `internal/config/generator.go` builds a full project config, and
`cmd/init.go` help text promises "Project configuration (agk.toml)", but the built-in
generators in `pkg/scaffold/template_registry.go` only write `main.go` + `go.mod`.
- **Action:** Either call the generator during `init` or drop the claim. A real project
`agk.toml` (provider/model/memory defaults) would also let `run`/`eval` stop depending
on env + flags.

### A3. `template remove` by name 🟢
- **Evidence:** Explicit `TODO` in `cmd/template.go` — only removes by exact source string.
- **Action:** Add name→source lookup in `registry.CacheManager`.

### A4. Fix eval doc/schema mismatch 🟢
- **Evidence:** README shows `evalserver:` / `expected_output:`; the parser
(`internal/eval/types.go`, `parser.go`) expects `target:` / `expect:`.
- **Action:** Align README to the actual schema (or add a compatibility shim). Currently a
confusing first-run failure.

### A5. Real cost estimation 🟢
- **Evidence:** `cmd/trace.go` hardcodes `estimatedCost := tokens * 0.00001`.
- **Action:** Per-model pricing table so `trace view` / `trace list` cost numbers are
trustworthy.

### A6. Implement `expect.trace` validation 🟡 (see B2 — biggest unlock)
- **Evidence:** `TraceExpectation` (tool_calls, llm_calls, execution_path, min/max steps)
is fully typed in `internal/eval/types.go` but there is a
`// TODO: Validate trace expectations` at `internal/eval/runner.go:193`.
- **Action:** Validate against the captured trace (see B2).

---

## B. Net-new features that close loop gaps

### B1. `agk run` / `agk dev` — the missing center of the loop 🟢 ✅ SHIPPED
- **Gap:** The CLI *scaffolds* and *observes* but never *runs*. `printNextSteps` in
`cmd/init.go` just tells the user to `go run main.go`.
- **Delivered (`cmd/run.go`):**
- `agk run [path]` wraps `go run .`, auto-sets `AGK_TRACE=true` +
`AGK_TRACE_EXPORTER=file`, and inherits stdio;
- on exit, prints a compact trace summary (duration / spans / LLM calls / tokens / cost)
+ a `→ agk trace view <run-id>` hint;
- `--watch` / `-w` re-runs on `.go` changes (debounced);
- `--no-trace` and `--trace-level minimal|standard|detailed` flags.
- **Follow-ups:** a dedicated `agk dev` alias; making `agk trace` path-aware so summaries for
`agk run <subdir>` link correctly from any CWD.

### B2. Eval: auto-serve + behavioral assertions 🔴/🟡 ⭐
- **Gap:** `agk eval` is an HTTP client (`internal/eval/http_target.go`) that assumes the
user is *separately* running a `v1beta.EvalServer` in another terminal.
- **Proposal A — auto-serve:** `agk eval --serve ./...` (or `agk eval init`) builds/launches
the user's eval server, runs tests, and tears it down. One command instead of two
processes. `agk eval init` could also scaffold a starter `tests.yaml` + EvalServer wrapper.
- **Proposal B — behavioral assertions:** Implement `expect.trace` (types already exist) to
assert "the `search` tool was called", "≤ 3 LLM calls", "path was research→summarize".
Turns eval from output-matching into **behavioral** testing — what agent devs actually
need. Validate against the existing `audit.TraceObject` event model
(`internal/audit/types.go`).
- **Proposal C — more target types:** in-process / CLI target so simple agents don't need
HTTP at all (parser currently rejects anything but `type: http`).

### B3. `agk doctor` — preflight diagnostics 🔴
- **Gap:** A large class of first-run failures is environmental (`OPENAI_API_KEY` unset,
Ollama not running on `:11434`, model not pulled, registry unreachable). These surface as
cryptic runtime errors deep in the agent.
- **Proposal:** `agk doctor` checks: Go version, provider API keys, Ollama reachability,
registry index reachability, `.agk/` health. Cheap, big friction reducer.

### B4. More templates tied to framework strengths 🔴
- **Gap:** Only `quickstart` + `workflow` ship (`pkg/scaffold/templates/`). The core
framework sells memory/RAG, MCP tools, multimodal, and parallel/DAG/loop workflows —
**none of which have a template.**
- **Proposal:** Add self-contained templates: RAG agent, MCP-tool agent, parallel/DAG
workflow, chat-REPL agent. Each is small and high-value.

### B5. `agk trace diff <a> <b>` 🔴
- **Gap:** The observe pillar is the most mature; improvements are incremental.
- **Proposal:** Run-to-run diff (latency / tokens / cost / execution path) — directly
answers "did my prompt change help?". A `trace watch` live-tail is also close, since
`internal/tui` `NewTraceViewerWithPath` already supports hot-reload.

### B6. MCP & RAG management 🔴 (longer-term, on-brand)
- **Gap:** The framework differentiates on "batteries-included" MCP + chromem vector store,
but the CLI offers nothing here yet.
- **Proposal:** `agk mcp add/list` (register MCP servers + scaffold tool wiring) and
`agk rag ingest <docs>` / `agk knowledge` (manage the embedded vector store). Bigger lifts,
but strategically aligned with the framework's identity and the existing roadmap.

---

## Mapping to the five-pillar vision

| Pillar | Existing | Proposed additions |
|--------|----------|--------------------|
| **Create** | `init`, `template` | Interactive wizard (A1), `agk.toml` (A2), more templates (B4) |
| **Run** *(new)* | — | `agk run`/`dev` (B1), `agk doctor` (B3) |
| **Test** | `eval` | Auto-serve (B2-A), `expect.trace` (A6/B2-B), more targets (B2-C) |
| **Observe** | `trace` | Cost table (A5), `trace diff`/`watch` (B5) |
| **Distribute** | *(planned)* | template `pack`/`push` |
| **Deploy** | *(planned)* | `agk deploy` (cloud/k8s/edge), MCP/RAG mgmt (B6) |
</content>
13 changes: 11 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,16 @@ go mod tidy
# Set your API key
export OPENAI_API_KEY=sk-...

# Run the agent
go run main.go
# Run the agent (tracing on by default, prints a trace summary on exit)
agk run

# ...or re-run automatically on file changes
agk run --watch
```

> `agk run` wraps `go run .`, enables tracing, and surfaces a trace summary plus a
> `agk trace view` hint when the program exits. Prefer plain `go run main.go`? That still works.

---

## Templates & Registry
Expand Down Expand Up @@ -199,6 +205,8 @@ agk trace mermaid > trace_flow.md
|---------|-------------|
| `init` | Create a new project from a template. |
| `init --list` | Show details of all available templates. |
| `run` | Build and run a project with tracing on; prints a trace summary on exit. |
| `run --watch` | Re-run the project automatically when `.go` files change. |
| `eval` | Run automated tests against workflows with semantic matching. |
| `trace list` | List all captured trace runs. |
| `trace show` | Display summary of a specific run. |
Expand All @@ -214,6 +222,7 @@ agk trace mermaid > trace_flow.md
- **Smart Scaffolding** (Quickstart, Workflow bases)
- **Eval Framework** (Semantic matching, LLM-as-judge, professional reports)
- **Trace System** (Interactive TUI, Mermaid export, detailed spans)
- **Run Command** (`agk run` / `--watch` — tracing-on execution with inline trace summary)
- **Streaming Support** (Native across all templates)

### In Progress
Expand Down
Loading
Loading