Skip to content

feat(trace): add agk trace diff to compare two runs#17

Open
kunalkushwaha wants to merge 1 commit into
mainfrom
feat/trace-diff
Open

feat(trace): add agk trace diff to compare two runs#17
kunalkushwaha wants to merge 1 commit into
mainfrom
feat/trace-diff

Conversation

@kunalkushwaha

Copy link
Copy Markdown
Member

What

Adds agk trace diff — compare two trace runs to answer "did my change help?". This is feature B5 from the FEATURES.md roadmap, and it pairs naturally with agk run's trace summary (#13) and the per-model cost work (#15).

Behavior

Diffs two runs across duration, spans, LLM calls, tokens, and estimated cost, showing the delta with an arrow + percentage. For duration/tokens/cost (lower-is-better), improvements are green and regressions red.

📊 Trace Diff
  A (baseline): run-200-base
  B (new):      run-201-new
────────────────────────────────────────────────────────────────
METRIC     A (baseline)  B (new)  Δ
Duration   3.20s         2.10s    -1.10s ▼ -34%
Spans      8             6        -2 ▼ -25%
LLM Calls  3             2        -1 ▼ -33%
Tokens     2000          1200     -800 ▼ -40%
Est. Cost  $0.0200       $0.0120  -$0.0080 ▼ -40%

Run selection

Form Compares
agk trace diff the two most recent runs
agk trace diff <a> <a> (baseline) vs the latest run
agk trace diff <a> <b> the two explicit runs

Testing

  • go build, go vet, go test ./..., gofmt all green.
  • Unit tests (the cmd package's first) for delta direction, the metric set, formatting, and explicit run resolution.
  • Verified end-to-end with synthetic manifests: explicit pair, zero-arg (picks two most recent), and the "not enough runs" error path.

Independent branch off main (alongside #13#16). Reuses the existing readManifest/TraceRun helpers in cmd/trace.go.

🤖 Generated with Claude Code

Answers "did my change help?" by diffing two trace runs across duration,
spans, LLM calls, tokens, and estimated cost, with colored deltas
(improvements green, regressions red) and percentage change.

Run selection:
  agk trace diff                 # two most recent runs
  agk trace diff <a>             # <a> (baseline) vs latest
  agk trace diff <a> <b>         # explicit pair

Pairs naturally with `agk run`'s trace summary and the per-model cost work.
Includes unit tests for the delta direction, metric set, and formatting
(the cmd package's first tests).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant