Skip to content

feat(eval): add --serve to auto-launch the EvalServer#20

Open
kunalkushwaha wants to merge 1 commit into
mainfrom
feat/eval-serve
Open

feat(eval): add --serve to auto-launch the EvalServer#20
kunalkushwaha wants to merge 1 commit into
mainfrom
feat/eval-serve

Conversation

@kunalkushwaha

Copy link
Copy Markdown
Member

What

Removes the two-terminal dance from agk eval. Today you must start your project in EvalServer mode in one terminal, then run agk eval in another. With --serve, AGK builds & launches the project (setting AGK_EVAL_MODE=true), waits for it to become healthy, runs the tests, and tears it down — all in one command.

This is feature B2 (auto-serve) from the FEATURES.md roadmap — the biggest eval-DX win.

Before / after

# before
AGK_EVAL_MODE=true ./myworkflow      # terminal 1
agk eval tests.yaml                  # terminal 2

# after
agk eval tests.yaml --serve          # one command

How it works

  • startEvalServer runs go run . (or a custom --serve-cmd) in its own process group, so go run's compiled child is reliably killed on teardown (SIGTERM → SIGKILL after a grace period).
  • waitForHealthy polls the test file's target.url /health until ready or --serve-timeout.
  • Server stdout/stderr is captured and printed if startup fails (and streamed with a [server] prefix under --verbose).
  • Lifecycle is signal-safe (Ctrl+C stops the server) and torn down before the os.Exit on test failure.

Flags

Flag Default Description
--serve off Launch the project in EvalServer mode for the run, then stop it
--serve-dir . Project directory to launch
--serve-cmd go run . Custom launch command (e.g. a prebuilt binary)
--serve-timeout 90 Seconds to wait for health

Testing

  • go build, go vet, go test ./..., gofmt all green.
  • Unit tests: parseServeCmd, and waitForHealthy against httptest (becomes-healthy and timeout paths).
  • Verified end-to-end against a stub EvalServer: agk eval --serve launched it, waited for health, ran a contains test to a pass, and tore down cleanly — confirmed no lingering process afterward (process-group kill works).
  • docs/EVAL.md "Run Tests" now leads with the one-command flow.

Independent branch off main. Only touches cmd/eval.go, new cmd/eval_serve*.go, and docs/EVAL.md — no README conflict.

🤖 Generated with Claude Code

Removes the two-terminal dance from `agk eval`. With `--serve`, AGK builds and
launches the project in EvalServer mode (AGK_EVAL_MODE=true), waits for it to
become healthy, runs the tests, and tears it down — all in one command.

- startEvalServer runs `go run .` (or a custom --serve-cmd) in its own process
  group so the compiled child is reliably killed on teardown (SIGTERM→SIGKILL).
- waitForHealthy polls the test file's target.url /health until ready or timeout.
- Server stdout/stderr is captured and printed if startup fails (and streamed
  with a [server] prefix under --verbose).
- Lifecycle is signal-safe and torn down before the os.Exit on test failure.
- Flags: --serve, --serve-dir, --serve-cmd, --serve-timeout.
- Docs: EVAL.md "Run Tests" now leads with the one-command flow.

Tests: parseServeCmd + waitForHealthy (httptest, healthy and timeout paths).
Verified end-to-end against a stub EvalServer (launch → run → clean teardown,
no lingering process).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant