Part of #1155.
Objective
Add a read-only agentv history subcommand that surfaces per-test score timelines from a sidecar baseline file by parsing its git log. Gives users "continuous improvement tracking" of eval scores over time without requiring any new storage format or server.
Why
AgentV's sidecar <eval>.baseline.jsonl files are already committed to repos; each time someone runs agentv compare --update-baseline (see #1158), the file gets a new commit. git log -p on that file is already the source of truth for how scores have evolved. agentv history is a thin CLI wrapper that renders that log as a per-test timeline.
Design latitude
# Timeline for one test across all commits that touched the baseline file:
agentv history --baseline evals/my-eval.baseline.jsonl --test "handles empty input"
# Example output:
# commit date score verdict
# abc123def 2026-04-23 0.82 pass
# def456abc 2026-04-22 0.79 borderline
# ...
# JSONL for piping into downstream tools:
agentv history --baseline evals/my-eval.baseline.jsonl --test "handles empty input" --format jsonl
- Uses git plumbing (
git log --follow -p <file>) to walk the baseline's history.
- Parses each revision of the JSONL to extract the named test's score and verdict.
--since <date> and --until <date> for range filtering (optional).
- No new storage, no server, no config.
Acceptance signals
- Works on a repo where the baseline file has at least 3 commits touching it (can be set up with a small test fixture).
- Outputs a correctly-ordered (newest first) timeline.
--format jsonl pipes cleanly into jq.
- Errors helpfully if the baseline file isn't tracked in git, or if the named test doesn't appear in any revision.
Non-goals
- Not a dashboard. Terminal/JSONL output only.
- Not a cross-repo or cross-file aggregator — one baseline file per invocation.
- Not writing to or mutating git history.
Depends on
Nothing new — works against AgentV's existing sidecar baseline convention. More useful once #1158 ships (because --update-baseline will produce regular commits to parse), but does not require it.
Part of #1155.
Objective
Add a read-only
agentv historysubcommand that surfaces per-test score timelines from a sidecar baseline file by parsing itsgit log. Gives users "continuous improvement tracking" of eval scores over time without requiring any new storage format or server.Why
AgentV's sidecar
<eval>.baseline.jsonlfiles are already committed to repos; each time someone runsagentv compare --update-baseline(see #1158), the file gets a new commit.git log -pon that file is already the source of truth for how scores have evolved.agentv historyis a thin CLI wrapper that renders that log as a per-test timeline.Design latitude
git log --follow -p <file>) to walk the baseline's history.--since <date>and--until <date>for range filtering (optional).Acceptance signals
--format jsonlpipes cleanly intojq.Non-goals
Depends on
Nothing new — works against AgentV's existing sidecar baseline convention. More useful once #1158 ships (because
--update-baselinewill produce regular commits to parse), but does not require it.