Skip to content

feat(cli): agentv history read command for per-test score timeline #1160

@christso

Description

@christso

Part of #1155.

Objective

Add a read-only agentv history subcommand that surfaces per-test score timelines from a sidecar baseline file by parsing its git log. Gives users "continuous improvement tracking" of eval scores over time without requiring any new storage format or server.

Why

AgentV's sidecar <eval>.baseline.jsonl files are already committed to repos; each time someone runs agentv compare --update-baseline (see #1158), the file gets a new commit. git log -p on that file is already the source of truth for how scores have evolved. agentv history is a thin CLI wrapper that renders that log as a per-test timeline.

Design latitude

# Timeline for one test across all commits that touched the baseline file:
agentv history --baseline evals/my-eval.baseline.jsonl --test "handles empty input"

# Example output:
# commit       date        score  verdict
# abc123def    2026-04-23  0.82   pass
# def456abc    2026-04-22  0.79   borderline
# ...

# JSONL for piping into downstream tools:
agentv history --baseline evals/my-eval.baseline.jsonl --test "handles empty input" --format jsonl
  • Uses git plumbing (git log --follow -p <file>) to walk the baseline's history.
  • Parses each revision of the JSONL to extract the named test's score and verdict.
  • --since <date> and --until <date> for range filtering (optional).
  • No new storage, no server, no config.

Acceptance signals

  • Works on a repo where the baseline file has at least 3 commits touching it (can be set up with a small test fixture).
  • Outputs a correctly-ordered (newest first) timeline.
  • --format jsonl pipes cleanly into jq.
  • Errors helpfully if the baseline file isn't tracked in git, or if the named test doesn't appear in any revision.

Non-goals

  • Not a dashboard. Terminal/JSONL output only.
  • Not a cross-repo or cross-file aggregator — one baseline file per invocation.
  • Not writing to or mutating git history.

Depends on

Nothing new — works against AgentV's existing sidecar baseline convention. More useful once #1158 ships (because --update-baseline will produce regular commits to parse), but does not require it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions