Skip to content

feat(core): optional provenance block in EVAL.yaml schema #1157

@christso

Description

@christso

Part of #1155.

Objective

Add an optional provenance block to test cases in the EVAL.yaml schema. Opaque metadata — records where a generated test came from (PR, issue, chat transcript, other) so generated evals can be traced back to source artifacts.

Why

Generated evals (from the skill extension in #1155) need source traceability so users can audit "where did this test come from?" without grep. Hand-authored tests leave the field absent. Aligns with W3C PROV conventions.

Design latitude

Add to test-case schema (packages/core/src/evaluation/validation/eval-file.schema.ts):

provenance: z.object({
  source: z.enum(['pr', 'issue', 'chat', 'other']),
  url: z.string().url().optional(),
  commit: z.string().optional(),       // for source=pr
  generated_by: z.string().optional(), // e.g., "agentv-eval-writer@0.5.0"
  generated_at: z.string().datetime().optional(),
}).passthrough().optional(),

passthrough() lets future fields be added without a breaking change. Exact field names open for review.

Acceptance signals

  • Schema accepts provenance block; validators treat it as opaque (no behavior tied to values).
  • Tests without provenance load and run identically to today.
  • agentv eval lint passes on evals with and without provenance.
  • Round-trips through loader cleanly.

Non-goals

  • Not enforcing provenance on hand-authored tests.
  • Not implementing any behavior tied to provenance values (no "skip-if-generated" logic).
  • Not UI/CLI changes — pure schema addition.

Unblocks

Skill-extension sub-issue (#1155) can start stamping provenance on generated evals once this lands.

Metadata

Metadata

Assignees

No one assigned

    Labels

    coreAnything pertaining to core functionality of AgentVenhancementNew feature or requesteval-writerWork on or enabling the agentv-eval-writer skill

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions