Skip to content

feat(scorecard): 7-status taxonomy, per-row emission, antecedent propagation (schema 0.6)#62

Open
brettdavies wants to merge 5 commits into
devfrom
feat/u2-7-status-emission
Open

feat(scorecard): 7-status taxonomy, per-row emission, antecedent propagation (schema 0.6)#62
brettdavies wants to merge 5 commits into
devfrom
feat/u2-7-status-emission

Conversation

@brettdavies
Copy link
Copy Markdown
Owner

Summary

Ships U2 of the scorecard fairness taxonomy plan (docs/plans/2026-05-21-001-feat-scorecard-fairness-taxonomy-plan.md in the agentnative-site repo): the CLI now emits a 7-status taxonomy with per-row results, antecedent propagation for conditional requirements, and a bumped scorecard schema. The vendored spec snaps to agentnative-spec dev at commit b4f4d02 (PR brettdavies/agentnative#34, U1) so the new applicability.kind: conditional / antecedent.check_id shape parses against five conditional rows in p2 and p8.

The status enum gains opt_out (deliberate non-adoption) and n_a (conditional antecedent unmet); the existing five values are preserved. Summary gains matching counters. Every entry in results[] now represents one requirement row instead of one probe, carries the row's RFC 2119 tier (must/should/may), and names the originating probe in check_id. A probe whose Check::covers() lists multiple rows (e.g., p3-versionp3-must-version + p3-should-version-short) now produces multiple result entries. Two probes (p8-bundle-exists and p2-json-output) emit opt_out directly when their feature is absent; antecedent propagation rewrites every conditional row whose antecedent collapses to opt_out / n_a so downstream MUSTs and MAYs stop double-counting against tools that deliberately do not ship the prerequisite feature.

The badge score formula keeps its pass / (pass + warn + fail) shape but extends the exclusion set to cover the new statuses: opt_out is excluded from the denominator and n_a is excluded from both sides, per the plan's transitional posture (the final tier-weighted formula is the U3 spec issue, after the disambiguated input has been rescored). Text mode renders the new statuses as OPT and N/A badges; the summary line shows all seven counters.

Three parser strictness fixes the red team surfaced harden the build-time spec gate: whitespace-only check_id is rejected, mixed legacy if: plus new kind: in one applicability block now errors (instead of silently dropping the legacy prose), and any key inside antecedent beyond check_id errors by name so v2 compound-antecedent syntax (op: any_of) cannot ship under v1 semantics.

The Cargo.toml version and CHANGELOG.md are unchanged; both are release-PR artifacts per RELEASES.md.

Changelog

Added

  • New opt_out and n_a scorecard statuses surface in anc audit --output json (status field on each row and matching counters in summary). opt_out marks deliberate non-adoption (tool ships no --output flag, no AGENTS.md bundle); n_a marks a conditional requirement whose antecedent is unmet. Pre-0.6 consumers treat both as unknown and feature-detect.
  • Each entry in results[] now carries a tier field (must / should / may, or null for rows not in the registry) and a check_id field naming the probe that produced the row.
  • anc emit schema returns the schema 0.6 contract ($id: https://anc.dev/scorecard-v0.6.schema.json) with the new status enum values, summary counters, and per-row fields.

Changed

  • anc audit --output json emits one result entry per requirement-row instead of one per check_id. A probe like p3-version (covers p3-must-version and p3-should-version-short) now produces two distinct entries, each tier-stamped, so downstream scoring layers no longer need a coverage-matrix join to attribute a probe's outcome to a specific RFC 2119 level.
  • Conditional requirement rows whose antecedent collapses to opt_out or n_a are propagated to n_a in results[]; rows whose antecedent is skip or error inherit the indeterminacy. The propagated evidence string names the antecedent check id so the chain is legible from the JSON alone.
  • The badge denominator excludes opt_out (transitional) and excludes n_a from both sides, matching the plan's posture that no formula is provably fair until the input shape is disambiguated.
  • anc audit text mode renders OPT and N/A status badges alongside the existing five, and the summary line reports all seven counters.
  • p8-bundle-exists emits opt_out when no top-level AGENTS.md or SKILL.md is found (a malformed bundle still emits warn); p2-json-output emits opt_out when no --output or --format flag is detected at top level or in any subcommand.
  • The vendored agentnative-spec tree updates to dev commit b4f4d02 (PR feat(applicability): add machine-checkable conditional shape agentnative#34). Five rows in p2 and p8 migrate to the new applicability.kind: conditional / antecedent.check_id shape; the remaining 18 legacy applicability.if: <prose> rows stay as-is until each prerequisite grows a machine-readable check id.

Fixed

  • The spec parser now rejects three malformed inputs that previously fell through silently: antecedent.check_id containing only whitespace, an applicability block carrying both legacy if: and new kind: (the legacy branch only fired for single-key maps, so the prose was being dropped on the floor), and any key inside antecedent other than check_id (compound antecedents are deferred to v2 of the schema per the plan's Sub-decision 2b).

Documentation

  • schema/scorecard.schema.json regenerates against the 0.6 contract: new enum values, new required counters, tier and check_id on CheckResultView, and a refreshed examples[0] block.
  • coverage/matrix.json and docs/coverage-matrix.md regenerate against the new conditional applicability shape. Conditional rows surface with applicability.antecedent.check_id populated; legacy rows continue to emit applicability.condition: "<prose>".

Type of Change

  • feat: New feature (non-breaking change which adds functionality)

Related Issues/Stories

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing completed
  • All tests passing

Test Summary:

  • Unit and integration tests: 786 passing (29 net-new across parser, registry, scorecard, schema drift)
  • Pre-push gate: fmt, clippy -Dwarnings, full test suite, cargo-deny, shellcheck, Windows compat — all green
  • Manual dogfood: anc audit . --output json produces a schema 0.6 scorecard with the self-target at the v0.4.0 baseline (no opt_out / n_a, since the repo ships both --output and AGENTS.md); anc audit --command echo --output json exercises the propagation chain end-to-end (summary.opt_out: 1 for p2-must-output-flag from direct emission, summary.n_a: 1 for p2-must-schema-print from antecedent propagation)
  • anc emit coverage-matrix --check clean (committed artifacts match the registry plus covers() declarations)

Files Modified

Modified:

  • build_support/parser.rs: New conditional applicability shape (kind: conditional + antecedent.check_id) parses alongside the legacy if: form; three strictness fixes for the malformed-input edge cases.
  • src/principles/registry.rs: Applicability::Conditional becomes a struct variant with condition and antecedent fields; new Antecedent type carries check_id. Two registry-consistency guards added.
  • src/principles/matrix.rs: Matrix renderer handles both the legacy condition and the new antecedent shape; matrix.json emits the new shape via additive serde fields.
  • src/types.rs: CheckStatus gains OptOut(String) and NotApplicable(String).
  • src/scorecard/mod.rs: SCHEMA_VERSION bumps to 0.6; new fan_out_per_row and propagate_antecedents helpers run inside build_scorecard; CheckResultView gains tier + check_id; Summary gains opt_out + n_a counters; score_pct, build_summary, format_text, format_text_raw, exit_code updated for the new variants. Twenty-seven new tests covering fan-out, propagation, score formula, summary invariants, serialization roundtrip.
  • src/checks/project/bundle_exists.rs: Emits OptOut when no bundle file is found; malformed bundle still emits Warn.
  • src/checks/behavioral/json_output.rs: Emits OptOut when no --output / --format flag is detected at top level or in any subcommand.
  • src/principles/spec/principles/p1-non-interactive-by-default.md through p8-discoverable-skill-bundle.md: Vendored at agentnative-spec dev SHA b4f4d02; five rows in p2 and p8 carry the new conditional shape.
  • schema/scorecard.schema.json: Schema 0.6 contract ($id, enum values, required counters, tier + check_id on results, refreshed example).
  • coverage/matrix.json and docs/coverage-matrix.md: Regenerated against the new applicability shape.
  • tests/integration.rs: Schema version assertion bumped to 0.6; the audit_profile diagnostic-only test looks up the per-row id p5-must-dry-run (with check_id: "p5-dry-run") under per-row emission.
  • tests/scorecard_metadata_security.rs: Schema version assertion bumped to 0.6.
  • tests/scorecard_schema_v05.rs: Schema version assertion bumped to 0.6; assert_v05_shape walks the new summary.opt_out / summary.n_a keys and every row's tier / check_id. Six schema-drift guards assert the committed schema agrees with the live scorecard.
  • tests/build_parser.rs: Eight parser red-team tests covering malformed antecedent shapes, whitespace-only check_id, mixed legacy plus new shape, smuggled v2 op: key, and a defense-in-depth emit_rust quote-escape test.

Created:

  • None.

Renamed:

  • None.

Deleted:

  • None.

Breaking Changes

  • No breaking changes

Schema 0.6 is an additive evolution of 0.5: every pre-0.6 key remains, the new keys are populated alongside the existing ones, and consumers that do not understand opt_out / n_a should treat them as skip (the conservative bucket, matching the contract in the schema's own description). The site renderer, leaderboard ingest, and any third-party agent consumer can ship their schema-0.6 support out-of-band without coordination.

The one shape change that warrants a heads-up: each entry in results[] is now keyed by requirement row id (e.g., p3-must-version) instead of probe id (e.g., p3-version). Consumers that pinned on the probe id should switch to the new check_id field on each row, which carries the probe provenance forward.

Deployment Notes

  • No special deployment steps required

The release PR will bump Cargo.toml and CHANGELOG.md. Downstream consumers (site renderer, Homebrew tap, registry rescore) consume the published artifact after the tag pushes; no out-of-band coordination needed.

Checklist

  • Code follows project conventions and style guidelines
  • Commit messages follow Conventional Commits
  • Self-review of code completed
  • Tests added/updated and passing
  • No new warnings or errors introduced
  • Changes are backward compatible (consumers feature-detect new keys)

Pin-vendors agentnative-spec at commit b4f4d02 (PR brettdavies/agentnative#34 squash-merge on dev) as the basis for U2 of the scorecard fairness taxonomy plan (docs/plans/2026-05-21-001-feat-scorecard-fairness-taxonomy-plan.md in the site repo).

The vendored tree carries U1: `applicability.kind: conditional` with `antecedent.check_id` shape on five requirement rows: `p2-must-schema-print` and `p2-should-schema-file` (antecedent `p2-json-output`), plus `p8-must-bundle-install`, `p8-may-install-all`, `p8-may-bundle-update` (antecedent `p8-bundle-exists`). Replaces the legacy `applicability.if: <prose>` shape on those rows. p8 prose re-expressed in if-X-then-Y construction.

VERSION stays at 0.4.0 by design because the spec-side VERSION bump is deferred to spec's release PR per U1's status notes. The build pipeline already reads the vendored tree at `src/principles/spec/` via `build.rs`; downstream U2 work consumes the new shape from there.

Resolved short SHA: b4f4d02 (printed by sync-spec.sh during vendoring; recorded here so the pin is traceable post-merge).
Extends the build-time spec parser, runtime registry, and matrix renderer to accept the new conditional applicability shape introduced by agentnative-spec PR #34 (vendored at b4f4d02 in the prior commit) while keeping the legacy `{ if: "<prose>" }` shape for the 18 spec rows that have not yet migrated.

Schema change (parsed YAML, runtime enum, matrix.json output):

- `Applicability::Conditional { condition: Option<String>, antecedent: Option<Antecedent> }` replaces the previous `Conditional(String)` tuple variant. At least one of `condition` or `antecedent` must be set; both is permitted (the new shape supplements the legacy prose with a machine-readable check id).
- `Antecedent { check_id }` is the v1 machine-readable form. Compound antecedents (`all_of` / `any_of`) are deferred to a future schema bump per plan Sub-decision 2b.
- The matrix.json row's `applicability` block uses `#[serde(skip_serializing_if = "Option::is_none")]` so legacy rows continue to emit `{ "kind": "conditional", "condition": "<prose>" }` and new rows emit `{ "kind": "conditional", "antecedent": { "check_id": "<id>" } }`. No site renderer changes required for the legacy path.

The parser rejects four error modes loudly: `kind:` values other than `conditional`; `antecedent:` mappings missing `check_id`; `kind: conditional` with neither `condition:` nor `antecedent:` set; `condition:` values that are not non-empty strings. Each error names the offending file, requirement id, and the malformed field so build failures are actionable.

This commit is the foundation for U2 of the scorecard fairness taxonomy plan (docs/plans/2026-05-21-001-feat-scorecard-fairness-taxonomy-plan.md in the site repo). Per-row emission, antecedent status propagation, and the schema 0.6 bump that consume this new shape land in the following commits on this branch.
Implements the bulk of U2 from the scorecard fairness taxonomy plan (docs/plans/2026-05-21-001-feat-scorecard-fairness-taxonomy-plan.md). Three changes ship together because they share the same pattern-match exhaustiveness contract on `CheckStatus`: the new variants, the per-row pipeline that emits them, and the test fixtures that pin the JSON shape.

**7-status taxonomy (Decision 1).** `CheckStatus` gains `OptOut(String)` for deliberate non-adoption and `NotApplicable(String)` for conditional rows whose antecedent is unmet. JSON serializes as `opt_out` and `n_a` respectively. Every match site (`build_summary`, `score_pct`, `exit_code`, `format_text`, `format_text_raw`, `CheckResultView::from_row`) now handles all seven variants exhaustively. The plan's transitional `score_pct` formula keeps the historic `pass / (pass + warn + fail)` shape and excludes opt_out from the denominator and n_a from both sides (per U2 work item: "minimal change that respects the new semantics without committing to a new formula"); the final formula is the U3 spec issue.

**Per-row emission (Decision 2c).** New `fan_out_per_row(raw, catalog)` walks each probe-level `CheckResult` and emits one row per requirement in `Check::covers()`, replacing the row `id` with the requirement id and carrying the probe's check id forward as provenance. A probe like `p3-version` (covers `p3-must-version` + `p3-should-version-short`) now produces two rows in `results[]`; a probe whose check is not yet wired into the registry passes through as a single row keyed by check id. `CheckResultView` gains `tier` (registry-looked-up `must` / `should` / `may`, or `null` for unknown ids) and `check_id` (probe provenance). The shape change drives the schema bump from 0.5 to 0.6; pre-0.6 consumers feature-detect.

**Antecedent propagation (Decision 2a).** New `propagate_antecedents(rows, raw)` reads each row's registry `Applicability` and applies the propagation table to conditional rows: `opt_out` / `n_a` antecedent collapses the consequent to `n_a`; `skip` and `error` inherit; `pass` / `warn` / `fail` leave the consequent untouched (the probe's own status stands). Evidence text on the rewritten row cites the antecedent check id and its status so the provenance is legible in the scorecard JSON.

`Summary` gains `opt_out` and `n_a` counters alongside the historic five so consumers can read the new buckets without scanning `results[]`. The badge derivation reads the per-row vector (post-propagation) so the embed URL the JSON emits and the post-summary text hint stay in lock-step. The audience classifier and coverage_summary continue to read raw probe results because signal classification keys on check ids and coverage counts requirements covered by the underlying probes.

Test updates pin the new shape: 14 new scorecard-module tests cover fan-out for single-row and multi-row probes, the full propagation table (pass/warn/fail/opt_out/n_a/skip/error antecedents), score_pct exclusions, summary counting, and the `tier` / `check_id` round-trip; integration tests update the suppression test to look up the row id under per-row emission; the schema_v05 drift guard adds 0.6 keys (summary counters, per-row `tier`/`check_id`) so consumers' shape contract is checked end-to-end.
…rtifacts

Wires the two probe-level cases where the 7-status taxonomy's `opt_out` carries clear semantics, then regenerates the committed schema artifacts so the JSON Schema, coverage matrix, and live scorecard agree on the shape introduced in the previous commits on this branch.

**Probe-level opt_out emissions.**

- `p8-bundle-exists` (project layer): the "no top-level `AGENTS.md` / `SKILL.md` found" branch now returns `OptOut` instead of `Warn`. A bundle that exists but is malformed (missing YAML frontmatter or `name:` field) is a real SHOULD violation and stays `Warn`. The opt_out signal feeds antecedent propagation: every conditional row whose antecedent is `p8-bundle-exists` (`p8-must-bundle-install`, `p8-may-install-all`, `p8-may-bundle-update`) collapses to `n_a` automatically when the project ships no bundle.
- `p2-json-output` (behavioral layer): the two "no `--output` / `--format` flag detected" branches (top-level help and per-subcommand probe) now return `OptOut`. The conditional rows `p2-must-schema-print` and `p2-should-schema-file` are antecedent-gated on `p2-json-output`, so a tool without structured output sees those rows propagate to `n_a` rather than counting twice against the score.

The probe-level changes are deliberately scoped to the two cases the plan calls out by name; other probes that currently emit `Skip` for "feature absent" are left as-is for follow-up work (the propagation table handles them correctly through whatever conditional antecedent they sit under, when the antecedent gets a machine-readable shape via `sync-spec`).

**Regenerated artifacts.**

- `schema/scorecard.schema.json` bumps `$id` to `https://anc.dev/scorecard-v0.6.schema.json`. The `CheckResultView` definition adds `tier` (enum of `must` / `should` / `may` / `null`) and `check_id`; the `Summary` definition adds `opt_out` and `n_a` counters; the `status` enum gains `opt_out` and `n_a` values; the example block updates to a per-row row with `tier` + `check_id` populated. The Summary's `required` list now names all seven counters.
- `coverage/matrix.json` and `docs/coverage-matrix.md` regenerate against the conditional-applicability schema landed in the spec-scaffolding commit. The five conditional rows in p2 and p8 surface with `applicability.kind: conditional` and `applicability.antecedent.check_id` populated; legacy rows continue to emit `applicability.condition: "<prose>"` until each prerequisite grows a check id in the verifier catalog.

Dogfood verification against `--command echo` (a CLI with no `--output` flag and no project context) produces a scorecard with `summary.opt_out: 1` for the `p2-must-output-flag` row (direct emission) and `summary.n_a: 1` for the `p2-must-schema-print` row (antecedent-propagated). Against the agentnative-cli repo itself (which has both `--output` and `AGENTS.md`), nothing routes through opt_out or n_a; the score stays at the v0.4.0 dogfood baseline.
Twenty-two adversarial tests across the four U2 surfaces, plus three parser strictness fixes the red team surfaced.

**Parser strictness fixes (`build_support/parser.rs`).** Three error modes that the original v1 parser accepted silently now hard-fail with a hint at the schema the author probably meant:

- `antecedent.check_id` is now trimmed before the non-empty check (whitespace-only strings were previously accepted as a valid id).
- Mixed legacy `if:` and new `kind:` in the same applicability block now errors. The legacy branch only fired for `map.len() == 1`, so the original parser silently dropped the `if:` prose when both keys were present, which would footgun any author mid-migration.
- Extra keys inside `antecedent` (beyond `check_id`) now error by name. Compound antecedents (`op: any_of | all_of`) are explicitly deferred to a future schema bump per plan Sub-decision 2b; silently ignoring v2 keys on a v1 row would let v2 syntax ship under v1 semantics.

**Parser red team (`tests/build_parser.rs`, +8 tests).** Whitespace-only `check_id`, `antecedent:` as a string, `antecedent:` as a list (the v2 compound shape), `kind: Conditional` with the wrong case, mixed legacy + new shape in one row, antecedent with `op:` smuggled alongside `check_id`, `antecedent: null`, empty antecedent mapping. Plus an `emit_rust` defense-in-depth test asserting that hostile characters in a `check_id` (quotes, backslashes) escape correctly in the generated Rust source.

**Registry consistency red team (`src/principles/registry.rs`, +2 tests).** Walks `REQUIREMENTS` and asserts every conditional row's `antecedent.check_id` resolves to a real check in `all_checks_catalog()`, because a typo or rename would silently mute propagation in production. A second test asserts no conditional row names its own covering check as its antecedent (the "self-gate" edge case where a row would gate its status against itself).

**Propagation red team (`src/scorecard/mod.rs`, +5 tests).** Idempotency (a second pass produces identical output), no-op when the antecedent didn't run (source-only or filtered run), `--audit-profile` suppression of the antecedent propagates as Skip with the suppression reason preserved, full pipeline end-to-end (fan-out → propagation → summary → score with opt_out + n_a counted separately).

**Score formula red team (`src/scorecard/mod.rs`, +4 tests).** Score is 0 with no panic when every row is opt_out or n_a (division-by-zero guard). One pass amid 999 n_a rows is 100% (n_a excluded from both numerator and denominator). Skip + Error continue to be excluded under the new taxonomy. `summary.total` equals the sum of all seven per-status counters (catches a future status variant added without updating `build_summary`).

**Serialization red team (`src/scorecard/mod.rs`, +2 tests).** Evidence strings containing quotes, backslashes, control chars (`\n`, `\t`, `\u{0007}`) and Unicode (zero-width joiner, RTL override) round-trip through serde → parse → assert without loss or escape corruption. Sanitization belongs at the render layer, not here; the scorecard's job is faithful pass-through.

**Schema drift red team (`tests/scorecard_schema_v05.rs`, +6 tests).** The committed `schema/scorecard.schema.json` is the consumer contract for the site renderer and third-party leaderboards. Tests assert: `$id` pins to the current `SCHEMA_VERSION` (0.6); the `status` enum lists all seven taxonomy values; `Summary.required` includes `opt_out` + `n_a`; `CheckResultView` includes `tier` (with the three RFC 2119 levels plus null) + `check_id`; the schema's own `examples[0]` block satisfies its own `required` lists (catches doc-vs-schema drift); a live scorecard from a real `anc audit` run carries every key in the top-level and per-row `required` lists.

The single supporting production change beyond the parser strictness is `#[derive(Debug)]` on `Summary` so red-team assertions can surface counter mismatches inline rather than panicking with a useless message.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant