feat(scorecard): 7-status taxonomy, per-row emission, antecedent propagation (schema 0.6)#62
Open
brettdavies wants to merge 5 commits into
Open
feat(scorecard): 7-status taxonomy, per-row emission, antecedent propagation (schema 0.6)#62brettdavies wants to merge 5 commits into
brettdavies wants to merge 5 commits into
Conversation
Pin-vendors agentnative-spec at commit b4f4d02 (PR brettdavies/agentnative#34 squash-merge on dev) as the basis for U2 of the scorecard fairness taxonomy plan (docs/plans/2026-05-21-001-feat-scorecard-fairness-taxonomy-plan.md in the site repo). The vendored tree carries U1: `applicability.kind: conditional` with `antecedent.check_id` shape on five requirement rows: `p2-must-schema-print` and `p2-should-schema-file` (antecedent `p2-json-output`), plus `p8-must-bundle-install`, `p8-may-install-all`, `p8-may-bundle-update` (antecedent `p8-bundle-exists`). Replaces the legacy `applicability.if: <prose>` shape on those rows. p8 prose re-expressed in if-X-then-Y construction. VERSION stays at 0.4.0 by design because the spec-side VERSION bump is deferred to spec's release PR per U1's status notes. The build pipeline already reads the vendored tree at `src/principles/spec/` via `build.rs`; downstream U2 work consumes the new shape from there. Resolved short SHA: b4f4d02 (printed by sync-spec.sh during vendoring; recorded here so the pin is traceable post-merge).
Extends the build-time spec parser, runtime registry, and matrix renderer to accept the new conditional applicability shape introduced by agentnative-spec PR #34 (vendored at b4f4d02 in the prior commit) while keeping the legacy `{ if: "<prose>" }` shape for the 18 spec rows that have not yet migrated. Schema change (parsed YAML, runtime enum, matrix.json output): - `Applicability::Conditional { condition: Option<String>, antecedent: Option<Antecedent> }` replaces the previous `Conditional(String)` tuple variant. At least one of `condition` or `antecedent` must be set; both is permitted (the new shape supplements the legacy prose with a machine-readable check id). - `Antecedent { check_id }` is the v1 machine-readable form. Compound antecedents (`all_of` / `any_of`) are deferred to a future schema bump per plan Sub-decision 2b. - The matrix.json row's `applicability` block uses `#[serde(skip_serializing_if = "Option::is_none")]` so legacy rows continue to emit `{ "kind": "conditional", "condition": "<prose>" }` and new rows emit `{ "kind": "conditional", "antecedent": { "check_id": "<id>" } }`. No site renderer changes required for the legacy path. The parser rejects four error modes loudly: `kind:` values other than `conditional`; `antecedent:` mappings missing `check_id`; `kind: conditional` with neither `condition:` nor `antecedent:` set; `condition:` values that are not non-empty strings. Each error names the offending file, requirement id, and the malformed field so build failures are actionable. This commit is the foundation for U2 of the scorecard fairness taxonomy plan (docs/plans/2026-05-21-001-feat-scorecard-fairness-taxonomy-plan.md in the site repo). Per-row emission, antecedent status propagation, and the schema 0.6 bump that consume this new shape land in the following commits on this branch.
Implements the bulk of U2 from the scorecard fairness taxonomy plan (docs/plans/2026-05-21-001-feat-scorecard-fairness-taxonomy-plan.md). Three changes ship together because they share the same pattern-match exhaustiveness contract on `CheckStatus`: the new variants, the per-row pipeline that emits them, and the test fixtures that pin the JSON shape. **7-status taxonomy (Decision 1).** `CheckStatus` gains `OptOut(String)` for deliberate non-adoption and `NotApplicable(String)` for conditional rows whose antecedent is unmet. JSON serializes as `opt_out` and `n_a` respectively. Every match site (`build_summary`, `score_pct`, `exit_code`, `format_text`, `format_text_raw`, `CheckResultView::from_row`) now handles all seven variants exhaustively. The plan's transitional `score_pct` formula keeps the historic `pass / (pass + warn + fail)` shape and excludes opt_out from the denominator and n_a from both sides (per U2 work item: "minimal change that respects the new semantics without committing to a new formula"); the final formula is the U3 spec issue. **Per-row emission (Decision 2c).** New `fan_out_per_row(raw, catalog)` walks each probe-level `CheckResult` and emits one row per requirement in `Check::covers()`, replacing the row `id` with the requirement id and carrying the probe's check id forward as provenance. A probe like `p3-version` (covers `p3-must-version` + `p3-should-version-short`) now produces two rows in `results[]`; a probe whose check is not yet wired into the registry passes through as a single row keyed by check id. `CheckResultView` gains `tier` (registry-looked-up `must` / `should` / `may`, or `null` for unknown ids) and `check_id` (probe provenance). The shape change drives the schema bump from 0.5 to 0.6; pre-0.6 consumers feature-detect. **Antecedent propagation (Decision 2a).** New `propagate_antecedents(rows, raw)` reads each row's registry `Applicability` and applies the propagation table to conditional rows: `opt_out` / `n_a` antecedent collapses the consequent to `n_a`; `skip` and `error` inherit; `pass` / `warn` / `fail` leave the consequent untouched (the probe's own status stands). Evidence text on the rewritten row cites the antecedent check id and its status so the provenance is legible in the scorecard JSON. `Summary` gains `opt_out` and `n_a` counters alongside the historic five so consumers can read the new buckets without scanning `results[]`. The badge derivation reads the per-row vector (post-propagation) so the embed URL the JSON emits and the post-summary text hint stay in lock-step. The audience classifier and coverage_summary continue to read raw probe results because signal classification keys on check ids and coverage counts requirements covered by the underlying probes. Test updates pin the new shape: 14 new scorecard-module tests cover fan-out for single-row and multi-row probes, the full propagation table (pass/warn/fail/opt_out/n_a/skip/error antecedents), score_pct exclusions, summary counting, and the `tier` / `check_id` round-trip; integration tests update the suppression test to look up the row id under per-row emission; the schema_v05 drift guard adds 0.6 keys (summary counters, per-row `tier`/`check_id`) so consumers' shape contract is checked end-to-end.
…rtifacts Wires the two probe-level cases where the 7-status taxonomy's `opt_out` carries clear semantics, then regenerates the committed schema artifacts so the JSON Schema, coverage matrix, and live scorecard agree on the shape introduced in the previous commits on this branch. **Probe-level opt_out emissions.** - `p8-bundle-exists` (project layer): the "no top-level `AGENTS.md` / `SKILL.md` found" branch now returns `OptOut` instead of `Warn`. A bundle that exists but is malformed (missing YAML frontmatter or `name:` field) is a real SHOULD violation and stays `Warn`. The opt_out signal feeds antecedent propagation: every conditional row whose antecedent is `p8-bundle-exists` (`p8-must-bundle-install`, `p8-may-install-all`, `p8-may-bundle-update`) collapses to `n_a` automatically when the project ships no bundle. - `p2-json-output` (behavioral layer): the two "no `--output` / `--format` flag detected" branches (top-level help and per-subcommand probe) now return `OptOut`. The conditional rows `p2-must-schema-print` and `p2-should-schema-file` are antecedent-gated on `p2-json-output`, so a tool without structured output sees those rows propagate to `n_a` rather than counting twice against the score. The probe-level changes are deliberately scoped to the two cases the plan calls out by name; other probes that currently emit `Skip` for "feature absent" are left as-is for follow-up work (the propagation table handles them correctly through whatever conditional antecedent they sit under, when the antecedent gets a machine-readable shape via `sync-spec`). **Regenerated artifacts.** - `schema/scorecard.schema.json` bumps `$id` to `https://anc.dev/scorecard-v0.6.schema.json`. The `CheckResultView` definition adds `tier` (enum of `must` / `should` / `may` / `null`) and `check_id`; the `Summary` definition adds `opt_out` and `n_a` counters; the `status` enum gains `opt_out` and `n_a` values; the example block updates to a per-row row with `tier` + `check_id` populated. The Summary's `required` list now names all seven counters. - `coverage/matrix.json` and `docs/coverage-matrix.md` regenerate against the conditional-applicability schema landed in the spec-scaffolding commit. The five conditional rows in p2 and p8 surface with `applicability.kind: conditional` and `applicability.antecedent.check_id` populated; legacy rows continue to emit `applicability.condition: "<prose>"` until each prerequisite grows a check id in the verifier catalog. Dogfood verification against `--command echo` (a CLI with no `--output` flag and no project context) produces a scorecard with `summary.opt_out: 1` for the `p2-must-output-flag` row (direct emission) and `summary.n_a: 1` for the `p2-must-schema-print` row (antecedent-propagated). Against the agentnative-cli repo itself (which has both `--output` and `AGENTS.md`), nothing routes through opt_out or n_a; the score stays at the v0.4.0 dogfood baseline.
Twenty-two adversarial tests across the four U2 surfaces, plus three parser strictness fixes the red team surfaced.
**Parser strictness fixes (`build_support/parser.rs`).** Three error modes that the original v1 parser accepted silently now hard-fail with a hint at the schema the author probably meant:
- `antecedent.check_id` is now trimmed before the non-empty check (whitespace-only strings were previously accepted as a valid id).
- Mixed legacy `if:` and new `kind:` in the same applicability block now errors. The legacy branch only fired for `map.len() == 1`, so the original parser silently dropped the `if:` prose when both keys were present, which would footgun any author mid-migration.
- Extra keys inside `antecedent` (beyond `check_id`) now error by name. Compound antecedents (`op: any_of | all_of`) are explicitly deferred to a future schema bump per plan Sub-decision 2b; silently ignoring v2 keys on a v1 row would let v2 syntax ship under v1 semantics.
**Parser red team (`tests/build_parser.rs`, +8 tests).** Whitespace-only `check_id`, `antecedent:` as a string, `antecedent:` as a list (the v2 compound shape), `kind: Conditional` with the wrong case, mixed legacy + new shape in one row, antecedent with `op:` smuggled alongside `check_id`, `antecedent: null`, empty antecedent mapping. Plus an `emit_rust` defense-in-depth test asserting that hostile characters in a `check_id` (quotes, backslashes) escape correctly in the generated Rust source.
**Registry consistency red team (`src/principles/registry.rs`, +2 tests).** Walks `REQUIREMENTS` and asserts every conditional row's `antecedent.check_id` resolves to a real check in `all_checks_catalog()`, because a typo or rename would silently mute propagation in production. A second test asserts no conditional row names its own covering check as its antecedent (the "self-gate" edge case where a row would gate its status against itself).
**Propagation red team (`src/scorecard/mod.rs`, +5 tests).** Idempotency (a second pass produces identical output), no-op when the antecedent didn't run (source-only or filtered run), `--audit-profile` suppression of the antecedent propagates as Skip with the suppression reason preserved, full pipeline end-to-end (fan-out → propagation → summary → score with opt_out + n_a counted separately).
**Score formula red team (`src/scorecard/mod.rs`, +4 tests).** Score is 0 with no panic when every row is opt_out or n_a (division-by-zero guard). One pass amid 999 n_a rows is 100% (n_a excluded from both numerator and denominator). Skip + Error continue to be excluded under the new taxonomy. `summary.total` equals the sum of all seven per-status counters (catches a future status variant added without updating `build_summary`).
**Serialization red team (`src/scorecard/mod.rs`, +2 tests).** Evidence strings containing quotes, backslashes, control chars (`\n`, `\t`, `\u{0007}`) and Unicode (zero-width joiner, RTL override) round-trip through serde → parse → assert without loss or escape corruption. Sanitization belongs at the render layer, not here; the scorecard's job is faithful pass-through.
**Schema drift red team (`tests/scorecard_schema_v05.rs`, +6 tests).** The committed `schema/scorecard.schema.json` is the consumer contract for the site renderer and third-party leaderboards. Tests assert: `$id` pins to the current `SCHEMA_VERSION` (0.6); the `status` enum lists all seven taxonomy values; `Summary.required` includes `opt_out` + `n_a`; `CheckResultView` includes `tier` (with the three RFC 2119 levels plus null) + `check_id`; the schema's own `examples[0]` block satisfies its own `required` lists (catches doc-vs-schema drift); a live scorecard from a real `anc audit` run carries every key in the top-level and per-row `required` lists.
The single supporting production change beyond the parser strictness is `#[derive(Debug)]` on `Summary` so red-team assertions can surface counter mismatches inline rather than panicking with a useless message.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ships U2 of the scorecard fairness taxonomy plan (
docs/plans/2026-05-21-001-feat-scorecard-fairness-taxonomy-plan.mdin the agentnative-site repo): the CLI now emits a 7-status taxonomy with per-row results, antecedent propagation for conditional requirements, and a bumped scorecard schema. The vendored spec snaps toagentnative-specdev at commitb4f4d02(PR brettdavies/agentnative#34, U1) so the newapplicability.kind: conditional/antecedent.check_idshape parses against five conditional rows in p2 and p8.The status enum gains
opt_out(deliberate non-adoption) andn_a(conditional antecedent unmet); the existing five values are preserved.Summarygains matching counters. Every entry inresults[]now represents one requirement row instead of one probe, carries the row's RFC 2119tier(must/should/may), and names the originating probe incheck_id. A probe whoseCheck::covers()lists multiple rows (e.g.,p3-version→p3-must-version+p3-should-version-short) now produces multiple result entries. Two probes (p8-bundle-existsandp2-json-output) emitopt_outdirectly when their feature is absent; antecedent propagation rewrites every conditional row whose antecedent collapses toopt_out/n_aso downstream MUSTs and MAYs stop double-counting against tools that deliberately do not ship the prerequisite feature.The badge score formula keeps its
pass / (pass + warn + fail)shape but extends the exclusion set to cover the new statuses:opt_outis excluded from the denominator andn_ais excluded from both sides, per the plan's transitional posture (the final tier-weighted formula is the U3 spec issue, after the disambiguated input has been rescored). Text mode renders the new statuses asOPTandN/Abadges; the summary line shows all seven counters.Three parser strictness fixes the red team surfaced harden the build-time spec gate: whitespace-only
check_idis rejected, mixed legacyif:plus newkind:in one applicability block now errors (instead of silently dropping the legacy prose), and any key insideantecedentbeyondcheck_iderrors by name so v2 compound-antecedent syntax (op: any_of) cannot ship under v1 semantics.The
Cargo.tomlversion andCHANGELOG.mdare unchanged; both are release-PR artifacts perRELEASES.md.Changelog
Added
opt_outandn_ascorecard statuses surface inanc audit --output json(statusfield on each row and matching counters insummary).opt_outmarks deliberate non-adoption (tool ships no--outputflag, noAGENTS.mdbundle);n_amarks a conditional requirement whose antecedent is unmet. Pre-0.6 consumers treat both as unknown and feature-detect.results[]now carries atierfield (must/should/may, ornullfor rows not in the registry) and acheck_idfield naming the probe that produced the row.anc emit schemareturns the schema 0.6 contract ($id: https://anc.dev/scorecard-v0.6.schema.json) with the new status enum values, summary counters, and per-row fields.Changed
anc audit --output jsonemits one result entry per requirement-row instead of one percheck_id. A probe likep3-version(coversp3-must-versionandp3-should-version-short) now produces two distinct entries, each tier-stamped, so downstream scoring layers no longer need a coverage-matrix join to attribute a probe's outcome to a specific RFC 2119 level.opt_outorn_aare propagated ton_ainresults[]; rows whose antecedent isskiporerrorinherit the indeterminacy. The propagated evidence string names the antecedent check id so the chain is legible from the JSON alone.opt_out(transitional) and excludesn_afrom both sides, matching the plan's posture that no formula is provably fair until the input shape is disambiguated.anc audittext mode rendersOPTandN/Astatus badges alongside the existing five, and the summary line reports all seven counters.p8-bundle-existsemitsopt_outwhen no top-levelAGENTS.mdorSKILL.mdis found (a malformed bundle still emitswarn);p2-json-outputemitsopt_outwhen no--outputor--formatflag is detected at top level or in any subcommand.agentnative-spectree updates todevcommitb4f4d02(PR feat(applicability): add machine-checkable conditional shape agentnative#34). Five rows inp2andp8migrate to the newapplicability.kind: conditional/antecedent.check_idshape; the remaining 18 legacyapplicability.if: <prose>rows stay as-is until each prerequisite grows a machine-readable check id.Fixed
antecedent.check_idcontaining only whitespace, an applicability block carrying both legacyif:and newkind:(the legacy branch only fired for single-key maps, so the prose was being dropped on the floor), and any key insideantecedentother thancheck_id(compound antecedents are deferred to v2 of the schema per the plan's Sub-decision 2b).Documentation
schema/scorecard.schema.jsonregenerates against the 0.6 contract: new enum values, new required counters,tierandcheck_idonCheckResultView, and a refreshedexamples[0]block.coverage/matrix.jsonanddocs/coverage-matrix.mdregenerate against the new conditional applicability shape. Conditional rows surface withapplicability.antecedent.check_idpopulated; legacy rows continue to emitapplicability.condition: "<prose>".Type of Change
feat: New feature (non-breaking change which adds functionality)Related Issues/Stories
docs/plans/2026-05-21-001-feat-scorecard-fairness-taxonomy-plan.md(agentnative-site)docs/plans/2026-05-21-001-feat-scorecard-fairness-taxonomy-plan.md(agentnative-site repo)dev, vendored here at SHAb4f4d02)Testing
Test Summary:
-Dwarnings, full test suite, cargo-deny, shellcheck, Windows compat — all greenanc audit . --output jsonproduces a schema 0.6 scorecard with the self-target at the v0.4.0 baseline (no opt_out / n_a, since the repo ships both--outputandAGENTS.md);anc audit --command echo --output jsonexercises the propagation chain end-to-end (summary.opt_out: 1forp2-must-output-flagfrom direct emission,summary.n_a: 1forp2-must-schema-printfrom antecedent propagation)anc emit coverage-matrix --checkclean (committed artifacts match the registry pluscovers()declarations)Files Modified
Modified:
build_support/parser.rs: New conditional applicability shape (kind: conditional+antecedent.check_id) parses alongside the legacyif:form; three strictness fixes for the malformed-input edge cases.src/principles/registry.rs:Applicability::Conditionalbecomes a struct variant withconditionandantecedentfields; newAntecedenttype carriescheck_id. Two registry-consistency guards added.src/principles/matrix.rs: Matrix renderer handles both the legacy condition and the new antecedent shape; matrix.json emits the new shape via additive serde fields.src/types.rs:CheckStatusgainsOptOut(String)andNotApplicable(String).src/scorecard/mod.rs:SCHEMA_VERSIONbumps to 0.6; newfan_out_per_rowandpropagate_antecedentshelpers run insidebuild_scorecard;CheckResultViewgainstier+check_id;Summarygainsopt_out+n_acounters;score_pct,build_summary,format_text,format_text_raw,exit_codeupdated for the new variants. Twenty-seven new tests covering fan-out, propagation, score formula, summary invariants, serialization roundtrip.src/checks/project/bundle_exists.rs: EmitsOptOutwhen no bundle file is found; malformed bundle still emitsWarn.src/checks/behavioral/json_output.rs: EmitsOptOutwhen no--output/--formatflag is detected at top level or in any subcommand.src/principles/spec/principles/p1-non-interactive-by-default.mdthroughp8-discoverable-skill-bundle.md: Vendored atagentnative-specdev SHAb4f4d02; five rows in p2 and p8 carry the new conditional shape.schema/scorecard.schema.json: Schema 0.6 contract ($id, enum values, required counters,tier+check_idon results, refreshed example).coverage/matrix.jsonanddocs/coverage-matrix.md: Regenerated against the new applicability shape.tests/integration.rs: Schema version assertion bumped to0.6; theaudit_profile diagnostic-onlytest looks up the per-row idp5-must-dry-run(withcheck_id: "p5-dry-run") under per-row emission.tests/scorecard_metadata_security.rs: Schema version assertion bumped to0.6.tests/scorecard_schema_v05.rs: Schema version assertion bumped to0.6;assert_v05_shapewalks the newsummary.opt_out/summary.n_akeys and every row'stier/check_id. Six schema-drift guards assert the committed schema agrees with the live scorecard.tests/build_parser.rs: Eight parser red-team tests covering malformedantecedentshapes, whitespace-onlycheck_id, mixed legacy plus new shape, smuggled v2op:key, and a defense-in-depthemit_rustquote-escape test.Created:
Renamed:
Deleted:
Breaking Changes
Schema 0.6 is an additive evolution of 0.5: every pre-0.6 key remains, the new keys are populated alongside the existing ones, and consumers that do not understand
opt_out/n_ashould treat them asskip(the conservative bucket, matching the contract in the schema's own description). The site renderer, leaderboard ingest, and any third-party agent consumer can ship their schema-0.6 support out-of-band without coordination.The one shape change that warrants a heads-up: each entry in
results[]is now keyed by requirement row id (e.g.,p3-must-version) instead of probe id (e.g.,p3-version). Consumers that pinned on the probe id should switch to the newcheck_idfield on each row, which carries the probe provenance forward.Deployment Notes
The release PR will bump
Cargo.tomlandCHANGELOG.md. Downstream consumers (site renderer, Homebrew tap, registry rescore) consume the published artifact after the tag pushes; no out-of-band coordination needed.Checklist