Skip to content

Add governance benchmark substrate#184

Merged
pengfei-threemoonslab merged 2 commits into
mainfrom
codex/governance-benchmark
Jun 6, 2026
Merged

Add governance benchmark substrate#184
pengfei-threemoonslab merged 2 commits into
mainfrom
codex/governance-benchmark

Conversation

@pengfei-threemoonslab
Copy link
Copy Markdown
Contributor

Summary

  • Promote benchmark/agent-pr-governance/cases.yaml to v0.2 with executable, catalog-only, and external-evidence case statuses.
  • Add an experimental governance benchmark substrate that materializes base/head repos, runs the real verifier, builds capability locks, diffs them, and evaluates capability semantic expectations.
  • Add schema-safe benchmark models, generated catalog/result schemas, docs, and focused tests.

Boundaries

  • No report_schema_version bump; reports remain 0.23.
  • Capability lock export remains 0.1; capability lock diff remains 0.2.
  • No new public report field, verify integration, GitHub Action output, policy-pack behavior, or second release gate.
  • release_decision.decision remains the only release gate.

Validation

  • python -m pytest tests/test_governance_benchmark.py tests/test_capability_domain.py tests/test_capability_lock.py tests/test_capability_delta.py tests/test_verifier_scenarios.py -q
  • python scripts/run_governance_benchmark.py --catalog benchmark/agent-pr-governance/cases.yaml --json
  • python scripts/generate_schemas.py --check
  • ruff check .
  • python -m pytest tests/test_schema_boundaries.py tests/test_public_surface_contract.py -q
  • python -m pytest tests/test_docs_links.py -q

@pengfei-threemoonslab pengfei-threemoonslab changed the title [codex] Add governance benchmark substrate Add governance benchmark substrate Jun 6, 2026
@pengfei-threemoonslab pengfei-threemoonslab marked this pull request as ready for review June 6, 2026 17:44
@pengfei-threemoonslab pengfei-threemoonslab merged commit fdbb8da into main Jun 6, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant