Context
agent-kernel provides capability-based authorization, policy enforcement, context firewalling and audit for agent tool ecosystems.
A useful cross-repo scenario is an agent consuming a statistical/model-evaluation artifact, such as an offline policy evaluation report from skdr-eval. These artifacts can be misused if an agent treats a headline value estimate as deployment evidence while ignoring support diagnostics, uncertainty or warnings.
Goal
Add a policy pattern for gating agent actions based on structured evaluation artifacts.
Example principle:
An agent may summarize a high-risk evaluation artifact, but it must not recommend deployment or automatic rollout when support diagnostics are high_risk.
Suggested capabilities / policy checks
Support a generic artifact policy layer that can inspect fields such as:
artifact_type
support_health
warnings
uncertainty
decision_stable
recommendation.intent
limitations
Potential decisions:
allow_summary
allow_manual_review_recommendation
require_human_review
deny_deployment_recommendation
deny_automatic_rollout
Example scenario
An agent receives an EvaluationArtifact with:
- candidate appears better than baseline;
support_health = high_risk;
- warnings include low ESS or poor overlap.
Expected behavior:
- allowed: summarize the result and explain the caveats;
- allowed: recommend improving logs/support;
- denied: recommend deployment, rollout, or automatic A/B promotion as if the result were reliable.
Acceptance criteria
Non-goals
- Do not implement OPE/statistical estimation in
agent-kernel.
- Do not hard-code a dependency on
skdr-eval.
- Do not make policy decisions based only on a single numeric metric.
AI agent notes
This is a policy-safety example. Keep it small, generic and testable. Prefer fixture artifacts rather than external package dependencies.
Context
agent-kernelprovides capability-based authorization, policy enforcement, context firewalling and audit for agent tool ecosystems.A useful cross-repo scenario is an agent consuming a statistical/model-evaluation artifact, such as an offline policy evaluation report from
skdr-eval. These artifacts can be misused if an agent treats a headline value estimate as deployment evidence while ignoring support diagnostics, uncertainty or warnings.Goal
Add a policy pattern for gating agent actions based on structured evaluation artifacts.
Example principle:
Suggested capabilities / policy checks
Support a generic artifact policy layer that can inspect fields such as:
artifact_typesupport_healthwarningsuncertaintydecision_stablerecommendation.intentlimitationsPotential decisions:
allow_summaryallow_manual_review_recommendationrequire_human_reviewdeny_deployment_recommendationdeny_automatic_rolloutExample scenario
An agent receives an
EvaluationArtifactwith:support_health = high_risk;Expected behavior:
Acceptance criteria
ok,caution, andhigh_risksupport states.skdr-evalproducers.weaver-specEvaluationArtifactcontract if/when available.Non-goals
agent-kernel.skdr-eval.AI agent notes
This is a policy-safety example. Keep it small, generic and testable. Prefer fixture artifacts rather than external package dependencies.