Skip to content

CRD-driven CEL admission rules with async validator#374

Merged
matthyx merged 18 commits into
mainfrom
feature/cel-admission-rules
Jun 17, 2026
Merged

CRD-driven CEL admission rules with async validator#374
matthyx merged 18 commits into
mainfrom
feature/cel-admission-rules

Conversation

@slashben

@slashben slashben commented May 20, 2026

Copy link
Copy Markdown
Contributor

Implements designs-and-proposals#3 (CRD-driven CEL admission rules) (merged).

Summary

Replaces the operator's hardcoded admission rules (R2000 Exec, R2001 Port-Forward) with CRD-defined CEL rules. Same Rules CRD schema used by the node-agent for eBPF events, extended with a new k8s-admission event type.

What's in this PR

Core pipeline

  • admission/cel/AdmissionCEL engine + AdmissionCelEvent adapter (exposes event.Kind, event.UserInfo, etc. to CEL; object/oldObject/options as top-level variables due to ext.NativeTypes() map limitation).
  • admission/rules/cel/CelRuleCreator and CelRuleEvaluator replace the legacy Go rule descriptors and factory.
  • admission/ruleswatcher/ — Dynamic informer over Rules CRDs; filters for enabled rules with at least one k8s-admission expression, calls CelRuleCreator.SyncRules.
  • admission/rulebinding/cache/ — Now takes the RuleCreator as a constructor argument and exposes RefreshRules() for the watcher to invoke.

Performance and safety (per the design doc)

  • Self-pod short-circuit — Validator parses the unverified sub claim from the projected ServiceAccount token at startup and drops any admission request whose UserInfo.Name matches. Hard guarantee against feedback loops from the operator's own writes. Falls back gracefully if the token isn't readable.
  • Kind-of-interest pre-filterCelRuleCreator extracts event.Kind == "X" patterns from loaded expressions at SyncRules time. Validator skips events for Kinds no rule could match. Conservative — falls back to wildcard for expressions with || or no Kind pin.
  • Async validator with bounded worker poolValidate() enqueues onto a 1000-slot buffered channel and returns immediately; 10 workers drain it. The API server never waits on CEL. Queue-full → drop with throttled warning log + atomic counter (exposed via DropCount() for Prometheus).

Bug fix

  • admission/rulebinding/cache/ — Both the RBCache and RulesWatcher adaptors share the dynamic watcher event stream. The RBCache now filters non-RuntimeRuleAlertBinding events before parsing, fixing the "cannot convert int64 to string" spam previously seen after adding the RulesWatcher.

Deleted

  • admission/rules/v1/r2000_exec_to_pod.go (+ test)
  • admission/rules/v1/r2001_portforward.go (+ test)
  • admission/rules/v1/factory.go
  • admission/rules/v1/rule.go

Kept: helpers.go (K8s enrichment functions), failureobject.go (GenericRuleFailure), rule_interface.go.

Validation

End-to-end on a kind cluster with the locally built image:

  • kubectl exec → R2000 fires, alert in operator logs with K8s enrichment populated
  • kubectl port-forward → R2001 fires
  • kubectl apply NetworkPolicy → R2002 fires (additional rule from the kind test), event.UserInfo.Username resolves correctly in the message expression
  • Pod CREATE with no matching rule → no alert (negative test)
  • Operator self-generated request → short-circuited
  • 309 unit tests pass

Dependencies

Dep Status
armoapi-go EventTypeK8sAdmission constant ✅ merged, v0.0.714
rulelibrary R2000/R2001 as CEL YAML TODO — separate PR
helm-charts updates (RBAC, webhook config, image bump) TODO — separate PR, marked DO NOT MERGE

Test plan

  • Unit tests: 309 pass on go test -race ./...
  • Kind-cluster e2e: exec, port-forward, NetworkPolicy creates, negative pod-create, self-request short-circuit
  • Reviewer feedback on the design doc and architecture

Summary by CodeRabbit

Release Notes

  • New Features
    • Introduced a CEL-based admission rules engine driven by Kubernetes Custom Resources, including rule pre-filtering by resource kind and per-rule evaluation caching.
    • Added an asynchronous admission evaluation pipeline (bounded worker pool) with queue-drop tracking.
    • Added support for dynamic rule syncing via a CRD watcher and a self-request short-circuit to reduce redundant evaluations.
  • Refactor / Change
    • Migrated pod exec and port-forward admission logic to CEL-based rules; removed the legacy built-in implementations.
  • Documentation
    • Added a feature guide for CEL admission rules.

@coderabbitai

coderabbitai Bot commented May 20, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

An error occurred during the review process. Please try again later.

📝 Walkthrough

Walkthrough

This PR replaces hardcoded admission rules with a CEL-driven, CRD-backed evaluation system. It introduces a CEL expression engine, dynamic rule loading via CRD watchers, and shifts webhook validation to an async worker pipeline with self-subject detection and kind-based pre-filtering.

Changes

CEL Admission Rules

Layer / File(s) Summary
CEL Engine foundation and event modeling
admission/cel/cel.go, admission/cel/cel_test.go, admission/cel/event.go, admission/cel/event_test.go, admission/cel/integration_test.go
AdmissionCEL encapsulates a CEL environment and thread-safe program cache with read-through and double-checked compilation. AdmissionCelEvent and AdmissionCelUserInfo model admission requests and user identity. NewAdmissionCelEvent translates Kubernetes admission attributes to CEL-evaluable events. Tests validate expression evaluation, caching behavior, compile-error persistence, and end-to-end rule matching for PodExecOptions and generic Pod events.
Kind filter optimization
admission/rules/cel/kindfilter.go, admission/rules/cel/kindfilter_test.go
KindFilter statically extracts Kubernetes admission event kinds from CEL rule expressions via regex matching on event.Kind == "X" patterns and disjunction detection. Enables fast pre-filtering before expensive CEL evaluation. Falls back to wildcard mode when expressions are unresolvable. Tests cover single/multiple rules, conjunctions, nil safety, and non-admission expression filtering.
Rule creator and per-event evaluator
admission/rules/cel/creator.go, admission/rules/cel/creator_test.go, admission/rules/cel/evaluator.go, admission/rules/cel/evaluator_test.go
CelRuleCreator syncs runtime rules, maintains a KindFilter, collects all CEL expression strings for cache management, and atomically builds evaluators by ID/name/tags/all. CelRuleEvaluator processes admission events, evaluates match/message/unique-id expressions, and optionally enriches alerts with pod/container/node/workload metadata from a Kubernetes cache. Tests verify lookups, tag matching, cache eviction on sync, parameter injection, enrichment applicability, and CEL event-type filtering.
Rule binding cache and CRD watcher
admission/rulebinding/cache/cache.go, admission/rulebinding/cache/cache_test.go, admission/rulebinding/cache/helpers.go, admission/rulebinding/cache/helpers_test.go, admission/ruleswatcher/watcher.go, admission/ruleswatcher/watcher_test.go
Cache now accepts injected rule creator parameter. RulesWatcher watches Rules CRDs, lists and filters enabled rules with admission expressions, extracts rule specs from CRD objects via JSON marshaling, syncs through rule creator, and optionally refreshes rule bindings. Handlers ignore non-RuntimeRuleAlertBinding kinds and gate evaluation on rule binding kind. Tests verify handler kind filtering, rule extraction from CRD specs, and admission-expression filtering.
Webhook validator async worker pipeline
admission/webhook/validator.go, admission/webhook/validator_test.go
Validate enqueues evaluation jobs instead of processing synchronously; workers consume a bounded queue, evaluate rules in background, and export alerts. Supports self-subject short-circuit for operator requests via JWT sub claim detection, kind-based pre-filter via KindAcceptor, and atomic drop counting with throttled warnings when queue is full. Graceful shutdown drains queued jobs with bounded timeout. Tests cover self-subject short-circuiting, kind filtering, worker pool processing, queue-full dropping, and job draining on shutdown.
Self-identity detection and main integration
admission/webhook/selfidentity.go, admission/webhook/selfidentity_test.go, main.go, docs/features/cel-admission-rules.md, go.mod
New helpers extract operator's JWT sub claim from projected service account token for self-request detection; validates token structure and JSON payload without verifying signature. main.go wires CEL engine, rule creator, rules watcher, and async validator together; starts workers and integrates with webhook server lifecycle. Documentation describes CEL variables, component architecture, and Rule CRD format. Dependencies: google/cel-go v0.26.1 added as direct requirement, armoapi-go bumped to v0.0.714.
Exporter test update
admission/exporter/http_exporter_test.go
Test simplified to construct rulesv1.GenericRuleFailure directly instead of invoking removed hardcoded R2000 rule, reducing test setup complexity.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A CEL-spring blooms where rules once were carved,
CRDs now sing what hardcode had starved.
Workers dance async, filters pre-screen the tide,
Self-awareness guards the operator's pride.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: implementing CRD-driven CEL admission rules with an async validator, which is the core objective and primary focus of this changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/cel-admission-rules

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown

Summary:

  • License scan: success
  • Credentials scan: failure
  • Vulnerabilities scan: failure
  • Unit test: success
  • Go linting: failure

@matthyx matthyx left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have to explain to me how the async pipeline blocks admission, because now I understand that we never block any admission
ok by design we never block anything, this is only for alerting

Comment thread admission/webhook/validator.go
func NewCelRuleCreator(celEngine *admissioncel.AdmissionCEL) *CelRuleCreator {
return &CelRuleCreator{
celEngine: celEngine,
// Empty creator accepts nothing — the kind filter starts non-wildcard

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it fine to drop all admissions until we have rules? maybe we should only accept HTTP connections after we have rules?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO - yes, I don't want to hold any line up. What if the user doesn't install rules? The whole design idea is that we treat admission webhooks as best-effort event sources

Comment thread admission/rules/cel/evaluator.go Outdated
}

// Enrich with K8s details when a cache is available.
if access != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enrichK8sDetails calls rulesv1.GetControllerDetails(attrs, clientset) which does a live clientset.CoreV1().Pods(namespace).Get(...), it only makes sense for a Pod or a Pod subresource.

Suggested change
if access != nil {
if access != nil && (attrs.GetResource().Resource == "pods" || attrs.GetKind().Kind == "PodExecOptions") {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right!

Comment thread admission/cel/cel.go Outdated
// evaluation of the same expression avoids re-compilation.
type AdmissionCEL struct {
env *cel.Env
programCache map[string]cel.Program

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache is never cleared if some rules are dropped

for {
select {
case <-ctx.Done():
return

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably drain the jobs channel before exit, other jobs enqueued just before context cancellation will never get processed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right

@Bezbran Bezbran left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see anything hurting my eyes.
At the beginning I wondered why we are not "blocking" but @matthyx asked and answered.

@matthyx matthyx moved this to Needs Reviewer in KS PRs tracking May 20, 2026
slashben added a commit that referenced this pull request May 24, 2026
enrichK8sDetails resolves a Pod via clientset.CoreV1().Pods(ns).Get(...)
through GetControllerDetails. That call only makes sense for Pod CRUD
and Pod subresources (exec, portforward, attach) — for other kinds
(NetworkPolicy, RoleBinding, Secret, ...) it either returns NotFound or
fetches an unrelated Pod that happens to share the request name.

Add a kind/resource gate so enrichment only runs when applicable. The
admission alert payload remains complete via buildAdmissionAlert; only
the Pod-derived RuntimeAlertK8sDetails fields are now omitted for
non-Pod events.

Reviewed-by: matthyx (PR #374 line review)

Signed-off-by: Ben <ben@armosec.io>
slashben added a commit that referenced this pull request May 24, 2026
The AdmissionCEL program cache was append-only — once an expression had
been compiled it stayed cached forever, even if no loaded rule still
referenced it. After enough SyncRules cycles (rules added, removed,
edited) the cache would grow without bound.

Add an AdmissionCEL.RetainOnly(activeExpressions) that drops cached
programs whose expression is not in the active set. CelRuleCreator.SyncRules
collects every expression string referenced by the new rule set (each
RuleExpression, plus per-rule Message and UniqueID templates) and calls
RetainOnly after the swap.

Reviewed-by: matthyx (PR #374 line review)

Signed-off-by: Ben <ben@armosec.io>
slashben added a commit that referenced this pull request May 24, 2026
Previously the worker's select picked between ctx.Done() and av.jobs,
returning immediately on cancel. Jobs enqueued just before cancellation
sat in the buffered channel forever.

On context cancellation each worker now switches to a non-blocking drain
pass that processes everything currently in av.jobs before exiting. Drain
uses a fresh context with a bounded timeout (10s) — the worker ctx is
already canceled and would short-circuit downstream API calls. If the
deadline is hit, remaining jobs are abandoned with a warning log
including the count.

Reviewed-by: matthyx (PR #374 line review)

Signed-off-by: Ben <ben@armosec.io>
@slashben

Copy link
Copy Markdown
Contributor Author

Pushed three fixes addressing @matthyx's review:

# Issue Commit
3 enrichK8sDetails only valid for Pod / pod subresources a2db526 — added enrichmentApplicable gate (Pod, PodExecOptions, PodPortForwardOptions, PodAttachOptions). Other kinds skip enrichment instead of fetching an unrelated Pod by name.
4 CEL program cache leaks for removed rules d3af050 — added AdmissionCEL.RetainOnly(activeExpressions). CelRuleCreator.SyncRules now collects every expression string (RuleExpression + Message + UniqueID) and prunes the cache after each swap.
5 Drain queued jobs on shutdown c2920a8 — on ctx.Done() the worker switches to a bounded (10s) non-blocking drain pass before exiting. If the drain deadline trips, remaining jobs are logged with a count and abandoned.

On the "block admissions until rules are loaded" question (creator.go:26) — I'd like to keep the current behavior. The operator is an alert-only audit path, the events are best-effort information, and gating admission on rule readiness would couple API server availability to operator startup. Cheaper to drop the cold-start events than to add a sync+blocking dance.

Tests for all three fixes are in this push. go test -race ./... is green (322 tests across 21 packages).

Comment thread admission/rules/cel/creator.go Fixed

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
admission/exporter/http_exporter_test.go (1)

31-55: ⚡ Quick win

Test no longer validates SendAdmissionAlert behavior directly.

This currently re-implements internal assembly logic instead of exercising the production SendAdmissionAlert path, so regressions in that method can slip through. Please route this assertion through a real SendAdmissionAlert invocation (e.g., via an httptest receiver) and verify ClusterName/ClusterUID in the emitted alert.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@admission/exporter/http_exporter_test.go` around lines 31 - 55, The test
currently reconstructs RuntimeAlert internals instead of exercising
SendAdmissionAlert; update the test to call exporter.SendAdmissionAlert through
an httptest receiver (e.g., start a test HTTP server that captures the posted
alert) using the same ruleFailure input, then assert that the emitted
RuntimeAlert contains exporter.ClusterName and exporter.ClusterUID (check fields
ClusterName and ClusterUID on the received RuntimeAlert.RuntimeAlertK8sDetails).
Ensure you remove the manual k8sDetails injection and use the real
SendAdmissionAlert call path and validate the received payload.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@admission/cel/cel.go`:
- Around line 184-189: The current getOrCreateProgram in AdmissionCEL treats a
cached nil cel.Program as a success (returns nil, nil), which hides prior
compile failures; change the caching to record compile errors explicitly (e.g.,
add a programErrCache map[string]error guarded by cacheMu) and when a lookup
finds an entry, check programErrCache[expression] first and return that error if
present; when compilation fails, store the error into
programErrCache[expression] (and a nil/absent program in programCache), and when
compilation succeeds, store the Program in programCache and clear any entry in
programErrCache so subsequent lookups return the real error or the real program
rather than silently returning nil.

In `@admission/rules/cel/evaluator.go`:
- Around line 47-55: The per-binding parameter map stored via
CelRuleEvaluator.SetParameters/GetParameters (e.parameters) is never added to
the CEL evaluation context; update the code that builds the CEL evaluation
context (the method that currently constructs it from admission attributes) to
merge e.parameters into that context so CEL expressions can reference those
parameters, or if merging conflicts occur namespace the parameters (e.g., under
"params") when injecting into the context; ensure SetParameters/GetParameters
remain consistent with the injected key(s).

In `@admission/ruleswatcher/watcher.go`:
- Around line 41-49: NewRulesWatcher currently allows a nil ruleSyncer which
causes syncAllRules to panic when calling w.ruleSyncer.SyncRules; fix by
guarding that dependency: in NewRulesWatcher check if ruleSyncer == nil and
replace it with a small no-op implementation of RuleSyncer (e.g., noopRuleSyncer
implementing SyncRules returning nil), or alternatively add a nil check at the
start of syncAllRules to return early if w.ruleSyncer is nil; update references
to the RulesWatcher.ruleSyncer field and ensure syncAllRules calls are safe.

In `@admission/webhook/validator.go`:
- Around line 223-229: The loop currently may pick a job even after ctx is
canceled; to fix, check cancellation before attempting to receive from av.jobs:
test ctx.Err() (or do a non-blocking select on <-ctx.Done()) immediately before
the select that reads from av.jobs, call av.drainJobs() and return if canceled,
then proceed to receive and call av.evaluate(ctx, job.attrs); reference
av.drainJobs, av.jobs, av.evaluate and ctx.Done to locate the code to change.

In `@docs/features/cel-admission-rules.md`:
- Around line 12-20: The fenced architecture block starting with "Rules CRD 
──watch──▶  RulesWatcher  ──sync──▶  CelRuleCreator" is unlabeled and triggers
markdownlint MD040; update the opening fence to include a language identifier
(e.g., ```text) so the block becomes "```text" followed by the existing ASCII
diagram and a closing "```", ensuring consistent rendering and lint compliance.

---

Nitpick comments:
In `@admission/exporter/http_exporter_test.go`:
- Around line 31-55: The test currently reconstructs RuntimeAlert internals
instead of exercising SendAdmissionAlert; update the test to call
exporter.SendAdmissionAlert through an httptest receiver (e.g., start a test
HTTP server that captures the posted alert) using the same ruleFailure input,
then assert that the emitted RuntimeAlert contains exporter.ClusterName and
exporter.ClusterUID (check fields ClusterName and ClusterUID on the received
RuntimeAlert.RuntimeAlertK8sDetails). Ensure you remove the manual k8sDetails
injection and use the real SendAdmissionAlert call path and validate the
received payload.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c8ca0e1c-dda9-4123-a89c-e49b86efdeb7

📥 Commits

Reviewing files that changed from the base of the PR and between 37ad382 and c2920a8.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (31)
  • admission/cel/cel.go
  • admission/cel/cel_test.go
  • admission/cel/event.go
  • admission/cel/event_test.go
  • admission/cel/integration_test.go
  • admission/exporter/http_exporter_test.go
  • admission/rulebinding/cache/cache.go
  • admission/rulebinding/cache/cache_test.go
  • admission/rulebinding/cache/helpers.go
  • admission/rulebinding/cache/helpers_test.go
  • admission/rules/cel/creator.go
  • admission/rules/cel/creator_test.go
  • admission/rules/cel/evaluator.go
  • admission/rules/cel/evaluator_test.go
  • admission/rules/cel/kindfilter.go
  • admission/rules/cel/kindfilter_test.go
  • admission/rules/v1/factory.go
  • admission/rules/v1/r2000_exec_to_pod.go
  • admission/rules/v1/r2000_exec_to_pod_test.go
  • admission/rules/v1/r2001_portforward.go
  • admission/rules/v1/r2001_portforward_test.go
  • admission/rules/v1/rule.go
  • admission/ruleswatcher/watcher.go
  • admission/ruleswatcher/watcher_test.go
  • admission/webhook/selfidentity.go
  • admission/webhook/selfidentity_test.go
  • admission/webhook/validator.go
  • admission/webhook/validator_test.go
  • docs/features/cel-admission-rules.md
  • go.mod
  • main.go
💤 Files with no reviewable changes (6)
  • admission/rules/v1/r2000_exec_to_pod_test.go
  • admission/rules/v1/rule.go
  • admission/rules/v1/factory.go
  • admission/rules/v1/r2001_portforward.go
  • admission/rules/v1/r2001_portforward_test.go
  • admission/rules/v1/r2000_exec_to_pod.go

Comment thread admission/cel/cel.go
Comment thread admission/rules/cel/evaluator.go
Comment on lines +41 to +49
func NewRulesWatcher(k8sClient k8sclient.K8sClientInterface, ruleSyncer RuleSyncer, cacheRefresher RBCacheRefresher) *RulesWatcher {
return &RulesWatcher{
k8sClient: k8sClient,
ruleSyncer: ruleSyncer,
cacheRefresher: cacheRefresher,
watchResources: []watcher.WatchResource{
watcher.NewWatchResource(typesv1.RuleGvr, metav1.ListOptions{}),
},
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard against nil ruleSyncer to avoid startup/runtime panic.

syncAllRules unconditionally calls w.ruleSyncer.SyncRules(...). If wiring passes nil, this panics on first event/list sync.

Suggested fix
 func NewRulesWatcher(k8sClient k8sclient.K8sClientInterface, ruleSyncer RuleSyncer, cacheRefresher RBCacheRefresher) *RulesWatcher {
+	if ruleSyncer == nil {
+		panic("ruleswatcher: ruleSyncer must not be nil")
+	}
 	return &RulesWatcher{
 		k8sClient:      k8sClient,
 		ruleSyncer:     ruleSyncer,
 		cacheRefresher: cacheRefresher,
 		watchResources: []watcher.WatchResource{
 			watcher.NewWatchResource(typesv1.RuleGvr, metav1.ListOptions{}),
 		},
 	}
 }

Also applies to: 94-94

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@admission/ruleswatcher/watcher.go` around lines 41 - 49, NewRulesWatcher
currently allows a nil ruleSyncer which causes syncAllRules to panic when
calling w.ruleSyncer.SyncRules; fix by guarding that dependency: in
NewRulesWatcher check if ruleSyncer == nil and replace it with a small no-op
implementation of RuleSyncer (e.g., noopRuleSyncer implementing SyncRules
returning nil), or alternatively add a nil check at the start of syncAllRules to
return early if w.ruleSyncer is nil; update references to the
RulesWatcher.ruleSyncer field and ensure syncAllRules calls are safe.

Comment on lines +223 to +229
select {
case <-ctx.Done():
av.drainJobs()
return
case job := <-av.jobs:
av.evaluate(ctx, job.attrs)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prioritize cancellation before consuming more jobs.

Once ctx.Done() is ready, this select may still pick job := <-av.jobs, evaluating with an already-canceled context and dropping work due to canceled downstream calls.

Suggested fix
 func (av *AdmissionValidator) runWorker(ctx context.Context) {
 	defer av.wg.Done()
 	for {
+		// Give cancellation priority so post-cancel jobs are handled by drainCtx.
+		if ctx.Err() != nil {
+			av.drainJobs()
+			return
+		}
 		select {
 		case <-ctx.Done():
 			av.drainJobs()
 			return
 		case job := <-av.jobs:
 			av.evaluate(ctx, job.attrs)
 		}
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
select {
case <-ctx.Done():
av.drainJobs()
return
case job := <-av.jobs:
av.evaluate(ctx, job.attrs)
}
func (av *AdmissionValidator) runWorker(ctx context.Context) {
defer av.wg.Done()
for {
// Give cancellation priority so post-cancel jobs are handled by drainCtx.
if ctx.Err() != nil {
av.drainJobs()
return
}
select {
case <-ctx.Done():
av.drainJobs()
return
case job := <-av.jobs:
av.evaluate(ctx, job.attrs)
}
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@admission/webhook/validator.go` around lines 223 - 229, The loop currently
may pick a job even after ctx is canceled; to fix, check cancellation before
attempting to receive from av.jobs: test ctx.Err() (or do a non-blocking select
on <-ctx.Done()) immediately before the select that reads from av.jobs, call
av.drainJobs() and return if canceled, then proceed to receive and call
av.evaluate(ctx, job.attrs); reference av.drainJobs, av.jobs, av.evaluate and
ctx.Done to locate the code to change.

Comment on lines +12 to +20
```
Rules CRD ──watch──▶ RulesWatcher ──sync──▶ CelRuleCreator
Admission Webhook ──▶ RBCache.ListRulesForObject ──▶ CelRuleEvaluator
AdmissionCEL.EvaluateRuleWithContext
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language identifier to the fenced architecture block.

Line 12 uses an unlabeled fenced block, which triggers markdownlint (MD040) and can reduce renderer/tooling consistency.

📝 Suggested diff
-```
+```text
 Rules CRD  ──watch──▶  RulesWatcher  ──sync──▶  CelRuleCreator
                                                       │
                                                       ▼
 Admission Webhook ──▶ RBCache.ListRulesForObject ──▶ CelRuleEvaluator
                                                       │
                                                       ▼
                                                AdmissionCEL.EvaluateRuleWithContext
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>

[warning] 12-12: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @docs/features/cel-admission-rules.md around lines 12 - 20, The fenced
architecture block starting with "Rules CRD ──watch──▶ RulesWatcher ──sync──▶
CelRuleCreator" is unlabeled and triggers markdownlint MD040; update the opening
fence to include a language identifier (e.g., text) so the block becomes "text" followed by the existing ASCII diagram and a closing "```", ensuring
consistent rendering and lint compliance.


</details>

<!-- fingerprinting:phantom:poseidon:hawk -->

<!-- This is an auto-generated comment by CodeRabbit -->

@github-actions

Copy link
Copy Markdown

Summary:

  • License scan: success
  • Credentials scan: failure
  • Vulnerabilities scan: failure
  • Unit test: success
  • Go linting: failure

@slashben slashben requested a review from matthyx May 24, 2026 07:10
Comment thread admission/cel/cel.go Outdated
}
// nil means the program was previously cached as a compile failure.
if out == nil {
return false, nil

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have to be careful to catch the compilation failure, otherwise we will think this rule never evaluates to false while it's just not compiling

@matthyx matthyx moved this from Needs Reviewer to Waiting on Author in KS PRs tracking May 26, 2026
slashben added 10 commits June 16, 2026 15:57
Introduce AdmissionCelEvent and AdmissionCelUserInfo structs that wrap
admission.Attributes into plain Go structs suitable for CEL evaluation.
NewAdmissionCelEvent extracts kind, name, namespace, operation, user
info, and unstructured object/oldObject/options maps from the attributes.

Docs-exempt: new internal package, no user-facing behavior change yet
Signed-off-by: Ben <ben@armosec.io>
CEL evaluation engine for admission webhook events. Uses cel-go native
types to register AdmissionCelEvent/AdmissionCelUserInfo, with
object/oldObject/options as top-level map variables (cel-go NativeTypes
does not support map[string]interface{} struct fields). Includes
RWMutex-protected program cache, expression compilation with nil-caching
for compile failures, and EvaluateRuleWithContext that implements AND
semantics with short-circuit and event-type filtering.

Docs-exempt: new internal package, no user-facing behavior change yet
Signed-off-by: Ben <ben@armosec.io>
Implements watcher.Adaptor that watches the Rules CRD, extracts rules
with k8s-admission expressions, and syncs them to the CelRuleCreator via
the RuleSyncer interface. Optionally calls RBCacheRefresher after sync.

Adds docs/features/cel-admission-rules.md documenting the full CEL
admission rules architecture.
Signed-off-by: Ben <ben@armosec.io>
…only

- NewCache now accepts a rules.RuleCreator parameter instead of hardcoding rulesv1.NewRuleCreator()
- Added RefreshRules() method to RBCache for rebuilding rule mappings when rules change
- Removed the rulesv1 import from cache package (no longer needed)
- Validator now logs all matching rules and sends alerts without blocking requests
- Updated cache_test.go to pass RuleCreatorMock to NewCache
Signed-off-by: Ben <ben@armosec.io>
Wire AdmissionCEL, CelRuleCreator, and RulesWatcher into main.go so the
admission controller now loads rules from CRDs at runtime. Delete the
hardcoded R2000/R2001 rule files and factory; fix CelRuleCreator to
satisfy the rules.RuleCreator interface with correct return types.
Signed-off-by: Ben <ben@armosec.io>
The dynamic watcher dispatches every event from every watched GVR to every
registered adaptor. After RulesWatcher was added alongside RBCache, the
RBCache started receiving Rules CRD events and tried to parse them as
RuntimeAlertRuleBinding, producing 'cannot convert int64 to string' errors
(the Rules CRD has integer severity; the binding has string severity).

Add an isRuleBinding kind check at the entry of each handler so RBCache
ignores cross-talk from other watched resources.

Signed-off-by: Ben <ben@armosec.io>
Before any rule evaluation, the validator now drops requests whose
UserInfo.Name matches the operator's own ServiceAccount subject. This is
a hard guarantee against positive feedback loops: the operator may write
to the API (status updates, leader election, alert exports), and we never
want those writes to re-enter the rule pipeline.

The subject is read at construction time by parsing the unverified 'sub'
claim from the projected ServiceAccount token at the standard mount path.
If reading fails (e.g. running outside a pod), the validator logs a
warning and continues without the short-circuit — degraded but functional.

Signed-off-by: Ben <ben@armosec.io>
slashben added 8 commits June 16, 2026 16:07
Add a KindFilter to CelRuleCreator built from the loaded rule expressions
at SyncRules time. The validator consults the creator (via a small
KindAcceptor interface) before any evaluation work and drops events whose
Kind no rule could possibly match.

The filter is conservative: expressions with || or no event.Kind ==
constraint collapse the filter to wildcard mode (accept everything),
preserving correctness at the cost of skipping the optimization. Only
expressions tagged EventType k8s-admission are inspected; other event
types are ignored entirely by this operator.

Signed-off-by: Ben <ben@armosec.io>
The validator's Validate() now snapshots the request and enqueues onto a
buffered channel, returning nil immediately so the API server never waits
on CEL evaluation. A pool of worker goroutines (10 by default, 1000-slot
queue) drains the channel and runs the full match + alert pipeline
out-of-band.

When the queue is full the request is dropped and an atomic counter is
incremented; drops are logged at exponentially throttled intervals to
avoid flooding the operator log under burst load. The counter is exposed
via DropCount() so it can be surfaced as a Prometheus metric in a follow-up.

Workers are bound to the webhook's serverContext and exit on cancellation.
Tests cover: enqueue+process, drop-on-full, context-cancel shutdown, and
short-circuit drops (self-pod, kind filter, nil object) staying out of
the queue entirely.

Signed-off-by: Ben <ben@armosec.io>
enrichK8sDetails resolves a Pod via clientset.CoreV1().Pods(ns).Get(...)
through GetControllerDetails. That call only makes sense for Pod CRUD
and Pod subresources (exec, portforward, attach) — for other kinds
(NetworkPolicy, RoleBinding, Secret, ...) it either returns NotFound or
fetches an unrelated Pod that happens to share the request name.

Add a kind/resource gate so enrichment only runs when applicable. The
admission alert payload remains complete via buildAdmissionAlert; only
the Pod-derived RuntimeAlertK8sDetails fields are now omitted for
non-Pod events.

Reviewed-by: matthyx (PR #374 line review)

Signed-off-by: Ben <ben@armosec.io>
The AdmissionCEL program cache was append-only — once an expression had
been compiled it stayed cached forever, even if no loaded rule still
referenced it. After enough SyncRules cycles (rules added, removed,
edited) the cache would grow without bound.

Add an AdmissionCEL.RetainOnly(activeExpressions) that drops cached
programs whose expression is not in the active set. CelRuleCreator.SyncRules
collects every expression string referenced by the new rule set (each
RuleExpression, plus per-rule Message and UniqueID templates) and calls
RetainOnly after the swap.

Reviewed-by: matthyx (PR #374 line review)

Signed-off-by: Ben <ben@armosec.io>
Previously the worker's select picked between ctx.Done() and av.jobs,
returning immediately on cancel. Jobs enqueued just before cancellation
sat in the buffered channel forever.

On context cancellation each worker now switches to a non-blocking drain
pass that processes everything currently in av.jobs before exiting. Drain
uses a fresh context with a bounded timeout (10s) — the worker ctx is
already canceled and would short-circuit downstream API calls. If the
deadline is hit, remaining jobs are abandoned with a warning log
including the count.

Reviewed-by: matthyx (PR #374 line review)

Signed-off-by: Ben <ben@armosec.io>
CodeQL flagged len(rs)*3 in collectExpressions as a size computation
that may overflow. Drop the capacity hint — Go's slice growth handles
the small per-rule expression count fine and the heuristic was already
a guess.

CodeQL alert: github.com/kubescape/operator/security/code-scanning/34

Signed-off-by: Ben <ben@armosec.io>
Previously a compilation failure was cached as a nil cel.Program and the
caller treated nil as 'no match', silently swallowing the error after
the first attempt. A broken rule expression would then become invisible
to operators: the rule never fires, but no error is logged either.

Replace the cache value with cachedProgram{prog, err}. On every lookup,
return the cached error if one is present. Compile is still attempted
exactly once per expression (the error is sticky until the rule sync
evicts the entry), but the error is loud — propagated to the evaluator
which logs and returns nil-match.

Reviewed-by: matthyx (PR #374 line review on cel.go:100)

Signed-off-by: Ben <ben@armosec.io>
CelRuleEvaluator.SetParameters stored RuntimeAlertRuleBindingRule
parameters but they never reached CEL evaluation. The API was inert:
parameterized rules could never see the per-binding overrides supplied
by their RuntimeAlertRuleBinding.

Declare 'params' as a top-level CEL variable in the AdmissionCEL
environment and inject the active binding's parameters into the eval
context under that key. CreateEvalContext seeds params to an empty map
so expressions referencing it remain compilable even when no binding
overrides apply.

Rule expressions can now reference binding overrides as
params['threshold'], params['allowedNamespaces'], etc.

Reviewed-by: matthyx (PR #374 line review on evaluator.go:55)

Signed-off-by: Ben <ben@armosec.io>
@slashben slashben force-pushed the feature/cel-admission-rules branch from c2920a8 to 98b0398 Compare June 16, 2026 20:20
@slashben

Copy link
Copy Markdown
Contributor Author

Pushed the next round of fixes (branch force-pushed after rebase onto current main):

Issue Commit Notes
Cache compile failures as nil silently swallows broken rules (matthyx cel.go:100) 4d2cce5 Cache value is now cachedProgram{prog, err}; every lookup re-returns the cached error. The evaluator already logs and returns nil-match on errors, so a broken expression now produces a recurring loud warning instead of a silent no-op. New test verifies the error is surfaced on every call.
Per-binding parameters never reach CEL evaluation (matthyx + coderabbit evaluator.go:55) 98b0398 Declared params as a top-level CEL variable; ProcessEvent injects e.parameters under that key. CEL expressions can now reference params["allowedNamespaces"], params["threshold"], etc. New parameterized rule test confirms the override path.
CodeQL overflow alert on len(rs)*3 (creator.go:57) 4fd9434 Dropped the capacity hint; Go's slice growth handles the small per-rule expression count.
Branch was conflicting with main (rebase) Rebased onto current main; conflict was on the legacy R2000/R2001 v1 files that we delete intentionally.

go test -race ./... is green: 335 tests, 21 packages.

Not addressed in this push:

  • CodeRabbit's "give cancellation priority before consuming more jobs" — the ctx we'd pass to evaluate is the worker context, not the drain context, so a job picked just after cancellation evaluates against an already-canceled context. The drain path uses a fresh bounded context. I think the current behavior is correct, but happy to add the explicit ctx.Err() check if you'd like.
  • CodeRabbit's nil-ruleSyncer guard in NewRulesWatcher — main.go always constructs with a non-nil syncer; adding a panic guard would mask a wiring bug rather than catch it.
  • Markdown lint on the architecture fence in docs/features/cel-admission-rules.md — happy to fix in a followup if it blocks merge.

@slashben slashben requested a review from matthyx June 16, 2026 20:21
@slashben slashben added the release Create release label Jun 16, 2026
@github-actions

Copy link
Copy Markdown

Summary:

  • License scan: success
  • Credentials scan: failure
  • Vulnerabilities scan: failure
  • Unit test: success
  • Go linting: failure

@matthyx matthyx left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good now

@matthyx matthyx merged commit 9a5c8b0 into main Jun 17, 2026
11 checks passed
@matthyx matthyx deleted the feature/cel-admission-rules branch June 17, 2026 05:32
@matthyx matthyx moved this from Waiting on Author to To Archive in KS PRs tracking Jun 17, 2026
slashben added a commit to kubescape/helm-charts that referenced this pull request Jun 17, 2026
Companion to:
  - operator: kubescape/operator#374 (merged, v0.2.149)
  - rulelibrary: kubescape/rulelibrary#35 (merged)

Changes:
- operator.image.tag bumped to v0.2.149 — the first release with
  CRD-driven CEL admission rules, async validator, self-pod
  short-circuit, and kind pre-filter.
- Operator ClusterRole now lists/watches kubescape.io/rules so the
  RulesWatcher can discover CRD-defined admission rules.
- ValidatingWebhookConfiguration adds a CREATE rule for
  networking.k8s.io/networkpolicies so CEL rules can fire on
  NetworkPolicy creation.
- default-rules.yaml ships R2000 (Exec to pod) and R2001 (Port forward
  to pod) as CEL k8s-admission rules out of the box, mirroring the
  rulelibrary definitions.
- Snapshot tests regenerated to reflect the new rules.

Signed-off-by: Ben <ben@armosec.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release Create release

Projects

Status: To Archive

Development

Successfully merging this pull request may close these issues.

4 participants