Skip to content

AE filtered/whitelist arm emits 0 for the flagged pod — active-set not propagated/retained into the live whitelist (unfiltered works; proxy shows 161 http) #59

@entlein

Description

@entlein

AE filtered/whitelist arm captures 0 for the flagged pod (live filtered-write), while unfiltered + the computed proxy show it should keep it

Symptom (6a28bdc0, 2026-06-10, AE authenticated + scanning)

First fully-authenticated volproof run. EVERYTHING (unfiltered) arm captured real data, but the AE (filtered/whitelist) arm wrote 0:

log4shell_all (EVERYTHING/unfiltered) http 1011 rows / 17 pods   ✅
log4shell_ae  (filtered/whitelist)   http 0 / 0 pods             ❌

AE filtered scans log: streaming.TableScanner: query completed mode=whitelist pods=2 rows=0 table=http_events — the filtered query completes (not DeadlineExceeded) but returns 0 rows for its 2-pod whitelist.

Proof the data exists + should be kept

Computed proxy = EVERYTHING capture ∩ dx active-set (http_events.pod IN concat(aa.namespace,'/',aa.pod)):

  • the dx-flagged backend produced 161 http rows / 60,443 uncomp / 5,114 comp in-window and IS in adaptive_attribution (R0001 fired: exfil chain cut/tr/base32/getent).
    So the adaptive policy SHOULD keep 161 http (84% row / 94.6% compressed reduction vs ALL). The live filtered-write emitted none of it.

Root cause (whitelist membership/retention, NOT the regex)

  • The pod-name format matches: http_events.pod = ns/pod (from px.upid_to_pod_name), activeset.Key.Render() = ns/pod. The proxy join confirms they match. So scanner.go regex_match('^(ns/pod|…)$', df.pod) is not the problem.
  • The problem is the live whitelist did not contain the http-producing flagged backend during its capture window. The AE whitelist was pods=2 (benign), and the flagged backend (adaptive_attribution last_seen mid-window) was not retained in it. By post-run the whitelist had aged to 2 non-http pods.

Why this regressed (didn't happen on f518)

On f518 the backend was warm/long-running → stably in the active-set → AE arm captured (vc1 http=142). The per-arm backend bounce added for the stateful-exploit fix (k8sstormcenter/bob#140) makes the flagged pod transient: it's bounced at arm start, flagged (R0001) only mid-window, and isn't propagated into / retained in the AE filtering whitelist long enough to capture. Relates to entlein/dx#62 (active-set not refreshed/retained).

What to fix

  1. dx→AE push: ensure an R0001-flagged pod is pushed into the AE filtering whitelist promptly on detection and retained for at least the capture window (TTL ≥ the AE stream window; don't age out a still-active flagged pod). (No StartExport/aeclient log lines were observed during the run — verify the control-push path actually fires for R0001.)
  2. AE FilterUpdater: confirm the live whitelist reflects adaptive_attribution (it lagged to 2 pods missing the flagged backend).
  3. Verify with a fresh-but-warmed pod: flagged pod present in mode=whitelist AND query completed … rows>0.
  4. Secondary: DeadlineExceeded on heavy tables (conn_stats/amqp) at ADAPTIVE_STREAM_WINDOW_SEC=20 — raise window for filtered mode (few pods).

Interim

The valid ALL-vs-AE reduction number is obtainable via the computed proxy (EVERYTHING ∩ active-set) until the live filtered-write retains flagged pods. http: 84.1% rows / 94.6% compressed; dns ~100% (benign-only); combined 98.4% compressed (62×).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions