Summary
The AE ClickHouse sink intermittently silently drops writes: ClickHouse returns HTTP 200 with X-ClickHouse-Summary.written_rows=0 even though AE sent N>0 rows. This zeroes out protocol volume (esp. http_events) in the AE-filtered arm, and it is the real cause of the volproof "AE attrition" we chased to dx (it is NOT a dx active-set bug — see entlein/dx#62).
Evidence (live, 2026-06-09, clean3 = vizier-adaptive_export_image:0.14.19-aeprod-clean3, PG 6a2850d0)
streaming.BatchWriter: flush failed error="sink: pixie write to dns_events reported 19 rows_sent but CH summary written_rows=0 (silent drop): {...\"written_rows\":\"0\",\"written_bytes\":\"0\"...}" reason=timer
sink: pixie write completed body_bytes=17304 ch_summary={...\"written_rows\":\"0\",\"written_bytes\":\"0\"...} table=...
streaming.TableScanner: query failed; backing off error="pixieapi: ExecuteScript: rpc error: code = DeadlineExceeded desc = context deadline exceeded" table=tls_events/mysql_events (WINDOW/REFRESH=20)
- Live CH read: last 10 min
http_events rows=0 / 0 pods, but dns_events rows=54 / 3 pods — i.e. the active-set HAS pods; http writes specifically vanish. (Cross-rep f518: AE http pods 2→1→0 while dns pods 3→3→7 — protocol-divergent, so NOT active-set aging.)
- CH read errors on AE/vector-written tables:
Code 432 UNKNOWN_CODEC: Unknown codec family code: 0 on adaptive_attribution, trigger_watermark, kubescape_logs — possible related part corruption.
Recurring
The guard's own comment cites a prior occurrence: 2026-05-23T20:58Z redis_events: rows_sent=1658, written_rows=0. So this is at least the second sighting.
Code
src/vizier/services/adaptive_export/internal/sink/clickhouse.go:241 — INSERT INTO db.tbl FORMAT JSONEachRow, Content-Type application/x-ndjson.
:85 setFailLoudSettings sets input_format_skip_unknown_fields=0 etc → a column/format mismatch would HTTP-error, NOT silently drop. AE sets no async_insert.
:287 the silent-drop guard returns an error → streaming.BatchWriter flush fails → backoff + re-loop (CPU burn; see also the 3.2-core note in the file header comment).
Leading hypothesis (to verify CH-side)
async_insert enabled server-side (CHI user/profile config) → the INSERT response returns before the async buffer flushes → X-ClickHouse-Summary.written_rows=0 at response time even though rows later land. That would make the silent-drop guard a false positive that triggers re-loops. Verify: SELECT name,value,changed FROM system.settings WHERE name IN ('async_insert','wait_for_async_insert') + CHI users.xml. If async: either set wait_for_async_insert=1 on AE's INSERTs, or read the post-flush summary, or check system.asynchronous_insert_log instead of the response header. If NOT async: capture one rejected body + run the INSERT manually to see why CH parses 0 rows.
Impact
Any volproof data-volume measurement is invalid while this is live: AE http→0 reads as "adaptive filtering reduced volume to nothing" when it is actually dropped writes. Secondary: DeadlineExceeded at WINDOW=20 adds throughput loss; consider per-refresh single-heavy-table scheduling or larger windows.
Repro
clean3 AE, streaming overflow MAX_WHITELIST=500, WINDOW/REFRESH=20, broad http+dns load → tail AE logs for silent drop + compare rows_sent vs CH written_rows.
Summary
The AE ClickHouse sink intermittently silently drops writes: ClickHouse returns HTTP 200 with
X-ClickHouse-Summary.written_rows=0even though AE sent N>0 rows. This zeroes out protocol volume (esp.http_events) in the AE-filtered arm, and it is the real cause of the volproof "AE attrition" we chased to dx (it is NOT a dx active-set bug — see entlein/dx#62).Evidence (live, 2026-06-09, clean3 =
vizier-adaptive_export_image:0.14.19-aeprod-clean3, PG 6a2850d0)http_eventsrows=0 / 0 pods, butdns_eventsrows=54 / 3 pods — i.e. the active-set HAS pods; http writes specifically vanish. (Cross-rep f518: AE http pods 2→1→0 while dns pods 3→3→7 — protocol-divergent, so NOT active-set aging.)Code 432 UNKNOWN_CODEC: Unknown codec family code: 0onadaptive_attribution,trigger_watermark,kubescape_logs— possible related part corruption.Recurring
The guard's own comment cites a prior occurrence:
2026-05-23T20:58Z redis_events: rows_sent=1658, written_rows=0. So this is at least the second sighting.Code
src/vizier/services/adaptive_export/internal/sink/clickhouse.go:241—INSERT INTO db.tbl FORMAT JSONEachRow, Content-Typeapplication/x-ndjson.:85 setFailLoudSettingssetsinput_format_skip_unknown_fields=0etc → a column/format mismatch would HTTP-error, NOT silently drop. AE sets noasync_insert.:287the silent-drop guard returns an error →streaming.BatchWriterflush fails → backoff + re-loop (CPU burn; see also the 3.2-core note in the file header comment).Leading hypothesis (to verify CH-side)
async_insertenabled server-side (CHI user/profile config) → the INSERT response returns before the async buffer flushes →X-ClickHouse-Summary.written_rows=0at response time even though rows later land. That would make the silent-drop guard a false positive that triggers re-loops. Verify:SELECT name,value,changed FROM system.settings WHERE name IN ('async_insert','wait_for_async_insert')+ CHI users.xml. If async: either setwait_for_async_insert=1on AE's INSERTs, or read the post-flush summary, or checksystem.asynchronous_insert_loginstead of the response header. If NOT async: capture one rejected body + run the INSERT manually to see why CH parses 0 rows.Impact
Any volproof data-volume measurement is invalid while this is live: AE http→0 reads as "adaptive filtering reduced volume to nothing" when it is actually dropped writes. Secondary: DeadlineExceeded at WINDOW=20 adds throughput loss; consider per-refresh single-heavy-table scheduling or larger windows.
Repro
clean3 AE, streaming overflow MAX_WHITELIST=500, WINDOW/REFRESH=20, broad http+dns load → tail AE logs for
silent drop+ comparerows_sentvs CHwritten_rows.