fix(sf-323): activate cost-based eviction — measure per-record cost + mark records clean once durable by ivkan · Pull Request #87 · TopGunBuild/topgun

ivkan · 2026-06-24T20:17:07Z

Summary

Activates cost-based eviction, which was dormant in production. Two gates were keeping the eviction orchestrator from ever firing:

Cost hardwired to 0 — every record's metadata.cost was 0 at all three write sites, so owned_entry_cost() summed to 0, current_ram stayed below the high-water mark, and eviction never engaged.
Records permanently dirty — evict_lru filters !is_dirty(), but on_store() had zero production callers, so every put()-written record stayed dirty forever and was never eligible for eviction (a second dormancy gate found mid-implementation).

What changed

Cost is genuinely measured → stored → read end-to-end: estimated_cost(&RecordValue) returns the rmp_serde::to_vec_named serialized byte length (floor ≥ 1) + key.len(), stamped at put(), get() lazy-load, and hydrate_loaded().
Records marked clean once durable (R6) via a new atomic StorageEngine::mark_stored(key, now) (DashMap get_mut per-key write lock) called after a successful WAL-fsync'd data_store.add(). Null-store-backed records correctly stay dirty.
Cost stays local & never-serialized — RecordMetadata keeps no serde derives, enforced at compile time via assert_not_impl_any!(RecordMetadata: Serialize, DeserializeOwned).
Serde-failure cost fallback → tracing::warn! + bounded COST_ESTIMATE_FALLBACK = 4096 (a huge sentinel would make the orchestrator thrash evicting innocent records under an unreachable ceiling).

Correctness contract

mark_stored's last_update_time <= now is a best-effort timestamp guard, not a per-write identity guard — under a same-millisecond concurrent same-key put() it can transiently mark a not-yet-durable record clean. This window is bounded and self-healing (the concurrent persist completes independently of eviction; put() acks only after its own persist), so no acked-durable write is ever lost. The trait doc and put() comment state this honestly. The airtight per-write-token fix is tracked in TODO-537.

Verification

cargo fmt --check ✓ · cargo clippy --all-targets --all-features -- -D warnings ✓ · cargo test --release -p topgun-server --lib → 1487/0/0
Behavioral e2e (AC1–AC4, AC6): eviction fires under real cost pressure (drives the real private tick() defect branch, asserts dirty_count()==0 + >0 evicted, non-vacuous); query correct under active eviction; restart correct under active eviction (TODO-530 holds); cost non-leakage; write-through clean-marking.
Two independent review nets: fresh-context opus impl-reviewer (Review v4 APPROVED, wiring grep-confirmed) + glm-5.2 cross-vendor net (clean — confirmed the deferred items are correctly scoped).

Review trail

v1 CHANGES_REQUESTED (R6 TOCTOU) → Fix Response v1 (atomic mark_stored) → v2 APPROVED → v3 overturned by cross-vendor (timestamp-vs-identity guard) → Fix Response v2 (honest contract + 4 minors) → v4 APPROVED → post-v4 cross-vendor net clean.

Follow-ups

TODO-537 — per-write identity-token clean-marking (close the ms-tie false-clean window).
TODO-538 — cost-model heap accuracy (serialized-size proxy under-counts true heap footprint).
TODO-484 — G4b soak re-run under crash + active eviction to confirm TODO-530 closed end-to-end now eviction actually fires.

Note on file count

7 source files touched (vs the 5-file Rust spec cap) + test-only Cargo.toml/Cargo.lock. Breach-by-design: the R6 correctness fix is a single unit (one trait method + its sole impl), consistent with SPEC-310a/318/322c precedent.

…nvariant (G1) - Add pub fn estimated_cost(value: &RecordValue) -> u64 in record.rs using rmp_serde::to_vec_named with >= 1 floor (never panic on write path) - Document that cost field is local-only/never-wire in RecordMetadata doc - RecordMetadata intentionally retains NO Serialize/Deserialize derive - Call sites must add key.len() as u64 to capture key heap contribution

…coding test (G2) - put(): measure estimated_cost(&value) + key.len() before value moves into record - get() lazy-load: measure cost of loaded value + key.len() on cache miss - hydrate_loaded(): measure cost of hydrated value + key.len() - size_and_owned_entry_cost_reflect_records: assert > 0 (was assert_eq == 0) The old assertion encoded the dormant-eviction bug; now guards the fix

…n (G3) - AC1 (eviction_orchestrator.rs): real tick() fires under real cost pressure; drives private tick() from within the same cfg(test) mod; asserts > 0 records evicted and owned_entry_cost drops; non-vacuous (flush-then-eligible ordering) - AC2 (eviction_cost_test.rs): query correct under active eviction; full datastore scan via scan_values returns complete record set with evicted keys non-resident; proves SPEC-322c streaming path is load-bearing - AC3 (eviction_cost_test.rs): restart correct under active eviction; fresh engine over same redb, all written keys visible via datastore scan (TODO-530 closed) - AC4 (eviction_cost_test.rs): cost non-leakage; RecordValue round-trip via rmp_serde carries no cost field; NullDataStore probe confirms metadata never reaches the persistence layer - storage/mod.rs: register eviction_cost_test under #[cfg(test)]

- record.rs: add #[must_use] to estimated_cost() (must_use_candidate) - eviction_orchestrator.rs: backtick-quote identifiers in doc comments - eviction_cost_test.rs: backtick-quote identifiers in doc comments + fmt reformat

…able (R6) - put() with Client/CrdtMerge provenance: after data_store.add() succeeds, re-fetch the record from the engine and call metadata.on_store(now), marking it clean and eligible for eviction. Gated on !is_null() so NullDataStore (no real persistence) leaves records dirty as required. - get() lazy-load: records loaded from the datastore enter the engine clean (last_stored_time = now via on_store) — already persisted, immediately re-evictable in the evict→reload steady state. - hydrate_loaded(): same as get() — datastore-materialized records enter clean. Without this fix, evict_lru's !is_dirty() filter (line 416) excluded every put()-written record permanently, making eviction dead in production even after the cost fix (R2). on_store() had zero production callers before this commit.

AC1 (eviction_orchestrator.rs): - Remove manual last_stored_time re-insertion hack that faked dirty→clean transition. With R6, put() over a real redb datastore calls on_store(now) after add() succeeds, marking records genuinely clean. dirty_count() == 0 assertion added to prove real eligibility without manipulation. AC2/AC3 (eviction_cost_test.rs): - Remove write_and_flush_records helper's snapshot re-insertion hack. Replaced by write_records() which simply calls put() — records are clean because R6 marks them via the real write-through path. Added dirty_count() assertion in AC2 to confirm real eligibility. AC6 (eviction_cost_test.rs, new): - Part 1: put() with Client provenance over real redb → dirty_count() == 0 AND evict_lru evicts the record (genuinely eligible, no hack). - Part 2: put() over NullDataStore → dirty_count() >= 1, evict_lru returns 0 (no real persistence → stays dirty, correctly ineligible for eviction). - Part 3: get() lazy-load → record enters engine clean (dirty_count() == 0), evict_lru can evict it immediately (evict→reload steady state works).

…ored - Fixed: put() Step 6 re-fetch+re-put (engine.get -> on_store -> engine.put) was a read-modify-write with no per-key lock across Step 4 -> Step 6. Under concurrent same-key/same-partition writes it could re-put a clean-stamped value over a newer durable write (lost write + false-clean). - Add StorageEngine::mark_stored(key, now): atomic in-place clean-mark under the DashMap per-key write lock; applies only when last_update_time <= now so a newer concurrent (still-dirty) write is never falsely marked evictable and an already-clean newer record is never regressed. - Replace put() Step 6 RMW with engine.mark_stored; no whole-record re-put. - Add 3 unit tests: marks resident clean, skips newer write, absent-key false.

- Correct the StorageEngine::mark_stored trait doc and the put() write-through comment: the last_update_time<=now guard is a best-effort timestamp mark, NOT a per-write identity check. State plainly that two concurrent same-key puts in the same millisecond can transiently mark the other writer's not-yet-durable record clean, and why that is bounded + self-healing (concurrent persist completes independently of eviction; put() acks only after its own persist, so no acked-durable write is lost). Replaces the prior over-stated guarantee. - Trace mark_stored's false return (record evicted or superseded after persist) to surface write-then-immediately-evict churn.

- Replace the silent unwrap_or(1) (which blinds the eviction orchestrator to a record's true size) with a tracing::warn! plus COST_ESTIMATE_FALLBACK. The sentinel is deliberately moderate, not u64::MAX-large: a value that fails to encode also cannot be persisted, so it stays dirty/un-evictable, and a huge sentinel would make the orchestrator thrash evicting innocent records under a ceiling it can never reach. - Document that estimated_cost is a deliberate second encode of the value (persistence encodes it again); folding the two is a tracked follow-up.

- Add static_assertions dev-dependency and assert_not_impl_any!(RecordMetadata: Serialize, DeserializeOwned) in the AC4 module. Turns the cost-non-leakage invariant from a comment into an enforced contract: adding a serde derive to RecordMetadata now fails to compile, not just at runtime.

cloudflare-workers-and-pages · 2026-06-24T20:18:19Z

Deploying topgun with Cloudflare Pages

Latest commit:	`e58f016`
Status:	✅ Deploy successful!
Preview URL:	https://d650e983.topgun-f45.pages.dev
Branch Preview URL:	https://fix-sf-323-eviction-cost.topgun-f45.pages.dev

View logs

ivkan added 11 commits June 24, 2026 20:11

fix(sf-323): clippy -D warnings + cargo fmt clean (G4 gate)

720185b

- record.rs: add #[must_use] to estimated_cost() (must_use_candidate) - eviction_orchestrator.rs: backtick-quote identifiers in doc comments - eviction_cost_test.rs: backtick-quote identifiers in doc comments + fmt reformat

style(sf-323): cargo fmt eviction_cost_test.rs

5862fe9

ivkan merged commit 48b3266 into main Jun 24, 2026
16 checks passed

ivkan deleted the fix/sf-323-eviction-cost branch June 24, 2026 20:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(sf-323): activate cost-based eviction — measure per-record cost + mark records clean once durable#87

fix(sf-323): activate cost-based eviction — measure per-record cost + mark records clean once durable#87
ivkan merged 11 commits into
mainfrom
fix/sf-323-eviction-cost

ivkan commented Jun 24, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ivkan commented Jun 24, 2026

Summary

What changed

Correctness contract

Verification

Review trail

Follow-ups

Note on file count

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 24, 2026

Deploying topgun with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant