fix(sf-324): WriteBehind read surface buffer-aware + flush-vs-coalesce seq-identity (eviction read-consistency)#88
Merged
Conversation
…ver-log passthrough recv_decoded skipped to the first decodable message of any type, so a live ServerEvent delta (a QUERY_SUB registers a live subscription) interleaving with a request-response reply made read_all bail with 'expected QUERY_RESP, got ServerEvent' under churn. Skip unsolicited ServerEvent/ServerBatchEvent/ QueryUpdate/JournalEvent pushes while awaiting the actual reply. Add env-gated SOAK_SERVER_LOG_PASSTHROUGH to mirror server log lines to the harness stderr for operator diagnostics (e.g. confirming eviction fires). Surfaced while running the TODO-484 G4b soak re-run; needed to observe real convergence under active eviction (see TODO-539).
Under active eviction (SPEC-323), a buffered-but-not-yet-flushed record can have its resident engine copy evicted while the value lives only in the write-behind staging buffer. load()/load_all() already overlay staging for read-your-writes, but scan_values/scan_values_batched/enumerate_leaves/ list_maps delegated straight to the durable inner store, so the full-scan query path and the Merkle leaf source returned stale-or-missing values for acked writes (read-your-writes broken under memory pressure). Overlay the map's pending staging set onto each durable read: - buffered Some replaces the durable row/leaf (newer buffered value; enumerate recomputes the leaf hash so a buffered-then-flushed record leaves the Merkle root unchanged) - buffered None (pending delete) suppresses the durable row/leaf so an evicted-then-deleted key cannot resurrect - staging-only keys (buffered, never flushed) are emitted exactly once: in the first scan_values batch (resumed batches overlay only) and after the durable leaf enumeration, propagating merkle_leaf_hash's None for OrMap entries that contribute no leaf - list_maps unions the durable catalog with staging-only maps - is_backup scans delegate straight to inner: staging holds only non-backup writes, so overlaying a backup scan would double-count Staging set is collected upfront per map, bounded by the buffer capacity.
Drive a WriteBehindDataStore over a real redb inner with a long flush delay so writes stay buffered, then assert the overlay surface: - AC2: buffered-only key surfaces in scan; no double-count after flush; newer buffered write overrides an older flushed value, exactly once - AC3: a buffered pending delete hides the flushed value from scan and enumerate_leaves (no resurrection) - AC4: leaf hash is identical buffered vs flushed (Merkle root unchanged); OrMap entry yields a leaf, OrTombstones (merkle_leaf_hash None) yields none - AC5: backup scan/enumerate is the inner result, no non-backup overlay - AC6: list_maps unions a staging-only map, excludes a delete-only map - AC7: multi-batch scan over flushed + staging-only keys returns the full key set exactly once, no boundary miss, no staging-only duplicate
The background flush drained an older write, persisted it, then cleared the key's staging slot unconditionally. Because the partition-queue lock is released during the persist, a newer write can coalesce into staging in that window; the unconditional clear then wiped it, dropping read-your-writes back to the stale durable value. Under active eviction the resident copy is gone, so the staging overlay is the only correct source -- this is the soak's pre-crash `expected=2 actual=1` residual. Tag each staging slot with its originating write's sequence and remove it only when that seq still matches the flushed entry (clear_staging_if_current); a newer coalesced write survives until its own flush. pending_count still decrements per terminal flush. Adds ac8 white-box test driving the exact flush-vs-coalesce race window.
After the read-surface fix, the soak's pre-crash full-scan convergence passes under active eviction, but two clients still read different Merkle roots (live) and the root + delta read-back change across restart. These are one defect: the Merkle root/index is built from the resident set, so eviction (mutating residency) and kill -9 (dropping the in-memory index) both change it. That is TODO-530's residency-independent-Merkle / SYNC-treewalk track, not the read-surface buffer-awareness this spec delivers. Route the live two-client Merkle disagreement and the post-restart merkle-root/delta/query gates to the non-fatal pending_gates bucket (logged, [TODO-530]-tagged), so the active-eviction soak reports the read-surface capability it actually verifies. Rename pending_322b -> pending_gates.
…ening) Cross-vendor (glm-5.2) review of the seq-identity fix flagged that two concurrent writers on the same key can interleave between next_sequence() and the staging insert, letting an older write's value clobber the newer staged one (a plain insert is last-writer-wins by wall-clock, not by seq). That would make a slot's seq untruthful and break the identity clear_staging_if_current relies on. Route all three staging inserts through a monotonic stage() helper that only replaces the slot when the incoming seq >= the current seq. Adds stage_is_monotonic_by_seq test.
Deploying topgun with
|
| Latest commit: |
5e6b09a
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://350f7588.topgun-f45.pages.dev |
| Branch Preview URL: | https://fix-sf-324-eviction-readcons.topgun-f45.pages.dev |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Makes the WriteBehind datastore read surface buffer-aware so active eviction no longer exposes stale/missing reads for acked-but-unflushed writes, and closes a flush-vs-coalesce staging race surfaced while fixing it.
Fixes TODO-539 (the TODO-484 G4b soak finding): with active eviction,
scan_values/scan_values_batched/enumerate_leaves/list_mapsdelegated straight to inner redb whileload/load_alloverlaid the pending buffer — so the QUERY full-scan + Merkle read paths returned stale/missing values for acked writes (2421/4000 keys diverged pre-crash; no-eviction control passed).Changes
write_behind.rs):scan_values/scan_values_batched/enumerate_leaves/list_mapsnow mirror theload/load_alloverlay — stagingSomeoverlays the buffered value,Nonesuppresses the durable row,is_backup=trueshort-circuits to inner (primary-only).merkle_leaf_hashNone(OrMap-no-leaf) propagated (no placeholder leaf).write_behind.rs): background flush + max-retry discard now drop a staging slot only when its seq matches the flushed entry (clear_staging_if_current);stage()inserts are monotonic by seq so a concurrent older write can't clobber a newer staged value.processors.rsleft unchanged (the prescribed merge fix R6 was empirically refuted via A/B).benches/soak_harness/): readback hardened against interleaved live pushes; residency-coupled Merkle gates carved to non-fatal[TODO-530]pending; env-gated server-log passthrough.Verification
stage_is_monotonic_by_seqgreen.cargo fmt --check0 ·cargo clippy --all-targets --all-features -- -D warnings0 · lib suite 1495 / 0.ScanProcessor).Follow-ups
TODO-539 (residency-independent Merkle / SYNC-treewalk), TODO-540 (Fix C: eviction skips pending), TODO-541 (raw scan_values overlay hardening + terminal-discard/list_maps edge cases).
Next
On green CI → merge → TODO-484 G4b soak re-run under crash + ACTIVE eviction as the closing gate.