feat(flashblocks): State root computation via PayloadProcessor sparse trie task#2247
feat(flashblocks): State root computation via PayloadProcessor sparse trie task#2247
Conversation
mw2000
commented
Apr 16, 2026
- Replace inline synchronous state root computation with reth's PayloadProcessor sparse trie task, computing state roots in parallel during block building
- PayloadProcessor, ChangesetCache, and TreeConfig are stored on OpPayloadBuilder and reused across blocks so the sparse trie stays warm
- Falls back to synchronous computation on task failure
- Rewrite state_root benchmark to measure residual wait time (finish → root) with busy-wait simulation of 200ms inter-flashblock delays
🟡 Heimdall Review Status
|
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
| let start_time = std::time::Instant::now(); | ||
|
|
||
| // Signal that all state updates are done. | ||
| state_hook.finish(); | ||
|
|
||
| let state_provider = state.database.as_ref(); | ||
| let hashed_state = state_provider.hashed_post_state(&state.bundle_state); |
There was a problem hiding this comment.
Potential correctness issue: hashed_state computed before merge_transitions
hashed_post_state(&state.bundle_state) is called here, before build_block calls state.merge_transitions(BundleRetention::Reverts). In reth's State model, individual commit() calls push changes into transition_state, and merge_transitions() merges them into bundle_state. If there are un-merged transitions at this point, the hashed_state won't reflect all state changes.
This hashed_state is then passed to build_block and stored in BuiltPayloadExecutedBlock, which the engine uses for state persistence. If it's stale (pre-merge), the engine could persist an inconsistent view.
Consider calling state.merge_transitions(BundleRetention::Reverts) before computing the hashed state, or moving this computation to after build_block merges transitions. Alternatively, verify that all transitions are already merged into bundle_state by this point (i.e., no pending transitions exist).
| if let Some(rx) = state_root_handle { | ||
| match rx.recv() { |
There was a problem hiding this comment.
Blocking recv() on a tokio worker thread
rx.recv() is std::sync::mpsc::Receiver::recv(), which blocks the current thread. This is called from finalize_payload, which is invoked from async fn build_payload. Blocking a tokio worker thread while waiting for the sparse trie task can stall other tasks on the same runtime, especially under load.
Consider using tokio::sync::oneshot instead of std::sync::mpsc for StateRootHandle, or wrapping this in tokio::task::spawn_blocking / tokio::task::block_in_place.
| let env = reth_engine_tree::tree::ExecutionEnv { | ||
| evm_env: Default::default(), | ||
| hash: B256::ZERO, | ||
| parent_hash, | ||
| parent_state_root, | ||
| transaction_count: 0, | ||
| withdrawals: None, | ||
| }; |
There was a problem hiding this comment.
evm_env: Default::default() and hash: B256::ZERO — these are placeholder values for the ExecutionEnv since the builder manages its own execution loop. This works because the empty tx list means the processor won't actually execute anything with these fields. However, this coupling to internal PayloadProcessor behavior is fragile — if PayloadProcessor::spawn starts validating or using these fields (e.g., for logging, caching, or consistency checks), this will silently produce wrong results. A brief // SAFETY: or // INVARIANT: comment documenting why the defaults are correct would help future readers.
| pub config: BuilderConfig, | ||
| /// Reusable payload processor — warm sparse trie persisted across blocks. | ||
| #[debug(skip)] | ||
| payload_processor: Arc<std::sync::Mutex<PayloadProcessor<BaseEvmConfig>>>, |
There was a problem hiding this comment.
Arc<std::sync::Mutex<PayloadProcessor>> — the lock is correctly scoped (dropped before any .await), but since OpPayloadBuilder is Clone (and payload_processor is Arc), concurrent build_payload calls would contend on this lock. If the service guarantees single-active-builder at a time, that's fine, but worth a comment noting the exclusivity assumption.
|
|
||
| // PayloadProcessor is reused across blocks for warm sparse trie. | ||
| let evm_config = BaseEvmConfig::optimism(ctx.chain_spec()); | ||
| let tree_config = TreeConfig::default(); |
There was a problem hiding this comment.
TreeConfig::default() — the benchmark uses TreeConfig::default().with_disable_sparse_trie_cache_pruning(true), but production here uses the plain default (which has pruning enabled). If the benchmark is meant to reflect production, one of these should change. If cache pruning is intentionally enabled in production but disabled in benchmarks to keep the trie warm, that's fine but worth documenting the discrepancy.
…trics Expand ClientBounds with DatabaseProviderFactory, StateReader, and 'static to support OverlayStateProviderFactory and StateProviderBuilder needed by the upcoming PayloadProcessor integration. Add state_root_task_* metrics (started, completed, error counts and duration histogram) for observability.
bca1ea0 to
8979c8d
Compare
| @@ -935,16 +1012,18 @@ where | |||
| .apply_pre_execution_changes()?; | |||
There was a problem hiding this comment.
Pre-execution state changes not sent to sparse trie task
apply_pre_execution_changes() (system contract calls like L1 block info updates, beacon root, etc.) modifies state before any sequencer transactions run. However, these state diffs are never sent through state_hook.send_state_update() — only sequencer transactions (in execute_sequencer_transactions) and pool transactions (in execute_best_transactions) feed the hook.
The sparse trie task computes the state root from the parent trie plus only the diffs it receives via on_state(). Since pre-execution diffs are missing, the background state root will be incorrect — it won't account for system contract state changes.
On the fallback path (synchronous state_root_with_updates), the root is computed from hashed_post_state(&state.bundle_state) which does include pre-execution changes, so the fallback produces the correct root while the background task doesn't.
Consider either:
- Sending the pre-execution state changes through the hook (add a
state_hook.send_state_update(...)call afterapply_pre_execution_changes()), or - Committing the pre-execution changes to the
OverlayStateProviderFactorybefore spawning the task so they're part of the base state the sparse trie sees.
|
|
||
| // Merge pending transitions into bundle_state before reading it. | ||
| // revm's commit() pushes to transition_state, not bundle_state. | ||
| state.merge_transitions(BundleRetention::Reverts); |
There was a problem hiding this comment.
Double merge_transitions when called from finalize_payload
finalize_state_root calls state.merge_transitions(BundleRetention::Reverts) here, then the caller finalize_payload passes state to build_block which calls merge_transitions again (payload.rs:1035). The second merge is a no-op on the now-empty transition state, but build_block clones state.transition_state beforehand (payload.rs:1033) to restore it at the end — since the finalization path already merged, this clone captures an empty transition state.
This is safe only because finalize_payload is called at the end of block building and no further flashblocks are built afterward. However, the implicit coupling is fragile — if someone later calls build_block after finalize_state_root in a non-terminal context, state would be silently corrupted. Consider either:
- Having
finalize_state_rootnot callmerge_transitionsitself (let the caller manage it), or - Adding a comment documenting this coupling.
| StateProviderFactory | ||
| + ChainSpecProvider<ChainSpec = BaseChainSpec> | ||
| + BlockReaderIdExt<Header = Header> | ||
| + DatabaseProviderFactory< | ||
| Provider: BlockReader | ||
| + StageCheckpointReader | ||
| + PruneCheckpointReader | ||
| + ChangeSetReader | ||
| + StorageChangeSetReader | ||
| + BlockNumReader | ||
| + StorageSettingsCache, | ||
| > + StateReader | ||
| + Clone | ||
| + 'static | ||
| { |
There was a problem hiding this comment.
Breaking change to ClientBounds public trait
Adding DatabaseProviderFactory<Provider: ...> + StateReader + 'static is a breaking change — any downstream type implementing ClientBounds must now also satisfy these bounds. Worth noting in the PR description / changelog. If these bounds are only needed by spawn_state_root_task (which is pub(crate)), consider introducing a separate StateRootClientBounds trait to keep ClientBounds stable for external consumers.
8979c8d to
454d4e8
Compare
| state_root = %outcome.state_root, | ||
| "Background state root task completed" | ||
| ); | ||
| return Ok((outcome.state_root, outcome.trie_updates, hashed_state)); |
There was a problem hiding this comment.
No consistency check between background trie_updates and locally-computed hashed_state
When the background task succeeds, outcome.trie_updates (computed by the sparse trie from per-tx diffs fed through the hook) is returned alongside hashed_state (computed locally from bundle_state on line 178). These two independently-derived views of the state are stored together in BuiltPayloadExecutedBlock and must be consistent for the engine to persist state correctly.
If any state update is ever lost in transit through the hook (e.g., a future refactoring accidentally skips a send_state_update call, or a new code path commits state without notifying the hook), the trie updates and hashed state would silently diverge — no error, no warning, just an inconsistent state persisted by the engine.
Consider adding a debug assertion or logging comparison (e.g., comparing the state root derived from hashed_state against outcome.state_root in debug/test builds) to catch divergence early. Something like:
| return Ok((outcome.state_root, outcome.trie_updates, hashed_state)); | |
| return Ok((outcome.state_root, outcome.trie_updates, hashed_state)); | |
| // TODO: consider adding a debug assertion comparing the | |
| // locally-derived state root against outcome.state_root to | |
| // catch hook/bundle_state divergence. |
| if let Some(hook) = guard.hook.as_mut() { | ||
| hook.on_state(StateChangeSource::Transaction(idx), state); | ||
| guard.tx_count += 1; | ||
| } |
There was a problem hiding this comment.
Silent no-op after finish() is called
After finish() sets hook to None, subsequent send_state_update calls silently discard state updates with no warning. In the current code flow this can't happen (finish is only called from finalize_state_root as the last step), but it's a latent foot-gun — if someone adds a code path that calls send_state_update after finish(), the sparse trie task would produce an incorrect root with no diagnostic signal.
Consider logging a warning or debug-asserting when a state update is dropped:
if let Some(hook) = guard.hook.as_mut() {
hook.on_state(StateChangeSource::Transaction(idx), state);
guard.tx_count += 1;
} else {
debug_assert!(false, "send_state_update called after finish()");
}| name = "state_root" | ||
| harness = false | ||
|
|
||
|
|
There was a problem hiding this comment.
Nit: extra trailing blank lines added to the file.
Review SummaryThis PR integrates reth's New FindingsConsistency gap between background trie updates and local hashed state ( Silent state update drops after Extra trailing blank lines ( Previously Raised Items (from prior review round)The following items were raised in a prior review round and remain relevant:
|