Skip to content

feat(flashblocks): State root computation via PayloadProcessor sparse trie task#2247

Draft
mw2000 wants to merge 3 commits intomainfrom
mw2000/state-root-fast
Draft

feat(flashblocks): State root computation via PayloadProcessor sparse trie task#2247
mw2000 wants to merge 3 commits intomainfrom
mw2000/state-root-fast

Conversation

@mw2000
Copy link
Copy Markdown
Contributor

@mw2000 mw2000 commented Apr 16, 2026

  • Replace inline synchronous state root computation with reth's PayloadProcessor sparse trie task, computing state roots in parallel during block building
  • PayloadProcessor, ChangesetCache, and TreeConfig are stored on OpPayloadBuilder and reused across blocks so the sparse trie stays warm
  • Falls back to synchronous computation on task failure
  • Rewrite state_root benchmark to measure residual wait time (finish → root) with busy-wait simulation of 200ms inter-flashblock delays

@cb-heimdall
Copy link
Copy Markdown
Collaborator

cb-heimdall commented Apr 16, 2026

🟡 Heimdall Review Status

Requirement Status More Info
Reviews 🟡 0/1
Denominator calculation
Show calculation
1 if user is bot 0
1 if user is external 0
2 if repo is sensitive 0
From .codeflow.yml 1
Additional review requirements
Show calculation
Max 0
0
From CODEOWNERS 0
Global minimum 0
Max 1
1
1 if commit is unverified 1
Sum 2

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
base Ignored Ignored Preview Apr 17, 2026 7:12pm

Request Review

Comment on lines +137 to +143
let start_time = std::time::Instant::now();

// Signal that all state updates are done.
state_hook.finish();

let state_provider = state.database.as_ref();
let hashed_state = state_provider.hashed_post_state(&state.bundle_state);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential correctness issue: hashed_state computed before merge_transitions

hashed_post_state(&state.bundle_state) is called here, before build_block calls state.merge_transitions(BundleRetention::Reverts). In reth's State model, individual commit() calls push changes into transition_state, and merge_transitions() merges them into bundle_state. If there are un-merged transitions at this point, the hashed_state won't reflect all state changes.

This hashed_state is then passed to build_block and stored in BuiltPayloadExecutedBlock, which the engine uses for state persistence. If it's stale (pre-merge), the engine could persist an inconsistent view.

Consider calling state.merge_transitions(BundleRetention::Reverts) before computing the hashed state, or moving this computation to after build_block merges transitions. Alternatively, verify that all transitions are already merged into bundle_state by this point (i.e., no pending transitions exist).

Comment on lines +145 to +146
if let Some(rx) = state_root_handle {
match rx.recv() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking recv() on a tokio worker thread

rx.recv() is std::sync::mpsc::Receiver::recv(), which blocks the current thread. This is called from finalize_payload, which is invoked from async fn build_payload. Blocking a tokio worker thread while waiting for the sparse trie task can stall other tasks on the same runtime, especially under load.

Consider using tokio::sync::oneshot instead of std::sync::mpsc for StateRootHandle, or wrapping this in tokio::task::spawn_blocking / tokio::task::block_in_place.

Comment on lines +98 to +105
let env = reth_engine_tree::tree::ExecutionEnv {
evm_env: Default::default(),
hash: B256::ZERO,
parent_hash,
parent_state_root,
transaction_count: 0,
withdrawals: None,
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evm_env: Default::default() and hash: B256::ZERO — these are placeholder values for the ExecutionEnv since the builder manages its own execution loop. This works because the empty tx list means the processor won't actually execute anything with these fields. However, this coupling to internal PayloadProcessor behavior is fragile — if PayloadProcessor::spawn starts validating or using these fields (e.g., for logging, caching, or consistency checks), this will silently produce wrong results. A brief // SAFETY: or // INVARIANT: comment documenting why the defaults are correct would help future readers.

pub config: BuilderConfig,
/// Reusable payload processor — warm sparse trie persisted across blocks.
#[debug(skip)]
payload_processor: Arc<std::sync::Mutex<PayloadProcessor<BaseEvmConfig>>>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arc<std::sync::Mutex<PayloadProcessor>> — the lock is correctly scoped (dropped before any .await), but since OpPayloadBuilder is Clone (and payload_processor is Arc), concurrent build_payload calls would contend on this lock. If the service guarantees single-active-builder at a time, that's fine, but worth a comment noting the exclusivity assumption.


// PayloadProcessor is reused across blocks for warm sparse trie.
let evm_config = BaseEvmConfig::optimism(ctx.chain_spec());
let tree_config = TreeConfig::default();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TreeConfig::default() — the benchmark uses TreeConfig::default().with_disable_sparse_trie_cache_pruning(true), but production here uses the plain default (which has pruning enabled). If the benchmark is meant to reflect production, one of these should change. If cache pruning is intentionally enabled in production but disabled in benchmarks to keep the trie warm, that's fine but worth documenting the discrepancy.

…trics

Expand ClientBounds with DatabaseProviderFactory, StateReader, and
'static to support OverlayStateProviderFactory and StateProviderBuilder
needed by the upcoming PayloadProcessor integration.

Add state_root_task_* metrics (started, completed, error counts and
duration histogram) for observability.
@mw2000 mw2000 force-pushed the mw2000/state-root-fast branch from bca1ea0 to 8979c8d Compare April 17, 2026 18:20
@@ -935,16 +1012,18 @@ where
.apply_pre_execution_changes()?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-execution state changes not sent to sparse trie task

apply_pre_execution_changes() (system contract calls like L1 block info updates, beacon root, etc.) modifies state before any sequencer transactions run. However, these state diffs are never sent through state_hook.send_state_update() — only sequencer transactions (in execute_sequencer_transactions) and pool transactions (in execute_best_transactions) feed the hook.

The sparse trie task computes the state root from the parent trie plus only the diffs it receives via on_state(). Since pre-execution diffs are missing, the background state root will be incorrect — it won't account for system contract state changes.

On the fallback path (synchronous state_root_with_updates), the root is computed from hashed_post_state(&state.bundle_state) which does include pre-execution changes, so the fallback produces the correct root while the background task doesn't.

Consider either:

  1. Sending the pre-execution state changes through the hook (add a state_hook.send_state_update(...) call after apply_pre_execution_changes()), or
  2. Committing the pre-execution changes to the OverlayStateProviderFactory before spawning the task so they're part of the base state the sparse trie sees.


// Merge pending transitions into bundle_state before reading it.
// revm's commit() pushes to transition_state, not bundle_state.
state.merge_transitions(BundleRetention::Reverts);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double merge_transitions when called from finalize_payload

finalize_state_root calls state.merge_transitions(BundleRetention::Reverts) here, then the caller finalize_payload passes state to build_block which calls merge_transitions again (payload.rs:1035). The second merge is a no-op on the now-empty transition state, but build_block clones state.transition_state beforehand (payload.rs:1033) to restore it at the end — since the finalization path already merged, this clone captures an empty transition state.

This is safe only because finalize_payload is called at the end of block building and no further flashblocks are built afterward. However, the implicit coupling is fragile — if someone later calls build_block after finalize_state_root in a non-terminal context, state would be silently corrupted. Consider either:

  • Having finalize_state_root not call merge_transitions itself (let the caller manage it), or
  • Adding a comment documenting this coupling.

Comment on lines 69 to 83
StateProviderFactory
+ ChainSpecProvider<ChainSpec = BaseChainSpec>
+ BlockReaderIdExt<Header = Header>
+ DatabaseProviderFactory<
Provider: BlockReader
+ StageCheckpointReader
+ PruneCheckpointReader
+ ChangeSetReader
+ StorageChangeSetReader
+ BlockNumReader
+ StorageSettingsCache,
> + StateReader
+ Clone
+ 'static
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking change to ClientBounds public trait

Adding DatabaseProviderFactory<Provider: ...> + StateReader + 'static is a breaking change — any downstream type implementing ClientBounds must now also satisfy these bounds. Worth noting in the PR description / changelog. If these bounds are only needed by spawn_state_root_task (which is pub(crate)), consider introducing a separate StateRootClientBounds trait to keep ClientBounds stable for external consumers.

@mw2000 mw2000 force-pushed the mw2000/state-root-fast branch from 8979c8d to 454d4e8 Compare April 17, 2026 19:12
state_root = %outcome.state_root,
"Background state root task completed"
);
return Ok((outcome.state_root, outcome.trie_updates, hashed_state));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No consistency check between background trie_updates and locally-computed hashed_state

When the background task succeeds, outcome.trie_updates (computed by the sparse trie from per-tx diffs fed through the hook) is returned alongside hashed_state (computed locally from bundle_state on line 178). These two independently-derived views of the state are stored together in BuiltPayloadExecutedBlock and must be consistent for the engine to persist state correctly.

If any state update is ever lost in transit through the hook (e.g., a future refactoring accidentally skips a send_state_update call, or a new code path commits state without notifying the hook), the trie updates and hashed state would silently diverge — no error, no warning, just an inconsistent state persisted by the engine.

Consider adding a debug assertion or logging comparison (e.g., comparing the state root derived from hashed_state against outcome.state_root in debug/test builds) to catch divergence early. Something like:

Suggested change
return Ok((outcome.state_root, outcome.trie_updates, hashed_state));
return Ok((outcome.state_root, outcome.trie_updates, hashed_state));
// TODO: consider adding a debug assertion comparing the
// locally-derived state root against outcome.state_root to
// catch hook/bundle_state divergence.

if let Some(hook) = guard.hook.as_mut() {
hook.on_state(StateChangeSource::Transaction(idx), state);
guard.tx_count += 1;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent no-op after finish() is called

After finish() sets hook to None, subsequent send_state_update calls silently discard state updates with no warning. In the current code flow this can't happen (finish is only called from finalize_state_root as the last step), but it's a latent foot-gun — if someone adds a code path that calls send_state_update after finish(), the sparse trie task would produce an incorrect root with no diagnostic signal.

Consider logging a warning or debug-asserting when a state update is dropped:

if let Some(hook) = guard.hook.as_mut() {
    hook.on_state(StateChangeSource::Transaction(idx), state);
    guard.tx_count += 1;
} else {
    debug_assert!(false, "send_state_update called after finish()");
}

name = "state_root"
harness = false


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: extra trailing blank lines added to the file.

@github-actions
Copy link
Copy Markdown
Contributor

Review Summary

This PR integrates reth's PayloadProcessor sparse trie task to pipeline state root computation alongside block building, with a synchronous fallback. The architecture is sound — state diffs are fed through a BuilderStateHook wrapper that bridges &self calling conventions to reth's &mut self OnStateHook, and the reusable PayloadProcessor keeps the sparse trie warm across blocks.

New Findings

Consistency gap between background trie updates and local hashed state (state_root_task.rs:194): When the background task succeeds, trie_updates come from the sparse trie (driven by hook-delivered diffs) while hashed_state is independently computed from bundle_state. These two representations must be consistent for correct engine state persistence. If any state update is lost in transit through the hook, divergence would be silent. A debug-mode cross-check would catch this class of bug early.

Silent state update drops after finish() (state_root_task.rs:63): send_state_update silently discards updates once finish() has been called. Current code flow prevents this, but a debug_assert! would guard against regressions.

Extra trailing blank lines (Cargo.toml:248): Minor — two extra blank lines at end of file.

Previously Raised Items (from prior review round)

The following items were raised in a prior review round and remain relevant:

  • Blocking std::sync::mpsc::recv() — documented as running on a spawn_blocking thread; verify this invariant holds
  • ClientBounds trait expansion is a breaking change for downstream implementors
  • Arc<std::sync::Mutex<PayloadProcessor>> relies on single-active-builder assumption (documented via comment)
  • Pre-execution state changes now correctly flow through the hook via SharedHookAdapter (addressed)
  • Double merge_transitions coupling between finalize_state_root and build_block (documented)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants