Skip to content

fix(l1sequencer): fail-fast on verifier startup when L1 unreachable#986

Merged
tomatoishealthy merged 2 commits into
feat/sequencer-finalfrom
fix/f03-verifier-failfast
Jun 10, 2026
Merged

fix(l1sequencer): fail-fast on verifier startup when L1 unreachable#986
tomatoishealthy merged 2 commits into
feat/sequencer-finalfrom
fix/f03-verifier-failfast

Conversation

@chengwenxi

Copy link
Copy Markdown
Collaborator

Problem

NewSequencerVerifier treated an initial syncHistory() failure as a soft error (Logger.Error only). When the L1 RPC is unreachable at boot, history stays empty, upgrade.SetUpgradeBlockHeight is never called, and the global upgrade.UpgradeBlockHeight is left at its -1 sentinel — yet the node continues booting.

Booting with UpgradeBlockHeight=-1 is unsafe and is the shared root cause of three audit findings:

  • F-01IsUpgraded() returns false for every height, so the PBFT state machine can run past the true upgrade height (split-brain at the boundary).
  • F-03 — on a block-synced fullnode restart, two crash points both keyed on IsUpgraded:
    • the handshake's sequencer-mode replay exemption (replay.go: IsUpgraded(storeHeight+1)) fails → ErrAppBlockHeightTooHigh (app height N > core upgradeHeight-1);
    • the consensus ctor's reconstructLastCommit guard (state.go:192) runs against a nil commit → panic.
  • F-13 — silent halt of an active verifier on the same path.

I reproduced F-03 on the devnet (pre-fix image): stopping a sentry node that had block-synced past the upgrade height (=10) and restarting it with the L1 RPC black-holed produced the verifier soft-error followed by error during handshake: ... app block height (723) is higher than core (9) — confirming the node boots into the -1 state and then crashes in the handshake.

Fix

Make verifier startup fail-fast: NewSequencerVerifier now returns an error when the initial sync fails or returns empty history (UpgradeBlockHeight still < 0), and cmd/node/main.go propagates it so the node exits at boot rather than booting into the unsafe -1 state. The operator's supervisor (Docker restart, systemd, k8s) restarts the node once L1 is reachable.

The error message carries the refusing to start with UpgradeBlockHeight=-1 sentinel that the audit verification scripts (f1-upgrade-init-race.sh, f3-nilcommit-panic.sh) assert on.

Scope

  • node/l1sequencer/verifier.goNewSequencerVerifier returns (*SequencerVerifier, error); fail-fast on sync failure / empty history.
  • node/cmd/node/main.go — propagate the error.

Based on the latest feat/sequencer-final (4a7b9119).

Verification

  • go build ./cmd/node/ ./l1sequencer/ — clean
  • go vet ./l1sequencer/ — clean
  • go test ./l1sequencer/ — pass

@chengwenxi chengwenxi requested a review from a team as a code owner June 10, 2026 01:06
@chengwenxi chengwenxi requested review from SecurityLife and removed request for a team June 10, 2026 01:06
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 97432287-cd4c-41f2-b8c2-e61b7e64be37

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/f03-verifier-failfast

Comment @coderabbitai help to get the list of available commands and usage tips.

@chengwenxi chengwenxi changed the title fix(l1sequencer): fail-fast on verifier startup when L1 unreachable (F-03/F-01/F-13) fix(l1sequencer): fail-fast on verifier startup when L1 unreachable Jun 10, 2026
…F-03/F-01/F-13)

NewSequencerVerifier treated an initial syncHistory() failure as a soft
error (Logger.Error only). When L1 RPC is unreachable at boot, history
stays empty, upgrade.SetUpgradeBlockHeight is never called, and the
global UpgradeBlockHeight is left at its -1 sentinel.

Booting with UpgradeBlockHeight=-1 is unsafe:
- F-01: IsUpgraded() returns false for every height, so the PBFT state
  machine can run past the true upgrade height.
- F-03: on a block-synced fullnode restart, the handshake's sequencer-mode
  replay exemption (replay.go: `IsUpgraded(storeHeight+1)`) fails ->
  ErrAppBlockHeightTooHigh; and the consensus ctor's reconstructLastCommit
  guard (state.go:192) runs against a nil commit -> panic.

Make startup fail-fast: NewSequencerVerifier now returns an error when the
initial sync fails or returns empty history (UpgradeBlockHeight still < 0),
and main.go propagates it so the node exits at boot rather than booting
into the unsafe -1 state. The operator's supervisor restarts the node once
L1 is reachable.
@chengwenxi chengwenxi force-pushed the fix/f03-verifier-failfast branch from 9d66f68 to f1b5870 Compare June 10, 2026 01:17
The CI Lint job runs golangci-lint over the whole node/ package, not just
the PR diff, and was failing on pre-existing issues. Fix all of them so
the check passes:

- gofmt: struct field alignment / formatting (verifier.go, signer.go,
  ha_service.go)
- errcheck: check or explicitly ignore deferred Close/Cancel/SetDeadline
  return values (block_fsm.go, ha_service.go, enclave_signer.go)
- gosec G112: set ReadHeaderTimeout on http.Server (main.go metrics
  server, hakeeper/rpc/server.go) to avoid Slowloris
- gosec G301: tighten Raft storage dir perms 0o755 -> 0o750 (ha_service.go)
- misspell: cancelled -> canceled, initialised -> initialized
  (verifier.go, derivation.go, ha_service.go)

No behavioral change. `make lint` (go run build/lint.go) now exits 0.
@tomatoishealthy tomatoishealthy merged commit 7093489 into feat/sequencer-final Jun 10, 2026
7 checks passed
@tomatoishealthy tomatoishealthy deleted the fix/f03-verifier-failfast branch June 10, 2026 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants