A reorg removes previously canonical blocks, starting at the tip of the chain, going back until a specific “reorg depth”, in favor of an alternative chain.
The op-node and op-geth are expected to handle reorgs automatically. Long (or deep) reorgs may get full nodes stuck on the old chain due to timeout/processing problems, which can be resolved with backups (infra-specific) or recovery of nodes synced to the pre-reorg chain (as described in this doc).
Very short reorgs (< 5 blocks) are not expected: the sequencer confirmation depth should ensure that unstable L1 information is not included.
Short reorgs (5-60 blocks) may happen if the L1 reorgs past the sequencer conf-depth, expected to be up to 10 L1 blocks deep (~60 L2 blocks), but should be rare.
Medium reorgs (60+ blocks) may still happen, but indicate very poor L1 conditions. Reorgs this deep in terms of slots may happen due to L1 forkchoice bugs, but are not common in terms of L1 execution-blocks that the rollup follows, as reorged slots are often empty (since a proposal of two different blocks for the same slot is slashable).
Long reorgs may happen on L2 operation problems: after failing to submit data for a prolonged time. This ensures rollup guarantees, at the cost of reverting unconfirmed L2 blocks (of which the txs may still be replayed, but only after first processing deposits).
And in all cases a L2 bug may also be a possible cause: reorgs are meant to be rare and should be RCA’d.
Unusually multiple reorgs indicate that there’s a bug in the Sequencer/Verifier—derivation (op-node) or batching (op-batcher). Please escalate the issue to the client pod.
In the early days of Span Batches, we can reasonably suspect the bug originated from Span Batches deployment. Please turn off Span Batches and roll back the op-batcher to post Singular Batches in this case. Please refer to the Process section in this Runbook.
There are three causes that can cause a reorg:
L1 reorg deeper than sequencer conf depth
Sequencer/Verifier bug
How: an L2 block was created and submitted, but reproduced from batch as conflicting block, without L1 reorg.
Limit: matching the number of invalid created L2 blocks. This reorg should not include safe-blocks: the verifier codepath should take priority over sequencer bugs, as it is more predictable to only undo unsafe blocks, and does not require protocol overrides. Undoing safe-blocks without L1 reorgs or derivation rule changes is not possible.
Trigger: verifier-code producing block-inputs that do not match the unsafe L2 blocks
When the verifier code processes block-inputs to advance the L2 safe-head, while the L2 unsafe-head is ahead, it compares the would-have-been block inputs against the previously known unsafe blocks (synced from p2p, or locally produced as sequencer). If there is a mismatch, this then triggers a reorg of the L2 chain by inserting the new correct block into the engine, while still traversing the same L1 chain.
Failed batch-submission for full sequencer-window (12 hours)
In all these 3 cases, the common verifier code-path (derivation pipeline) executes the reorg.
The sequencer should automatically start building a new L2 chain after having executed the reorg like a verifier would.
Upon reorgs, op-geth puts transactions of the old chain back into the tx-pool. The sequencer may start including these into new blocks (i.e. replay), or drop them if there are too many for the tx-pool to hold on to.