A reorg removes previously canonical blocks, starting at the tip of the chain, going back until a specific “reorg depth”, in favor of an alternative chain.

The op-node and op-geth are expected to handle reorgs automatically. Long (or deep) reorgs may get full nodes stuck on the old chain due to timeout/processing problems, which can be resolved with backups (infra-specific) or recovery of nodes synced to the pre-reorg chain (as described in this doc).

By depth

Very short reorgs (< 5 blocks) are not expected: the sequencer confirmation depth should ensure that unstable L1 information is not included.

Short reorgs (5-60 blocks) may happen if the L1 reorgs past the sequencer conf-depth, expected to be up to 10 L1 blocks deep (~60 L2 blocks), but should be rare.

Medium reorgs (60+ blocks) may still happen, but indicate very poor L1 conditions. Reorgs this deep in terms of slots may happen due to L1 forkchoice bugs, but are not common in terms of L1 execution-blocks that the rollup follows, as reorged slots are often empty (since a proposal of two different blocks for the same slot is slashable).

Long reorgs may happen on L2 operation problems: after failing to submit data for a prolonged time. This ensures rollup guarantees, at the cost of reverting unconfirmed L2 blocks (of which the txs may still be replayed, but only after first processing deposits).

And in all cases a L2 bug may also be a possible cause: reorgs are meant to be rare and should be RCA’d.

By occurrence

Unusually multiple reorgs indicate that there’s a bug in the Sequencer/Verifier—derivation (op-node) or batching (op-batcher). Please escalate the issue to the client pod.

In the early days of Span Batches, we can reasonably suspect the bug originated from Span Batches deployment. Please turn off Span Batches and roll back the op-batcher to post Singular Batches in this case. Please refer to the Process section in this Runbook.

Overview

There are three causes that can cause a reorg:

L1 reorg deeper than sequencer conf depth
- How: when the sequencer includes L1 data into the L2 chain, and that L1 data is invalidated due to reorg, the L2 chain will reorg along with it.
- Limit: matching the L1 reorg depth, insofar the L2 chain included L1 info of reorged out blocks. This reorg may include previous safe blocks, if the batch-submission was reorged in L1. The op-node also maintains a sanity maximum reorg-depth of 5 sequence windows (2.5 days), and halts otherwise.
- Trigger: the verifier and sequencer code-paths both detect chain inconsistencies when traversing L1 blocks by number, causing a derivation-pipeline reset to a safe consistent point: it will roll back the chain for the L2 safe-head to be available, and the L2 unsafe-head to be equal or ahead, with a L1 origin pointing some pare of the post-reorg L1 chain.
Sequencer/Verifier bug
- How: an L2 block was created and submitted, but reproduced from batch as conflicting block, without L1 reorg.
- Limit: matching the number of invalid created L2 blocks. This reorg should not include safe-blocks: the verifier codepath should take priority over sequencer bugs, as it is more predictable to only undo unsafe blocks, and does not require protocol overrides. Undoing safe-blocks without L1 reorgs or derivation rule changes is not possible.
- Trigger: verifier-code producing block-inputs that do not match the unsafe L2 blocks
  
  When the verifier code processes block-inputs to advance the L2 safe-head, while the L2 unsafe-head is ahead, it compares the would-have-been block inputs against the previously known unsafe blocks (synced from p2p, or locally produced as sequencer). If there is a mismatch, this then triggers a reorg of the L2 chain by inserting the new correct block into the engine, while still traversing the same L1 chain.
Failed batch-submission for full sequencer-window (12 hours)
- How: when L2 blocks do not get posted to L1, eventually the protocol forces deposits in, by producing a block that conflicts with the first sequencer block that is missing on L1.
  - Note that batch-submission may also fail if submitted data was included but has timed-out: L1-data txs can timeout, to prevent interrupted/broken channels from bricking the batch-submission work indefinitely.
- Limit: matching the sequencer window (12 hours!) This is a serious protocol operation fault, something that should never happen, unless the sequencer and/or batch-submission is down for a prolonged time without intervention. If it does happen, one possible option may be to prolong the sequencing-window with a network upgrade, to avoid reversal of any chain activity, until data has been submitted. The protocol is meant to protect users from malicious sequencers however, and defaults to forcing this block of deposited transactions, ensuring liveness and core rollup properties.
- Trigger: the derivation-pipeline traverses to a new L1 block, that is ahead of the L1 origin of the L2 safe head by more than the sequencing window. It then creates a new L2 block with just deposits, building on the safe head, and reorging out unsafe L2 blocks, if any.

In all these 3 cases, the common verifier code-path (derivation pipeline) executes the reorg.

The sequencer should automatically start building a new L2 chain after having executed the reorg like a verifier would.

Upon reorgs, op-geth puts transactions of the old chain back into the tx-pool. The sequencer may start including these into new blocks (i.e. replay), or drop them if there are too many for the tx-pool to hold on to.