This document describes how to rewind a whole network to a prior head, effectively forcing a reorg of the chain. This process should only be used for testing purposes, or during emergencies, e.g. in order to rewind a critical consensus bug, most likely shortly after a broken fork activation.
Related prior art: ‣
Determine all nodes that are under your control that would need to be rewound. This includes sequencers and replicas.
Then determine the block number to rewind to.
<aside> ⚠️ ArgoCD auto-sync should be disabled on all nodes that will be rewound, so that op-nodes can freely be scaled down and up.
</aside>
If a chain rewind is being prepared as a contingency after a potentially broken fork activation, you can already determine the block number prior to the network activation because of the constant 2 second block time. In such cases it is advised to shut down the batcher & proposer close (think 1 min) before a fork activation, to avoid posting batches of broken L2 blocks to L1.
op-node
)
admin_stopSequencer
on the sequencer op-node
. In that case, need to call admin_resetDerivationPipeline
on all op-node
s later.op-wheel engine rewind --set-head --to <block-num>
on all EL clients (most likely op-geth
).
debug_setHead
followed by engine_forkchoiceUpdated
using the provided block number. It also checks that the block exists in the ELs database. The safe and finalized tag are guaranteed to only be reset backwards, not forward.op-wheel
needs to be set up with the right flags or env vars for the open and authenticated RPC EL endpoint, and file path to the jwt secret. See op-wheel engine rewind --help
for all available and required flags.op-geth
highly depends on the infrastructure setup. E.g. for a network managed with Kubernetes, you could use kubectl port-forward op-geth-0 8545 8551
to forward both, the open and authenticated API.admin_stopSequencer
before, instead of shutting down nodes, you should now call admin_resetDerivationPipeline
on all nodes (including the sequencer) and then start the sequencer back up using admin_startSequencer
.The sequencer will just pick up where the EL client got reset to and produce blocks. It might take a while for other nodes’ unsafe head to move. But the latest when a batch tx on the new chain is confirmed on L1 and the safe head derived from it, should all nodes be back in sync.
A complete working example zsh script for OP Labs internal infrastructure follows. It is able to rewind the full internal-devnet
cluster within ~25 seconds.
work/
, with the optimism/
monorepo and Kubernetes configuration k8s/
repositories next to it.
k8s
repo is just used for access to the jwt-keys.optimism
is just used to build op-wheel
.