Summary

This document describes how to rewind a whole network to a prior head, effectively forcing a reorg of the chain. This process should only be used for testing purposes, or during emergencies, e.g. in order to rewind a critical consensus bug, most likely shortly after a broken fork activation.

Related prior art:

Preparation

Determine all nodes that are under your control that would need to be rewound. This includes sequencers and replicas.

Then determine the block number to rewind to.

<aside> ⚠️ ArgoCD auto-sync should be disabled on all nodes that will be rewound, so that op-nodes can freely be scaled down and up.

</aside>

Production Network Contingency

If a chain rewind is being prepared as a contingency after a potentially broken fork activation, you can already determine the block number prior to the network activation because of the constant 2 second block time. In such cases it is advised to shut down the batcher & proposer close (think 1 min) before a fork activation, to avoid posting batches of broken L2 blocks to L1.

The Rewind

  1. Shut down batcher+proposer to avoid sending batches with blocks that would potentially be invalidated by a rewind
    1. If preparing for a contingency chain rewind after a fork, shut the batcher down right before the fork activation
  2. Shut down all consensus clients, including the sequencer (most likely op-node)
    1. Alternatively, you can call admin_stopSequencer on the sequencer op-node. In that case, need to call admin_resetDerivationPipeline on all op-nodes later.
    2. A shut down is the safer approach.
  3. Call op-wheel engine rewind --set-head --to <block-num> on all EL clients (most likely op-geth).
    1. This will call debug_setHead followed by engine_forkchoiceUpdated using the provided block number. It also checks that the block exists in the ELs database. The safe and finalized tag are guaranteed to only be reset backwards, not forward.
    2. op-wheel needs to be set up with the right flags or env vars for the open and authenticated RPC EL endpoint, and file path to the jwt secret. See op-wheel engine rewind --help for all available and required flags.
    3. The network connections to op-geth highly depends on the infrastructure setup. E.g. for a network managed with Kubernetes, you could use kubectl port-forward op-geth-0 8545 8551 to forward both, the open and authenticated API.
  4. Start up all consensus clients, starting with sequencer.
    1. If you used admin_stopSequencer before, instead of shutting down nodes, you should now call admin_resetDerivationPipeline on all nodes (including the sequencer) and then start the sequencer back up using admin_startSequencer .
  5. Start up batcher+proposer.

The sequencer will just pick up where the EL client got reset to and produce blocks. It might take a while for other nodes’ unsafe head to move. But the latest when a batch tx on the new chain is confirmed on L1 and the safe head derived from it, should all nodes be back in sync.

Example Script

A complete working example zsh script for OP Labs internal infrastructure follows. It is able to rewind the full internal-devnet cluster within ~25 seconds.

Assumptions