Tezos reconstruction benchmark by replay

We DaiLambda are working on improving Tezos blockchain storage layer called context. The context stores versions of blockchain states including balance and smart contracts.

Objective

We want to benchmark Tezos context reconstruction of the recent blocks, say 10000.

  • Only recent blocks, not from the genesis.
  • Blocks must be preloaded to exclude the network costs.

Currently we have 2 ways to replay blocks: reconstruct and replay.

tezos-node reconstruct commits to the context, but is always from the genesis

tezos-node reconstruct reconstructs the contexts from the genesis. It takes too long time for benchmark, several days or a week. We also do not want to benchmark the context reconstruction of the old cemented blocks, since a running node does not build contexts only from floating blocks.

tezos-node replay can replay blocks from a recent block, but does not commit contexts

tezos-node replay command is to replay specified block levels, but it NEVER commits contexts: new versions of contexts are built on memory, then their hashes are compared with the contexts already imported on disk. For the precise benchmark, we want to commit newly create contexts rather than just checking the context hashes, since the disk I/O is always the big performance factor of the node.

Solution: replay with reconstruction

We have developed a hybrid version of these 2 methods, replay+reconstruc. Using it, we can replay recent Tezos blocks then commit their context updates to the disk. In the view point of the context storage layer, the benchmark using this replay with reconstruction is very similar to what it performs in the actual Tezos node.

Suppose that we want to replay blocks with the context reconstruction between levels $level1 and $level2.

The idea is:

  • store/ carries the blocks between $level1 and $level2.
  • context/ carries only the context at $level1.
  • Let tezos-node replay command apply the blocks between $level1 and $level2 consecutively and commit their contexts to context/ directory.

Prepare tezos-node

Your node must have tezos-node replay command.

$ ./tezos-node --help
...
       replay
           Replay a set of previously validated blocks
...

Prepare a full node

Prepare a full node and let $srcdir be the directory of the full node.

Its storage version must be 0.0.5 or newer:

$ cat $srcdir/version.json
{ "version": "0.0.5" }

Upgrading storage

If the storage version is 0.0.4, you have to upgrade the data directory by:

$ ./tezos-node upgrade storage --data-dir $srcdir

NOTE: Recent master of tezos-node has a bug around --data-dir. You MUST make sure $srcdir/config.json exists and its data-dir field points to $srcdir.

Check the data

You must first check $level1 and $level2 are available in the node:

$ ./tezos-node replay --data-dir $srcdir $(($level1 + 1)) $level2

Snapshot of $level1

Make a snapshot of level $level1:

$ ./tezos-node snapshot export --data-dir $srcdir --block $level1 tezos-mainnet-snapshot-full.$level1

NOTE: The snapshot MUST be taken by the new store. Currently, snapshots of v9.1 called “legacy snapshots” are NOT properly imported by the latest tezos-node.

Prepare the starting context

Now import the snapshot to a new directory $importdir:

$ mkdir $importdir
$ ./tezos-node snapshot import --data-dir $importdir tezos-mainnet-snapshot-full.$level1

Prepare the replay directory

Make another directory $replaydir for the replay:

Copy the store and JSON files of $srcdir:

$ cp -a $srcdir/store $replaydir/
$ cp $srcdir/*.json $replaydir/

Copy the context of $importdir:

$ cp -a $importdir/context $replaydir/

Reinitialize the directory:

$ ./tezos-node config reset --data-dir $replaydir

Now $replaydir/ is ready for reconstruction from $level1:

  • store/ : blocks enough to reconstruct between $level1 and $level2.
  • context/: the context of $level1, the starting point of the reconstruction.

Reconstruction by replay

Now the following should work:

$ ./tezos-node replay --data-dir $replaydir $((level + 1)) $((level + 2)) ...

tezos-node replay cannot take the range of block levels. You need to specify levels one by one.

When you rerun the reconstruction, you have to reset context/ directory:

$ rm -rf $replaydir/context
$ cp -a $importdir/context $replaydir
$ ./tezos-node replay --data-dir $replaydir lev1 lev2 ...

Testing modified node using histroic data

If a modification to a node implementation changes context hash, pure replay+reconstruct fails because the block application may produce different context hashes from the ones expected.

To make replay+reconstruct working even in this situlation, tezos-node replay has a new option --ignore-context-hash-mismatch in https://gitlab.com/dailambda/tezos/-/tree/jun@replay-reconstruct . With this option, the context hash mismatches are ignored: if a block $B_i$ with an expected context hash $H_i$ produces a context of a different context hash $H’_i$, the mismatch does not stop the validation if the option is enabled. The replay continues and commits the result context with $H’_i$ remembering the pair of $(H_i, H’_i)$ in memory. At the replay of the next block $B_{i+1}$, the node requires the context of its predecessor $B_i$. Its context hash is $H_i$ in $B_i$, which is NOT found in the context DB. Instead we checkout the context of $H’_i$ paired with $H_i$.

Thanks to this --ignore-context-hash-mismatch option we can quickly test and bench node modifications over historic block data, even if it might produce different context hashes.

Future work

  • It takes very long time to export and import a snapshot for $level1. What we need here is just one context and we need no store exported. It would be nice if we have a tool to copy just one version of context.
  • Benchmarking fails if a shell or a protocol has modifications affect context hashes. Now we have --ignore-context-hash-mismatch option.
  • Better UI.