Gradual migration to remove garbage files from Tezos context

Abstract

We are developing a technique of gradual migration to perform huge context modification at a protocol upgrade in multiple blocks to reduce the migration downtime ultimately to an unnoticeable level.

As the first example of this gradual migration, we plan to remove garbage files spendable left in 283K contract directories in Tezos mainnet. They should have been removed at the Babylone upgrade but some were left untouched until today.

Retrospection of Hangzhou context migration

At Tezos Hangzhou upgrade we have executed the context flattening. It flattens the deeply nested directory structures of the context, Tezos blockchain state. Once migrated to the new directory structure, the block validation speed improved by more than 20%.

Unfortunately, Tezos network has experienced long downtime at this migration. Faster nodes, especially restarted from fresh full and rolling snapshots, finished the context flattening in less than 10 minutes. For archive nodes, however, it took nearly one hour. In addition, the same flattening happened many times due to unknown reasons for some nodes. (The flattening itself had no problem. It seems block validations were somehow canceled during the flattening, probably because of network timeout or disconnection from the peer.) As a result, it took a long time for the entire network to verify the first blocks of Hangzhou.

The performance of the context access degrades a lot when its data file is huge. Therefore large modifications to the context take much longer time in the archive nodes with several hundred GB of context data files than in full and rolling nodes. This issue is being resolved by Tarides introducing layers to the context to split the huge data file into smaller files.

Even if the layer will keep each context file very small, it would still take around 10 minutes to finish the context flattening. Therefore the network should experience about 20 minutes of downtime at least (the first 10 minutes for the baker to create a block, then the second 10 minutes for the other nodes to verify the block).

To minimize the downtime of large-scale context modification like the context flattening to an unnoticeable level, we need something more than the layers.

Basic idea of gradual migration

A gradual migration splits the entire migration process into multiple smaller steps, and performs one step for each block at the beginning of the new protocol. If the time for each step is much smaller than the block time, then the migration could be done without any noticeable network downtime. For example, if the context flattening migration would have been split into 2,000 blocks, then it would have taken less than 2 seconds per block even in the archive nodes, and the entire steps could have been finished in 1 day.

A large context restruction usually targets millions of directories and files in the context, such as contract directories (/contracts/index/..) and big_map directories (/big_maps/index/..). We can split these target directories and files into groups and perform the migration for a group per block.

This idea is simple but must be implemented carefully since during the gradual migration the context becomes “unstable”: some parts of it are migrated but the others are not yet.

The migration set: items not yet migrated

Since the migration is split into multiple blocks, we must keep track of which items are not yet migrated. The cost of this migration set must be minimum.

It is a very bad idea to list up the millions of files under a directory such as /contracts/index since it takes nearly 1 minute even for a fresh full node started from a snapshot and 30 minutes for the archive node.

Instead, we can make a copy of the directory and use it as the migration set. This costs almost nothing; Tezos context is functional persistent storage and a directory copy is just a creation of a link, no matter the huge number of files the directory has. When the migration of an item finishes, the item must be removed from the migration set directory.

Suppose that we want to migrate 2 million subdirectories under /contracts/index. We first copy /contracts/index somewhere, say /tmp/migration_set and use it as the set of subdirectories not yet migrated. When the migration of a subdirectory /contracts/index/$d finishes, /tmp/migration_set/$d is removed.

Migration in each block validation

At the beginning of each block validation, the protocol checks whether the migration is ongoing or not by checking the existence of the migration set directory.

If the migration set is not empty, it takes some of the subdirectories of the migration set and performs their migration. The selection must be stable so that all the nodes can take the same subdirectories at each block. (And we have such a stable selection in Tezos context.) Once migrated, the subdirectories are removed from the migration set.

Once all the migrations finish, the migration set becomes empty. In Tezos context, this automatically removes the migration set directory.

Application: gradual removal of garbage files

As the first experiment of this gradual migration, we would like to remove the garbage files spendable under the contract directories.

Background

The context of Tezos Mainnet contains over 283K garbage files named spendable under /contracts/index/<contract id>/ (See issue 2155 further details). These files were used before Protocol 005 Babylon. They should have been deleted at Babylon upgrade but the removal was incomplete.

We first implemented a migration to delete all these garbage files in 1 block by simply scanning all the contracts, but it took more than 30 minutes for archive nodes to fold over the contracts. To prevent the stall of the block validation, we have to distribute the load among blocks by gradually removing the garbage files.

Algorithm

At the first block of a new protocol, we make a list of contracts waiting for the migration process in Init_storage.prepare_first_block. This is done by simply copying the entire tree of /contracts/index to /tmp/migration_set.

Init_storage.prepare checks /tmp/migration_set directory for each block. If it is not empty, the function lists a small number of contracts under /tmp/migration_set using list function. Then it deletes spendables from the listed contracts in /contracts/index. Once done, the contracts are removed from /tmp/migration_set.

Experimental implementation

For experiments and benchmarks, We have integrated this algorithm into the Ithaca protocol upgrade. It is available at https://gitlab.com/dailambda/tezos/-/tree/jun@gradual_migration_spendable-v12.3.

Benchmark

Time consumption

We reconstructed the first 2,000 blocks of Ithaca with the gradual removal of spendable files which migrate 2,000 contracts per block. With about 2 million contracts, the migration finishes around 1,000 blocks, less than the blocks of half a day. Splitting the migration into 1,000 pieces, there was no noticeable increase of the block validation time.

Snapshot size

Thanks to the hash consing mechanism of Irmin, copying the entire /contracts/index to /tmp/migration_set does not affect the snapshot size immediately. As the chain grows, the contents of /contracts/index and /tmp/migration_set diverges, and the snapshot size increases.

We made a snapshot in the middle of the gradual migration (level 2245108) and compared its size with the snapshot of the same block level of vanilla Ithaca. We need a small patch to override the context hash differences caused by the modified protocol: see https://gitlab.com/dailambda/tezos/-/tree/jun@gradual_migration_spendable-v12.3-test:

Snapshot size of vanilla Ithaca at level 2245108: 28.83GiB
Snapshot size of Ithaca + gradual spendable removal at the same level: 28.93GiB

The increase of the snapshot is about 100MiB, below 1% of the original snapshot file. This overhead will be gone once the gradual migration finishes and the migration set is removed completely from the context.

Possible alternative

spendable files can be removed without this gradual migration. Each time the protocol updates a contract directory, changing its balance for example, it checks the existence of spendable under its directory and removes it if exists. This removal on demand is simple but it cannot make the clean-ups completed:

spendable files remain forever unless the contract is updated.
Future protocols must keep this code for spendable files.

We prefer the gradual migration since this sort of patching code should not exist forever in protocols.

Conclusion towards more complex gradual migrations

We have developed a gradual migration technique to remove garbage files left in the Tezos context. The removal requires the traversal of more than 2 million contract directories and it would take 30 minutes at worst if done in one block. The gradual migration performs the traversal over 1,000 blocks to mitigate the overhead to an unnoticeable level.

Removing obsolete files gradually is the simplest form of gradual migration: each contract directory is valid for the protocol even if it is migrated or not.

We should also consider more complex gradual migrations which make the context unstable during the migration: where the migrated part of the context and not yet have some incompatibilities. The context flattening is such an unstable gradual migration. In the next article, we consider how to perform unstable gradual migrations safely.

tezos context optimization