Optz: Michelson optimizer

Optz is an optimizer of Michelson programs, the base language for Tezos smart contract. It takes a Michelson code as input and then returns a smaller and more efficient equivalent code.

Optimization of Michelson opcode sequences

The optimization consists of 2 parts: rule-based transformation and optimal stack manipulation search by A* algorithm.

Rule based code transformation

Dozens of rewriting rules replace opcodes with better ones, such as:

PAIR; CDR  =>  DROP

DIP n { a }; DIP n { b }  =>  DIP n { a; b }

SWAP; op  =>  op  // when op is a commutative binary operator

NOT; IF { a } { b }  =>  IF { b } { a }

Search for the optimal stack manipulation sequences

After the rule-based transformation, A* algorithm finds the most optimal sequence of stack manipulation opcodes such as DROP, SWAP, DIG, DUG, DIP, DUP, CAR, CDR, PAIR, UNPAIR, and the push only opcodes like PUSH and EMPTY_SET. For example, the following opcode sequence found in an existing contract in Tezos mainnet:

DIG 1; DUP 1; DUG 2; PUSH c0; DIG 2; PUSH c1; DIG 4

is replaced with a shorter equivalent one:

PUSH c0; DIG 1; PUSH c1; DUP 4

The search space explodes exponentially when the range of the stack touched by the target opcodes and the number of constants in it grows. To finish the optimization in a reasonable time, the maximum size of the search space is introduced.

Special case for drop-only sequence

Compilers from higher-level languages to Michelson clean the stack when the program execution exits each functional and variable scope. We have observed some compilers tend to produce very long inefficient stack cleaning opcode sequences. They are too big for the above A* algorithm to optimize.

To optimize these stack cleaners, a special case of stack manipulation optimization is introduced for “drop only sequences”, which never push nor reorder stack elements but only drop some of them. The optimal opcodes are easily constructible for these drop-only sequences, therefore no size restriction is necessary. For example, the following code found in a smart contract in Tezos Mainnet:

DROP 2; DIG 1; DROP 1; DIG 1; DROP 1; DIG 3; DROP 1; DIG 3; DROP 1; DIG 4; DROP 1; DIG 4; DROP 1; DIG 4; DROP 1; DIG 4; DROP 1; DIG 4; DROP 1; DIG 4; DROP 1

can be optimized to the following much shorter efficient code:

DROP 2; DUG 2; DROP 2; DIG 3; DIG 4; DROP 2; DIP 4 { DROP 6 }

Optimization of Tezos Mainnet contracts

We have tested Optz against 1711 different Michelson contracts deployed on Tezos Mainnet between 2021-02-18 and 2022-02-17.

All the contracts are optimized within 1 second by a laptop (Mac Book Pro 13-inch 2017) and the output Michelson encoding size is 92.8% of the inputs in average. Some are reduced by over 30%:

Optimization results of the contracts deployed on Tezos Mainnet between 2021-02-18 and 2022-02-17

The optimization performance highly depends on how hard the original high-level language to Michelson compilers optimizes its code.

What we learned

The drop-only sequence optimization greatly reduces the code sizes of outputs of some compilers. We recommend these compilers implement this optimization. It is cheap since the optimal opcodes can be constructed without a heavy A* search. All we need is a stack operation simulator.

Presentation

For in-depth explanation, slides are available:

Try Optz

Web app

You can try the web interface of Optz at https://dailambda.jp/optz-js. It is compiled to JavaScript and optimizes the input in your web browser.

Docker image

You can also use our Docker image. Use the following script as optz.sh:

#!/bin/sh
docker run --rm \
       -v `pwd`:/work dailambda/scaml:master \
       /root/.opam/4.12.1/bin/optz $1

then give your script at the current directory to the script to optimize:

$ ./optz.sh your_contract.tz
# optimized: bytes: 2516 -> 2438
parameter
  (...

Source code

The source code is available at https://gitlab.com/dailambda/scaml. Optz was originally written as the last phase of optimization in SCaml compiler, an OCaml backend for Michelson. Since this phase is a pure conversion from Michelson to Michelson, we made it an independent tool. The build is quite straightforward if you have already set up OPAM environment correctly:

$ git clone https://gitlab.com/dailambda/scaml
$ cd scaml
$ opam switch create . ocaml-base-compiler.4.12.1 --deps-only -y
$ dune build
$ dune build install

The idea of A* algorithm to optimize Tezos stack operations was originally published by Arthur Breitman. We did not notice this post when we started writing the optimizer using simple Dijkstra search. Later we extended our algorithm to A* and reduced the optimization times of some very heavy cases in Dijkstra.

Optz’s optimal stack operation search now supports CAR, CDR, PAIR, and UNPAIR opcodes. For example, UNPAIR; SWAP; DROP is now rewritten to CAR. The longest rewrite we found in Mainnet contracts is as follows:

DUP 1; CAR; DIP 1 { CDR }; DUP 1; CAR; DIP 1 { CDR }; DUP 1; CAR; DIP 1 { CDR; } DIG 2; DUP 1; CAR; DIP 1 { CDR }; DIG 4; DUG 2

=> UNPAIR; UNPAIR; UNPAIR; DIG 3; DIG 3; UNPAIR

This support of the pair opcodes improves the average reduction rate of optimization from 94.0% to 92.8%.

UNPAIR; PAIR => NOP is not always valid rewriting

At a first glance, the opcode sequence UNPAIR; PAIR is trivially contractable to NOP, but it is not always valid due to the field annotation (%name) of Michelson, therefore special care is required.

UNPAIR; PAIR clears the field annotations of the pair type of the top stack element: ex. pair (int %x) (int %y) => pair int int. Once the field annotations are removed, the pair type becomes compatible with other pair types with any field annotations, for example, pair (int %a) (int %b). If UNPAIR; PAIR is removed, the result stack type keeps the original field annotations and becomes no longer compatible with pair types with other field annotations.

Indeed, we have found several Michelson codes in Tezos Mainnet whose UNPAIR; PAIR cannot be removed away. For example,

IF { DIP { DROP } } { DROP; UNPAIR; PAIR }

is well-typed under the initial stack type pair (int %a) (int %b); pair (int %x) (int %y); A and the result typestack type is pair (int %a) (int %b); A. If UNPAIR; PAIR is removed from the code, it is no longer well typed under the initial stack type, since pair (int %a) (int %b) and pair (int %x) (int %y) are incompatible.

It requires type analysis to know exactly whether UNPAIR; PAIR and similar sequences are safely contractable, but Optz’s optimal stack operation search does not use types. Instead, it uses a conservative workaround: its rewriting has the following restriction: if a pair is formed by PAIR in the input opcode sequence, it must be also created by PAIR in the output. It makes sure the pair has a type without field annotations.