Automatically finding a program that implements a given specification is called program synthesis. The main difficulty is that the search space is huge: the number of programs of size $$n$$ grows exponentially. Naïvely enumerating every program of size $$n$$, checking whether each one satisfies the specification, and then moving on to programs of size $$n+1$$ and so on doesn’t scale. However, the field has advanced by using smarter search techniques to prune the search space, leveraging performance improvements in SMT solvers, and at times limiting the scope of the problem.

In this post, I’ll explain one approach to modern program synthesis: counterexample-guided iterative synthesis of component-based, loop-free programs, as described in Synthesis of Loop-Free Programs by Gulwani et al. We’ll dissect exactly what each of those terms mean, and we’ll also walk through an implementation written in Rust that uses the Z3 solver.

My hopes for this post are two-fold:

1. I hope that people who are unfamiliar with program synthesis — just like I was not too long ago — get a little less unfamiliar and learn something new about the topic. I’ve tried to provide many examples, and break down the dense logic formulas from the paper into smaller, approachable pieces.

2. I hope that folks who are already familiar with this kind of program synthesis can help me diagnose some performance issues in the implementation, where I haven’t been able to reproduce the synthesis results reported in the literature. For some of the more difficult benchmark problems, the synthesizer fails to even find a solution before my patience runs out.

## Motivation

Why write a program that writes other programs for me? Am I just too lazy to write them myself? Of course I am. However, there are many valid reasons why a person who is not as lazy as I am might want to synthesize programs.

Some programs are quite tricky to write correctly by hand, and a program synthesizer might succeed where you or I might fail. Quick! How do you isolate the rightmost zero bit in a word using only three bit manipulation instructions?!

              ,--- The rightmost zero bit.
|
V
Input:  011010011

Output: 000000100
^
|
'--- Only that bit is set.


Did you get it yet?

Our program synthesizer will find a solution in under a second, and that minimal-length solution in a minute or so. It would take me quite a while longer than that to do the same by hand. We’ll return to this problem throughout the rest of this post, and use it as a running example.

Another reason to use a program synthesizer might be that we need to write many more programs than we have time to write by hand. Take for example a compiler’s peephole optimizer: it considers a sliding window of instruction sequences, and for each sequence, it checks if it knows of an equivalent-but-faster-or-smaller instruction sequence. When it does know of a better instruction sequence, it replaces the original instructions with the better ones.

Peephole optimizers are typically constructed from pattern-matching rules that identify suboptimal instruction sequences paired with the improved instruction sequence to replace matches with:

Each replacementi is a little, optimized mini-program. If we were writing a new peephole optimizer from scratch and by hand, we would have to write $$n$$ optimized mini-programs ourselves. And $$n$$ can be big: LLVM’s InstCombine peephole optimizer has over 1,000 pattern-and-replacement pairs. Even half that many is way more than I want to write myself.

Instead of writing those optimized mini-programs by hand, we can use each original instruction sequence as a specification, feed it into a program synthesizer, and see if the synthesizer can find the optimal instruction sequence that does the same thing. Finally, we can use all these original instruction sequences and their synthesized, optimal instruction sequences as pattern-and-replacement pairs to automatically construct a peephole optimizer! This idea was first proposed by Bansal et al in Automatic Generation of Peephole Superoptimizers.

Edit: John Regehr pointed out to me that this idea has been floating around since much earlier than 2006, when the Bansal et al paper was published. He pointed me to The Design and Application of a Retargetable Peephole Optimizer by Davidson et al from 1980 as an example, but noted that even this wasn’t the first time it came up.

## An Overview of Our Task

Program synthesis is the act of taking a specification, and automatically finding a program that satisfies it. In order to make the problem a little more tractable, we’re limiting its scope in two ways:

1. Loop-free: We are only synthesizing programs without loops.

2. Component-based: We are only synthesizing programs that can be expressed as the composition of a given library of components.

The loop-free limitation is not very limiting for many use cases. For example, peephole optimizers often don’t consider instruction sequences that span across loop boundaries.

Component-based synthesis means that rather than synthesizing programs using any combination of any number of the target language’s expressions, the synthesizer is given a library of components and synthesizes programs that use each of those components exactly once. The synthesizer rearranges the components, rewiring their inputs and outputs, until it finds a configuration that satisfies the specification.

That is, given a library of $$N$$ components, it constructs a program of the form

where each parameter in paramsi is either a tempj variable defined earlier in the program or one of the original inputs.

For example, given the two components

• f(a)
• g(a, b)

and an input parameter x, the synthesizer can construct any of the following candidate programs (implicitly returning the variable defined last):

or

or

or

or

or

That’s it. That’s all of the programs it can possibly construct given just those two components.

The synthesizer cannot construct the following program, because it doesn’t use every component:

And the synthesizer cannot construct this program, because it uses the f component more than once:

And, finally, it cannot construct this last program, because this last program uses some function h that is not a component in our given library:

The following table describes some of the properties of component-based synthesis by comparing it to the fully general version of program synthesis:

General Synthesis Component-Based Synthesis
Shape of Synthesized Programs Using any number of any of the target language's expressions Using only the components in the library
Size of Synthesized Programs Varies Exactly the size of the library, since each component in the library is used exactly once

In our synthesizer, the components will be functions over fixed bit-width integers (also known as “bitvectors” in the SMT solver parlance) and they will correspond to a single instruction in our virtual instruction set: add, and, xor, etc. But in principle they could also be higher-level functions or anything else that we can encode in SMT queries. More on SMT queries later.

While component-based synthesis makes the synthesis problem easier, it does foist a decision on us each time we invoke the synthesizer: we must choose the library of available components. Each component is used exactly once in the synthesized program, but if we want to synthesize a program that performs multiple additions, we can include multiple instances of the add component in the library. Too few components, and the synthesizer might not be able to find a solution. Too many components will slow down the synthesizer, and let it generate non-optimal programs that potentially contain dead code.

To summarize, in component-based synthesis of loop-free programs, our synthesizer’s inputs are

• a specification, and

• a library of components.

Its output is a program that satisfies the specification, expressed in terms of the given components, or an error if can’t find such a program.

## Formalizing the Problem

In order to synthesize a program, we need a specification describing the desired program’s behavior. The specification is a logical expression that describes the output when the program is given these inputs.

We define the specification with:

• $$\vec{I}$$ as the program inputs,

• $$O$$ as the program output, and

• $$\phi_\mathrm{spec}(\vec{I}, O)$$ as the expression relating the inputs to the output. This expression should be true when $$O$$ is the desired output of running the program on inputs $$\vec{I}$$.

The library of components we’re given is a multi-set of specifications describing each component’s behavior. Each component specification comes with how many inputs it takes (e.g. an add(a, b) component takes two inputs, and a not(a) component takes one input) as well as a logical formula relating the component’s inputs to its output. The component inputs, output, and expression all have similar notation to the program specification, but with a subscript:

• $$\vec{I}_i$$ is the $$i^\mathrm{th}$$ component’s input variables,

• $$O_i$$ is the $$i^\mathrm{th}$$ component’s output variable, and

• $$\phi_i(\vec{I}_i, O_i)$$ is the logical expression relating the $$i^\mathrm{th}$$ component’s inputs with its output.

We define $$N$$ as the number of components in the library.

For our isolating-the-rightmost-zero-bit example, what is the minimal components library we could give to the synthesizer, while still preserving its ability to find our desired solution? It would be a library consisting of exactly the components that correspond to each of the three instructions in the solution program: a not, an add1, and an and component.

Component Definition Description
$$\phi_0(I_0, O_0)$$ $$O_0 = \texttt{bvadd}(1, I_0)$$ The add-one operation on bitvectors.
$$\phi_1(I_1, I_2, O_1)$$ $$O_1 = \texttt{bvand}(I_1, I_2)$$ The bitwise and operation on bitvectors.
$$\phi_2(I_3, O_2)$$ $$O_0 = \texttt{bvnot}(I_3)$$ The bitwise not operation on bitvectors.

Program synthesis can be expressed as an exists-forall problem: we want to find whether there exists some program $$P$$ that satisfies the specification for all inputs given to it and outputs it returns.

\begin{align} & \exists P: \\ & \quad \forall \vec{I},O: \\ & \quad \quad P(\vec{I}) = O \implies \phi_\mathrm{spec}(\vec{I}, O) \end{align}

Let’s break that down and translate it into English:

 $$\exists P$$ There exists some program $$P$$, such that $$\forall \vec{I},O$$ for all inputs $$\vec{I}$$ and output $$O$$, $$P(\vec{I}) = O$$ if we run the program on the inputs $$\vec{I}$$ to get the output $$O$$, $$\implies$$ then $$\phi_\mathrm{spec}(\vec{I}, O)$$ our specification $$\phi_\mathrm{spec}$$ is satisfied.

This exists-forall formalization is important to understand because our eventual implementation will query the SMT solver (Z3 in our case) with pretty much this formula. It won’t be exactly the same:

• $$P$$ is an abstraction that’s hiding some details about components,
• there are a few algebraic transformations we will perform, and
• we won’t pose the whole problem to the solver in a single query all at once.

Nonetheless, the implementation follows from this formalization, and we won’t get far if we don’t have a handle on this.

## A Brief Introduction to SMT Solvers

We can’t continue any further without briefly discussing SMT solvers and their capabilities. SMT solvers like Z3 take a logical formula, potentially containing unbound variables, and return whether it is:

• Satisfiable: there is an assignment to the unbound variables that makes the assertions true, and also here is a model describing those assignments.

• Unsatisfiable: the formula’s assertions are false; there is no assignment of values to the unbound variables that can make them true.

SMT solvers take their assertions in a Lisp-like input language called SMT-LIB2. Here is an example of a satisfiable SMT query:

Note that even though there isn’t any $$\exists$$ existential quantifier in there, the solver is implicitly finding a solution for $$x$$ in $$\exists x: x + 2 = 5$$, i.e. there exists some $$x$$ such that $$x + 2$$ equals 5. While some SMT solvers have some support for working with higher-order formulas with explicit $$\exists$$ existential and $$\forall$$ universal quantifiers nested inside, these modes tend to be much slower and also incomplete. We can only rely on first-order, implicitly $$\exists$$ existential queries: that is, formulas with potentially unbound variables and without any nested $$\exists$$ existential and $$\forall$$ universal quantification.

We can add a second assertion to our example that makes it unsatisfiable:

The assertions 10 = x + 1 and 5 = x + 2 put conflicting requirements on x, and therefore there is no value for x that can make both assertions true, and therefore the whole query is unsatisfiable.

## Counterexample-Guided Iterative Synthesis

Counterexample-guided iterative synthesis (CEGIS) enables us to solve second-order, exists-forall queries — like our program synthesis problem — with off-the-shelf SMT solvers. CEGIS does this by decomposing these difficult queries into multiple first-order, $$\exists$$ existentially quantified queries. These are the kind of queries that off-the-shelf SMT solvers excel at solving. First, we’ll look at CEGIS in general, and after that we’ll examine what is required specifically for component-based CEGIS.

CEGIS begins by choosing an initial, finite set of inputs. There has to be at least one, but it doesn’t really matter where it came from; we can use a random number generator. Then we start looping. The first step of the loop is finite synthesis, which generates a program that is correct at least for the inputs in our finite set. It may or may not be correct for all inputs, but we don’t know that yet. Next, we take that candidate program and verify it: we want determine whether it is correct for all inputs (in which case we’re done), or if there is some input for which the candidate program is incorrect (called a counterexample). If there is a counterexample, we add it to our set, and continue to the next iteration of the loop. The next program that finite synthesis produces will be correct for all the old inputs, and also this new counterexample. The counterexamples force finite synthesis to come up with more and more general programs that are correct for more and more inputs, until finally it comes up with a fully general program that works for all inputs.

Without further ado, here is the general CEGIS algorithm:

\begin{align} & \texttt{CEGIS}(\phi_\mathrm{spec}(\vec{I}, O)): \\ & \qquad S = \langle \text{initial finite inputs} \rangle \\ & \qquad \\ & \qquad \textbf{loop}: \\ & \qquad \qquad \texttt{// Finite synthesis.} \\ & \qquad \qquad \textbf{solve for P in } \exists P,O_0,\ldots,O_{\lvert S \rvert - 1}: \\ & \qquad \qquad \qquad \left( P(S_0) = O_0 \land \phi_\mathrm{spec}(S_0, O_0) \right) \\ & \qquad \qquad \qquad \land \ldots \\ & \qquad \qquad \qquad \land \left( P(S_{\lvert S \rvert - 1}) = O_{\lvert S \rvert - 1} \land \phi_\mathrm{spec}(S_{\lvert S \rvert - 1}, O_{\lvert S \rvert - 1}) \right) \\ & \qquad \qquad \textbf{if } \texttt{unsat}: \\ & \qquad \qquad \qquad \textbf{error} \text{ “no solution”} \\ & \qquad \qquad \\ & \qquad \qquad \texttt{// Verification.} \\ & \qquad \qquad \textbf{solve for \vec{I} in } \exists \vec{I},O: \,\, P(\vec{I}) = O \land \lnot \phi_\mathrm{spec}(\vec{I}, O) \\ & \qquad \qquad \textbf{if } \texttt{unsat}: \\ & \qquad \qquad \qquad \textbf{return P} \\ & \qquad \qquad \textbf{else}: \\ & \qquad \qquad \qquad \textbf{append \vec{I} to S} \\ & \qquad \qquad \qquad \textbf{continue} \end{align}

CEGIS decomposes the exists-forall synthesis problem into two parts:

1. Finite synthesis: The first query, the finite synthesis query, finds a program that is correct for at least the finite example inputs in $$S$$. Here’s its breakdown:

 $$\exists P,O_0,\ldots,O_{\lvert S \rvert - 1}:$$ There exists some program $$P$$ and outputs $$O_0,\ldots,O_{\lvert S \rvert - 1}$$ such that $$( \,\, P(S_0) = O_0$$ $$O_0$$ is the output of running the program on inputs $$S_0$$ $$\land$$ and $$\phi_\mathrm{spec}(S_0, O_0) \,\, )$$ the specification is satisfied for the inputs $$S_0$$ and output $$O_0$$, $$\land \ldots$$ and… $$\land \left( P(S_{\lvert S \rvert - 1}) = O_{\lvert S \rvert - 1} \land \phi_\mathrm{spec}(S_{\lvert S \rvert - 1}, O_{\lvert S \rvert - 1}) \right)$$ and $$O_{\lvert S \rvert - 1}$$ is the output of running the program on inputs $$S_{\lvert S \rvert - 1}$$, and the specification is satisfied for these inputs and output.

Note that this is a first-order, existential query; it is not using nested $$\forall$$ universal quantification over all possible inputs! Instead, it is instantiating a new copy of $$P(S_i) = O_i \land \phi_\mathrm{spec}(S_i, O_i)$$ for each example in our finite set $$S$$.

For example, if $$S = \langle 3, 4, 7 \rangle$$, then the finite synthesis query would be

\begin{align} & \exists P,O_0,O_1,O_2: \\ & \qquad \left( P(3) = O_0 \land \phi_\mathrm{spec}(3, O_0) \right) \\ & \qquad \land \,\, \left( P(4) = O_1 \land \phi_\mathrm{spec}(4, O_1) \right) \\ & \qquad \land \,\, \left( P(7) = O_2 \land \phi_\mathrm{spec}(7, O_2) \right) \\ \end{align}

This inline expansion works because the finite set of inputs $$S$$ is much smaller in practice (typically in the tens, if even that many) than the size of the set of all possible inputs (e.g. there are $$2^{32}$$ bitvectors 32 bits wide).

If the query was unsatisfiable, then there is no program that can implement the specification for every one of the inputs in $$S$$. Since $$S$$ is a subset of all possible inputs, that means that there is no program that can implement the specification for all inputs. And since that is what we are searching for, it means that the search has failed, so we return an error.

If the query was satisfiable, the resulting program $$P$$ satisfies the specification for all the inputs in $$S$$, but we don’t know whether it satisfies the specification for all possible inputs or not yet. For example, if $$S = \langle 0, 4 \rangle$$, then we know that the program $$P$$ is correct when given the inputs $$0$$ and $$4$$, but it may or may not be correct when given the input $$1$$. We don’t know yet.

2. Verification: Verification takes the program $$P$$ produced by finite synthesis and checks whether it satisfies the specification for all inputs. That’s naturally expressed as a $$\forall$$ universally quantified query over all inputs, but we can instead ask if there exists any input to the program for which the specification is not satisfied thanks to De Morgan’s law.

Here’s the breakdown of the verification query:

 $$\exists \vec{I}, O:$$ Does there exist some inputs $$\vec{I}$$ and output $$O$$ such that $$P(\vec{I}) = O$$ $$O$$ is the result of running the program on $$\vec{I}$$ $$\land$$ and $$\lnot \phi_\mathrm{spec}(\vec{I}, O)$$ the specification is not satisfied.

If the verification query is unsatisfiable, then there are no inputs to $$P$$ for which the specification is not satisfied, which means that $$P$$ satisfies the specification for all inputs. If so, this is what we are searching for, and we’ve found it, so return it!

However, if the verification query is satisfiable, then we’ve discovered a counterexample: a new input $$\vec{I}$$ for which the program does not satisfy the specification. That is, the program $$P$$ is buggy when given $$\vec{I}$$, so $$P$$ isn’t the program we are searching for.

Next, we add the new $$\vec{I}$$ to our finite set of inputs $$S$$, so that in the next iteration of the loop, we will synthesize a program that produces a correct result when given $$\vec{I}$$ in addition to all the other inputs in $$S$$.

As the loop iterates, we add more and more inputs to $$S$$, forcing the finite synthesis query to produce more and more general programs. Eventually it produces a fully general program that satisfies the specification for all inputs. In the worst case, we are adding every possible input into $$S$$: finite synthesis comes up with a program that fails verification when given 1, and then when given 2, and then 3, and so on. In practice, each counterexample $$\vec{I}$$ that verification finds tends to be representative of many other inputs that are also currently unhandled-by-$$P$$. By adding $$\vec{I}$$ to $$S$$, the next iteration of finite synthesis will produce a program that handles not just $$\vec{I}$$ but also all the other inputs that $$\vec{I}$$ was representative of.

For example, finite synthesis might have produced a program that handles all of $$S = \langle 0,1,5,13 \rangle$$ but which is buggy when given a positive, even number. Verification finds the counterexample $$I = 2$$, which gets appended to $$S$$, and now $$S = \langle 0,1,5,13,2 \rangle$$. Then, the next iteration of the loop synthesizes a program that doesn’t just also work for 2, but works for all positive even numbers. This is what makes CEGIS effective in practice.

Finally, notice that both finite synthesis and verification are first-order, $$\exists$$ existentially quantified queries that off-the-shelf SMT solvers like Z3 can solve.

### CEGIS with Components

Now that we know how CEGIS works in the abstract, let’s dive into how we can use it to synthesize component-based programs.

For every loop-free program that is a composition of components, we can flip the program’s representation into a location mapping:

• Instead of listing the program line-by-line, defining what component is on each line, we can list components, defining what line the component ends up on.

• Instead of referencing the arguments to each component by variable name, we can reference either the line on which the argument is defined (if it comes from the result of an earlier component) or as the $$n^\mathrm{th}$$ program input.

For example, consider our program to isolate the rightmost zero bit in a word:

We can exactly represent this program with the following location mapping:

With component-based CEGIS, we’ll be synthesizing this kind of location mapping. This lets us represent a whole component-based program with a handful of numbers for lines and argument indices. And numbers are something that we can represent directly in an SMT query.

#### Verifying a Component-Based Program

Let’s start with verying a component-based program before we look at their finite synthesis. Verification takes a location mapping, connects the components’ input and output variables together as described by the location mapping, and asks the SMT solver to find a counterexample.

For convenience, so we don’t have to keep repeating $$\vec{I}_0,\ldots,\vec{I}_{N-1}$$ all the time, we define $$\textbf{P}$$ as the set of all the parameter variables for each component in the library:

$$\textbf{P} = \, \vec{I}_0 \, \cup \, \ldots \, \cup \, \vec{I}_{N-1}$$

And similarly we define $$\textbf{R}$$ as the set of all temporary result variables for each component in the library:

$$\textbf{R} = \{O_0, \, \ldots, \, O_{N-1}\}$$

With our running example of isolating the rightmost zero bit, our minimal library consists of

\begin{align} \phi_0(I_0, O_0) &= [O_0 = \texttt{bvadd}(1, I_0)] \\ \phi_1(I_1, I_2, O_1) &= [O_1 = \texttt{bvand}(I_1, I_2)] \\ \phi_2(I_3, O_2) &= [O_2 = \texttt{bvnot}(I_3)] \end{align}

and therefore its

\begin{align} N &= 3 \\ \textbf{P} &= \{ I_0, I_1, I_2, I_3, I_4 \} \\ \textbf{R} &= \{ O_0, O_1, O_2 \} \end{align}

We want to constrain the whole library to behave according to its individual component specifications. The output of each and component should indeed be the bitwise and of its inputs, and the output of each not component should indeed be the bitwise not of its input, etc… We define $$\phi_\mathrm{lib}$$ as the combination of every component specification $$\phi_i$$:

$$\phi_\mathrm{lib}(\textbf{P}, \textbf{R}) = \phi_i(\vec{I}_0, O_0) \land \ldots \land \phi_i(\vec{I}_{N-1}, O_{N-1})$$

So for our minimal example library, the $$\phi_\mathrm{lib}$$ we get is:

\begin{align} \phi_\mathrm{lib}(\textbf{P}, \textbf{R}) &= [ O_0 = \texttt{bvadd}(1, I_0) ] \\ &\land [ O_1 = \texttt{bvand}(I_1, I_2) ] \\ &\land [ O_2 = \texttt{bvnot}(I_3) ] \end{align}

That is, the library’s constraints are satisfied when all of

• $$O_0$$ is the wrapping addition of $$I_0$$ and 1,
• $$O_1$$ is the bitwise and of $$I_1$$ and $$I_2$$, and
• $$O_2$$ is the bitwise not of $$I_3$$,

Because finite synthesis runs before verification, we already have access to the candidate program’s location mapping when we’re constructing our verification query. This location mapping tells us which actual arguments align with which formal parameters of a component. That means we know what the connections are from each component’s input variables to the program inputs and the temporary result variables for other components. We know the dataflow between components.

Let’s make this concrete with our isolating-the-rightmost-zero-bit example. Having produced this candidate program:

With this library:

\begin{align} \phi_0(I_0, O_0) &= [O_0 = \texttt{bvadd}(1, I_0)] \\ \phi_1(I_1, I_2, O_1) &= [O_1 = \texttt{bvand}(I_1, I_2)] \\ \phi_2(I_3, O_2) &= [O_2 = \texttt{bvnot}(I_3)] \end{align}

We know that $$\texttt{a} = O_2$$, since it is the result of the not component $$\phi_2(I_3, O_2)$$. And since a is the first argument to and a, b, which uses component $$\phi_1(I_1, I_2, O_1)$$, we know that $$\texttt{a} = I_1$$. Therefore, we know that $$O_2 = I_1$$.

We have these equalities from each component input variable $$I_i$$ in $$\textbf{P}$$ to either some other component’s output variable $$O_j$$ in $$\textbf{R}$$ or to one of the program inputs $$\vec{I}$$. These equalities are given to us directly by the location mapping for the candidate program that we’re verifying.

Additionally, because our candidate program is implicitly returning the last temporary variable c, which is the result of the and component $$\phi_1(I_1, I_2, O_1)$$, and because the $$O$$ in $$\phi_\mathrm{spec}(\vec{I}, O)$$ represents the result of the whole program, we know that $$O = O_1$$.

If we put all these equalities together for our example program we get:

$$\left( I_0 = x \right) \land \left( I_1 = O_2 \right) \land \left( I_2 = O_1 \right) \land \left( I_3 = x \right) \land \left( O = O_1 \right)$$

This represents all the connections between our library’s various components and to the candidate program’s inputs and output. If you imagine connecting the components together like a circuit, this represents all the wires between each component.

We define these component-connecting equalities as $$\phi_\mathrm{conn}$$, and its general definition is:

\begin{align} \phi_\mathrm{conn}(\vec{I}, O, \textbf{P}, \textbf{R}) &= \left( O = O_\mathrm{last} \right) \\ & \land \left( \vec{I}_0 \, = \, \vec{V}_0 \right) \\ & \land \, \ldots \\ & \land \left( \vec{I}_{N-1} \, = \, \vec{V}_{N-1} \right) \end{align}

Where

• $$\vec{V}_i$$ are the actual arguments that the candidate program passes into the $$i^\mathrm{th}$$ component $$\phi_i$$. Each $$\vec{V}_i$$ is made up of entries from either the program’s inputs $$\vec{I}$$ or from temporary results from $$\textbf{R}$$ that are defined by earlier components in the program. This is equivalent to the arguments field defined for each component in our example location mapping’s components map.

• $$O_\mathrm{last}$$ is the output variable for the component on the last line of the program, according to our candidate program’s location mapping.

Once again, let’s break that down:

 $$\left( O = O_\mathrm{last} \right)$$ The output of the whole program is equal to the result of the component on the last line of the program, $$\land$$ and $$\left( \vec{I}_0 \, = \, \vec{V}_0 \right)$$ the first component's inputs and its assigned arguments are equal to each other, $$\land \, \ldots$$ and... $$\left( \vec{I}_{N-1} \, = \, \vec{V}_{N-1} \right)$$ the last component's inputs and its assigned arguments are equal to each other.

Note that both $$O_\mathrm{last}$$ and each $$\vec{V}_i$$ are properties of the candidate program’s location mapping, and are known at “compile time” of the verification query. They are not variables that we are $$\exists$$ existentially or $$\forall$$ universally quantifying over in the query itself. We expand them inline when constructing the verification query.

Ok, so with all of that out of the way, we can finally define the verification constraint that we use in component-based CEGIS:

\begin{align} & \exists \vec{I}, O, \textbf{P} , \textbf{R} : \\ & \qquad \phi_\mathrm{conn}(\vec{I}, O, \textbf{P}, \textbf{R}) \land \phi_\mathrm{lib}(\textbf{P}, \textbf{R}) \land \lnot \phi_\mathrm{spec}(\vec{I}, O) \end{align}

The verification constraint asks: given that we’ve connected the components together as described by the candidate program’s location mapping, are there any inputs for which the specification is not satisfied?

Let’s break that down once more:

 $$\exists \vec{I}, O, \textbf{P} , \textbf{R} :$$ Does there exist some inputs and output such that $$\phi_\mathrm{conn}(\vec{I}, O, \textbf{P}, \textbf{R})$$ when the components are connected together as described by our candidate program's location mapping, $$\land \,\, \phi_\mathrm{lib}(\textbf{P}, \textbf{R})$$ and when the components behave as defined by our library, $$\land \,\, \lnot \phi_\mathrm{spec}(\vec{I}, O)$$ the specification is not satisfied?

Finding a solution to this query gives us a new counterexample $$\vec{I}$$ that we can add to our set of examples $$S$$ for future iterations of the CEGIS loop. Failure to find any solution to this query means that the candidate location mapping corresponds to a program that is correct for all inputs, in which case we’re done.

#### Finite Synthesis of a Component-Based Program

Finite synthesis composes the library components into a program that will correctly handle all the given example inputs. It does this by querying the SMT solver for a location mapping that contains assignments of components to lines in the program, and assignments of variables to each component’s actual arguments.

Recall our example location mapping:

To encode a location mapping in the finite synthesis query, every component parameter in $$\textbf{P}$$ and every component result in $$\textbf{R}$$ gets an associated location variable. The finite synthesis query is searching for an assignment to these location variables.

We call the set of all location variables $$L$$, and we refer to a particular location variable as $$l_x$$ where $$x$$ is either a component result in $$\textbf{R}$$ or component parameter in $$\textbf{P}$$:

$$L = \{ \, l_x \, \vert \, x \in \textbf{P} \cup \textbf{R} \, \}$$

The location variable for a result $$l_{O_i}$$ is equivalent to the line field for a component in our JSON-y syntax for example location mappings. It determines the line in the program that the component is assigned to, and therefore where its temporary result is defined.

The location variable for a parameter $$l_p$$ is equivalent to an entry in a component’s arguments list in our JSON-y syntax. These location variables determine where the associated parameter gets its value from: either the $$i^\mathrm{th}$$ program input or the temporary result defined on the $$j^\mathrm{th}$$ line of the program.

To use one index space for both line numbers and program inputs, we follow the same convention that we did with entries in the arguments list in the JSON syntax:

• When $$l_x$$ is less than the number of program inputs, then it refers to the $${l_x}^\mathrm{th}$$ program input.

• Otherwise, when $$l_x$$ is greater than or equal to the number of program inputs, then subtract the number of inputs from $$l_x$$ to get the line number it’s referring to.

Value of $$l_x$$ Refers To Location
0 Input 0 Program Inputs $$\vec{I}$$
1 Input 1
... ...
$$\lvert \vec{I} \rvert - 1$$ Input $$\lvert \vec{I} \rvert - 1$$
$$\lvert \vec{I} \rvert + 0$$ Line 0 Line Numbers
$$\lvert \vec{I} \rvert + 1$$ Line 1
... ...
$$\lvert \vec{I} \rvert + N - 1$$ Line $$N - 1$$

All loop-free, component-based programs can be described with a location mapping. However, the reverse is not true: not all location mappings describe a valid program.

Consider this location mapping:

This line mapping is inconsistent because it wants to put both its components on line zero of the program, but each line in the program can only use a single component.

To forbid the solver from providing bogus answers of this sort, we add the consistency constraint $$\psi_\mathrm{cons}$$ to the finite synthesis query. It requires that no pair of distinct component result location variables can be assigned the same line.

$$\psi_\mathrm{cons}(L) = \bigwedge\limits_{x,y \in \textbf{R}, x \not\equiv y} \left( l_x \neq l_y \right)$$

Once more, let’s break that down:

 $$\bigwedge\limits_{x,y \in \textbf{R}, x \not\equiv y}$$ For each $$x,y$$ pair of component results, where $$x$$ and $$y$$ are not the same result variable, $$\left( l_x \neq l_y \right)$$ the location of $$x$$ and the location of $$y$$ are not the same.

But there are even more ways that a location mapping might describe an invalid program! Consider this location mapping:

That location mapping describes this program:

The b temporary result is used before it is defined, and in order to compute b, we need to compute a, but computing a requires computing b, which requires computing a, etc… We have a cycle on our hands.

To forbid mappings that correspond to bogus programs with dataflow cycles, we use the acyclicity constraint $$\psi_\mathrm{acyc}$$. This constraint enforces that a particular component’s parameters are defined before this component’s line.

$$\psi_\mathrm{acyc}(L) = \bigwedge\limits_{i=0}^{N-1} \left( \bigwedge\limits_{p \in \vec{I}_i} \left( l_p < l_{O_i} \right) \right)$$

Let’s break that down:

 $$\bigwedge\limits_{i=0}^{N-1}$$ For each component index $$i$$, $$\bigwedge\limits_{p \in \vec{I}_i}$$ and for each of the $$i^\mathrm{th}$$ component's input parameters, $$l_p < l_{O_i}$$ the location of the parameter should be less than the location of the component, meaning that the parameter is defined before the component is used.

The only other way that location mappings can be invalid is if a location is out of bounds of the program inputs and line numbers, so we’re ready to define the well-formed-program constraint $$\psi_\mathrm{wfp}$$. This constraint enforces that any location mapping we synthesize will correspond to a well-formed program.

A well-formed program is

• consistent,

• acyclic,

• its component parameter locations point to either a program input or a line number, and

• its component temporary result locations point to a line number.

Let’s define $$M$$ as the number of program inputs plus the number of components in the library:

$$M = \lvert \vec{I} \rvert + N$$

A component parameter location $$l_{p \in \textbf{P}}$$ can point to either

• a program input in the range from zero to the number of program inputs: $$0 \leq l_{p} \lt \lvert \vec{I} \rvert$$, or

• a line number, which corresponds to the $$N$$ locations following the program inputs: $$\lvert \vec{I} \rvert \leq l_p \lt M$$.

Since those two ranges are contiguous, it means that component parameter locations ultimately fall within the range $$0 \leq l_p \lt M$$.

A component temporary result location $$l_{r \in \textbf{R}}$$ must point to a line number, which means that they fall within the range $$\lvert \vec{I} \rvert \leq l_r \lt M$$.

Put all that together and we get the well-formed-program constraint $$\psi_\mathrm{wfp}$$:

\begin{align} \psi_\mathrm{wfp}(L) &= \bigwedge\limits_{p \in \textbf{P}} \left( 0 \leq l_p \lt M \right) \\ & \land \, \bigwedge\limits_{r \in \textbf{R}} \left( \lvert \vec{I} \rvert \leq l_r \lt M \right) \\ & \land \, \psi_\mathrm{cons}(L) \\ & \land \, \psi_\mathrm{acyc}(L) \end{align}

And here is its breakdown:

 $$\bigwedge\limits_{p \in \textbf{P}} \left( 0 \leq l_p \lt M \right)$$ Each component parameter location $$l_p$$ points to either a program input or a line number, $$\land \, \bigwedge\limits_{r \in \textbf{R}} \left( \lvert \vec{I} \rvert \leq l_r \lt M \right)$$ and each component result location $$l_r$$ points to a line number, $$\land \, \psi_\mathrm{cons}(L)$$ and the location mapping is consistent, $$\land \, \psi_\mathrm{acyc}(L)$$ and the location mapping is acyclic.

Now that we can constrain finite synthesis to only produce location mappings that correspond to well-formed programs, all we need to do is encode the connections between components and the behavior of the library. This should sound familiar: we need the finite synthesis equivalent of $$\phi_\mathrm{conn}$$ and $$\phi_\mathrm{lib}$$ from verification. And it turns out that $$\phi_\mathrm{lib}$$ doesn’t need to be tweaked at all, because the behavior of the library remains the same whether we are in verification or finite synthesis. But while $$\phi_\mathrm{conn}$$ was checking a set of already-known connections between components, in finite synthesis we are searching for those connections, so we need a different query.

These connections define the dataflow between components. They are the wires in the circuit built from our components. A connection from some component result into another component’s input means that we need to constrain the result and input variables to be equal in the finite synthesis query. For example, if component $$\phi_i$$ get’s placed on line 3, and parameter $$p$$ is assigned the location 3, then $$p$$ must take on the same value as the output $$O_i$$ of that component.

This leads us to our definition of $$\psi_\mathrm{conn}$$: for every pair of location variables $$l_x$$ and $$l_y$$, if they refer to the same location, then $$x$$ and $$y$$ must have the same value.

$$\psi_\mathrm{conn}(L, \vec{I}, O, \textbf{P}, \textbf{R}) = \bigwedge\limits_{x,y \in \vec{I} \cup \textbf{P} \cup \textbf{R} \cup { O } } \left( \left( l_x = l_y \right) \implies \left( x = y \right) \right)$$

Here is its piece-by-piece breakdown:

 $$\bigwedge\limits_{x,y \in \vec{I} \cup \textbf{P} \cup \textbf{R} \cup \{ O \} }$$ For each pair of location variables $$l_x$$ and $$l_y$$, where $$x$$ and $$y$$ are either a program input, or a component's parameter, or a component's temporary result, or the program output, $$\left( l_x = l_y \right)$$ if the location variables refer to the same location, $$\implies$$ then $$\left( x = y \right)$$ $$x$$ and $$y$$ must have the same value.

We’re finally ready to define our finite synthesis query for a location mapping. This query asks the solver to find some location mapping that corresponds to a well-formed program that satisfies our specification for each example input in $$S$$. In other words, it must enforce that

• the location mapping corresponds to a well-formed program, and

• when the components are connected as described by the location mapping, and when the components behave as described by our library,

• then the specification is satisfied for each of our example inputs in $$S$$.

Here it is, finally, our finite synthesis query:

\begin{align} & \exists L, O_0, \ldots, O_{\vert S \rvert - 1}, \textbf{P}_0, \ldots, \textbf{P}_{\vert S \rvert - 1}, \textbf{R}_0, \ldots, \textbf{R}_{\vert S \rvert - 1}: \\ & \qquad \psi_\mathrm{wfp}(L) \,\, \land \\ & \qquad \qquad \bigwedge\limits_{i=0}^{\lvert S \rvert - 1} \left( \phi_\mathrm{lib}(\textbf{P}_i, \textbf{R}_i) \land \psi_\mathrm{conn}(L, S_i, O_i, \textbf{P}_i, \textbf{R}_i) \land \phi_\mathrm{spec}(S_i, O_i) \right) \\ \end{align}

That’s quite a mouthful, so, one last time, let’s pull it apart and break it down:

 $$\exists L, O_0, \ldots, O_{\vert S \rvert - 1}, \textbf{P}_0, \ldots, \textbf{P}_{\vert S \rvert - 1}, \textbf{R}_0, \ldots, \textbf{R}_{\vert S \rvert - 1}:$$ There exists some location mapping $$L$$, and program outputs, component parameters, and component results variables for each example in $$S$$, such that $$\psi_\mathrm{wfp}(L) \,\, \land$$ the location mapping is a well-formed program, and $$\bigwedge\limits_{i=0}^{\lvert S \rvert - 1}$$ for each example input index $$i$$, $$\phi_\mathrm{lib}(\textbf{P}_i, \textbf{R}_i)$$ the components behave as described by the library, $$\land \,\, \psi_\mathrm{conn}(L, S_i, O_i, \textbf{P}_i, \textbf{R}_i)$$ and the components are connected as described by the location mapping, $$\land \,\, \phi_\mathrm{spec}(S_i, O_i)$$ and the specification is satisfied for the $$i^\mathrm{th}$$ example input.

When the solver finds a satisfiable assignment for this query, we get a new candidate location mapping that corresponds to a program that is correct for each of the example inputs in $$S$$. When the solver finds the query unsatisifiable, that means there is no locaiton mapping that corresponds to a program that is correct for each of the example inputs, which means that our search has failed.

## Implementation

I implemented a loop-free, component-based program synthesizer in Rust that uses Z3 to solve the finite synthesis and verification queries. The implementation’s repository is over here.

Our target language has all the operations you would expect for working with fixed-width integers. It has arithmetic operations like add and mul. It has bitwise operations like and and xor. It has comparison operations like eq, that evaluate to one if the comparison is true and zero otherwise. Finally, it has a select operation that takes three operands: a condition, a consequent, and an alternative. When the condition is non-zero, it evaluates to the consequent, and otherwise it evaluates to the alternative.

Values are neither signed nor unsigned. For operations like division that behave differently on signed and unsigned integers, we have a different instruction for each behavior: div_s for signed division and div_u for unsigned division.

### Program Representation

A program is a sequence of instructions:

An instruction has an operation and binds its result to an identifier:

An identifier is an opaque wrapper around a number:

We won’t support any nice names for identifiers, since essentially everything is a temporary. When we stringify programs, we’ll turn the identifiers into a, b, c, etc…

Additionally, we will uphold the invariant that Id(i) always refers to the value defined by the ith instruction in the program.

Finally, the Operator enumeration has a variant for each operation in the language, along with the identifiers of its operands:

### Building Programs

To make constructing programs by hand easier, we have a builder for programs. It wraps an inner Program and exposes a method for each operation, to append an instruction using that operation to the program thats being built.

Here is our isolate-the-righmost-zero-bit example program constructed via the ProgramBuilder:

Having the builder is nice since we intend to write unoptimized programs ourselves, that we then use as specifications for synthesizing optimized programs.

### Defining Components

Every operator has an associated component, so that we can synthesize programs that use that operator. Because a library can have a heterogeneous set of components, we’ll define an object-safe Component trait, so we can box them up as trait objects with dynamically dispatched methods, and store them all in the same collection.

A Component has two roles:

1. It must provide the constraints between its inputs and its output for the solver. This is its $$\phi_i(\vec{I}_i, O_i)$$ that will be part of the whole library’s $$\phi_\mathrm{lib}(\textbf{P}, \textbf{R})$$.

2. After we’ve synthesized some location mapping for a program that uses this component, in order to construct the corresponding Program, the component needs to know how to create the Operator it represents.

The make_expression trait method will handle the first role, and the make_operator trait method will handle the second. make_expression takes Z3 BitVec variables as its operands (the $$\vec{I}_i$$ input variables) and returns a new Z3 BitVec expression that represents the result of running the operation on the operands (the $$O_i$$ output variable). make_operator takes the Ids of its operands (as described by a synthesized location mapping) and returns its associated Operator with those Ids as its operands.

Note that, because different components have different arities, these methods don’t take their operands as multiple, individual parameters like a: BitVec, b: BitVec or a: Id, b: Id. Instead, components report their arity with the operand_arity trait method and callers pass in a slice of that many operands.

Finally, we also support synthesizing constant values for the const operator. These immediates are handled separately from operands, and so we also have the immediate_arity trait method, and make_expression and make_operator also takes slices of immediates.

Let’s take a look at the implementation of a simple, representative component: the add component.

• It has two operands and zero immediates.

• The make_operator implementation is mechanical, pulling out its individual operands from the given slice, just to push them into the Operator::Add.

• The make_expression implementation is similarly concise, but is not quite mechanical. This is where we encode the add operator’s semantics into a Z3 expression. SMT-LIB2 defines a bvadd operatoin for wrapping addition on bitvectors, and the z3 crate we’re using exposes it to us as a method on BitVec. Wrapping addition is exactly what we want for our add instruction, and so all we have to do to encode our add operation’s semantics is bvadd our two operands.

• Because we are always working with components as trait objects, we also define a helper function for boxing up the Add component and turning it into a trait object with dynamically dispatched methods.

Most components look pretty much like that: there is some direct encoding into a single SMT expression for the operation.

Other components, like eqz, don’t have a direct encoding as a single SMT expression, and we need to compose multiple.

The const component is the only component that uses immediates. It doesn’t have operands. Either its constant value is unbound, in which case we’re synthesizing the value, or we provide a value for it upon constructing the component:

Finally, a library is just a vector of components:

### Specifications

A specification looks quite similar to a component, but we only need to create its SMT constraints. We don’t need to construct Operators from it, so it doesn’t have an equivalent of the Component::make_operator method.

We want to use existing, unoptimized programs as specifications for new, optimized programs, and so we implement Specification for Program.

The arity of a program specification is how many input variables the program defines with the var operator:

To create a Z3 expression for a whole program, we build up subexpressions instruction by instruction in a table. This table serves as our lexical environment. By inspecting an operation’s operand Ids, we pluck the corresponding subexpressions out of this table to pass as operands into the instruction’s Component::make_expression implementation. The final entry in the table is the expression for the whole program, since the program implicitly returns its last value.

### The Synthesizer

The Synthesizer structure solves one synthesis problem for us. It takes a library and spec upon construction, and then we can configure how it will run with a few builder-style methods, and then finally we call its synthesize method to actually kick off the CEGIS loop.

This allows us to create, configure, and run a synthesis problem like this:

### Location Mappings

A location mapping uses a bunch of line indices to represent a component-based program. Lines are something we can represent directly in SMT expressions, and so location mappings are how we will encode programs for synthesis.

We have the choice of either encoding a line as a mathematical integer, or as a bitvector. I originally used mathematical integers, and perhaps unsurprisingly, it is way slower than using a bitvector. By using a bitvector representation, we can use the minimal number of bits required for our library of components. For example, if there is one program input and seven components in the library, the largest zero-based line number we will ever need is seven, which means we only need bitvectors that are three bits wide to represent every line. The narrower the bitvector, the faster Z3 can work with it. I haven’t seen line representation mentioned in any of the papers I’ve read, but I’m sure it is tribal knowledge that “everyone knows.”

Location mappings are used differently at different phases of CEGIS. Before we synthesize a program, they are a collection of unbound variables in an SMT query that we’re constructing. After synthesis, those variables get bound to concrete values and we only want to work with the values from then on; we don’t need the variables anymore. Therefore, we split the location mapping into two distinct representations:

1. Before and during finite synthesis we have LocationVars, which is a collection of line variables.

2. After finite synthesis, we pull the values assigned to those variables out into an Assignments structure that is a collection of index values.

A LocationVars has line variables for the program inputs $$\vec{I}$$, for all the components’ parameters $$\textbf{P}$$, for all the components’ temporary results $$\textbf{R}$$, and for the whole program’s output $$O$$. It also keeps track of how wide our line location bitvectors are.

Unlike the original paper, we do not always take the program output from the last line in the synthesized location mapping. Instead, we allow setting the output line earlier in the program, essentially inserting early returns. This lets us (optionally) synthesize programs with optimal code size: we can find the earliest output line for which we can still synthesize a correct program. This little trick is taken from Souper.

The Assignments are all those same things, but instead of the variable for the line, we have the value of the line that we pulled out of the model the SMT solver gave us. It also has any synthesized immediate values.

### Verification

Verification takes an Assignment produced by finite synthesis, and attempts find a counterexample. If it finds one, then we need to continue the CEGIS loop. If it can’t find a counterexample, then we’ve found a solution and we’re done.

In the original paper, they represent the candidate program in the verification query with $$\phi_\mathrm{lib}$$ and $$\phi_\mathrm{conn}$$, essentially asking the solver to interpret the connections between components. This is what I originally did as well, but then I had a realization:

• We already need to have the ability to turn an Assignments into a Program for when we find a solution.

• We can already turn a Program into an SMT expression, since we use unoptimized programs as specifications for finding optimized ones.

That means we can convert our Assignments into a Program and then into an SMT expression. Why not verify that this SMT expression implies the specification directly? This means that

• we don’t need to construct $$\phi_\mathrm{lib}$$ and $$\phi_\mathrm{conn}$$ during verification,

• the solver doesn’t need to unify all the connections between components’ inputs and outputs in $$\phi_\mathrm{conn}$$ to solve the verification query, and

• this should generally create smaller queries, which should generally be faster to solve than larger queries.

This did end up saving a little bit of code in the implementation, but does not seem to have yielded a speed up for overall synthesis time. Probably because nearly all time is spent in finite synthesis, and not verification.

Anyways, here is the verification implementation:

In the case where verification finds a counterexample, we make a call to the add_invalid_assignment method. This is another trick taken from Souper: we remember assignments that work for some inputs, but not all of them, and explicitly forbid them in future finite synthesis queries. This saves the SMT solver from reconsidering these seemingly promising assignments, only to find once again that they don’t work for one of the example inputs.

### Finite Synthesis

Recall that finite synthesis searches for a location mapping that satisfies the specification for each of our example inputs, but might or might not satisfy the specification when given other inputs.

The first constraint on the location mapping is that it corresponds to a well-formed program. This constraint actually remains the same for every iteration of CEGIS, so we generate the constraint once when creating a new Synthesizer instance, and then reuse the already constructed constraint for each finite synthesis query. Similarly, we also store the location variables in the Synthesizer and reuse them across all iterations of the CEGIS loop.

Recall the definition of the $$\psi_\mathrm{wfp}$$ well-formed program constraint:

\begin{align} \psi_\mathrm{wfp}(L) &= \bigwedge\limits_{p \in \textbf{P}} \left( 0 \leq l_p \lt M \right) \\ & \land \, \bigwedge\limits_{r \in \textbf{R}} \left( \lvert \vec{I} \rvert \leq l_r \lt M \right) \\ & \land \, \psi_\mathrm{cons}(L) \\ & \land \, \psi_\mathrm{acyc}(L) \end{align}

It says that each parameter location $$l_p$$ falls within the range $$0 \leq l_p < M$$, each temporary result location $$l_r$$ falls within the range $$\lvert \vec{I} \rvert \leq l_r < M$$, that the locations are consistent, and that there are no dataflow cycles.

Constructing the equivalent SMT expression in Z3 follows pretty much directly from that, except it is a bit noisier, since we’re passing contexts around and need to convert Rust usizes and u32s into Z3 bitvectors of the appropriate width:

Remember that the $$\psi_\mathrm{cons}$$ consistency constraint ensures that we don’t assign two components to the same line:

$$\psi_\mathrm{cons}(L) = \bigwedge\limits_{x,y \in \textbf{R}, x \not\equiv y} \left( l_x \neq l_y \right)$$

Our consistent method creates the corresponding SMT expression for Z3, and additionally records pairs of location indices into the invalid_connections map. This map keeps a record of location variables that well-formed programs cannot have dataflow between. Therefore, we don’t need to add connection clauses for these pairs of location variables in our eventual implementation of the $$\psi_\mathrm{conn}$$ connectivity constraint. Leaving those pairs out of the connectivity constraint keeps the SMT expressoin that much smaller, which should help Z3 solve it that much faster. Once again, this trick was taken from Souper.

The $$\psi_\mathrm{acyc}$$ acyclicity constraint enforces that our synthesized location mappings don’t have dataflow cycles:

$$\psi_\mathrm{acyc}(L) = \bigwedge\limits_{i=0}^{N-1} \left( \bigwedge\limits_{p \in \vec{I}_i} \left( l_p < l_{O_i} \right) \right)$$

Our implementation of the acyclicity constraint doesn’t do anything fancy, and is just a direct translation of the constraints into an SMT expression:

Our next steps are implementing the $$\psi_\mathrm{conn}$$ connectivity constraint and the $$\phi_\mathrm{lib}$$ library specification. These do change with each iteration of the CEGIS loop, because we instantiate them for each example input we’re working with. That means we can’t cache-and-reuse them, like we do with the well-formed program constraint.

The $$\psi_\mathrm{conn}$$ connectivity constraint encodes the dataflow connections between components in the library. If two location variables refer to the same location, then the entities whose location is assigned by those variables must have the same value.

$$\psi_\mathrm{conn}(L, \vec{I}, O, \textbf{P}, \textbf{R}) = \bigwedge\limits_{x,y \in \vec{I} \cup \textbf{P} \cup \textbf{R} \cup { O } } \left( \left( l_x = l_y \right) \implies \left( x = y \right) \right)$$

The connectivity method constructs a Z3 expression that implements this constraint. Previously, we had recorded pairs of location variables that cannot be connected directly. Now, we filter out any such connections from this constraint, keeping the SMT expression smaller, as explained earlier.

Our final prerequisite before we can implement the whole finite synthesis query is the $$\phi_\mathrm{lib}$$ library specification, which describes the behavior of the library as a whole:

$$\phi_\mathrm{lib}(\textbf{P}, \textbf{R}) = \phi_i(\vec{I}_0, O_0) \land \ldots \land \phi_i(\vec{I}_{N-1}, O_{N-1})$$

Implementing the $$\phi_\mathrm{lib}$$ library specification involves getting the immediates, operands, and result variables for each component, asking the component to make its SMT expression relating those variables to each other, and finally anding all those expressions together:

Ok, we finally have all the functions we need to implement finite synthesis! The finite synthesis query assigns values to our location variables such that the resulting mapping corresponds to a well-formed program, and that program satisfies our specification for each of our example inputs.

\begin{align} & \exists L, O_0, \ldots, O_{\vert S \rvert - 1}, \textbf{P}_0, \ldots, \textbf{P}_{\vert S \rvert - 1}, \textbf{R}_0, \ldots, \textbf{R}_{\vert S \rvert - 1}: \\ & \qquad \psi_\mathrm{wfp}(L) \,\, \land \\ & \qquad \qquad \bigwedge\limits_{i=0}^{\lvert S \rvert - 1} \left( \phi_\mathrm{lib}(\textbf{P}_i, \textbf{R}_i) \land \psi_\mathrm{conn}(L, S_i, O_i, \textbf{P}_i, \textbf{R}_i) \land \phi_\mathrm{spec}(S_i, O_i) \right) \\ \end{align}

Our implementation of the finite synthesis query mostly follows the same structure as that formula:

• We already have the well-formed program constraint cached and ready for reuse, so we don’t need to recreate it here.

• We loop over each of our example inputs:

• Create fresh program output variables, fresh component parameter variables, and fresh component result variables for each example.

• Constrain each component’s parameters and output to behave as dictated by our library.

• Connect cross-component variables together as described by the location mapping, with the connectivity constraint.

• Require that the program specification is satisfied for this example input.

• Additionally, we assign the program’s output’s location to the line we were given. This lets callers optionally force the synthesis of optimally small programs.

• Finally, we run the query and parse the resulting location mapping’s assignments out of the solver’s model.

### The CEGIS Loop

When the synthesizer is configured to produce optimally small programs, we wrap the CEGIS loop in another loop, similar to how Souper does. Souper begins synthesizing at the shortest program length and then allows longer and longer programs until it finds an optimally short solution. In contrast, we start by synthesizing the longest programs that can be expressed with the given library first, and then try shorter and shorter programs from there. This is motivated by two insights:

1. In practice, it appears that a failed search for a program of length n-1, because no such program exists, takes much longer than searching for and successfully finding a program of length n. By iterating longest to shortest target program lengths, we only hit the expensive case once.

2. When we synthesize a longer-than-necessary program, often that program contains dead code. Running dead code elimination (DCE) on a loop-free program is super easy, and when the dead code is removed, it additionally lets us skip ahead a bunch of iterations in this loop.

When the synthesizer is not configured to produce optimally small programs, we adjust the range of target program lengths such that we only call into the CEGIS loop once. We accomplish this by setting the shortest length equal to the longest length.

Originally, this implementation was choosing the initial example inputs with a random number generator. After reading through the Souper sources, I saw that they are presenting the program specification to the solver with all the inputs and output variables unbound. Then they have solver solve for example inputs itself. These inputs are presumably more “interesting” than what a random number generator might choose, which should in turn help finite synthesis come up with better solutions more quickly.

I implemented this technique for our synthesizer and additionally constrain the input variables to be distinct from any that we’ve already added to our initial inputs $$S$$.

Here’s the logic formula for the find-an-example-input query:

\begin{align} & \exists \vec{I}, O: \\ & \qquad \phi_\mathrm{spec}(\vec{I}, O) \land \bigwedge\limits_{i=0}^{\lvert S \rvert - 1} \left( \vec{I} \neq S_i \right) \end{align}

And here’s its breakdown:

 $$\exists \vec{I}, O:$$ Find some inputs $$\vec{I}$$ and output $$O$$ such that $$\phi_\mathrm{spec}(\vec{I}, O)$$ our specification is satisfied $$\land$$ and $$\bigwedge\limits_{i=0}^{\lvert S \rvert - 1} \left( \vec{I} \neq S_i \right)$$ we don't already have those inputs in our list of example inputs $$S$$.

The initial_concrete_inputs method constructs this query, passes it to Z3, and pulls each discovered example out of each query’s resulting model:

Alright so now we are ready to examine the implementation of our inner CEGIS loop!

Close readers will have noticed all the bit_width parameters being passed around everywhere and the usage of a FULL_BIT_WIDTH constant in earlier code samples. That’s because we implement the well-known optimization of synthesizing programs initially at narrow bit widths, and then verifying with wider and wider bit widths until we’ve verified at our target bit width: FULL_BIT_WIDTH, aka 32 in this implementation.

This works well because programs that are correct at, say, eight bits wide can often work just as well at 32 bits wide. Note that this is “often” and not “always”: widening synthesized constants, in particular, can be troublesome. However, with many fewer bits for the solver to keep track of, synthesis at eight bits wide is a lot faster than synthesis at 32 bits wide.

When verifying a program synthesized at a narrower bit width, we verify at the narrow bit width, and then verify at double that bit width, and then we double the bit width again, etc… until we’ve reached the full bit width, at which point we’re done. Whenever verification finds a counterexample, from then on we synthesize at whatever bit width we’ve reached so far. That is, we’ll never narrow our current bit width again after widening it.

## Results

We have a new program synthesizer, and the natural next question is: how well does it work? How does it compare to existing synthesizers? Can it solve as many synthesis problems as other synthesizers? Can it find programs as fast as other synthesizers can?

First off, our synthesizer can indeed find an optimal solution for isolating the rightmost zero bit in a word! However, there are multiple optimal solutions, and it doesn’t always find the one that’s exactly how we would probably write it by hand. According to our synthesizer’s cost model, which is just number of instructions used, a less-than-or-equal comparison that always evaluates to true is just as good as a const 1:

Phew!

The Synthesis of Loop-Free Programs paper by Gulwani et al defines a benchmark suite of 25 bitvector problems taken from the big book of bit-twiddling hacks, Hacker’s Delight. They use it to benchmark their program synthesizer, Brahma. Our running example of isolating the rightmost zero bit in a word is the seventh problem in the benchmark suite. Other benchmark problems include:

• turning off the rightmost one bit in a word,

• taking the average of two integers without overflow,

• rounding up to the next power of two,

• etc…

The problems are roughly ordered in increasing difficulty: the first problem is much easier than the last problem.

They use a standard library of twelve base components for the first seventeen problems. For the last eight problems, they augment that library with additional components, depending on the problem at hand.

I ported the benchmark problems into Programs that can be fed into our synthesizer as specifications, and created a small runner that times how long it takes our synthesizer to rediscover solutions to these problems.

Porting their standard library over to our synthesizer required some small changes. For example, our language does not allow immediates in the add operation, so their add1 component becomes two components in our library: an add component and a const 1 component. There are a couple other instances where they had a single component that corresponded to multiple operations. We don’t support that scenario, so I split those components into multiple components when porting them over to our synthesizer. Additionally, for the later, more difficult problems, I used the minimal library required to solve the problem, since these more difficult problems are prone to take quite a long time.

For these benchmarks, I’m using Z3 version 4.4.1 which comes in my OS’s package manager. Brahma used the Yices solver, and my understanding is that Souper can plug in a few different solvers, including Z3.

Our goal here is only to get a general sense of our synthesizer’s performance. We are not precisely evaluating our synthesizer’s performance and comparing it with Brahma in excruciating detail because this isn’t an apples-to-apples comparison. But we should get a sense of whether anything is horribly off.

All problems are run with a timeout of one hour, meaning that if we can’t find a solution in that amount of time, we give up and move on to the next problem. For comparison, the longest that Brahma took on any problem was 46 minutes.

First, I measured how long it takes to synthesize a program of any length for each problem. Next, I measured how long it takes to synthesize an optimally small program for each problem. Finally, I measured how long it takes to synthesize both an optimally small program and the constant values used in each const operation, rather than being given the necessary constant value as part of the const component.

The results are summarized in the table below, alongside the results reported in the paper for their Brahma synthesizer. The timings are in units of seconds.

Benchmark Brahma Our Synthesizer
Any Length Min. Length Min. Length & Synth. Constants
p1 3.2 0.5 109.0 170.4
p2 3.6 1.1 136.0 99.4
p3 1.4 0.6 32.8 31.9
p4 3.3 0.5 100.5 130.1
p5 2.2 0.3 125.2 119.2
p6 2.4 0.6 96.3 87.7
p7 1.0 0.5 100.1 80.3
p8 1.4 1.0 90.5 77.6
p9 5.8 36.5 241.4 165.6
p10 76.1 6.8 34.2 42.1
p11 57.1 4.1 31.9 36.9
p12 67.8 11.5 30.6 52.8
p13 6.2 20.9 129.7 165.0
p14 59.6 85.1 1328.7 1598.8
p15 118.9 20.0 1897.3 1742.7
p16 62.3 25.2 3600.0* 3600.0*
p17 78.1 8.2 102.0 243.5
p18 45.9 1.5 27.4 33.5
p19 34.7 3.3 10.0 6.0
p20 108.4 757.5 3600.1* 3600.0*
p21 28.3 timeout timeout timeout
p22 279.0 timeout timeout timeout
p23 1668.0 Z3 crash timeout timeout
p24 224.9 timeout timeout timeout
p25 2778.7 timeout timeout timeout

* For these problems, the synthesizer found a solution, but not necessarily the optimally small solution. It timed out before it could complete the search and either discover or rule out the existence of smaller programs.

For the easier, early problems, our synthesizer does pretty well. Finding an optimally small program takes much longer than finding a program of any length. Synthesizing constants doesn’t seem to add much overhead to synthesis over finding an optimally small program. Our synthesizer hits some sort of cliff around problems 20 and 21, and run times start skyrocketing and running into the timeout ceiling.

Problem 23 is to synthesize an implementation of popcount. Brahma solves the problem in just under 28 minutes. The Souper folks report that they can solve this problem as well, although they do say that their synthesizer scales less well than Braham does. When I was running this benchmark, Z3 repeatedly crashed while trying to solve a finite synthesis query. Ultimately, the synthesizer wasn’t able to complete the problem. I ran earlier versions of our synthesizer on this problem during development, when Z3 wasn’t crashing, and I was still never able to synthesize a program for popcount. I let the synthesizer run for nearly three days, and it couldn’t do it.

Maybe Yices is much better at solving these sorts of queries than Z3 is? Perhaps there are some well-known optimizations to make these queries easier for solvers to solve, and I’m just ignorant of them? Maybe the fact that compound components get split into multiple components in our library, resulting in a larger library and more location variables to solve for, is the root of the problem? I don’t know.

This performance cliff with larger synthesis problems is my one disappointment with this project. If anyone has ideas about why the synthesizer is falling off this cliff, or how to profile Z3 and dig into why it is struggling with these queries, I’d love to hear from you. Particularly if you have experience with this kind of program synthesis. SMT solvers are very much a black box to me right now.

## Conclusion

Counterexample-guided iterative synthesis is a wonderfully clever way to transform a single problem that is too difficult for SMT solvers to solve, into multiple little problems, each of which off-the-shelf solvers can handle. Inverting the representation of a component-based program into a location mapping, so that it can be wholly represented inside an SMT query, is equally deft.

Program synthesis — writing programs that write other programs — is fun and delightful in a meta, ouroboros-y sense. It also foretells a future where we can mechanically generate large swaths of an optimizing compiler’s backend. A future where compilers emit actually optimal code sequences. I find that lofty promise exciting, but I’m also frustrated by my lack of insight into SMT solver performance. There do exist other approaches to program synthesis that only rely on SMT solvers for verification, and not for finite synthesis. However, it seems premature to give up on this approach when I haven’t yet reproduced the results reported for Brahma and Souper. Nonetheless, I remain optimistic, and I intend to keep digging in until I resolve these discrepancies.

## References

• fitzgen/synth-loop-free-prog: The git repository for the synthesizer described in this post!

• Synthesis of Loop-Free Programs by Gulwani et al: This is the paper that most of this post is based upon! A great paper, and you should read it if you found this interesting!

• Program Synthesis by Gulwani et al: A survey paper summarizing the state of the art for program synthesis. A great way to quickly get an overview of the whole field!

• A Synthesizing Superoptimizer by Sasnauskas et al: This paper describes Souper, a superoptimizer for LLVM that uses program synthesis at its core. Souper’s synthesizer is also based on the Gulwani paper, but extends it in a few different ways. I’ve looked through its sources quite a bit to find optimizations and tricks!

• Automatic Generation of Peephole Superoptimizers by Bansal et al: The first paper describing how to generate a table-driven peephole optimizer offline with a superoptimizer.

• Hacker’s Delight by Henry S. Warren: A book dedicated to concise and non-obvious bit twiddling tricks. It is an incredible book if you’re into that sort of thing! The benchmark problems presented in the Synthesis of Loop-Free Programs paper are taken from this book.

• SyGuS: SyGuS, or Syntax-Guided Synthesis, formalizes a class of program synthesis problems, standardizes a language for describing them, and holds a yearly competition for synthesizers. The aim is to drive innovations in the field with a little friendly competition.