Writing a Test Case Generator for a Programming Language

Maxime Chevalier-Boisvert requested resources for learning about fuzzing programming language implementations on Twitter:

I’d like to learn about fuzzing, specifically fuzzing programming language implementations. Do you have reading materials you would recommend, blog posts, papers, books or even recorded talks?

@Love2Code · August 3, 2020

Maxime received many replies linking to informative papers, blog posts, and lectures. John Regehr suggested writing a simple generative fuzzer for the programming language.

A generative fuzzer combines a test case generator with the system under test (e.g. your compiler), generating new test cases and feeding them into the system:

fn generative_fuzzer() {
    loop {
        // Use the test case generator to create a new
        // input.
        let input = generate_test_case();

        // Feed that input into the system under test.
        let result = run_system_under_test(input);

        // Finally, if the system under test crashed,
        // failed an assertion, etc... then report
        // that!
        if result.is_interesting() {
            report(input);
        }
    }
}

I realized that many people might not know what it takes to write their own generative fuzzer, so this blog post shows one aspect of it: implementing a test case generator.

Our test case generator will generate WebAssembly programs. While WebAssembly has its own quirks — it’s a binary format and is generally a compilation target rather than a source language — it is a small and simple language. The techniques we use when generating WebAssembly should transfer to generating the programming language of your choice.

If you want to skip the exposition and jump head first into the code, here is the repository for our final test case generator.

What is a Test Case Generator?
Getting Set Up
Translating Grammars into Generators
Generating the Type Section
Generating the Import Section
Generating the Code Section
Using the Test Case Generator
Conclusion

What is a Test Case Generator?

Test case generators generate test cases. These test cases are always within the test domain: no cycles are wasted on invalid inputs, such as source text that fails to parse. Compare this to mutation-based fuzzing, where existing seed inputs are mutated to produce new inputs. In general, nothing guarantees that the new, mutated input is still within the test domain: the mutation may have introduced a syntax error. This property, that generated inputs are always within the test domain, is generative fuzzing’s main advantage and the test case generator’s main responsibility.

A test case generator should, additionally, support every feature of its target programming language. You won’t discover a bug in your compiler’s handling of switch statements if the test case generator doesn’t support generating switch statements. Pushing this idea even further, the test case generator should uniformly sample from the test domain. If the test case generator can technically generate switch statements but the probability of doing so is nearly zero, then you likely still won’t find that bug. However, uniformly sampling from the infinite set of all programs that can be written in a particular programming language is nontrivial and an area of active research.

A test case generator should, finally, be fast. The faster we can generate test cases, the faster we will discover bugs. If the generator is too slow, we can blow our time budget, failing to find those bugs at all.

Getting Set Up

First, we create a new crate with cargo. We’ll name this crate wasm-smith, giving a little nod to Csmith, the popular C program generator.

$ cargo new --lib wasm-smith

Second, we add the arbitrary crate as a dependency:

# wasm-smith/Cargo.toml

[dependencies]
arbitrary = { version = "0.4.6", features = ["derive"] }

The arbitrary crate helps us generate structured data from arbitrary bytes. It is typically used in combination with libFuzzer to translate the raw bytes given to use by libFuzzer into something that the system you’re testing can process. For example, a color conversion library might use arbitrary to turn the raw fuzzer-provided bytes into Rgb or Hsl color types. We will use it in a similar way for this project, translating raw bytes given to us by libFuzzer into semantically valid WebAssembly modules.

The arbitrary crate’s main export is the Arbitrary trait:

pub trait Arbitrary: Sized + 'static {
    fn arbitrary(u: &mut Unstructured) -> Result<Self>;

    // Provided methods hidden...
}

It takes an Unstructured, which is a helpful wrapper around a byte slice, and returns an instance of the type for which it is implemented.

For our wasm-smith crate, we define a Module type that represents our pseudo-random WebAssembly modules, and then we implement the Arbitrary trait for it:

use arbitrary::{Arbitrary, Result, Unstructured};

/// A pseudo-random WebAssembly module.
pub struct Module {
    // ...
}

impl Arbitrary for Module {
    fn arbitrary(u: &mut Unstructured) -> Result<Self> {
        todo!()
    }
}

Before we fill in that todo!() lets take a moment to settle on a design for what the implementation will look like.

Translating Grammars into Generators

Writing a generator is remarkably similar to hand-writing a recursive descent parser, so if you’ve done that before, then you should feel right at home. For example, given this grammar production (borrowed and lightly edited from the C++ name mangling grammar):

<class-enum-type> ::= Ts <name>
                    | Tu <name>
                    | Te <name>

A recursive descent parser will, almost mechanically, translate the production into something like this:

impl Parse for ClassEnumType {
    fn parse(p: &mut Parser) -> Result<Self> {
        // Ts <name>
        if p.peek("Ts") {
            p.consume("Ts")?;
            let name = Name::parse(p)?;
            return Ok(ClassEnumType::Ts(name));
        }

        // Tu <name>
        if p.peek("Tu") {
            p.consume("Tu")?;
            let name = Name::parse(p)?;
            return Ok(ClassEnumType::Tu(name));
        }

        // Te <name>
        p.consume("Te")?;
        let name = Name::parse(p)?;
        Ok(ClassEnumType::Te(name))
    }
}

Our generator will do something similar, except instead of peeking at the input string to decide which right-hand side of the production to parse, we will make a pseudo-random choice to generate one of those potential right hand sides.

We could use a random number generator directly to make these choices, but this has two problems:

We give up determinism unless we are careful to control the RNG’s seed and reuse the same RNG everywhere, threading it through all of our functions as a parameter. Determinism is extremely important for reproducing test failures! It’s definitely possible to do these things, but can occasionally be a little annoying.
More importantly, using an RNG precludes a mature fuzzing engine, like libFuzzer, from guiding our test case generation based on code coverage and other insights.

Instead, we use a raw input byte slice given to us by libFuzzer or AFL as a sequence of predetermined choices.⁰ This lets the fuzzer guide our test case generation, and gives us test case reduction “for free” since we can ask the fuzzer to reduce the raw input sequence, rather than write a domain-specific test case reducer. This comes as a relief because writing a reducer that understands WebAssembly is easily as much effort as writing the generator itself.

Here is the same C++ mangling example from above, but translated from a parser into a generator, using Unstructured:

fn arbitrary_class_enum_type(
    u: &mut Unstructured,
    output: &mut String,
) -> Result<()> {
    match u.int_in_range::<u8>(0..=2) {
        // Ts <name>
        0 => {
            output.push_str("Ts");
            arbitrary_name(u, output)?;
            Ok(())
        }
        // Tu <name>
        1 => {
            output.push_str("Tu");
            arbitrary_name(u, output)?;
            Ok(())
        }
        // Te <name>
        2 => {
            output.push_str("Te");
            arbitrary_name(u, output)?;
            Ok(())
        }
        _ => unreachable!(),
    }
}

Once again, this is mostly mechanical.

This pattern will generate syntactically correct test cases that can be parsed successfully but which likely contain a plethora of type errors, calls to undefined functions, etc. We’ve set out to generate semantically correct test cases that pass type checking and will exercise more than just the language implementation’s frontend.

Our final pattern maintains some extra information about the program we’ve generated thus far, so that we can consult that information when generating new forms. This extra information might include which names are in scope, the types of each variable, etc. We consult that information while dynamically building up thunks for every valid option we could generate. Once we have enumerated every option, we ask the Unstructured to choose one of them, and finally we call the chosen thunk to generate the form.

Here is an example of using this pattern for generating integer expressions, where an integer expression is either a constant integer, an arithmetic operation, a use of an integer variable, or a call of a function that returns an integer:

fn arbitrary_int_expr(
    u: &mut Unstructured,
    scope: &mut Scope,
) -> Result<Expr> {
    // We will dynamically build up all of the valid
    // options of what we can generate.
    let mut options: Vec<fn (
        &mut Unstructured,
        &mut Scope,
    ) -> Result<Expr>> = vec![];

    // It is always valid to generate a constant.
    options.push(|u, _| {
        Ok(Expr::Constant(u.arbitrary::<i32>()?))
    });

    // It is always valid to generate an addition.
    options.push(|u, scope| {
        let lhs = arbitrary_int_expr(u, scope)?;
        let rhs = arbitrary_int_expr(u, scope)?;
        Ok(Expr::Add(lhs, rhs))
    });

    // Subtraction, multiplication, division, etc look
    // similar to addition.

    // If there are integer variables in scope, we can
    // generate a use of one of them.
    if scope.has_int_variables() {
        options.push(|u, scope| {
            let var = u.choose(scope.int_variables())?;
            Ok(Expr::Var(var))
        });
    }

    // If there are any functions that return an integer
    // in scope, we can generate a call to one of them.
    if scope.has_int_funcs() {
        options.push(|u, scope| {
            let func = u.choose(scope.int_funcs())?;
            let args = arbitrary_args(u, &func.params)?;
            Ok(Expr::Call(func, args))
        });
    }

    // Choose one of our options and call the function
    // to generate the expression.
    let f = u.choose(&options)?;
    f(u, scope)
}

Finally, it is worth noting that, similar to how parser generators take a grammar and generate a parser, there are tools that will take a grammar and generate a test case generator (and even Rust implementations of those tools).

Generating the Type Section

Now we’re ready to continue generating WebAssembly!

The first section in a WebAssembly module is the type section. It declares the function type signatures used in the module and it has the following grammar:

<typesec> ::= 0x01 <size:u32> <num_funcs:u32> <functype>*

<functype> ::= <num_params:u32>
               <valtype>*
               <num_results:u32>
               <valtype>*

<valtype> ::= 0x7F     # i32
            | 0x7E     # i64
            | 0x7D     # f32
            | 0x7C     # f64

This is not context free: size is the size of the whole section in bytes, num_funcs is the number of following <functype>s, num_params defines how many parameter <valtype>s a function type has, and num_results defines how many result <valtype>s it has. But none of this comes into play until we actually serialize the module into bytes. Until then, a type section is a sequence of function type signatures, and these signatures don’t actually need access to any scope information, so we can just derive the Arbitrary trait directly for them:

#[derive(Arbitrary, Clone, Debug)]
struct FuncType {
    params: Vec<ValType>,
    results: Vec<ValType>,
}

#[derive(Arbitrary, Clone, Debug)]
enum ValType {
    I32,
    I64,
    F32,
    F64,
}

And since a module has a type section, let’s hook this up into our Module definition and its Arbitrary implementation:

pub struct Module {
    types: Vec<FuncType>,
    // ...
}

impl Arbitrary for Module {
    fn arbitrary(u: &mut Unstructured) -> Result<Self> {
        let mut module = Module::default();
        module.types = u.arbitrary()?;
        // ...
        Ok(module)
    }
}

Generating the Import Section

The import section is the first section where we will need to look at what we’ve previously generated when generating new items. An import brings an external function, table, memory, or global into scope. When importing a function, we need to specify the function’s type via an index into the types section. This means we can only generate a function import if our types section is non-empty.

The structure definitions and Arbitrary implementations for table and global types are straightforward, and nothing different from what we saw with the type section, so I’ve elided them here. The one thing to note is that the largest memory that a Wasm module can declare is 4GiB, so the Arbitrary implementation for memory types must take that into account:

#[derive(Clone, Debug)]
struct Limits {
    min: u32,
    max: Option<u32>,
}

#[derive(Clone, Debug)]
struct MemoryType {
    limits: Limits,
}

impl Arbitrary for MemoryType {
    fn arbitrary(
        u: &mut Unstructured<'_>,
    ) -> Result<Self> {
        // Note: memory sizes are in units of pages,
        // which are 16KiB in Wasm.
        let min = u.int_in_range(0..=65536)?;
        let max = if u.arbitrary().unwrap_or(false) {
            Some(if min == 65536 {
                65536
            } else {
                u.int_in_range(min..=65536)?
            })
        } else {
            None
        };
        Ok(MemoryType {
            limits: Limits { min, max },
        })
    }
}

When generating an arbitrary import, we dynamically build up a list of functions, one for each semantically valid choice we can make. Then we will use the next bit of raw input from Unstructured to choose one of those functions and call it to create an import. We keep creating more imports while the raw input tells us to — we read a bool from the Unstructured and stop adding imports once we read false.

impl Module {
    fn arbitrary_imports(
        &mut self,
        u: &mut Unstructured,
    ) -> Result<()> {
        let mut options: Vec<fn(
            &mut Unstructured,
            &mut Module,
        ) -> Result<Import>> = Vec::with_capacity(4);

        // If the module's type section is not empty, we
        // can define function imports with one of those
        // types.
        if !self.types.is_empty() {
            options.push(|u, m| {
                let max = m.types.len() as u32 - 1;
                let idx = u.int_in_range(0..=max)?;
                Ok(Import::Func(idx))
            });
        }

        options.push(|u, _| {
            Ok(Import::Table(u.arbitrary()?))
        });

        options.push(|u, _| {
            Ok(Import::Memory(u.arbitrary()?))
        });

        options.push(|u, _| {
            Ok(Import::Global(u.arbitrary()?))
        });

        loop {
            let keep_going = u.arbitrary::<bool>()
                .unwrap_or(false);
            if !keep_going {
                return Ok(());
            }

            let f = u.choose(&options)?;
            let import = f(u, self)?;
            self.imports.push(import);
        }
    }
}

Hooking this up to Module and its Arbitrary implementation is, once again, straightforward:

#[derive(Default)]
pub struct Module {
    types: Vec<FuncType>,
    imports: Vec<Import>,    // New!
    // ...
}

impl Arbitrary for Module {
    fn arbitrary(u: &mut Unstructured) -> Result<Self> {
        let mut module = Module::default();
        module.types = u.arbitrary()?;
        module.arbitrary_imports(u)?;     // New!
        // ...
        Ok(module)
    }
}

Generating the Code Section

There are a few more sections before the code section, which contains function bodies, but their implementation is similar to what we’ve already seen with the type and import sections, so we’ll skip them.

WebAssembly is a typed stack-based language, and has structured control flow. There is the operand stack, where values are pushed and popped, and there is the control stack, which keeps track of active control frames (i.e. ifs, loops, and blocks) and their labels that can be jumped to. Whether an instruction is valid depends on the types on the operand stack and, if the instruction is control instruction, the labels on the control stack. Therefore, we will explicitly model these stacks while generating instructions. Choosing which instruction to generate will consult the contents of these stacks, and, once a choice is made, generating a new instruction will update them.

Let’s begin with the basic definitions for value types and control frames:

/// The type of a value.
#[derive(Arbitrary, Clone, Copy, Debug, PartialEq, Eq)]
enum ValType {
    I32,
    I64,
    F32,
    F64,
}

/// A control frame.
#[derive(Debug)]
struct Control {
    kind: ControlKind,
    // Value types that must be on the stack when
    // entering this control frame.
    params: Vec<ValType>,
    // Value types that are left on the stack when
    // exiting this control frame.
    results: Vec<ValType>,
    // How far down the operand stack instructions
    // inside this control frame can reach.
    height: usize,
}

/// The kind of a control frame.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
enum ControlKind {
    Block,
    If,
    Loop,
}

We model the control stack as a vector of Controls, and the operand stack as a vector of Option<ValType>. The Option is introduced because unreachable code produces values of any type, and these are modeled as None.

type OperandStack = Vec<Option<ValType>>;
type ControlStack = Vec<Control>;

In order to generate a function body, we’ll also need to know the function’s parameter and result types, and the local variables available. We’ll collect these things and our stacks in a CodeBuilder type. To avoid thrashing the memory allocator, we factor the operand and control stacks out into a separate structure that is reused across the creation of each function body. This structure also contains the vector of options that we build up every time we generate an instruction.

pub(crate) struct CodeBuilderAllocations {
    controls: ControlStack,
    operands: OperandStack,

    // Dynamic set of instructions that would be
    // valid right now.
    options: Vec<fn(
        &mut Unstructured,
        &Module,
        &mut CodeBuilder,
    ) -> Result<Instruction>>,
}

struct CodeBuilder<'a> {
    func_ty: &'a FuncType,
    locals: &'a Vec<ValType>,
    allocs: &'a mut CodeBuilderAllocations,
}

This is basically the same information that WebAssembly’s validation algorithm requires. Our generator is similar to a WebAssembly validator for the same reasons that a generator that emits syntactically correct test cases is similar to a recursive descent parser.

Because there are so many instructions, we won’t write the validation checks and the thunks that generate the instructions inline like we did for previous sections. Instead we will write pairs of functions: the first checks whether a given instruction would be valid to generate in the current context, and the second generates that instruction and updates the context.

static OPTIONS: &[(
    // Predicate for whether this instruction is valid
    // in the given context, if any. `None` means that
    // the instruction is always valid.
    Option<fn(&Module, &mut CodeBuilder) -> bool>,
    // The function to generate the instruction, given
    // that we've made this choice.
    fn(
        &mut Unstructured,
        &Module,
        &mut CodeBuilder,
    ) -> Result<Instruction>,
)] = &[
    // ...
];

Despite this small change in code organization, we will use these function pairs to accomplish the same task as before: dynamically building up a list of functions, one for each instruction that would be valid in the current context, and then choosing one of them and calling it to generate an instruction.

`i32.add`

Let’s start with an easy example: generating the i32.add instruction. An i32.add is valid when there are two i32s on the operand stack. There are many instructions that are valid to generate when there are two i32s on the operand stack, so rather than naming the predicate function something like i32_add_is_valid, we’ll name it i32_i32_on_stack.

// Predicate for whether two `i32`s are on top of
// the operand stack, and therefore we can generate
// an `i32.add` (and many others).
fn i32_i32_on_stack(
    _: &Module,
    builder: &mut CodeBuilder,
) -> bool {
    builder.types_on_stack(&[
        ValType::I32,
        ValType::I32,
    ])
}

i32.add pops the i32s and pushes their sum. The function that generates an i32.add needs to do the same to our model of the operand stack:

fn i32_add(
    _: &mut Unstructured,
    _: &Module,
    builder: &mut CodeBuilder,
) -> Result<Instruction> {
    builder.pop_operands(&[ValType::I32, ValType::I32]);
    builder.push_operands(&[ValType::I32]);
    Ok(Instruction::I32Add)
}

`f64.const`

Here is another simple example. An f64.const instruction is always valid. It has an f64 immediate and adds that f64 onto the operand stack:

fn f64_const(
    u: &mut Unstructured,
    _: &Module,
    builder: &mut CodeBuilder,
) -> Result<Instruction> {
    let x = u.arbitrary::<f64>()?;
    builder.push_operands(&[ValType::F64]);
    Ok(Instruction::F64Const(x))
}

`if`

Things start getting more interesting when we consider instructions that introduce new control frames. An if instruction requires an i32 on top of the operand stack which determines whether its truth-y or false-y code path is executed. It can also take additional operand stack parameters, making them available to further instructions within its control frame. And while it can take stack parameters, we only require an i32 on the stack because we can always define ifs with empty stack parameters.

fn if_valid(
    _: &Module,
    builder: &mut CodeBuilder,
) -> bool {
    builder.type_on_stack(ValType::I32)
}

fn r#if(
    u: &mut Unstructured,
    module: &Module,
    builder: &mut CodeBuilder,
) -> Result<Instruction> {
    // Pop the `i32` that controls whether we take the
    // truth-y or the false-y code path.
    builder.pop_operands(&[ValType::I32]);

    // Generate an arbitrary block type that is
    // compatible with the current operand stack.
    let block_ty = builder.arbitrary_block_type(
        u,
        module,
    )?;

    // Create the control frame for this `if`.
    let (params, results) = block_ty
        .params_results(module);
    let height = builder.allocs.operands.len()
        - params.len();
    builder.allocs.controls.push(Control {
        kind: ControlKind::If,
        params,
        results,
        height,
    });

    Ok(Instruction::If(block_ty))
}

`else`

An else instruction is only valid when the top control frame was introduced by an if and the types on (the accessible portion of) the operand stack are exactly the top control frame’s result types.

fn else_valid(_: &Module, b: &mut CodeBuilder) -> bool {
    let control = b.top_control();
    control.kind == ControlKind::If
        && b.operands().len() == control.results.len()
        && b.types_on_stack(&control.results)
}

The else instruction resets the operand stack to its state at the moment that the if was introduced. It also changes the control kind to ControlKind::Block so that we don’t generate instruction sequences with two elses for the same if, like if ... else ... else ... end.

fn r#else(
    _: &mut Unstructured,
    _: &Module,
    builder: &mut CodeBuilder,
) -> Result<Instruction> {
    let control = builder.pop_control();
    builder.pop_operands(&control.results);
    builder.push_operands(&control.params);
    builder.allocs.controls.push(Control {
        kind: ControlKind::Block,
        ..control
    });
    Ok(Instruction::Else)
}

`end`

An end instruction finishes a control sequence that was introduced by an if, else, block, or a loop instruction. Similar to an else, it is only valid when we have exactly the current control frame’s result types on the operand stack. The final thing to check is that if the top control frame was introduced by an if, and the if does not leave the stack as it was upon entering (i.e. the parameter and result types are not equal) then we can’t generate an end. This scenario requires that we generate an else first, since not executing any instructions if the if’s truth-y path is not taken would leave the stack in a type mismatched state.

fn end_valid(
    _: &Module,
    builder: &mut CodeBuilder,
) -> bool {
    // Note: first control frame is the function
    // return's control frame, which never has an
    // associated `end`.
    if builder.allocs.controls.len() <= 1 {
        return false;
    }

    let control = builder.top_control();
    builder.operands().len() == control.results.len()
        && builder.types_on_stack(&control.results)
        && !(control.kind == ControlKind::If
             && control.params != control.results)
}

`call`

The final instruction we will examine is call. Whether a call is valid or not depends on the callee function’s type signature, and therefore whether we can generate a call or not depends on whether any of the module’s functions have a type signature compatible with the current operand stack.

fn call_valid(
    module: &Module,
    builder: &mut CodeBuilder,
) -> bool {
    module
        .funcs()
        .any(|(_, ty)| {
            builder.types_on_stack(&ty.params)
        })
}

To generate a call instruction, we choose which function we are calling and update the operand stack accordingly:

fn call(
    u: &mut Unstructured,
    module: &Module,
    builder: &mut CodeBuilder,
) -> Result<Instruction> {
    // Count how many of our module's functions could be
    // called with the current context.
    let n = module
        .funcs()
        .filter(|(_, ty)| {
            builder.types_on_stack(&ty.params)
        })
        .count();
    debug_assert!(n > 0);

    // Choose one of them.
    let i = u.int_in_range(0..=n - 1)?;
    let (func_idx, ty) = module
        .funcs()
        .filter(|(_, ty)| {
            builder.types_on_stack(&ty.params)
        })
        .nth(i)
        .unwrap();

    // Pop its parameters and push its results.
    builder.pop_operands(&ty.params);
    builder.push_operands(&ty.results);

    Ok(Instruction::Call(func_idx as u32))
}

Generating Instruction Sequences

Currently, the core WebAssembly specification defines 189 instructions, and I’ve implemented support for all of them in wasm-smith. That’s too many to reproduce here, but you can check them out if you’re curious. We’ll now turn our attention to using the pairs of functions we’ve defined for generating individual instructions to generate the sequences of instructions that make up a whole function body.

We keep generating new instructions until either when there aren’t any control frames left (meaning that we’ve returned from the function) or when the Unstructured tells us to stop. To generate one instruction, we filter OPTIONS down to just those options that are valid within the current context, ask the Unstructured to choose one of them, and call the chosen function to generate the instruction. This new instruction is added to our function body and we repeat the process.

impl CodeBuilder {
    pub(crate) fn arbitrary(
        mut self,
        u: &mut Unstructured,
        module: &Module,
    ) -> Result<Vec<Instruction>> {
        let mut instructions = vec![];

        while !self.allocs.controls.is_empty() {
            let keep_going =
                u.arbitrary().unwrap_or(false);
            if !keep_going {
                self.end_active_control_frames(
                    &mut instructions,
                );
                break;
            }

            // Filter `OPTIONS` down to just those that
            // are valid within the current context.
            self.allocs.options.clear();
            for (is_valid, option) in OPTIONS {
                if is_valid.map_or(true, |f| {
                    f(module, &mut self)
                }) {
                    self.allocs.options.push(*option);
                }
            }

            // Choose one of them, and call the chosen
            // function to generate the instruction.
            let f = u.choose(&self.allocs.options)?;
            let inst = f(u, module, &mut self)?;
            instructions.push(inst);
        }

        Ok(instructions)
    }
}

If the Unstructured tells us to stop generating instructions before we’ve exited all control frames, then the instruction sequence most likely is not at a valid stopping point. Either there are unfinished loops, ifs, and blocks, or we have the wrong types on the operand stack for a return. The end_active_control_frames method resolves these conflicts. When it can, it leaves the operands on the stack as a control frame’s results or a function’s return values. When the operand stack doesn’t align with the control frame’s results, it inserts an unreachable instruction. This instruction makes the sequence validate, but will trigger a trap at when executed.

impl CodeBuilder {
    fn end_active_control_frames(
        &mut self,
        instructions: &mut Vec<Instruction>,
    ) {
        while !self.allocs.controls.is_empty() {
            let num_operands = self.operands().len();
            let control = self.pop_control();

            // If we don't have the right operands on
            // the stack for this control frame, add an
            // `unreachable` instruction.
            if control.results.len() != num_operands
                || !self.types_on_stack(&control.results)
            {
                self.allocs.operands.push(None);
                instructions.push(
                    Instruction::Unreachable,
                );
            }

            // If this is an `if` that is not stack
            // neutral, then it must have an `else`.
            if control.kind == ControlKind::If
                && control.params != control.results
            {
                instructions.push(Instruction::Else);
                instructions.push(
                    Instruction::Unreachable,
                );
            }

            // The last control frame for the function
            // return does not need an `end` instruction.
            if !self.allocs.controls.is_empty() {
                instructions.push(Instruction::End);
            }

            self.allocs.operands.truncate(
                control.height,
            );
            self.allocs.operands.extend(
                control.results.into_iter().map(Some),
            );
        }
    }
}

The CodeBuilderAllocations are created in the arbitrary_code method, which generates the module’s code section. The code section contains the bodies of the functions locally defined in the module. For each function, it calls arbitrary_func_body, which creates locals for the function and a CodeBuilder that reuses the allocations and generates the instruction sequence for the function.

impl Module {
    /// Generate the module's code section.
    fn arbitrary_code(
        &mut self,
        u: &mut Unstructured,
    ) -> Result<()> {
        // Reserve space for the function bodies we
        // are about to define.
        self.code.reserve(self.funcs.len());

        // Create the allocations that are reused
        // when generating each function body.
        let mut allocs =
            CodeBuilderAllocations::default();

        // For each function defined locally,
        // generate an arbitrary function body.
        for func_ty_idx in &self.funcs {
            let func_ty =
                &self.types[*func_ty_idx as usize];
            let body = self.arbitrary_func_body(
                u,
                func_ty,
                &mut allocs,
            )?;
            self.code.push(body);
        }

        Ok(())
    }

    /// Generate a function body.
    fn arbitrary_func_body(
        &self,
        u: &mut Unstructured,
        func_ty: &FuncType,
        allocs: &mut CodeBuilderAllocations,
    ) -> Result<Code> {
        // Generate arbitrary locals for this function.
        let locals = self.arbitrary_locals(u)?;

        // Create a `CodeBuilder` that reuses the given
        // allocations.
        let builder = allocs.builder(func_ty, &locals);

        // Have the `CodeBuilder` generate an arbitrary
        // instruction sequence.
        let instructions = builder.arbitrary(u, self)?;

        Ok(Code {
            locals,
            instructions,
        })
    }
}

Finally, here is the full Arbitrary implementation for Module, including calls to each of the methods to generate sections whose description we elided in this blog post:

impl Arbitrary for Module {
    fn arbitrary(u: &mut Unstructured) -> Result<Self> {
        let mut module = Module::default();
        module.types = u.arbitrary()?;
        module.arbitrary_imports(u)?;
        module.arbitrary_funcs(u)?;
        if module.table_imports() == 0 {
            module.table = u.arbitrary()?;
        }
        if module.memory_imports() == 0 {
            module.memory = u.arbitrary()?;
        }
        module.arbitrary_globals(u)?;
        module.arbitrary_exports(u)?;
        module.arbitrary_start(u)?;
        module.arbitrary_elems(u)?;
        module.arbitrary_codes(u)?;
        module.arbitrary_data(u)?;
        Ok(module)
    }
}

That’s everything we need to generate arbitrary, valid Wasm modules!

Using the Test Case Generator

Now we are ready to use our shiny, new Wasm test case generator with a fuzzer! We’ll use the libfuzzer-sys crate, and cargo fuzz. Our new fuzz target accepts an arbitrary Module as input, serializes it into bytes, and passes these bytes into the wasmparser crate’s validator. Finally, it asserts that the module is valid. If this assertion fails, then either the test case generator or the wasmparser crate has a bug.

// fuzz/fuzz_targets/validate.rs

#![no_main]

use libfuzzer_sys::fuzz_target;
use wasm_smith::Module;

// Define a fuzz target that accepts arbitrary
// `Module`s as input.
fuzz_target!(|m: Module| {
    // Convert the module into Wasm bytes.
    let bytes = m.to_bytes();

    // Validate the module and assert that it passes
    // validation.
    let mut validator = wasmparser::Validator::new();
    validator.wasm_features(wasmparser::WasmFeatures {
        multi_value: true,
        ..wasmparser::WasmFeatures::default()
    });
    if let Err(e) = validator.validate_all(&bytes) {
        std::fs::write("test.wasm", bytes).unwrap();
        panic!("Invalid module: {}", e);
    }
});

We can run this fuzz target with cargo fuzz:

$ cargo fuzz run validate

Initially, I used this fuzz target to test my generator, making sure that it was actually generating valid Wasm modules. Then it stopped finding bugs in my generator, and soon after that it started finding bugs in wasmparser’s validation implementation. It has already found five unique validation bugs! And it isn’t like wasmparser has particularly low hanging fruit — we already fuzz wasmparser in a variety of different ways 24/7 on OSS-Fuzz.

Conclusion

Writing a test case generator for a programming language isn’t magic. It isn’t very difficult once you know the pattern to follow. And once you have the test case generator written, it can poke deep into your compiler, past the parser, helping you find a bunch of hidden bugs!

wasm-smith is already quite promising and I intend to keep developing and maintaining it. Next I want to define a bunch of fuzz targets that use wasm-smith for Wasmtime and its ecosystem of crates and tools. I also want to add support for swarm testing, toggling various Wasm proposals (such as multi-value and reference types) on and off, and add a mode that guarantees termination of its generated programs.

*Update: support for swarm testing and ensuring termination now exists. Supporting new Wasm proposals is ongoing, but you can already toggle some of them on or off.

Thanks for reading!

⁰ It should be noted that this technique, while it enables using libFuzzer and AFL with our generator, does not require their use. We can still use a purely random approach by filling a buffer with a random number generator and then using that buffer as our input sequence of predetermined choices. This would, for example, let us fuzz with our test case generator on platforms that aren’t supported by AFL or libFuzzer. ↩