Maxime Chevalier-Boisvert requested resources for learning about fuzzing programming language implementations on Twitter:
I’d like to learn about fuzzing, specifically fuzzing programming language implementations. Do you have reading materials you would recommend, blog posts, papers, books or even recorded talks?
Maxime received many replies linking to informative papers, blog posts, and lectures. John Regehr suggested writing a simple generative fuzzer for the programming language.
A generative fuzzer combines a test case generator with the system under test (e.g. your compiler), generating new test cases and feeding them into the system:
I realized that many people might not know what it takes to write their own generative fuzzer, so this blog post shows one aspect of it: implementing a test case generator.
Our test case generator will generate WebAssembly programs. While WebAssembly has its own quirks — it’s a binary format and is generally a compilation target rather than a source language — it is a small and simple language. The techniques we use when generating WebAssembly should transfer to generating the programming language of your choice.
If you want to skip the exposition and jump head first into the code, here is the repository for our final test case generator.
Table of Contents
- What is a Test Case Generator?
- Getting Set Up
- Translating Grammars into Generators
- Generating the Type Section
- Generating the Import Section
- Generating the Code Section
- Using the Test Case Generator
What is a Test Case Generator?
Test case generators generate test cases. These test cases are always within the test domain: no cycles are wasted on invalid inputs, such as source text that fails to parse. Compare this to mutation-based fuzzing, where existing seed inputs are mutated to produce new inputs. In general, nothing guarantees that the new, mutated input is still within the test domain: the mutation may have introduced a syntax error. This property, that generated inputs are always within the test domain, is generative fuzzing’s main advantage and the test case generator’s main responsibility.
A test case generator should, additionally, support every feature of its target
programming language. You won’t discover a bug in your compiler’s handling of
switch statements if the test case generator doesn’t support generating
switch statements. Pushing this idea even further, the test case generator
should uniformly sample from the test domain. If the test case generator can
switch statements but the probability of doing so is
nearly zero, then you likely still won’t find that bug. However, uniformly
sampling from the infinite set of all programs that can be written in a
particular programming language is
nontrivial and an area of
A test case generator should, finally, be fast. The faster we can generate test cases, the faster we will discover bugs. If the generator is too slow, we can blow our time budget, failing to find those bugs at all.
Getting Set Up
First, we create a new crate with
cargo. We’ll name this crate
giving a little nod to Csmith, the popular C program generator.
Second, we add the
arbitrary crate as a dependency:
arbitrary crate helps us generate structured data from arbitrary bytes. It
is typically used in combination with libFuzzer to translate the raw bytes
given to use by libFuzzer into something that the system you’re testing can
process. For example, a color conversion library might use
arbitrary to turn
the raw fuzzer-provided bytes into
Hsl color types. We will use it in
a similar way for this project, translating raw bytes given to us by libFuzzer
into semantically valid WebAssembly modules.
arbitrary crate’s main export is the
It takes an
Unstructured, which is a helpful wrapper around a byte slice, and
returns an instance of the type for which it is implemented.
wasm-smith crate, we define a
Module type that represents our
pseudo-random WebAssembly modules, and then we implement the
Before we fill in that
todo!() lets take a moment to settle on a design for
what the implementation will look like.
Translating Grammars into Generators
Writing a generator is remarkably similar to hand-writing a recursive descent parser, so if you’ve done that before, then you should feel right at home. For example, given this grammar production (borrowed and lightly edited from the C++ name mangling grammar):
<class-enum-type> ::= Ts <name> | Tu <name> | Te <name>
A recursive descent parser will, almost mechanically, translate the production into something like this:
Our generator will do something similar, except instead of peeking at the input string to decide which right-hand side of the production to parse, we will make a pseudo-random choice to generate one of those potential right hand sides.
We could use a random number generator directly to make these choices, but this has two problems:
We give up determinism unless we are careful to control the RNG’s seed and reuse the same RNG everywhere, threading it through all of our functions as a parameter. Determinism is extremely important for reproducing test failures! It’s definitely possible to do these things, but can occasionally be a little annoying.
More importantly, using an RNG precludes a mature fuzzing engine, like libFuzzer, from guiding our test case generation based on code coverage and other insights.
Instead, we use a raw input byte slice given to us by libFuzzer or AFL as a sequence of predetermined choices.0 This lets the fuzzer guide our test case generation, and gives us test case reduction “for free” since we can ask the fuzzer to reduce the raw input sequence, rather than write a domain-specific test case reducer. This comes as a relief because writing a reducer that understands WebAssembly is easily as much effort as writing the generator itself.
Here is the same C++ mangling example from above, but translated from a parser
into a generator, using
Once again, this is mostly mechanical.
This pattern will generate syntactically correct test cases that can be parsed successfully but which likely contain a plethora of type errors, calls to undefined functions, etc. We’ve set out to generate semantically correct test cases that pass type checking and will exercise more than just the language implementation’s frontend.
Our final pattern maintains some extra information about the program we’ve
generated thus far, so that we can consult that information when generating new
forms. This extra information might include which names are in scope, the types
of each variable, etc. We consult that information while dynamically building up
thunks for every valid option we could generate. Once we have enumerated every
option, we ask the
Unstructured to choose one of them, and finally we call the
chosen thunk to generate the form.
Here is an example of using this pattern for generating integer expressions, where an integer expression is either a constant integer, an arithmetic operation, a use of an integer variable, or a call of a function that returns an integer:
Finally, it is worth noting that, similar to how parser generators take a grammar and generate a parser, there are tools that will take a grammar and generate a test case generator (and even Rust implementations of those tools).
Generating the Type Section
Now we’re ready to continue generating WebAssembly!
The first section in a WebAssembly module is the type section. It declares the function type signatures used in the module and it has the following grammar:
<typesec> ::= 0x01 <size:u32> <num_funcs:u32> <functype>* <functype> ::= <num_params:u32> <valtype>* <num_results:u32> <valtype>* <valtype> ::= 0x7F # i32 | 0x7E # i64 | 0x7D # f32 | 0x7C # f64
This is not context free:
size is the size of the whole section in bytes,
num_funcs is the number of following
num_params defines how
<valtype>s a function type has, and
num_results defines how
<valtype>s it has. But none of this comes into play until we
actually serialize the module into bytes. Until then, a type section is a
sequence of function type signatures, and these signatures don’t actually need
access to any scope information, so we can just derive the
directly for them:
And since a module has a type section, let’s hook this up into our
definition and its
Generating the Import Section
The import section is the first section where we will need to look at what we’ve previously generated when generating new items. An import brings an external function, table, memory, or global into scope. When importing a function, we need to specify the function’s type via an index into the types section. This means we can only generate a function import if our types section is non-empty.
The structure definitions and
Arbitrary implementations for table and global
types are straightforward, and nothing different from what we saw with the type
section, so I’ve elided them here. The one thing to note is that the largest
memory that a Wasm module can declare is 4GiB, so the
for memory types must take that into account:
When generating an arbitrary import, we dynamically build up a list of
functions, one for each semantically valid choice we can make. Then we will use
the next bit of raw input from
Unstructured to choose one of those functions
and call it to create an import. We keep creating more imports while the raw
input tells us to — we read a
bool from the
Unstructured and stop
adding imports once we read
Hooking this up to
Module and its
Arbitrary implementation is, once again,
Generating the Code Section
There are a few more sections before the code section, which contains function bodies, but their implementation is similar to what we’ve already seen with the type and import sections, so we’ll skip them.
WebAssembly is a typed stack-based language, and has structured control
flow. There is the operand stack, where values are pushed and popped, and
there is the control stack, which keeps track of active control frames
blocks) and their labels that can be jumped
to. Whether an instruction is valid depends on the types on the operand stack
and, if the instruction is control instruction, the labels on the control
stack. Therefore, we will explicitly model these stacks while generating
instructions. Choosing which instruction to generate will consult the contents
of these stacks, and, once a choice is made, generating a new instruction will
Let’s begin with the basic definitions for value types and control frames:
We model the control stack as a vector of
Controls, and the operand stack as a
Option is introduced because unreachable code
produces values of any type, and these are modeled as
In order to generate a function body, we’ll also need to know the function’s
parameter and result types, and the local variables available. We’ll collect
these things and our stacks in a
CodeBuilder type. To avoid thrashing the
memory allocator, we factor the operand and control stacks out into a separate
structure that is reused across the creation of each function body. This
structure also contains the vector of options that we build up every time we
generate an instruction.
This is basically the same information that WebAssembly’s validation algorithm requires. Our generator is similar to a WebAssembly validator for the same reasons that a generator that emits syntactically correct test cases is similar to a recursive descent parser.
Because there are so many instructions, we won’t write the validation checks and the thunks that generate the instructions inline like we did for previous sections. Instead we will write pairs of functions: the first checks whether a given instruction would be valid to generate in the current context, and the second generates that instruction and updates the context.
Despite this small change in code organization, we will use these function pairs to accomplish the same task as before: dynamically building up a list of functions, one for each instruction that would be valid in the current context, and then choosing one of them and calling it to generate an instruction.
Let’s start with an easy example: generating the
i32.add instruction. An
i32.add is valid when there are two
i32s on the operand stack. There are
many instructions that are valid to generate when there are two
i32s on the
operand stack, so rather than naming the predicate function something like
i32_add_is_valid, we’ll name it
i32.add pops the
i32s and pushes their sum. The function that generates an
i32.add needs to do the same to our model of the operand stack:
Here is another simple example. An
f64.const instruction is always valid. It
f64 immediate and adds that
f64 onto the operand stack:
Things start getting more interesting when we consider instructions that
introduce new control frames. An
if instruction requires an
i32 on top of
the operand stack which determines whether its truth-y or false-y code path is
executed. It can also take additional operand stack parameters, making them
available to further instructions within its control frame. And while it can
take stack parameters, we only require an
i32 on the stack because we can
ifs with empty stack parameters.
else instruction is only valid when the top control frame was introduced by
if and the types on (the accessible portion of) the operand stack are
exactly the top control frame’s result types.
else instruction resets the operand stack to its state at the moment that
if was introduced. It also changes the control kind to
ControlKind::Block so that we don’t generate instruction sequences with two
elses for the same
if ... else ... else ... end.
end instruction finishes a control sequence that was introduced by an
block, or a
loop instruction. Similar to an
else, it is only valid
when we have exactly the current control frame’s result types on the operand
stack. The final thing to check is that if the top control frame was introduced
if, and the
if does not leave the stack as it was upon entering
(i.e. the parameter and result types are not equal) then we can’t generate an
end. This scenario requires that we generate an
else first, since not
executing any instructions if the
if’s truth-y path is not taken would leave
the stack in a type mismatched state.
The final instruction we will examine is
call. Whether a
call is valid or
not depends on the callee function’s type signature, and therefore whether we
can generate a
call or not depends on whether any of the module’s functions
have a type signature compatible with the current operand stack.
To generate a
call instruction, we choose which function we are calling and
update the operand stack accordingly:
Generating Instruction Sequences
Currently, the core WebAssembly specification defines 189 instructions, and I’ve
implemented support for all of them in
wasm-smith. That’s too many to
reproduce here, but you can check them
if you’re curious. We’ll now turn our attention to using the pairs of functions
we’ve defined for generating individual instructions to generate the sequences
of instructions that make up a whole function body.
We keep generating new instructions until either when there aren’t any control
frames left (meaning that we’ve returned from the function) or when the
Unstructured tells us to stop. To generate one instruction, we filter
OPTIONS down to just those options that are valid within the current context,
Unstructured to choose one of them, and call the chosen function to
generate the instruction. This new instruction is added to our function body and
we repeat the process.
Unstructured tells us to stop generating instructions before we’ve
exited all control frames, then the instruction sequence most likely is not at a
valid stopping point. Either there are unfinished
or we have the wrong types on the operand stack for a return. The
end_active_control_frames method resolves these conflicts. When it can, it
leaves the operands on the stack as a control frame’s results or a function’s
return values. When the operand stack doesn’t align with the control frame’s
results, it inserts an
unreachable instruction. This instruction makes the
sequence validate, but will trigger a trap at when executed.
CodeBuilderAllocations are created in the
arbitrary_code method, which
generates the module’s code section. The code section contains the bodies of the
functions locally defined in the module. For each function, it calls
arbitrary_func_body, which creates locals for the function and a
that reuses the allocations and generates the instruction sequence for the
Finally, here is the full
Arbitrary implementation for
calls to each of the methods to generate sections whose description we elided in
this blog post:
That’s everything we need to generate arbitrary, valid Wasm modules!
Using the Test Case Generator
Now we are ready to use our shiny, new Wasm test case generator with a fuzzer!
We’ll use the
libfuzzer-sys crate, and
fuzz. Our new fuzz target accepts an arbitrary
Module as input,
serializes it into bytes, and passes these bytes into the
wasmparser crate’s validator. Finally, it asserts that the
module is valid. If this assertion fails, then either the test case generator or
wasmparser crate has a bug.
We can run this fuzz target with
Initially, I used this fuzz target to test my generator, making sure that it was
actually generating valid Wasm modules. Then it stopped finding bugs in my
generator, and soon after that it started finding bugs in
validation implementation. It has already
bugs! And it isn’t
wasmparser has particularly low hanging fruit — we already fuzz
wasmparser in a variety of different ways 24/7 on
Writing a test case generator for a programming language isn’t magic. It isn’t very difficult once you know the pattern to follow. And once you have the test case generator written, it can poke deep into your compiler, past the parser, helping you find a bunch of hidden bugs!
wasm-smith is already quite promising and I intend to keep developing and
maintaining it. Next I want to define a bunch of fuzz targets that use
wasm-smith for Wasmtime and its ecosystem of crates and tools. I also want
to add support for swarm
testing, toggling various
Wasm proposals (such as multi-value and reference types) on and off, and add a
mode that guarantees termination of its generated programs.
Thanks for reading!
0 It should be noted that this technique, while it enables using libFuzzer and AFL with our generator, does not require their use. We can still use a purely random approach by filling a buffer with a random number generator and then using that buffer as our input sequence of predetermined choices. This would, for example, let us fuzz with our test case generator on platforms that aren’t supported by AFL or libFuzzer. ↩