Note: I am cross-posting this article from the Bytecode Alliance blog to my personal blog.
A few weeks ago, I finished implementing support for the WebAssembly reference types proposal in Wasmtime. Wasmtime is a standalone, outside-the-Web WebAssembly runtime, and the reference types proposal is WebAssembly’s first foray beyond simple integers and floating point numbers, into the exciting world of garbage-collected references. This article will explain what the reference types proposal enables, what it leaves for future proposals, and how it is implemented in Wasmtime.
What are Reference Types?
Without the reference types proposal, WebAssembly can only manipulate simple integer and floating point values. It can’t take or return references to the host’s objects like, for example, a DOM node on a Web page or an open connection on a server. There are workarounds: for example, you can store the host objects in a side table and refere to them by index, but this adds an extra indirection and implementing the side table requires cooperating glue code on the host side. That glue code, in particular, is annoying because it is outside of the Wasm sandbox, diluting Wasm’s safety guarantees, and it is host-specific. If you want to use your Wasm module on the Web, in a Rust host, and in a Python host, you’ll need three separate glue code implementations. This makes WebAssembly less attractive as a universal binary format.
With the reference types proposal, you don’t need glue code to interact with host references. The proposal has three main parts:
externreftype, representing an opaque, unforgable reference to a host object.
An extension to WebAssembly tables, allowing them to hold
externrefs in addition to function references.
New instructions for manipulating tables and their entries.
With these new capabilities, Wasm modules can talk about host references directly, rather than requiring external glue code running in the host.
externrefs play nice with WebAssembly’s sandboxing properties:
They are opaque: a Wasm module cannot observe an
externrefvalue’s bit pattern. Passing a reference to a host object into a Wasm module doesn’t reveal any information about the host’s address space and the layout of host objects within it.
They are unforgable: a Wasm module can’t create a fake host reference out of thin air. It can only return either a reference you already gave it or the null reference. It cannot pretend like the integer value
0x1bad2badis a valid host reference, return it to you, and trick you into dereferencing this invalid pointer.
Here’s a Wasm module that exports a
hello function which takes, as an
externref parameter, a reference to an open file and writes “Hello, Reference
Types!” to the file. This module imports the
write syscall from a hypothetical
future version of WASI, the WebAssembly System Interface, that leverages
This Wasm module will run in any WebAssembly environment where WASI is available.0 We don’t need glue code that is maintaining a side table, mapping host references to indices.
For example, we can run
hello.wat from a Python host
environment without any glue code:
And we can run the exact same
hello.wat from a Rust host and without any
Rust-specific bindings or glue:
On the Web
There are many different things an “open file” could be modeled by on the Web. For this demo, we’ll use a DOM node: writing to it will append text nodes. This works well because we know our module is only writing text data. If we were working with binary data, we would choose another polyfilling approach, like in-memory array buffers backing the file data.
hello.wat module and polyfill our
You can view a live demo of this code in the iframe below. It should work in
Firefox 79+, or any other browser that supports reference types (at the time of
writing, Chrome has support for an older version of the reference types proposal
--experimental-wasm-anyref flag, and Safari has in-progress support
What Comes After the Reference Types Proposal?
The in-progress WebAssembly type imports proposal builds on
reference types to let you distinguish between different kinds of host objects:
database connections and open files would be separate types. Or, on the Web,
Promises from DOM nodes. It also adds
references that point to types defined by other Wasm modules, not just host
WASI will want to adopt unforgable reference types for file handles, as the
examples above suggest, instead of using integers. It might make sense for WASI
to wait for type imports, however, so it can use separate types for, say, an
open file and a source of entropy. This is a decision for the WASI subgroup of
the WebAssembly Community Group. Either way, when WASI does make this switch, it
will continue to support integer file descriptors, but implemented inside the
Wasm sandbox as indices into an
externref Wasm table.
Even further in the future, full support for garbage-collected objects in WebAssembly will come eventually.
Now we’ll dive into the details of Wasmtime’s reference types implementation.
externrefs, the host gives Wasm limited access to host objects, but the
host also needs to know when Wasm is finished with those objects, so it can
clean them up. That clean up might involve closing a file handle or simply
deallocating memory. Statically running clean up routines once Wasm returns to
the host is attractive, but unfortunately flawed: Wasm can persist an
externref into a table or global so that it outlives the function call. We do
not, therefore, know ahead of time whether Wasm is still holding references to
host objects or not. We require a dynamic method for determining which objects
are still in use and which are not. This typically involves either reference
counting or a tracing collector or a combination of the two.
This is all to say that implementing the reference types proposal introduced a garbage collector in Wasmtime.
Much of WebAssembly’s value rests on its predictable performance and minimal nondeterminism. Garbage collectors are infamous for their unpredictable pauses, and, if they support finalizers or weak references, nondeterministic behavior. Implementing a collector to support reference types required reconciling these contradictions as much as possible, while also balancing implementation complexity.
The design we settled upon is a deferred reference counting collector, with Cranelift, Wasmtime’s just-in-time (JIT) compiler, producing stack maps for precisely identifying roots inside of Wasm frames. This approach balances predictable latency with throughput. It is more complex than conservative stack scanning, but avoids the nondeterminism inherent in that approach. It also has the desirable property that if a Wasm module doesn’t use reference types, then it can’t trigger garbage collection and the runtime will not impose any other related overhead.
We chose reference counting over tracing garbage collection for a few reasons. First, it lends itself to short and predictable GC pauses. Second, adding reference counting to a runtime that wasn’t built from the ground up with a garbage collector in mind is easier than adding a tracing collector.
Excavating exactly what it is that makes reference counting easier to add
post-facto is a worthwhile exercise, because this has implications for the whole
runtime and for any application intending to embed it. Tracing collectors
identify all live objects, starting from known-live roots, and then any object
that wasn’t found to be live is, by definition, no longer in use, and is
therefore safe to reclaim. Reference counting collectors start from known-dead
roots, find all the dead objects reachable only from those roots, and then
reclaim these known-dead objects. Consider what happens, with each type of
collector, if an object
A fails to reveal to the collector that it is
referencing another object
B. In a tracing collector, if no other object is
B, then collector will conclude that
B is not live and that it
is safe to reclaim
B. But now
A can use
B after it has been reclaimed,
leading to use-after-free memory unsafety. On the other hand, with a reference
counting collector, if
A does not reveal that it has the only reference that
B alive, then
B’s reference count will not be decremented, then
the collector never reclaims
B and the object is permanently leaked. While
B is not ideal, a leak is much safer than a dangling
pointer. Reference counting collectors are easier to add to an existing runtime,
and make embedding that runtime in larger applications easier, because they have
a better failure mode than tracing collectors do. By default, reference counting
collectors safely leak, while tracing collectors unsafely allow use-after-free.
Deferred Reference Counting
In normal reference counting, each time a new reference to an object is created, the object’s reference count is incremented, and each time a reference to the object is dropped, its reference count is decremented. When the reference count reaches zero, the object is deallocated. In deferred reference counting, these increments and decrements are not performed immediately, but are deferred until a later time and then processed together in bulk. This assuages one of reference counting’s biggest weaknesses, the overhead of frequently modifying reference counts, by trading prompt deallocation for better throughput.
With naïve reference counting, every single
WebAssembly instruction operating on a reference type needs to manipulate
reference counts. This leads to many increments and decrements, most of which
are redundant. It also requires code (known as landing pads) for decrementing
the reference counts of the
externrefs inside each frame during stack
unwinding — which can happen, for example, because of an out-of-bounds
heap access trap.
By using deferred reference counting for
externrefs inside of Wasm frames, we
don’t increment or decrement the reference counts at all, unless a reference
escapes a call’s stack frame by being stored into a table or global.
Additionally, we don’t need to generate landing pads for unwinding because
deferred reference counting can already tolerate slack between when a reference
to an object is created or dropped and when the object’s reference count is
adjusted to reflect that.
Wasmtime implements deferred reference counting by maintaining an
over-approximation of the set of
externrefs held alive by Wasm stack
frames. We call this the
VMExternRefActivationsTable. When we pass an
externref into Wasm, we insert it into the table. We do not update the table
as Wasm runs, so as execution drops references, the table becomes an
over-approximation of the set of
externrefs actually present on the stack.
Garbage collection is then composed of two phases. In the first phase, we walk
the stack, building a set of
externrefs currently held alive by Wasm stack
frames. This is the precise set that the
approximates. The second phase reconciles the difference between the
over-approximation and the precise set, decrementing the reference count for
each object that is in the over-approximation but not in the precise set. At the
end of this phase, we reset the
VMExternRefActivationsTable to the new precise
If we are not careful with how we schedule the deferred processing of reference
counts, we risk introducing nondeterminism. Using a timer-based GC schedule, for
example, means that we are at the whims of the operating system’s thread
scheduler and a vast variety of other factors that perturb how much we have
executed in a given amount of time. Instead, we trigger GC whenever the
VMExternRefActivationsTable reaches capacity, or whenever GC is explicitly
requested by the embedder application. As long as the embedder triggers GC
deterministically, we maintain deterministic GC scheduling, and execution
remains deterministic even in the face of finalizers.
It is worth noting that outside of Wasm frames, in native VM code implemented in
Rust, we use regular, non-deferred reference counting: cloning increments the
count and dropping decrements it. However, Rust’s moves and borrows let us avoid
some of the associated overhead. Moving an
externref transfers ownership
without affecting the reference count. Borrowing an
externref lets us safely
use the value without incrementing its reference count, or at least delays the
increment until the borrowed reference is cloned.
For the collector to find the GC roots within a stack frame it requires that the
compiler emit stack maps. A stack map records which words in a stack frame
externrefs at each point in the function where GC may occur. The
stacks maps, taken together, logically record a table of the form:
|Instruction Address||Offsets of Live GC References|
|0x12345678||2, 6, 12|
The offsets denoting where live references are stored within a stack frame are relative to the frame’s stack pointer and are expressed in units of words. Since garbage collection can only occur at certain points within a function, the table is sparse and only has entries for instruction addresses where GC is possible.
Because offsets of live GC references are relative to the stack pointer, and
because stack frames grow down from higher addresses to lower addresses, to get
a pointer to a live reference at offset
x within a stack frame, the collector
x to the frame’s stack pointer. For example, to calculate the pointer to
the live GC reference inside “frame 1” below, the collector would compute
frame_1_sp + x:
Stack +-------------------+ | Frame 0 | | | | | | | +-------------------+ <--- Frame 0's SP | | Frame 1 | Grows | | down | | | | Live GC reference | --+-- | | | | | | | | V | | x = offset of live GC ref | | | | | | +-------------------+ --+-- <--- Frame 1's SP | Frame 2 | | ... |
Each individual stack map is associated with just one instruction address within a compiled Wasm function, contains the size of the stack frame, and represents the stack frame as a bitmap. There is one bit per word in the stack frame; if the bit is set, then the word contains a live GC reference.
The actual stack walking functionality is provided by the
crate, which wraps
libunwind on unix-like
platforms and DbgHelp on Windows. To assist in stack walking, Cranelift emits
the platform’s unwind information:
.eh_frame on unix-like
platforms and structured exception handling for Windows.
Inline Fast Paths for GC Barriers
Despite our deferred reference counting scheme, compiled Wasm code must occasionally manipulate reference counts and run snippets of code called GC barriers. By running GC barriers, the program coordinates with the collector, helping it keep track of objects.
Because of our deferred reference counting, Wasmtime does not need barriers for references that stay within a Wasm function’s scope, but barriers are still needed whenever a reference enters or escapes the function’s scope. Recall that a reference can escape the scope when written into a global or table. It can, similarly, enter a Wasm call’s scope when read from a global or table. Therefore, reading and writing to globals and tables requires barriers.
The barriers for writing a reference into a table slot and a global are similar so, for simplicity, I’ll just refer to tables from now on. These write barriers are responsible for ensuring that:
The new object’s reference count is incremented now that the table is holding a reference to it.
The table element’s prior value, if any, has its reference count decremented.
If the prior element’s reference count reaches zero, then its destructor is called and memory block deallocated.
There are two subtleties here. First, steps 1 and 2, although they may seem independent, must be performed in the given order. If their order were reversed, then if the new object assigned to the table slot and old object currently in the table slot are the same, we would:
Decrement the object’s reference count from one to zero.
Run the object’s destructor and deallocate it.
Re-assign a reference to the (now freed) object into the table slot.
Increment the (now freed) object’s reference count.
That’s a use-after-free bug! To avoid it, we must always increment the new object’s reference count before decrementing the old object’s reference count. This way if they are the same object then the reference count never reaches zero.
The second subtlety is that an object’s destructor can do pretty much anything, including touch the table slot we are currently running GC barriers for. If we encounter this kind of reentrancy, we want the destructor to see the new table element. We do not want it to see the half-deinitialized object and let it attempt to resurrect the object.
Most barrier executions operate on non-null references, and most executions don’t decrement a reference count to zero and destroy an object. Therefore, the JIT emits the reference counting operations inline, and only calls out from Wasm to VM code when destroying an object:
The other scenario for which Wasmtime requires GC barriers is when Wasm reads a
reference from a table (or global), causing the reference to enter the scope of
the Wasm call. Its responsibility is ensuring that these references are safely
held alive by the
VMExternRefActivationsTable has a simple bump allocation
chunk to support fast insertions from inline JIT code. We maintain a
finger pointing within that bump chunk, and an
end pointer pointing just after
next finger is where we we will insert the next new entry into the
bump chunk, unless
next is equal to
end which means that the bump chunk is
at full capacity. When that happens we are forced to call an out-of-line slow
path that will trigger GC to free up space.
The reference types proposal is WebAssembly’s first expansion beyond simple integers and floating point numbers, requiring that Wasmtime grow a garbage collector. It also cuts down on the amount of module-specific and host-specific glue code. Future proposals, like interface types and module linking, should completely remove the need for such glue.
Thanks to Alex Crichton for reviewing the
bulk of this work, and exposing reference types in the
wasmtime-go API. Thanks
to Dan Gohman for reviewing the code that
implements the inline fast paths for GC barriers and the
interface changes it required. Thanks to Peter
Huene for exposing reference types in
wasmtime-dotnet. Finally, thanks to Jim
Blandy, Dan Gohman, and Alex Crichton for
reading early drafts of this article and providing valuable feedback.
0 However, since this is a hypothetical
future version of WASI, we will need to temporarily define our own version of
write syscall. We will, furthermore, need to define this
once for each host.
These polyfills are only required until WASI is updated to leverage reference types. ↩