Note: I am cross-posting this to my personal blog from hacks.mozilla.org, where it was originally published.
Tom Tromey and I have replaced the most
performance-sensitive portions of the
map parser with Rust code that is compiled to WebAssembly. The WebAssembly is up
benchmarks operating on real world source maps. Additionally, performance is
also more consistent: relative standard deviations decreased.
- Parsing and Querying Source Maps in Rust
- Conclusions and Future Work
The Source Map Format
The source map
code that was emitted by a compiler0,
minifier, or package bundling tool back to locations in the original sources
symbolicate backtraces, and to implement source-level stepping in
debuggers. Source maps encode debug information similar to that found in
A source map is a JSON object with a handful of fields. The
"mappings" field is a string that makes up the bulk of the source map, and contains the bidirectional mappings between source and object locations.
We will describe the
"mappings" string’s grammar in extended Backus-Naur form
Every component of a mapping is a Variable Length Quantity (VLQ) encoded
integer. Filenames and associated names are encoded as indices into side tables
stored in other fields of the source map’s JSON object. Every value is relative
to the last occurrence of its type, i.e. a given
<source> value is the delta
since the previous
<source> value. These deltas tend to be smaller than their
absolute values, which means they are more compact when encoded.
Each character in a VLQ is drawn from a set of 64 printable ASCII characters comprising the upper- and lower-case letters, the decimal digits, and some symbols. Each character represents a particular 6-bit value. The most significant bit is set on all but the last character in the VLQ; the remaining five bits are prepended to the integer value represented.
Rather than attempting to translate this definition into EBNF, we provide pseudocode for parsing and decoding a single VLQ below:
source-map library is published
on npm and maintained by the Firefox Developer Tools team at Mozilla. It is one
and is downloaded 10M times per
Like many software projects, the
source-map library was written first for
correctness, and then later modified to improve performance. By this time, it
has seen a good amount of performance work.
When consuming source maps, the majority of time is spent parsing the
"mappings" string and constructing a pair of arrays: one sorted by generated
binary search on the appropriate array. The parsing and sorting both happen
lazily, and don’t occur until a particular query requires it. This allows a
debugger to list sources, for example, without needing to parse and sort
mappings. Once parsed and sorted, querying tends not to be a performance
The VLQ decoding function takes a string as input, parses and decodes a VLQ from
the string, and returns a pair of the value and the rest of the input that was
not consumed. This function was originally written in an idiomatic,
referentially transparent style to return the pair as an
Object literal with
To remove the allocation, we modified the
procedure to take a second
Object that gets mutated and serves as an out parameter. Rather
than returning a freshly allocated
Object, the out parameter’s properties are
overwritten. This way, we could reuse the same object for every VLQ parse. This
is less idiomatic, but much more performant.
When reaching an unexpected end-of-string or an invalid base 64 character, the
VLQ decoding function would
would emit faster code when we changed the single base 64 digit decoding
function to return
failure instead of
Error. This is less idiomatic, but once again it improved performance.
When profiling with SpiderMonkey’s
prototype, we found that SpiderMonkey’s JIT was using slow-path polymorphic
inline caches for
Object property gets and sets on our mapping
JIT was not emitting the desired fast-path code with direct object slot
accesses, because it was not detecting a common “shape” (or “hidden
by all mapping
Objects. Some properties would be added in a different order,
or omitted completely, for example, when there was no associated name for a
mapping. By creating a constructor for mapping
Objects that initialized every
property we would ever use, we helped the JIT create a common shape for all
Objects. This resulted in another performance
When sorting the two mapping arrays, we use custom comparator functions. When
source-map library was first written, SpiderMonkey’s
Array.prototype.sort was implemented in C++ for performance1. However, when
sort is passed an explicit
comparator function and a large array, the sorting code must call the comparator
passing a custom comparator function made the sorts perform poorly.
WebAssembly is a new, low-level, architecture-independent byte code developed for the Web. It is designed for safe and efficient execution and compact binaries, and is developed in the open as a Web standard. It is supported by all major Web browsers.
WebAssembly exposes a stack machine that maps well onto modern processor
architectures. Its instructions operate on a large, linear buffer of memory. It
does not support garbage collection, although extending it to work with garbage
future. Control flow is
structured, rather than allowing arbitrary jumps and labels. It is designed to
be deterministic and consistent, even where different architectures diverge on
edge cases, such as out-of-range shifts, overflows, and canonicalizing
WebAssembly aims to match or beat native execution speed. It is currently within 1.5x native execution on most benchmarks.
Because of the lack of a garbage collector, languages that target WebAssembly are currently limited to languages without a runtime or garbage collector to speak of, unless the collector and runtime are also compiled to WebAssembly. That doesn’t happen in practice. Right now, the languages developers are actually compiling to WebAssembly are C, C++, and Rust.
Rust is a systems programming language that emphasizes safety and speed. It provides memory safety without relying on a garbage collector. Instead, it statically tracks ownership and borrowing of resources to determine where to emit code to run destructors or free memory.
Rust is in a great position for compiling to WebAssembly. Since Rust does not require garbage collection, Rust authors don’t have to jump through extra hoops to target WebAssembly. Rust has many of the goodies that Web developers have come to expect, and which C and C++ lack:
Easy building, packaging, publishing, sharing, and documenting of libraries. Rust has
cargo, and the crates.io ecosystem; C and C++ don’t have any widely used equivalent.
Memory safety. Not having to chase down memory corruption in
gdb. Rust prevents most of these pitfalls at compile time.
Parsing and Querying Source Maps in Rust
would have been at the VLQ decoding function. The VLQ decoding function is
invoked between one and four times for every mapping encoded in the
that many times during parsing.
Instead, we decided to parse the whole
"mappings" string in Rust/WebAssembly,
and then keep the parsed data there. The WebAssembly heap owns the parsed data
in its queryable form. This means that we never have to copy the data out of the
boundary more often. Instead, we only cross that boundary once each way for
every query, plus once each way for every mapping that is a result of that
query. Queries tend to produce a single result, or perhaps a handful.
To have confidence that our Rust implementation was correct, not only did we
ensure that our new implementation passed the
source-map library’s existing
test suite, but we also wrote a suite of
These tests construct random
"mappings" input strings, parse them, and then
assert that various properties hold.
Base 64 Variable Length Quantities
The first step to parsing a source map’s mappings is decoding VLQs. We
vlq library in Rust and
published it to crates.io.
decode64 function decodes a single base 64 digit. It uses pattern matching
and the idiomatic
Result-style error handling.
Result<T, E> is either
Ok(v), indicating that the operation succeeded and
produced the value
v of type
Err(error), indicating that the
operation failed, with the value
error of type
E providing the details. The
decode64 function’s return type
Result<u8, Error> indicates that it returns
u8 value on success, or a
vlq::Error value on failure.
decode64 function in hand, we can decode whole VLQ values. The
decode function takes a mutable reference to an iterator of bytes as input,
consumes as many bytes as it needs to decode the VLQ, and finally returns a
Result of the decoded value.
We begin by defining some small helper functions. The
predicate function returns
true if the given byte is a separator between
Next, we define a helper function for reading a single VLQ delta value and
function, and instead repeats this code each time it wants to read a VLQ
memory, but Rust does. We cannot pass a reference to a number in a zero-cost
Object with a number
property or we could close over the number’s variable in a local closure
function, but both of these solutions have associated runtime costs.
All in all, the Rust implementation of
"mappings" parsing is quite similar to
Every single parsed mapping
implementation. Much of the Rust implementation’s advantage stems from avoiding
these allocations and their associated garbage collections.
Finally, we still use a custom Quicksort implementation in the Rust code, and this is the only unidiomatic thing about the Rust code. We found that although the standard library’s built in sorting is much faster when targeting native code, our custom Quicksort is faster when targeting WebAssembly. (This is surprising, but we have not investigated why it is so.)
The WebAssembly foreign function interface (FFI) is constrained to scalar
code must ask the Rust to allocate a buffer with space for the
"mappings" string into that buffer, which it can do without crossing the FFI
boundary because it has write access to the whole WebAssembly linear
parse_mappings function and is returned a pointer to the parsed
WebAssembly API, passing in the pointer to the parsed mappings when doing
WebAssembly to free the parsed mappings.
Exposing WebAssembly APIs from Rust
All the WebAssembly APIs we expose live in a small glue crate that wraps the
source-map-mappings crate. This separation is useful because it allows us to
source-map-mappings crate with our native host target, and only
compile the WebAssembly glue when targeting WebAssembly.
In addition to the constraints of what types can cross the FFI boundary, each exported function requires that
- it has the
- it is marked
extern "C"so that it is publicly exported in the final
via WebAssembly does, by necessity, frequently use
functions and using pointers received from across an FFI boundary is always
unsafe, because the Rust compiler cannot verify the safety of the other
side. Ultimately this is less concerning with WebAssembly than it might be with
other targets – the worst we can do is
trap (which causes an
Error to be
the worst case scenarios with native binaries where executable memory is in the
same address space as writable memory, and an attacker can trick the program
into jumping to memory where they’ve inserted some shell code.
The simplest function we export is the function to get the last error that
occurred in the library. This provides similar functionality as
would like to know what kind of failure it was. We always maintain the most
recent error in a global, and this function retrieves that error value.
space to hold the
"mappings" string. We desire an owned, contiguous chunk of
u8s, which suggests using
Vec<u8>, but we want to expose a single pointer to
Box<Vec<u8>> or we can save the extra data needed to reconstruct the vector on
the side. We choose the latter approach.
A vector is a triple of
- a pointer to the heap-allocated elements,
- the capacity of that allocation, and
- the length of the initialized elements.
Since we are only exposing the pointer to the heap-allocated elements to
Vec on demand. We allocate two extra words of space and store
the length and capacity in the first two words of the heap elements and then
After initializing the buffer for the
ownership of the buffer to
parse_mappings, which parses the string into a
queryable structure. A pointer to the queryable
Mappings structure is
NULL if there was some kind of parse failure.
The first thing that
parse_mappings must do is recover the
Vec’s length and
capacity. Next, it constructs a slice of the
"mappings" string data,
constrains the lifetime of the slice to the current scope to double check that
we don’t access the slice again after its deallocated, and call into our
"mappings" string parsing function. Regardless whether parsing
succeeded or not, we deallocate the buffer holding the
"mappings" string, and
return a pointer to the parsed structure or save any error that may have
occurred and return
When we run queries, we need a way to translate the results across the FFI
boundary. The results of a query are either a single
Mapping or set of many
Mappings, and a
Mapping can’t cross the FFI boundary as-is unless we box it.
We do not wish to box
Mappings since we would also need to provide getters
for each of its fields, causing code bloat on top of the costs of allocation and
indirection. Our solution is to call an imported function for each
mappings_callback is an
extern function without definition, which
provide when instantiating the WebAssembly module. The
Mapping that is exploded into its member parts: each field in a transitively
Mapping is translated into a parameter so that it can cross the FFI
Option<T> members, we add a
bool parameter that serves as a
discriminant, determining whether the
therefore whether the following
T parameter is valid or garbage.
All of the exported query functions have a similar structure. They begin by
converting a raw
*mut Mappings pointer into an
&mut Mappings mutable
&mut Mappings lifetime is constrained to the current scope to
enforce that it is only used in this function call, and not saved away somewhere
where it might be used after it is deallocated. Next, the exported query
function forwards the query to the corresponding
Mappings method. For each
Mapping, the exported function invokes the
A typical example is the exported
all_generated_locations_for query function,
which wraps the
Mappings::all_generated_locations_for method, and finds all
the mappings corresponding to the given original source location.
Mappings, it must
deallocate them with the exported
Compiling Rust into
The addition of the
wasm32-unknown-unknown target makes compiling Rust into
WebAssembly possible, and
rustup makes installing a Rust compiler toolchain
$ rustup update $ rustup target add wasm32-unknown-unknown
Now that we have a
wasm32-unknown-unknown compiler, the only difference
between building for our host platform and WebAssembly is a
$ cargo build --release --target wasm32-unknown-unknown
.wasm file is at
Although we now have a working
.wasm file, we are not finished: this
file is still much larger than it needs to be. To produce the smallest
file we can, we leverage the following tools:
wasm-gc, which is like a linker’s
--gc-sectionsflag that removes unused object file sections, but for
.wasmfiles instead of ELF, Mach-O, etc. object files. It finds all functions that are not transitively reachable from an exported function and removes them from the
wasm-snip, which is used to replace a WebAssembly function’s body with a single
unreachableinstruction. This is useful for manually removing functions that will never be called at runtime, but which the compiler and
wasm-gccouldn’t statically prove were unreachable. Snipping a function can make other functions statically unreachable, so it makes sense to run
wasm-opt, which runs
binaryen’s optimization passes over a
.wasmfile, shrinking its size and improving its runtime performance. Eventually, when LLVM’s WebAssembly backend matures, this may no longer be necessary.
Our post-build pipeline goes
.wasm files. The
source-map library is primarily used in three environments:
- The Web
- Inside Firefox’s Developer Tools
Different environments can have different methods for loading a
bytes into an
runtime. On the Web and inside Firefox, we can use the standard
fetch API to
make an HTTP request to load the
.wasm file. It is the library consumer’s
responsibility to provide a URL pointing to the
.wasm file before parsing any
source maps. When used with Node.js, the library uses the
fs.readFile API to
.wasm file from disk. No initialization is required before parsing
any source maps in this scenario. We provide a uniform interface regardless
which environment the library is loaded in by using feature detection to select
.wasm loading implementation.
When compiling and instantiating the WebAssembly module, we must provide the
mapping_callback. This callback cannot change across the lifetime of the
instantiated WebAssembly module, but depending on what kind of query we are
performing, we want to do different things with the resulting mappings. So the
mapping_callback we provide is only responsible for translating the
exploded mapping members into a structured object and then trampolining that
result to a closure function that we set depending on what query we are running.
To make setting and unsetting the
currentCallback ergonomic, we define a
withMappingCallback helper that takes two functions: one to set as the
currentCallback, and another to invoke immediately. Once the second function’s
execution finishes, we reset the
null. This is the
WebAssembly to allocate space for the
"mappings" string, and then copy the
string into the allocated buffer.
parse_mappings WebAssembly function and translates any failures into an
Error that gets
The various query methods that call into WebAssembly have similar structure,
just like the exported functions on the Rust side do. They validate query
parameters, set up a temporary mappings callback closure with
withMappingCallback that aggregates results, calls into WebAssembly, and then
return the results.
Here is what
SourceMapConsumer.prototype.destroy which then calls into the exported
free_mappings WebAssembly function.
All tests were performed on a MacBook Pro from mid 2014 with a 2.8 GHz Intel Core i7 processor and 16 GB 1600 MHz DDR3 memory. The laptop was plugged into power for every test, and the benchmark Webpage was refreshed between tests. The tests were performed in Chrome Canary 65.0.3322.0, Firefox Nightly 59.0a1 (2018-01-15), and Safari 11.0.2 (11604.4.7.1.6)3. For each benchmark, to warm up the browser’s JIT compiler, we performed five iterations before collecting timings. After warm up, we recorded timings for 100 iterations.
We used a variety of input source maps with our benchmarks. We used three source maps found in the wild of varying sizes:
source-maplibrary. This source map is created by UglifyJS and its
"mappings"string is 30,081 characters long.
The latest Angular.JS’s minified-to-unminified source map. This source map’s
"mappings"string is 391,473 characters long.
"mappings"string is 14,964,446 characters long.
Additionally, we augmented the input set with two more artificially constructed source maps:
The Angular.JS source map inflated to ten times its original size. This results in a
"mappings"string of size 3,914,739.
The Scala.JS source map inflated to twice its original size. This results in a
"mappings"string of size 29,928,893. For this input source map, we only collected 40 iterations on each benchmark.
Astute readers will have noticed an extra 9 and 1 characters in the inflated
source maps’ sizes respectively. These are from the
; separator characters
between each copy of the original
"mappings" string when gluing them together
to create the inflated version.
We will pay particular attention to the Scala.JS source map. It is the largest
non-artificial source map we tested. Additionally, it is the largest source map
for which we have measurements for all combinations of browser and library
implementation. There is no data for the largest input (the twice inflated
unable to take any measurements with this combination because none of the
benchmarks could complete without the tab’s content process crashing. With the
WebAssembly implementation, Chrome would erroneously throw
access out of bounds. Using Chrome’s debugger, the supposed out-of-bounds
access was happening in an instruction sequence that does not exist in the
.wasm file. All other browser’s WebAssembly implementations successfully ran
the benchmark with this input, so I am inclined to believe it is a bug in the
For all benchmarks, lower values are better.
Setting a Breakpoint for the First Time
The first benchmark simulates a stepping debugger setting a breakpoint on some
line in the original source for the first time. This requires that the source
"mappings" string is parsed, and that the parsed mappings are sorted by
their original source locations so we can binary search to the breakpoint’s
corresponds to any mapping on the original source line.
Pausing at an Exception for the First Time
"mappings" string must be parsed. The parsed mappings must then be sorted
mapping, and finally that closest mapping’s original source location is
Subsequent Breakpoints and Pauses at Exceptions
The third and fourth benchmarks observe the time it takes to set subsequent
breakpoints after the first one, or to pause at subsequent uncaught exceptions,
or to translate subsequent logged messages’ locations. Historically, these
operations have never been a performance bottleneck: the expensive part is the
"mappings" string parse and construction of queryable data structures
(the sorted arrays).
Nevertheless, we want to make sure it stays that way: we don’t want these operations to suddenly become costly.
Both of these benchmarks are measuring the time it takes to binary search for the closest mapping to the target location and returning that mapping’s co-location.
These benchmark results should be seasoned with more salt than the other benchmarks’ results. Looking at both of the Scala.JS source map input plots, we see clear strata formations. This layering happens because the benchmarked operations run in such little time that the resolution of our timer becomes visible. We can see that Chrome exposes timers with resolution to tenths of a millisecond, Firefox exposes them with resolution to .02 milliseconds, and Safari exposes millisecond resolution.
Iterating Over All Mappings
One of advantages of the
wasm32-unknown-unknown target over the
wasm32-unknown-emscripten target, is that it generates leaner WebAssembly
wasm32-unknown-emscripten target includes polyfills for most of
libc and a file system built on top of
IndexedDB, among other things, but
wasm32-unknown-unknown does not. With the
source-map library, we’ve only
modules together into a single
.js file. We look at the effects of using
wasm-opt to shrink
.wasm file size, and using
gzip compression, which is ubiquitously supported on the Web.
Compiler at the “simple”
optimization level. We used the Closure Compiler because UglifyJS does not
support some newer ECMAScript forms that we introduced (for example
arrow functions). We used the “simple” optimization level because the “advanced”
Closure Compiler in mind.
source-map library implementation. The bars labeled
“WebAssembly” are for variations of the new
source-map library implementation
that uses WebAssembly for parsing the
"mappings" string and querying the
parsed mappings. Note that the “WebAssembly” implementation still uses
source-map library has additional
features, such as generating source maps, that are still implemented in
At their smallest, the new WebAssembly implementation has a larger total code
respectively. However, by using tools for shrinking
.wasm size, we brought the
size of the WebAssembly down to 0.16x its original size. Both implementations
and interface with the WebAssembly. Second, and more importantly, some of the
source-map library. Now, although those routines are no longer shared,
they are still in use by those other portions of the library.
Let us turn our focus towards what is contributing to the size of the
.wasm file. Running
wasm-objdump -h gives us the sizes of each
Data sections effectively account for the whole
Code section contains the encoded WebAssembly instructions that make
up function bodies. The
Data section consists of static data to be loaded into
the WebAssembly module’s linear memory space.
wasm-objdump to manually inspect the
Data section’s contents shows
that it mostly consists of string fragments used for constructing diagnostic
messages if the Rust code were to panic. However, when targeting WebAssembly,
Rust panics translate into WebAssembly traps, and traps do not carry extra
diagnostic information. We consider it a bug in
rustc that these string
fragments are even emitted. Unfortunately,
wasm-gc cannot currently remove
Data segments either, so we are stuck with this bloat for the
meantime. WebAssembly and its tooling is still relatively immature, and we
expect the toolchain to improve in this respect with time.
Next, we post-process
wasm-objdump’s disassembly output to compute the size of
each function body inside the
Code section, and group sizes by Rust crate:
The heaviest crate is
dlmalloc which is used by the
alloc crate, which
implements Rust’s low-level allocation APIs. Together,
clock in at 10,126 bytes, or 50.98% the total function size. In a sense, this is
a relief: the code size of the allocator is a constant that will not grow as we
The sum of code sizes from crates that we authored (
source_map_mappings_wasm_api) is 9,320 bytes, or
46.92% the total function size. That leaves only 417 bytes (2.10%) of space
consumed by the other crates. This speaks to the efficacy of
wasm-opt: although crates like
std are much larger than
our crates, we only use a tiny fraction of its APIs and we only pay for what we
Conclusions and Future Work
That said, there is still more work to do.
The most pressing next step is investigating why the Rust standard library’s sorting is not as fast as our custom Quicksort when targeting WebAssembly. This is the sole unidiomatic bit of Rust code in the rewrite. This behavior is surprising since our Quicksort implementation is naive, and the standard library’s quick sort is pattern defeating and opportunistically uses insertion sort for small and mostly-sorted ranges. In fact, the standard library’s sorting routine is faster than ours when targeting native code. We speculate that inlining heuristics are changing across targets, and that our comparator functions aren’t being inlined into the standard library’s sorting routine when targeting WebAssembly. It requires further investigation.
We found size profiling WebAssembly was more difficult than necessary. To get
useful information presented meaningfully, we were forced to write our own
home-grown script to post-process
The script constructs the call graph, and lets us query who the callers of some
function were, helping us understand why the function was emitted in the
file, even if we didn’t expect it to be. It is pretty hacky, and doesn’t expose
information about inlined functions. A proper WebAssembly size profiler would
have helped, and would be a benefit for anyone following in our tracks.
The relatively large code size footprint of the allocator suggests that writing or adapting an allocator that is focused on tiny code size could provide considerable utility for the WebAssembly ecosystem. At least for our use case, allocator performance is not a concern, and we only make a small handful of dynamic allocations. For an allocator, we would choose small code size over performance in a heartbeat.
The unused segments in our
Data section highlight the need for
another tool, to detect and remove static data that is never referenced.
library’s downstream users. The introduction of WebAssembly in our current
implementation requires the introduction of manually freeing the parsed mappings
when the user is finished with it. This does not come naturally to most
not typically think about the lifetime of any particular object. We could
SourceMapConsumer.with function that took a raw, un-parsed source
map, and an
async function. The
with function would construct a
SourceMapConsumer instance, invoke the
async function with it, and then call
destroy on the
SourceMapConsumer instance once the
async function call
completes. This is like
programmers would be to give every
SourceMapConsumer its own instance of the
WebAssembly module. Then, because the
SourceMapConsumer instance would have
the sole GC edge to its WebAssembly module instance, we could let the garbage
collector manage all of the
SourceMapConsumer instance, the WebAssembly module
instance, and the module instance’s heap. With this strategy, we would have a
static mut MAPPINGS: Mappings in the Rust WebAssembly glue code, and
Mappings instance would be implicit in all exported query function
calls. No more
Box::new(mappings) in the
parse_mappings function, and no
more passing around
*mut Mappings pointers. With some care, we might be able
to remove all allocation from the Rust library, which would shrink the emitted
WebAssembly to half its current size. Of course, this all depends on creating
multiple instances of the same WebAssembly module being a relatively cheap
operation, which requires further investigation.
wasm-bindgen project aims
to remove the need to write FFI glue code by hand, and automates interfacing
unsafe pointer manipulation code involved in exporting Rust
In this project, we ported source map parsing and querying to Rust and
WebAssembly, but this is only half of the
functionality. The other half is generating source maps, and it is also
performance sensitive. We would like to rewrite the core of building and
encoding source maps in Rust and WebAssembly sometime in the future as well. We
expect to see similar speed ups to what we observed for parsing source maps.
The pull request adding the WebAssembly implementation to the
mozilla/source-map library is in the progress of merging
here. That pull request
contains the benchmarking code, so that results can be reproduced, and we can
continue to improve them.
Finally, I’d like to thank Tom Tromey for hacking on this
project with me. I’d also like to thank Aaron
Crichton, Jeena Lee, Jim
Blandy, Lin Clark,
Luke Wagner, Michael
Cooper, and Till
Schneidereit for reading early drafts and
providing valuable feedback. This document, our benchmarks, and the
library are all better thanks to them.
3 For Firefox and Chrome, we tested with the latest nightly builds. We did not do the same with Safari because the latest Safari Technology Preview requires a newer macOS version than El Capitan, which this laptop was running. ⬑