Memory Management in Oxischeme
I've recently been playing with the Rust programming language, and what better way to learn a language than to implement a second language in the language one wishes to learn?! It almost goes without saying that this second language being implemented should be Scheme. Thus, Oxischeme was born.
Why implement Scheme instead of some other language? Scheme is a dialect of LISP and inherits the simple parenthesized list syntax of its LISP-y origins, called s-expressions. Thus, writing a parser for Scheme syntax is rather easy compared to doing the same for most other languages' syntax. Furthermore, Scheme's semantics are also minimal. It is a small language designed for teaching, and writing a metacircular interpreter (ie a Scheme interpreter written in Scheme itself) takes only a few handfuls of lines of code. Finally, Scheme is a beautiful language: its design is rooted in the elegant λ-calculus.
Scheme is not "close to the metal" and doesn't provide direct access to the machine's hardware. Instead, it provides the illusion of infinite memory and its structures are automatically garbage collected rather than explicitly managed by the programmer. When writing a Scheme implementation in Scheme itself, or any other language with garbage collection, one can piggy-back on top of the host language's garbage collector, and use that to manage the Scheme's structures. This is not the situation I found myself in: Rust is close to the metal, and does not have a runtime with garbage collection built in (although it has some other cool ideas regarding lifetimes, ownership, and when it is safe to deallocate an object). Therefore, I had to implement garbage collection myself.
Faced with this task, I had a decision to make: tracing garbage collection or reference counting?
Reference counting is a technique where we keep track of the number of other things holding references to any given object or resource. When new references are taken, the count is incremented. When a reference is dropped, the count is decremented. When the count reaches zero, the resource is deallocated and it decrements the reference count of any other objects it holds a reference to. Reference counting is great because once an object becomes unreachable, it is reclaimed immediately and doesn't sit around consuming valuable memory space while waiting for the garbage collector to clean it up at some later point in time. Additionally, the reclamation process happens incrementally and program execution doesn't halt while every object in the heap is checked for liveness. On the negative side, reference counting runs into trouble when it encounters cycles. Consider the following situation:
A -> B ^ | | v D <- C
A, B, C, and D form a cycle and all have a reference count of one. Nothing from outside of the cycle holds a reference to any of these objects, so it should be safe to collect them. However, because each reference count will never get decremented to zero, none of these objects will be deallocated. In practice, the programmer must explicitly use (potentially unsafe) weak references, or the runtime must provide a means for detecting and reclaiming cycles. The former defeats the general purpose, don't-worry-about-it style of managed memory environments. The latter is equivalent to implementing a tracing collector in addition to the existing reference counting memory management.
Tracing garbage collectors start from a set of roots and recursively traverse object references to discover the set of live objects in the heap graph. Any object that is not an element of the live set cannot be used again in the future, because the program has no way to refer to that object. Therefore, the object is available for reclaiming. This has the advantage of collecting dead cycles, because if the cycle is not reachable from the roots, then it won't be in the live set. The cyclic references don't matter because those edges are never traversed. The disadvantage is that, without a lot of hard work, when the collector is doing its bookkeeping, the program is halted until the collector is finished analyzing the whole heap. This can result in long, unpredictable GC pauses.
Reference counting is to tracing as yin is to yang. The former operates on dead, unreachable objects while the latter operates on live, reachable things. Fun fact: every high performance GC algorithm (such as generational GC or reference counting with "purple" nodes and trial deletion) uses a mixture of both, whether it appears so on the surface or not. "A Unified Theory of Garbage Collection" by Bacon, et all discusses this in depth.
I opted to implement a tracing garbage collector for Oxischeme. In particular, I implemented one of the simplest GC algorithms: stop-the-world mark-and-sweep. The steps are as follows:
- Stop the Scheme program execution.
- Mark phase. Trace the live heap starting from the roots and add every
reachable object to the
- Sweep phase. Iterate over each object
xin the heap:
xis an element of the
xis not an element of the
markedset, reclaim it.
- Resume execution of the Scheme program.
Because the garbage collector needs to trace the complete heap graph, any
structure that holds references to a garbage collected type must participate in
garbage collection by tracing the GC things it is holding alive. In Oxischeme,
this is implemented with the
whose implementation requires a
trace function that returns an iterable of
Note that Oxischeme separates tracing (generic heap graph traversal) from
marking (adding live nodes in the heap graph to a set). This enables using
Trace to implement other graph algorithms on top of the heap graph. Examples
include computing dominator trees and retained sizes of objects, or finding the
set of retaining paths of an object that you expected to be reclaimed by the
collector, but hasn't been.
If we were to introduce a
Trio type that contained three cons cells, we would
implement tracing like this:
What causes a garbage collection? As we allocate GC things, GC pressure increases. Once that pressure crosses a threshold — BAM! — a collection is triggered. Oxischeme's pressure application and threshold are very naive at the moment: every N allocations a collection is triggered, regardless of size of the heap or size of individual allocations.
A root is an object in the heap graph that is known to be live and reachable. When marking, we start tracing from the set of roots. For example, in Oxischeme, the global environment is a GC root.
In addition to permanent GC roots, like the global environment, sometimes it is necessary to temporarily root GC things referenced by pointers on the stack. Garbage collection can be triggered by any allocation, and it isn't always clear which Rust functions (or other functions called by those functions, or even other functions called by those functions called from the first function, and so on) might allocate a GC thing, triggering collection. The situation we want to avoid is a Rust function using a temporary variable that references a GC thing, then calling another function which triggers a collection and collects the GC thing that was referred to by the temporary variable. That results in the temporary variable becoming a dangling pointer. If the Rust function accesses it again, that is Undefined Behavior: it might still get the value it was pointing at, or it might be a segfault, or it might be a freshly allocated value being used by something else! Not good!
There are two possible solutions to this problem. The first is conservative garbage collection, where we walk the native stack and if any value on the stack looks like it might be a pointer and if coerced to a pointer happens to point to a GC thing in the heap, we assume that it is in fact a pointer. Under this assumption, it isn't safe to reclaim the object pointed to, and so we treat that GC thing a root. Note that this strategy is simple and easy to retrofit because it doesn't involve changes in any code other than adding the stack scanning, but it results in false positives. The second solution is precise rooting. With precise rooting, it is the responsibility of the Rust function's author to explicitly root and unroot pointers to GC things used in variables on the stack. The advantage this provides is that there are no false positives: you know exactly which stack values are pointers to GC things. The disadvantage is the requirement of explicitly telling the GC about every pointer to a GC thing you ever reference on the stack.
Almost every modern, high performance tracing collector for managed runtimes uses precise rooting because it is a prerequisite* of a moving collector: a GC that relocates objects while performing collection. Moving collectors are desirable because they can compact the heap, creating a smaller memory footprint for programs and better cache locality. They can also implement pointer bumping allocation, that is both simpler and faster than maintaining a free list. Finally, they can split the heap into generations. Generational GCs gain performance wins from the empirical observation that most allocations are short lived, and those objects that are most recently allocated are most likely to be garbage, so we can focus the GC's efforts on them to get the most bang for our buck. Precise rooting is a requirement for a moving collector because it has to update all references to point to the new address of each moved GC thing. A conservative collector doesn't know for sure if a given value on the stack is a reference to a GC thing or not, and if the value just so happens not to be a reference to a GC thing (it is a false positive), and the collector "helpfully" updates that value to the moved address, then the collector is introducing migraine-inducing bugs into the program execution.
* Technically, there do exist some moving and generational collectors that are "mostly copying" and conservatively mark the stack but precisely mark the heap. These collectors only move objects which are not conservatively reachable.
Oxischeme uses precise rooting, but is not a moving GC (yet). Precise rooting is
implemented with the
oxischeme::heap::Rooted<T> smart pointer
RAII type, which roots its referent upon construction and unroots it
when the smart pointer goes out of scope and is dropped.
Using precise rooting and
Rooted, we can solve the dangling stack pointer
problem like this:
With all of that out of the way, here is the implementation of our mark-and-sweep collector:
Why do we have four calls to
sweep, one for each type that Oxischeme
implements? To explain this, first we need to understand Oxischeme's allocation
Oxischeme does not allocate each individual object directly from the operating
system. In fact, most Scheme "allocations" do not actually perform any
allocation from the operating system (eg, call
Box::new). Oxischeme uses a set of
oxischeme::heap::Arenas, each of
which have a preallocated object pool with each item in the pool either being
used by live GC things, or waiting to be used in a future allocation. We keep
track of an
Arena's available objects with a "free list" of indices into its
When the Scheme program allocates a new object, we remove the first entry from
the free list and return a pointer to the object at that entry's index in the
object pool. If the every
Arena is at capacity (ie, its free list is empty), a
Arena is allocated from the operating system and its object pool is used
for the requested Scheme allocation.
For simplicity, Oxischeme has separate arenas for separate types of
objects. This sidesteps the problem of finding an appropriately sized free block
of memory when allocating different sized objects from the same pool, the
fragmentation that can occur because of that, and lets us use a plain old vector
as the object pool. However, this also means that we need a separate
ArenaSet<T> for each
T object that a Scheme program can allocate and why
oxischeme::heap::Heap::collect_garbage has four calls to
During the sweep phase of Oxischeme's garbage collector, we return the entries
of any dead object back to the free list. If the
Arena is empty (ie, the free
list is full) then we return the
Arena's memory to the operating system. This
prevents retaining the peak amount of memory used for the rest of the program
This concludes our tour of Oxischeme's current implementation of memory
management, allocation, and garbage collection for Scheme programs. In the
future, I plan to make Oxischeme's collector a moving collector, which will pave
the way for a compacting and generational GC. I might also experiment with
incrementalizing marking for lower latency and shorter GC pauses, or making
sweeping lazy. Additionally, I intend to
declare to the rust compiler that operations on un-rooted GC pointers unsafe,
but I haven't settled on an implementation strategy yet. I would also like to
experiment with writing a syntax extension for the Rust compiler so that it can
Trace implementations, and they don't need to be written by hand.
Thanks to Tom Tromey and Zach Carter for reviewing drafts.
A Unified Theory of Garbage Collection by Bacon, et all. Great paper!
Improving Python's Memory Allocator by Evan Jones. This was a nice resource when planning Oxischeme's allocation strategy. Interesting side note: CPython uses reference counting and a cycle detector rather than a tracing collector.