Implementing Weld in Rust

Weld is a runtime and language for high performance data analytics, developed
in the Stanford Infolab. It is implemented
in Rust, a modern take on a fast systems
programming language. In this blog post we provide our experiences
implementing a low-level systems research project in Rust (with no prior
experience with the language). We hope this will help other developers evaluate
Rust when choosing a language for their system.

First, a bit of background on Weld. The Weld
language includes common functional operators like map, filter, reduce,
and groupBy. Writers of analytics libraries who want to improve performance can
make their APIs lazy and emit Weld code for individual functions. Our runtime can then
optimize entire call graphs and produce fast native code without unnecessary
intermediate data materialization. Weld is essentially a just-in-time (JIT)
compilation system for analytics libraries.

In brief, Rust was a joy to work with. Its powerful pattern matching made
writing compiler optimization rules very easy, and we found that code that
compiled usually “just worked” because of the stringent checks Rust’s own
compiler performs on code the programmer writes. Additionally, writing C APIs
in Rust was also very easy; Rust provides good support and documentation for
doing this. While there are a few features the Rust developers are working on
which we hope appear in the stable compiler soon (namely, incremental
compilation and the box keyword), we wholeheartedly reccomend Rust to those
looking for a fun, powerful programming language for their next project.

Stay tuned for another blog post that describes Weld in greater detail. For
now, keep reading for our thoughts on writing Weld in Rust, and join the
discussion on Weld on our Google
Group!

Experiences with Rust

Weld is written in around 12000 lines of Rust code. It includes a compiler with
a tokenizer, parser, the abstract syntax tree (AST) definition, transformations
on the AST, and a code generator targeting LLVM, as well as a C API for
interacting with the compiler. Rust’s excellent LLVM bindings (llvm-sys) made
generating and JIT compiling LLVM code particularly easy.

We used Rust v.1.14 (stable) when developing Weld.

Implementing Rule-Based Optimizations with Pattern Matching

Weld’s optimizer takes an AST and applies a set of rule-based optimizations to it. These optimizations
find a subtree in the AST and replace it with a more efficient subtree.

Rust’s pattern matching makes writing transformations rules easy.
We wrote a transform method for the Expr type which, given a function which
takes an Expr and returns an Option<Expr>, applies the function to each
node in the AST. If the Option is Some(e), the node to which the function
was applied is replaced by the expression e. If the Option is None, the
expression is not substituted. This allows us to express transforms on the AST
elegantly by defining a function which specifies a set of match rules and
then calling transform.

Below, we have a simple match rule which matches on two i32 literals being summed. The expression is replaced by summing the
integers at compile time and replacing the add operation with a single literal:

transform is taking a closure here. The nested if let statements destructure a sub-expression to build the match rule.
If we find a match, we replace the expression with an expression which adds the literal values.

One point of tedium is that we need to use if let statements to destructure each expression
instead of being able to “nest” destructuring structs. For example, a similar implementation in Scala is much more concise since
patterns are nested:

The reason we cannot nest match rules in Rust is that each expression in our implementation is actually a Box<Expr>. Box is Rust’s
type designating heap allocated memory, and because we want to support in-place updates to the AST we want to use this type instead
of just Expr. Unfortunately, Rust doesn’t support nesting patterns for these boxed values, since the pattern would match on a Box
instead of the Expr (in the code above, ref dereferences the box). This causes deeply nested if let statements even for simple match rules.

Rust solves this problem using the box
keyword, which
allows matching on the contents of the Box. Unfortunately, this keyword is
not yet available in the stable version of the Rust compiler.

Handling Errors

Handling errors is a tedious undertaking in a systems project, and
especially in low-level projects written in C and C++. However, Rust’s Result
type makes it very easy. Result<T, U> is an enum with two variants: Ok(T) and
Err(U). These variants specify whether a function call was successful (in
which case the function would return Ok) or caused an error (in which case it would return Err).

Most functions in the Weld implementation return a Result, which enables use
of the try! macro. This macro checks if the
Result is Ok or Err. If it is Err, it exits the function and returns the Result.
If it is Ok, it unwraps the value contained in the Ok. This
makes it easy to propagate errors deep in a call stack up to some place where
they can be presented to the user. As an example, the following code appears in Weld’s parser:

The tokenize method also returns
a Result. If tokenization fails, the first line of the function exits and
returns an error. Otherwise, tokens is set to the value contained in the Ok returned
by tokenize.

In all, Result and try! make error propagation very easy. We found
that Rust did a much nicer job in making error handling code (something every
program should do!) elegant when compared to the errnos and error codes C
programmers are accustomed to.

rustc Compilation

Our first stab at writing a parser for Weld’s IR used a parser-generator
library called lalrpop. Unfortunately, one major
limitation of this library was the amount of code it generated. Because lalrpop (when we used it)
created a parser with over 10000 lines of code, Rust took over a minute to compile it. This was partially caused
by the lack of incremental compilation in stable Rust.
These compile times took a significant toll on productivity especially since we modified the IR’s
parser fairly often. Eventually, we opted to implement our own recursive descent
parser from scratch to reduce the compilation time.

In general, we found that Rust code took some time to compile;
changing a single file and then running cargo build takes roughly 10 seconds
on a standard laptop, while cargo build --release (which enables compiler
optimizations) takes around a minute. However, because of the copious
compile-time guarantees Rust makes in its language, we often found that
compiled code was working code; this certainly isn’t true in most C code we
write!

In addition, Rust has been looking into incremental computation for some time
now; we hope this
feature makes it into the stable compiler soon.

Interfacing with C

Weld interfaces with C by exposing its API as a set of C functions. This allows
Weld’s API to be called by languages such as Python, which have utilities for
calling C functions (e.g., Python’s ctypes module). An initial
concern with Rust was whether it would be easy to support exporting C-style
APIs easily; we quickly found that Rust had good support for this.

In Rust, a C function can be exported by using the extern "C" qualifier on a function, marking
it as public, and using the #[no_mangle] annotation:

#[no_mangle]pubextern"C"fnmy_c_callable_function(){println!("Hello from Rust!");}

The above function will be exported and callable from C. The declaration of
this function in C would look as you might expect:

voidmy_c_callable_function();

Rust also provides wrappers for C data types in the external libc crate. Weld
uses these to return integers and strings back into C:

Things get interesting when we need to pass richer data types from Rust to C.
Weld uses opaque handles (void * with a typedef) in its API to do this. The
void * pointers are heap-allocated Rust values (i.e., values with type
Box<T>). Of course, C doesn’t understand what a “Box” is, so we
need to acquire a raw pointer before passing it out of Rust.

One tricky thing here is that Rust figures out when it should release memory it
allocates. For example, a boxed value is freed when it goes out of scope.
When passing values to C, however, this isn’t exactly what we want; Rust
shouldn’t release a value when a Box goes out of scope, because some C code
might still be using its underlying data.

Rust gets around this with Box’s into_raw function. into_raw
automatically forgets about the heap allocated memory, so it can be passed
into C without fear of Rust deallocating it. When a heap allocated object is
passed back into C, we can use the from_raw function to reconstruct a Box
and tell Rust to treat it as an ordinary object it allocated. Alternatively, we can
just dereference the passed pointer in Rust and do operations on it. Note that both
of these operations are unsafe since Rust can’t make any guarantees about whether the
pointer is valid.

In all, Weld’s API looks something like the follows. For every type in the API,
Rust allocates boxed memory and casts it to a *mut T (a raw pointer) before
returning it. When a C caller passes the pointer back (as a raw pointer), we
use from_raw to reconstruct the Rust object, and then use the object. Below
is an example for the WeldConf type, which wraps a Rust HashMap and holds
configuration options.

pubstructWeldConf{conf_map:HashMap<CString,CString>,// other stuff}#[no_mangle]/// Return a new Weld configuration.pubextern"C"fnweld_conf_new()->*mutWeldConf{// forgets the allocated value, get a raw ptrBox::into_raw(Box::new(WeldConf::new()))}#[no_mangle]/// Return the number of configurations in the conf.pubunsafeextern"C"fnweld_conf_count(ptr:*constWeldConf)->libc::uint64_t{// Dereference the WeldConf and convert it to a // reference.letconf=&*ptr;// do stuff with conf; it's a regular Rust object.conf.conf_map.count()}#[no_mangle]/// Frees a configuration.pubunsafeextern"C"fnweld_conf_free(ptr:*mutWeldConf){// The value is reconstructed and then dropped// (since it goes out of scope).Box::from_raw(ptr);}

In C, the header file declaring the above API looks like this:

// The opaque handle.typedefvoid*weld_conf_t;// Returns a new weld configuration.weld_conf_tweld_conf_new();/// Return the number of configurations in the conf.uint64_tweld_conf_count(weld_conf_tptr);// Frees a configuration.voidweld_conf_free(weld_conf_tptr);

In general, despite its many language features Rust is still able to seamlessly
integrate with existing C programs. While developers need to be cognizant of
when Rust frees memory to prevent dangling pointers and leaks, the Rust docs do
a fantastic job documenting how to avoid these problems.

Summary

To summarize, implementing Weld in Rust was definitely a positive experience. As
other Rust users have reported at length, the language has a larger learning
curve than other languages, but after grappling with the borrow checker for a
few days the advantages of the language begin to outweigh its complexity.

In conclusion:

Pattern matching and Rust’s expressive enums are a great fit for engineering a compiler.

Error handling code is very elegantly expressed using Result.

Compilation times are slow for large files, but the guarantees Rust provides
means bugs such as segfaults are non-existent (barring functions written using
the FFI); generally, code that compiles is very likely to just work.

The lack of the box keyword makes matching on recursive data types such as
ASTs where each node is a Box<T> unwieldy, since programmers are stuck using
nested if let statements. We hope this feature makes it to stable Rust soon.

Interfacing with C is intuitive, though it require some manual memory
management. Nonetheless, found that the relevant libraries are very well documented.

The llvm-sys crate, which provides wrappers around the LLVM libraries, made code generation
in our compiler very easy.

We hope this blog post helps other systems developers in evaluating whether
Rust is a good choice for them, in particular when writing libraries which need
to be called by non-Rust code.