Writing eBPF tracing tools in Rust

tl;dr: I made an experimental Rust repository that lets you write BPF tracing tools from Rust! It’s
at https://github.com/jvns/rust-bcc or https://crates.io/crates/bcc, and has a couple of hopefully
easy to understand examples. It turns out that writing BPF-based tracing tools in Rust is really
easy (in some ways easier than doing the same things in Python). In this post I’ll explain why I
think this is useful/important.

For a long time I’ve been interested in the BPF compiler collection,
a C -> BPF compiler, C library, and Python bindings to make it easy to write tools like:

and a lot more. The list of available tools in
the /tools directory
is really impressive and I could write a whole blog post about that.
If you’re familiar with dtrace – the idea is that BCC is a little bit like dtrace, and in fact
there’s a dtrace-like language named ply implemented with BPF.

This blog post isn’t about ply or the great BCC tools though – it’s about what tools we need to
build more complicated/powerful BPF-based programs.

What does the BPF compiler collection let you do?

Here’s a quick overview of what BCC lets you do:

compile BPF programs from C into eBPF bytecode.

attach this eBPF bytecode to a userspace function or kernel function (as a “uprobe” / “kprobe”) or install it as XDP

communicate with the eBPF bytecode to get information with it

A basic example of using BCC is this
strlen_count.py program
and I think it’s useful to look at this program to understand how BCC works and how you might be
able to implement more advanced tools.

First, there’s an eBPF program. This program is going to be attached to the strlen function from
libc (the C standard library) – every time we call strlen, this code will be run.

This eBPF program

gets the first argument to the strlen function (the address of a string)

reads the first 80 characters of that string (using bpf_probe_read)

increments a counter in a hashmap (basically counts[str] += 1)

The result is that you can count every call to strlen. Here’s the eBPF program:

After that program is compiled, there’s a Python part which does b.attach_uprobe(name="c", sym="strlen", fn_name="count") –
it tells the Linux kernel to actually attach the compiled BPF to the strlen function so that it
runs every time strlen runs.

The really exciting thing about eBPF is what comes next – there’s no use keeping a hashmap of
string counts if you can’t access it! BPF has a number of data structures that let you share
information between BPF programs (that run in the kernel / in uprobes) and userspace.

So in this case the Python program accesses this counts data structure.

BPF data structures: hashmaps, buffers, and more!

There are basically 2 kinds of BPF data structures – data structures suitable for storing
statistics (BPF_HASH, BPF_HISTOGRAM etc), and data structures suitable for storing events
(like BPF_PERF_MAP) where you send a stream of events to a userspace program which then displays
them somehow.

There are a lot of interesting BPF data structures (like a trie!) and I haven’t fully worked out
what all of the possibilities are with them yet :)

What I’m interested in: BPF for profiling & tracing

Okay!! We’re done with the background, let’s talk about why I’m interested in BCC/BPF right now.

I’m interested in using BPF to implement profiling/tracing tools for dynamic programming languages,
specifically tools to do things like “trace all memory allocations in this Ruby program”. I think
it’s exciting that you can say “hey, run this tiny bit of code every time a Ruby object is
allocated” and get data back about ongoing allocations!

Rust: a way to build more powerful BPF-based tools

The issue I see with the Python BPF libraries (which are GREAT, of course) is that while they’re
perfect for building tools like tcplife which track tcp connnection lengths, once you want to
start doing more complicated experiments like “stream every memory allocation from this Ruby program,
calculate some metadata about it, query the original process to find out the class name for that
address, and display a useful summary”, Python doesn’t really cut it.

So I decided to spend 4 days trying to build a BCC library for Rust that lets you attach + interact
with BPF programs from Rust!

This table contains a hashmap mapping strings to counts. So we need to iterate over that table and
print out the keys and values. This is pretty simple: it looks like this.

let iter = table.into_iter();
for e in iter {
// key and value are each a Vec<u8> so we need to transform them into a string and
// a u64 respectively
let key = get_string(&e.key);
let value = Cursor::new(e.value).read_u64::<NativeEndian>().unwrap();
println!("{:?} {:?}", key, value);
}

Basically all the data that comes out of a BPF program is an opaque Vec<u8> right now, so you need
to figure out how to decode them yourself. Luckily decoding binary data is something that Rust is
quite good at – the byteorder crate lets you easily decode u64s, and translating a vector of
bytes into a String is easy (I wrote a quick get_string helper function to do that).

I thought this was really nice because the code for this program in Rust is basically exactly the
same as the corresponding Python version. So it very pretty approachable to start doing experiments
and seeing what’s possible.

Reading perf events from Rust

The next thing I wanted to do after getting this strlen example to work in rust was to handle
events!!

Events are a little different / more complicated.
The way you stream events in a BCC program is – it uses perf_event_open to create a ring buffer
where the events get stored.

Dealing with events from a perf ring buffer normally is a huge pain because perf has this
complicated data structure. The C BCC library makes this easier for you by letting you specify a C
callback that gets called on every new event, and it handles dealing with perf. This is super
helpful. To make this work with Rust, the rust-bcc library lets you pass in a Rust closure to run
on every event.

Rust example 2: opensnoop.rs (events!!)

To make sure reading BPF events actually
worked, I implemented a basic version of opensnoop.py from the iovisor bcc tools: opensnoop.rs.

I won’t walk through the C code in this case because there’s a lot of it but basically the eBPF
C part generates an event every time a file is opened on the system. I copied the C code verbatim
from opensnoop.py.

The Rust part starts out by compiling BPF code & attaching kprobes (to the open system call in the
kernel, do_sys_open). I won’t paste that code here because it’s basically the same as the strlen
example. What happens next is the new part: we install a callback with a Rust closure
on the events table, and then call perf_map.poll(200) in a loop. The design of the BCC library
is a little confusing to me still, but you need to repeatedly poll the perf reader objects to make
sure that the callbacks you installed actually get called.

This is the callback code I wrote, that gets called every time. Again, it takes an opaque Vec<u8>
event and translates it into a data_t struct to print it out. Doing this is kind of annoying (I
actually called libc::memcpy which is Not Encouraged Rust Practice), I need to figure out a less
gross/unsafe way to do that. The really nice thing is that if you put #[repr(C)] on your Rust
structs it represents them in memory the exact same way C will represent that struct. So it’s quite
easy to share data structures between Rust and C.

You might notice that this is actually a weird function that returns a callback – this is because I
needed to install 4 callbacks (1 per CPU), and in stable Rust you can’t copy closures yet.

output

Here’s what the output of that opensnoop program looks like!

This is kind of meta – these are the files that were being opened on my system when I saved this
blog post :). You can see that git is looking at some files, vim is saving a file, and my static
site generator Hugo is opening the changed file so that it can update the site. Neat!

using rust-bcc to implement Ruby experiments

Now that I have this basic library that I can use I can get counts + stream events in Rust, I’m
excited about doing some experiments with making BCC programs in Rust that talk to Ruby programs!

The first experiment (that I blogged about last week) is
count-ruby-allocs.rs
which prints out a live count of current allocation activity. Here’s an example of what it prints
out: (the numbers are counts of the number of objects allocated of that type so far).

Related work

Geoffrey Couprie is interested in building more advanced BPF tracing tools with Rust
too and wrote a great blog post with a cool proof of concept:
Compiling to eBPF from Rust.

I think the idea of not requiring the user to compile the BPF program is exciting, because you could
imagine distributing a statically linked Rust binary (which links in libcc.so) with a pre-compiled
BPF program that the binary just installs and then uses to do cool stuff.