Spying on a Ruby process's memory allocations with eBPF

Today instead of working on CPU profilers, I took the day to experiment with a totally new idea!

My idea at the beginning of the day was – what if you could take an arbitrary Ruby process’s PID
(that was already running!) and start tracking its memory allocations?

Spoiler: I got something working! Here’s an asciinema demo of what happened. Basically this shows a
live-updating cumulative view of rubocop’s memory allocations over 15 seconds, counted by class. You can see
that Rubocop allocated a few thousand Arrays and Strings and Ranges, some Enumerators, etc.

This demo works without making any code changes to rubocop at all – I just ran bundle exec
rubocop to start it. All the code for this is in https://github.com/jvns/ruby-mem-watcher-demo
(though it’s extremely experimental and likely only works on my machine right now).

how it works part 1: eBPF + uprobes

The way this works fundamentally is relatively simple. On Linux ~4.4+, you have this feature called
“uprobes” which let you attach code that you write to an arbitrary userspace function. You can do
this from outside the process – you ask the kernel to modify the function while the program is
running and run your code every time the function gets called.

You can’t ask the kernel to run just any code, though (at least not with eBPF) – you ask it to
run “eBPF bytecode” which is basically C code where you’re restricted in what memory you can access.
And it can’t have loops.

So the idea is that I’d run a tiny bit of code every time a new Ruby object was created in
rubocop, and then that code would count memory allocations per class.

This is the function I wanted to instrument (add a uprobe to): newobj_slowpath.

Next, here’s the Python part. This is just a while loop that every second reads counts (the same BPF hash
before, but magically accessible from Python somehow!!), prints out what’s in there, and then clears
it..

calling rb_class2name

Calling rb_class2name is pretty easy – I just needed to find the address of rb_class2name (which
I already know how to do from rbspy), cast that address to the right kind of function pointer
(extern "C" fn (u64) -> u64), and then call the resulting function!

Of course all of this (copying the memory maps, casting essentially a random address into a function
pointer, calling the resulting function) is unsafe in Rust, but I can still do it!

When I finally got this to work at like 9pm today I was so delighted.

segfaults

I kept running into segfaults when trying to translate class pointers into names. Instead of
debugging this (I just wanted to get a demo to work!!) I decided to just figure out how to ignore
the segfaults because it wasn’t always segfaulting, just sometimes.

here is what I did (this is silly, but it was fun)

before doing the thing that causes the segfault, fork

in the child process, try to do the potentially segfaulting thing and print out the answer

if the child process segfaults, ignore it and keep going

this worked great.

how the Rust program and the Python program work together

the way the final demo works is:

the Python program is in charge of getting class pointers + counting how many times each of them
has been allocated (with uprobes + BPF)

the Rust program is in charge of mapping class pointers to class names – you call it with a PID and a
list of class pointers as command arguments, and it prints out the mappings to stdout

This is of course all a hacky mess but it worked and I got it to work in 1 day which made me super
happy! I think it should be possible to do this all in Rust – as long as I can compile and save
the appropriate BPF program, I should be able to call the right system calls from Rust to insert
that compiled BPF program into the kernel without using bcc. I think.

design principle: magic

The main design principle I’m using right now is – how can I build tools that just feel really
magical? (they should also hopefully be useful, of course :)). But I think that eBPF enables a lot
of really awesome things and I want to figure out how to show that to people!

I feel like this idea of streaming you live updates about what memory your Ruby process is
allocating (without having to make any changes in your Ruby program beforehand) feels really magical
and cool. There’s still a lot of work to do to make it useful and it’s not clear how stable I can
make it, but I am delighted by this demo!