Think Big

We recently had an incident where we had a Redis instance blocked on writing to disk, hung inside the kernel.

Through a variety of other circumstances this meant the only up to date copy of business critical data was in memory on a single machine, multiple independent replicas and backups were unavailable, and that machine could crash at any moment.

The problem was eventually solved without data loss of any kind due to our Techops team being utterly brilliant, but during the “incident” a number of ideas that would otherwise be dismissed as absolutely crazy were discussed.

One of the ideas was based on the simple idea that redis was an in-memory database. We had access to the machine. And the repeated question came up of just copying all the memory off of the machine, or at least out of the process, and wrest our data from it’s grasp.

Although mostly not serious, at least to start with, as options lessened the idea of taking a core-dump and picking it apart looked more and more like a straw worth clutching at.

It ended up not being needed, which is a good thing, but I figured out that it was at least possible, if you are really really really desperate, and have some time on your hands. On the off-chance anyone is really that desperate in the future, I thought I’d at least list out what direction you would need to take, and wish you the best of luck.

(Past this point I’m going to assume an exceptional knowledge of C, a good knowledge of GDB, ELF loading, symbol lookup, the amd64 ABI used by Linux, memory layout and generally how the dynamic linker works. If interested check out the osdev.org wiki, especially on linking, ELF loaders etc and http://www.x86-64.org/documentation/abi.pdf – Or just fake it and read on)

Given you’re at point of desperation, you’re probably going to be limited in either time or what you can do. At the very least you are going to want to get a core dump of the redis process, taking into account that it’s going to have to write to somewhere and if you can’t just do a redis SAVE, then there is a chance that you can’t write to disk.

If you have to write to /tmp using tmpfs, realise that you might run out of memory at some point when writing it, and although the OOM killer probably won’t kick in, be very very sure first. You might be able to use the remote gdb stub/server to dump out the core over the network rather than locally in a pinch.

GDB Helping

You can get gdb to attach to a process, dump out a core file using generate-core-file and then detach from the process and continue executing:

Other things you want to get at this point if you can. Some of these might not be possible, or might not be possible to get off:

A screenshot of when you tool the dump! It includes:

Every library that was loaded that gdb found (and didn’t find) symbols for

Actual annotated call frame information with symbols and everything.

A copy of /proc/($pid)/maps for the process

The exact version of redis running, including any distribution specific patches (the distribution package number should be enough; You want to get an exact copy of the source code, and in a pinch you might need to reproduce the entire distribution build environment)

A copy of the redis binary for the process running in memory. If you have upgraded the redis-server and not restarted redis, this might not be available, but knowing the distribution package version might be enough to get it from elsewhere

Get this off of the server ASAP and somewhere you can work on it.

GDB Hell

At the time of writing, gdb had a couple of really big frustrating limitations and/or bugs when writing corefiles out.

The biggest is that although at the time of dumping it was quite happy loading in relative symbols from the binaries and libraries, and then translating them to absolute addresses from where the ELF loader had put the individual sections at runtime, the corefile had neither the symbol information, or the section load information (or at least not in a way that gdb could read it itself afterwards)

If the core was dumped with a recent version of gdb, you might get some success with running gdb like this:

gdb /usr/bin/redis -c <corefile>

But, this wasn’t working on the version shipping with Debian wheezy.

The reason is that gdb just wasn’t saving the section mappings, or annotating them in full in the coredump. Here is the difference between a recent gdb coredump, and the coredump I got from production in the same method:

If you don’t have the /proc/($pid)/maps file, you might have to use readelf/objdump or something to disassemble the original binary, disassemble part of your coredump’s text section, and find some code blocks that match.

This isn’t quite as bad as it looks, but almost. If you have a chance to use gdb more on the original machine running redis, dump out the symbol location of a well-known function inside redis-server, and use that to figure out the .text section base. e.g.:

You still have to rewrite symbol table, or otherwise tell gdb where the .text section for the binary is (although there is an option for gdb to do that when loading symbols, it didn’t actually work for me as documented. gdb is a twisty maze of mostly working code)

Another option is to use the maint print symbol <filename> and other maint print *symbol commands to dump out gdb’s internal state while attached to the original process. It’s in a human, not machine readable format, but will give you the information.

This is probably an hour or two’s work, just to start being able to read the corefile working around gdb’s eccentricities. This is going to be the case with almost any version of gdb, made more fun by the fact that the amd64 arch code isn’t quite as mature as the x86 code.

Getting the source

Now we just need to get the source code for that exact version of redis. If we’re using debian, this is pretty easy if you have the package and version name; Just use apt-get source.

You will probably find it much easier past this point to be running the same development environment that the package was built with, but you will absolutely need the same version of GCC; Older versions of GCC will pack memory differently and optimise differently. You will almost certainly need the same versions of libc, libjemalloc, libpthread etc as were running on the original server.

If you’re really, really lucky, you should at this point be able to build redis-server and produce close to an identical binary as was running on the server (There is actually a reproducable-build patch in debian’s redis package which I suspect helps with this as well). Moreover, you should be able to have access to both the coredump, and the underlying source code and datastructures.

Digging into the corefile

So at this point, hopefully we have a gdb at a point that it not only has read the corefile, but also knows where the symbols are in memory. This is absolutely brilliant, because redis.c has the most awesome global variable ever:

Because it’s declared as a global variable, it means that the linker has put it in the .bss segment so the ELF loader will zero out the memory for it at load time.

It also means that it has it’s own symbol table entry, so even from the core dump we know where all our redis server state is.

At this point though, gdb doesn’t know the structure of the data there:

(gdb) p server
$1 = -1589279768
(gdb)

What we need to do is rebuild redis-server from the same version, and then use the binary without the debugging data stripped out to let gdb figure out where the source code is, and from there gdb will be able to go through the redis source and figure out how to introspect the datastructures.

I just used dpkg-buildpackage in the unpacked debian source for the same version as the redis-server I took a coredump from, and then went into the ‘src/’ directory where the built binaries and source code were sitting. Then we need to get gdb to load in the debugging data from the unstripped redis-server binary without disregarding the existing symbol information etc from the coredump.

This is possible, but gdb can’t figure out on it’s own where the .text section was loaded to in the process we coredumped. We can introspect gdb’s state again and get out the .text base address, and then just tell gdb when loading in the new redis-server binary to offset the .text section by that amount. “info files” will show where gdb has loaded in each file so far, and where it has mapped each file section to.

This is about as far as I got, but it was enough to start introspecting the data in a structured way. Getting past here you have wonderful options such as:

Scripting gdb to just walk the data for you

Dumping out the memory and then write a wrapper program using the redis headers / code itself to in a hack to walk the datastructures (heck, you might even be able to use the redis SAVE code itself if you are very very lucky)

Using gdb to start up a new running redis process and somehow overlay all the memory referenced to from server.db[0] from the coredump, and then reference it in one of the server.db[n] slots

<

p>The second is probably the best option. The third would be pretty cool, but the details would start being really annoying (like having to rewrite all the pointers in the entire db if you can’t map in the data into exactly the same addresses)