Belated Codegate 2014 Quals Writeups and Lessons Learned

The local challenges can be grabbed from here and various other writeups are online. I was off on the timing for this one, so I only dove into most the challenges on Sunday morning… right before codegate ended and after it ended. I did pretty terrible rank-wise, but I thought the challenges were fun anyway and solved some after the codegate was over.

I learn new things almost every CTF, even if it’s sometimes just things that make sense intuitively but I’ve just never thought about before or forgotten. Here are some lessons learned from codegate.

stack canaries on linux stay constant and you can get them with an info leak. On windows there is more entropy and this wouldn’t have been as straightforward (see Pwn 250)

ulimit -s pretty much defeats ASLR if you’re local. Also, there are areas you can overwrite in this space, like in the syscall mappings, to control the instruction pointer (See Pwn 350 and Pwn 400)

To control the number of bytes output locally, you can use non-blocking sockets (see Pwn 400)

gdb really can’t find gs: segments very well, but there are usually workarounds (see Pwn 350)

You can’t call openprocess with all access on a process being debugged (i.e. you can’t open two debuggers on the same process, even if it’s been forked) but you can openprocess with PROCESS_VM_READ. (reversing 250 – although I ended up going a different route)

I wrote a Pykd script that can be called on a windbg breakpoint and continue based on python stuff e.g. you could do a conditional break on a regex which seems tough in windbg without writing a script. (see reversing 250)

it looks like they’re doing syscalls for every character written and read

Most of the binary is garbage, but you can see clusters of syscalls where the output and input happen. Looking in IDA, there are four clusters of syscalls. One outputs the fail, one outputs “Enter root password”, one is a huge loop that inputs the password, and one outputs winner.

With gdb, I recorded executions and executed backworkds until I was in the giant input loop. At that point I started searching for the password was as an offset to rbp, since this is how it outputs strings as syscalls. Sure enough, I found it pretty quickly.

Reversing 250 Clone Technique

This one has on the description: “Limited processes will be generated. Which one has the flag?”

We can see in the main function that it will call createProcessW in a loop that lasts 0x190 iterations, so that’s our number of processes.

Each process is called with three args, #randomlookingnumber# #randomlookingnumber# #counter#, where counter is 0,1,2,…

One thing I tried to do throughout this was put a breakpoint on memory access. For example,

ba w4 00409754

The value would change, but the breakpoint is never hit. Crazy! After some investigation with Joe, we eventually figured out this is because ba works by putting a break in one of four registers per processes. In our case, the memory is written to using a call to WriteProcessMemory, and the kernel is writing the memory, so our breakpoint is never hit.

Investigating this led to the cinit function, which is called before main and contains the calls to writeprocessmemory. It starts with code that grabs the command line args if they exist (and if not sets them to the first value. It then pushes them along with this interesting data string to a function, messes with a string it returns, and then 0s out that string. Even without looking at what the decode_func is doing, it looks like a likely place for a key!

My strategy was then to attach windbg to every one of these forked processes by turning childdbg on, changing the filters to control when it breaks, and set a breakpoint right before the rep stosd. I then wrote a python script to see if this is a candidate for the key.

Ok, so we might have to fix the pcap-ng file to view this correctly. I don’t know much about the format, but I opened it in a hex editor and searched for DWORD hex(4270407998), which is \x3e \x41 \x89 \xfe. I replaced this with DWORD 96 and got a similar error. Then I kind of brute forced – set it to 0x30 and got a different error. 0x40 gave a too big error. It took about 10 tries of manual binary searching, and then I got a format I could open in wireshark and follow the tcp stream.

Then just save the pdf part to a file and we get the key

Pwn 250 Angy Doraemon

This was a fun, relatively straightforward challenge. It’s a remote exploit. The bug is in the mouse function. When it reads the answer to “are you sure?” it reads 0x6E bytes into a buffer only a few bytes big.

The hard part is you have to bypass a non-executable stack and a stack canary. This is possible via an info leak. Because we can write over the end of our string, we can see other values on the stack, such as the canary and the fd. One interesting thing I learned is how on Linux the stack canary seems to be constant per process (on windows, as Matt Miller said, Windows cookies are basically an XOR’d combination of the current system time, process identifier, thread identifier, tick count, and performance counter.)

So you leak the canary, then you have to rop a payload. execl is in the .got, and I used some fancy redirecting to send the output right back through the socket. You need the file descriptor, but it’s on the stack and you need it anyway I think to do a read from the socket.

Pwn 350 4stone

This is a binary that plays a game locally, sort of like connect 4. All the interesting things happen if you win the game with 0 time (which you can do with a constant strategy – this took me longer than it should have). If you do win, it grabs some input from stdin.

As you can see from the following snippet, you get an “arbitrary” write (marked in red). There is a catch though. You can write anywhere, as long as it doesn’t start with 0x804, or 0x0b.

The stack is executable, so if we could overwrite anything, then we’d be sitting good. Unfortunately, most everything by default starts at 0x804 or 0x0b.

hmmm, we can write to the heap, but that’s tough. What can we do locally? My initial thought was to make the stack big enough so it wasn’t in the 0x0b range, then potentially overwrite a return pointer. There’s aslr on, but we might be able to brute force it. So I started filling up the stack with an execve call, but hit a limit. Ok, we can set this with ulimit. I tried this, and set it to unlimited, and something interesting happened. The mapping then looks something like this:

And a lot of these addresses, like libc, don’t change, even with ASLR! (apparently this is well known, but this is the first I’ve seen it). So where can we write in one of methods? The only function call after our arbitrary write is to exit, which makes a system call. Maybe we can overwrite that?

We can control edx and ecx with the gadget at 0x40000425, and another interesting thing is we may be able to control ebx indirectly if we can write to unk_8049150

.text:080480B4 lea ebx, unk_8049150 ; start
.text:080480BA int 80h ;

We have almost all we need for a system call, except we don’t have eax anywhere. After some research, one thing that’s promising is both “read” and “write” will return eax to the value of bytes read or written. My first thought was if I could set ecx or edx with the gadget at 0x40000425 then I might be able to use one of the existing reads or writes to control eax. This is close to working, like if they would mov eax, 4 (syswrite) immmediately before the int 80, it would have worked. But unfortunately there’s no flow like that. They all move eax, 4 at the beginning, then set all the args afterward.

I was stuck here, but luckily the good thing about trying this one late is there are solutions posted. This is an excellent writeup, and I used it to cheat: http://mslc.ctf.su/wp/codegate-2014-quals-minibomb-pwn-400/. They set a nonblocking socket to control how much could be written. Cool! I never would’ve though of that.

The sqli is relatively straightforward. You can have a True/false query like this.

password='or(1=2)and'1
password='or(1=1)and'1

But we have only 120 guesses*, and the password is quite long at 30 chars. This returns true

password='or(CHAR_LENGTH(password)=30)and'1

so that means we have only an average of four queries per letter, or 120 guesses for the whole 30 character password. They are lower case at least, so that can reduce our queries by quite a bit, but if we query all the bits, that’s 5 per letter and is too many with a true/false query (which would need 5 per letter). And if you do a straight binary search of the 30 char password, that’s on the order of a few hundred per session*

But, we really have more than true/false – we have an arbitrary number of states. true, false, and timing. We can sleep different amounts of times, etc. For example, we’re only lacking one bit, so we could ask, is the next bit 1 (return true), 0 (return false), or are all the bits 1 (sleep 10 seconds).
The query ended up pretty complicated, and I had to add some code that checked sanity to make sure letters were being decoded correctly.*