Search form

Blog

DEFCON Capture the Flag Qualification Challenge #2

This is my second post in a series on DEFCON 22 CTF Qualifications. Last time I examined a problem called shitsco and gave a short overview of CTF. This week, I’d like to walk you through another DEFCON Qualification problem: “nonameyet” from HJ. This problem was worth 3 points and was opened late in the game. It was solved by 10 teams but, sadly, my team, Samurai, was not one of them. I managed to land this one about an hour after the game ended. It’s a common theme among CTF players that they don’t stop after the game ends. There’s always some measure of personal pride on the line when it comes to solving these problems, regardless of points earned.

The problem description for nonameyet is:

I claim no responsibility for the things posted here. nonameyet_27d88d682935932a8b3618ad3c2772ac.2014.shallweplayaga.me:80

There is no download link provided and the service is running on port 80. We are to assume that this is a web challenge. Browsing to the web application I see that it allows users to upload photos to a /photos directory, hence the disclaimer in the problem description. Whenever a file upload capability is involved in a CTF web challenge, you can bet that it will be a source of a vulnerability. I have yet to see a web application problem in a CTF that provided a counter example.

I am able to view the shadowed password file on the server. So far, so good. Unfortunately, asking for the flag file directly yields an error. Of course, a 3 point problem would never be so easy in this CTF. Let’s turn our attention back to the file upload.

The page with the HTML for the upload form is upfile.html. This is loaded with a “?page=upfile.html” on the end of the URL. Examining the HTML source code on this file shows that our form data is submitted to /cgi-bin/nonameyet.cgi. We can recover this CGI program with a simple wget command:

More interestingly, it is also possible to use the upload form to upload anything at all. This just begs to have a PHP backdoor uploaded to the system. We put a simple PHP file manager on nonameyet_27d88d682935932a8b3618ad3c2772ac.2014.shallweplayaga.me...and used that to look around the directory structure and permissions placed on the files. Specifically, we could see that the /home/nonameyet/flag file was owned by nonameyet:nonameyet. I need to gain execution as this user to retrieve the flag. The web server executing the PHP scripts (including our backdoor) was running as the web server user.

It is important to note that getting a shell on a box provides an opportunity for many new attack vectors. For this problem, it was actually solved by other teams editing the file /home/nonameyet/.bash_aliases to include an alias that would copy the /home/nonameyet/flag file to /tmp with world readable permissions. The next time anyone popped a shell on this box and ran “ls” they would hand the flag over to another team. This was a very clever and devious thing to do—and in some sense, this is what CTF is all about.

I believe that having this file editable was an oversight on the part of the organizers. This file should not have been writeable. It was a great advantage for the teams that realized this mistake because they were free to look at other problems while waiting for someone else to come along and solve it the “legitimate” way. Furthermore, anyone that thought to look in /tmp before the flag was cleaned up could score points too. Lesson learned: Always poke around more and possibly set up some sort of monitoring for these kinds of issues. I wish I had thought of this first!

Binary Analysis

I went straight for the binary in the problem. The binary was not marked SUID so there must be some webserver magic launching the CGI program as the nonameyet user. Indeed, HJ confirmed that he was using a modified version of suexec after the game. I have already run a file command to see that the CGI program is an ELF 32-bit program. My usual next step is to run strings.

$ strings nonameyet.cgi
…

I see imports for C functions related to string parsing and file operations including dangerous APIs like strcpy() and sprintf(). I see a list of the errors the CGI program will return and input variables like photo, time, and date. There are some chunks of HTML and HTTP headers too. So far, it is a fairly typical CGI program. If you try to run it you will get an error 900 printed out to you with HTML tags. A quick strace shows that it is looking for the photos directory. Create this directory and you will move on to the program prompting you for input. Just enter ^D to signal an end of file and you will receive an error 902. Back to the strings. One string that really caught my eye was the “cgilib” string. This is indicative of a cgilib library. There were other strings that pointed to a library as well, such as the “/tmp/cgilibXXXXXX” string.

Cgilib is a “library [that] provides a simple and lightweight interface to the Common Gateway Interface (CGI) for C and C++ programs. Its purpose is to provide an easy to use interface to CGI for fast CGI programs written in the C or C++ programming language.” It is also an open source project. We can see from the output of the file command that the nonameyet.cgi program is dynamically linked, so let’s take a quick peek with ldd to see if cgilib is statically compiled into the binary or dynamically loaded at runtime from our system library.

We do not see cgilib on the list returned form ldd, so the cgilib library is statically linked. That is to say that if the cgilib binary is used in this program, it must have been compiled into the binary, which means that I could have source code for a good chunk of this problem. That would be a great aid in the reverse engineering process. One way to match up statically compiled libraries into CTF binaries is to use the IDA Pro FLAIR tool to generate a FLIRT signature that can be applied to the binary.

Which version of the library should I grab? The reverse lookup on the IP address used for this problem pointed to an Amazon EC2 server. I created an EC2 instance running the latest version of Ubuntu and applied all updates. It is important to mirror the game box as closely as possible. It is even better if we can run from the same ISP. I installed cgilib with this command:

$ sudo apt-get install cgilib

This added a file in /usr/lib/cgilib.a. I pulled this file back to my analysis machine with FLAIR installed and ran:

C:\> pelf -a libcgi.a
C:\> sigmake -n "libcgi" libcgi.pat libcgi.sig

The first command “pelf” will parse the library file and generate patterns for all exported symbols. The output of the command is put into the libcgi.pat file. The next “sigmake” command will read from the libcgi.pat file and create a binary representation that is output in the libcgi.sig file. This sig file can then be copied into the IDA Pro /sig directory and applied to a live database. All of this completely failed. No symbols were applied. I have not identified why. Bummer.

Thankfully, the library is very simple and almost all of the functions contain unique strings. We can download the source code for libcgi, find a function we are interested in, find a string used in that function, then find the same string in IDA Pro. Once we find the string in IDA we can press ‘x’ while the cursor is positioned on that string to find cross-references. If we follow the (hopefully) single cross-reference that exists, we can then name the function referencing that string as it is named in the source code for cgilib. It is a bit slower than FLIRT signatures but we will be able to flag a significant portion of the program as “uninteresting” right away. For example, if we look at the cgiReadFile function in the cgilib source code cgilib-0.7/cgi.c:

We can then find the /tmp/cgilibXXXXXX string in IDA Pro with a “search sequence of bytes”.

This will fail! As it turns out, there is a compiler optimization used on this function causing the string to be loaded as an immediate value on the stack. This is also sometimes used in programs that want to make string analysis more difficult on the reverse engineer. Indeed, if we go back and look at the string output our first clue is there:

~$ strings nonameyet.cgi
…
/tmp
/cgi
libX
XXXXf
…

They are broken up into groups of 4. This is because they are referenced as immediate DWORD values being moved into memory. Let’s repeat the search using a smaller string. If we search for “/tmp” we see exactly one spot in the binary where this appears. Here is how IDA shows the string data being loaded onto the stack:

We can now go to the top of this function and name it (‘n’ key) “cgiReadFile.” If you go through the rest of cgi.c you will end up with the following functions named:

The function named cgi_print (my name, not the cgilib name) is frequently called to output error messages that would be useful for debugging purposes. A quick look at this function reveals that if we set dword_804f0dc (normally 0 in the .bss) to something greater than arg0 (I assume this is a logging level?) we can get debugging output from the binary. In gdb the command to do this is:

When looking at a CTF problem, you should always be asking yourself “What is happening with my input?” Most of the parsing happens right up front in the cgiInit() function. This function will read and parse CGI input and set up the s_cgi structure. This function first checks for the environment variable CONTENT_TYPE. CGI input is usually passed via environment variables and stdin from the webserver. If this environment variable is not set then the program will read variables from stdin.

If the CONTENT_TYPE variable is set to “multipart/form-data” it will parse out a boundary condition from the variable and call off into the cgiReadMultipart() function before returning. If the CONTENT_TYPE variable is anything else, the program then looks for the REQUEST_METHOD and CONTENT_LENGTH environment variables.

For a REQUEST_METHOD of “GET” the environment variable QUERY_STRING is parsed and for a REQUEST_METHOD of “POST” stdin is parsed. If none of these are specified then the cgiReadVariables() function will prompt for input from the command line. This is very handy for quick testing. The cgiInit() function will also parse cookie information. All of this was learned by reading the cgilib source code for cgiInit().

We have five code paths for parsing our input: multipart, get, post, stdin, and cookies. All of these are standard in cgilib. Which code path should we explore first? Let’s start with the simplest form, no environment variables, and data parsed directly from stdin.

Here we just set a variable asdf = asdf and we are returned error 902, the same as if we passed in no input. Looking back to main() we can easily spot where “ERROR: 902” is printed inside of an else block. Look up to the if condition on that else block and we see that this is because photo = cgiGetFile((int)s_cgi, “photo”); returned NULL. Setting the photo variable from stdin also produces the same error. The cgiGetFile() call did not find a variable called photo registered in the s_cgi structure. There is another interesting behavior here if we set the same variable twice:

I should mention that I am using PEDA with GDB. It makes exploit development tasks a lot easier than standard GDB. I encourage you to check it out and explore how it works. Anyway, this is a NULL pointer dereference crash. The register EAX is being dereferenced. EAX is NULL. As a result, the program sends a signal 11 or SIGSEGV and we terminate execution. The buggy code seems to be in cgilib/cgi.c on line 644 when they attempt to do:

644: cgiDebugOutput (1, "%s: %s", result[i]->name, result[i]->value);

It looks to me like they used the incorrect index into the result array. There is another index counter called k used earlier in the code that accounts for duplicate variable name. My guess is that this line was simply copy and pasted from line 630 and the developers did not change ‘i’ to ‘k’. Either way, I am not sure if a web server would ever generate input to a CGI program like this, and unless we can somehow allocate the NULL address space on the remote server, this is not likely to be an interesting crash when solving the CTF problem. Interesting, but ultimately useless.

Back to our problem. The photo variable is NULL. Looking back in cgi.c source code for cgiGetFile() it is easy to spot that this information comes from s_cgi->files. Ok, that makes sense. However, the only code path that sets this information is when we have a CONTENT_TYPE of “multipart/form-data”. This was discovered with a quick grep for “->files” in the cgilib source code to find something that writes to this variable. The one place this happens is in the cgiReadMultipart() function. Let’s jump into feeding this program multipart data.

I used Wireshark to perform a packet capture on the data that was being sent by my browser when submitting a form to nonameyet.cgi. After all, the browser should already generate everything we need. With a quick copy and paste and setting up lines to end with \r\n instead of \n I now have the following setup to get multipart data parsed by the CGI program:

Remember each line ends with \r\n. After I set up the formdata file and my environment variable, let’s see if we can get past that error 902 output. I will also turn on the debug output with the debugger after breaking on main():

That looks pretty good! In truth, it took a bit of playing around to get to this point. Now we have everything specified in our form being read. The file contents were written and parsed and if we look in the /photos directory we see a file named base with the contents test:

$ ls photos/
base
$ cat photos/base
test

Where is the bug? If you look back up in the main() function you will see a subroutine I labeled “interesting”. The only way to get to this function is to have a valid photo returned from cgiGetFile(). Here is the decompiled source code for the interesting function:

There are a few things that jump out at me right away. The first is the use of the alloca() function. The man page for alloca states “The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller.” Thus, we are dynamically growing the stack based upon file_name_size. This function call ends up being just a “sub esp” instruction in the assembly code, so don’t expect to see an import to alloca in the ELF header.

The next thing I notice are the case statements looking for 4 character string patterns of: Genr, Time, Date, PixY, and PixX. IDA shows these in little endian (backwards) format. The program checks for % characters in the filename input that are followed by another % character 4 character later. Thus, we are looking for DOS style variables like %Genr%. It turns out all of these variables are passed in as the third argument to the interesting function.

They are built into a structure that is 0x30 bytes long. First the sizes are built with calls to v3 = cgiGetValue((int)s_cgi, “base”); and the like. Then the strings for the variables are built immediately before the sizes. The IDA decompilation of the main function does not identify this as a structure. However, the memset(&v3, 0, 0x30u); and the fact that only v3 is passed into a function that clearly needs all of these variables is a big clue that this is a structure, or an array of structures, instead of 12 individual variables. The v3 variable in main() (or a1 in interesting()) ends up looking like this:

Have you spotted the bug yet? If not, go back to what is happening with our input in the interesting function. We pass this structure into our function, alloca (file_name_size * 2) and then what? We start copying into this array. It’s the qmemcpy calls that are in question here. These are presented in assembly as rep movsb instructions. Ask yourself how much data is being copied and what is the size of the destination buffer? Do you control the data being copied into the buffer? What variables are being updated in the loops to affect the starting offsets of the copy? Study the code and see if you can answer some of these questions. Do it now, I will wait.

Vulnerability Discovery

What you might notice is that after the program takes the length of file_name, doubles it, and allocates that amount of space on the stack, it will then proceed to copy in the values for the other variables from the structure. For example, if I set the filename “foobar” (name=”photo”; filename=”foobar” in my formdata file) and then if I set the Time input to be AAAAAA the CGI program will allocate 14 bytes on the stack (length of “foobar\0” * 2) and then copy in the value of the %Time% variable, which would also be 6 bytes. This will be clearer when looking at the actual input file.

The bug comes in if we make the length of Time larger than the length of file_name while having file_name reference %Time%. There is no check to see if we have enough stack space left. This is a stack overflow. The only issue is that if we try to encode a %Time% variable directly into the file_name then the program never gets to the interesting function! For clarity, this is what the formdata file looks like now for testing:

The %Time% bit does not parse correctly and we miss the check for % in the filename. This is because the variables are being URL decoded. If I set it to %25Time%25 it will decode properly as %Time% (0x25 = ‘%’). The other problem I ran into with this input is that although the %Time% variable is case sensitive when the time pointers and sizes are actually set in the structure it is looked up with lower case only. So, name=”time” and filename=”%25Time%25” will produce the following crash:

Huzzah! We’ve exercised the stack overflow and crashed on a memcpy(). If we can get the function to return we will have control over EIP. We are actually really close to the function return at this point as well. The ret instruction is at 0x0804CFB0, just a short 14 bytes away.

Let’s see if we can get around this crash. The rep movs instruction will move ECX number of bytes from the pointer in ESI to the pointer in EDI. Here, ECX is set to 0x41414141. Clearly we overwrote the size used in this copy. We could look at the stack frame and do the math with the allocas to figure out exactly which offset the counter is coming from, but it is faster to just put in a string pattern in the time variable.

We can go back to our input file and replace the “FFGG” with NULLS so that no copy is executed. My first attempt was to inject raw NULL bytes into this file. I ran the following python script to get the job done. It’s not pretty, but it worked. I could also have used vi with %!xxd and %!xxd –r or any other hex editor to makes these changes.

While the python script properly modified the file, this technique did not work. ECX, instead of NULL, was set to 0x2d2d2d2d or “—-“. This value is coming from our boundary on the multipart data. I assumed that because we used NULL bytes that they must be causing early termination of string parsing routines. What if we URL encode the NULL bytes?

Setting the time variable to “AAAABBBBCCCCDDDDEEEEFF%00%00%00%00GGHHHHIIII” and debugging once again yields:

Well that was a step in the wrong direction! There is no crash now. We are seeing the ERROR: 906 coming back, which is what happens when the photo file being uploaded fails to open. The cookie coming back to us in the HTTP header is the name of this file. The base64 decoding of “AA==“ is 0x00, so it is understandable that that file did not open. I think we are running into similar issues with the string parsing again. This is as far as I got during the actual CTF.

It was not until afterwards that it was pointed out to me that we can double URL encode the NULL values. If URL encoding once makes 0x00 = %00 then URL encoding twice will be 0x00 = %00 = %25%30%30. With my formdata file now looking like this:

Awesome, we got past the rep movs with a NULL ECX and we are now 9 bytes away. The crash is now on the instruction 0x804cfa7: mov DWORD PTR [ebx],eax where EBX is 0x4f4f4e4e. We are writing EAX to this pointer. We can set this to be anywhere in memory that is writeable to avoid this crash. At the offset for “NNOO” let’s put in 0x0804F0EC, which is just past the end of the .BSS section. That address is mapped into our memory space and will be NULL and unused throughout the program. We will need to little endian encode and URL encode this pointer resulting in: %EC%F0%04%08.

Excellent! EIP: 0x4e4e4d4d. I can now control the next instruction that this program executes. Our goal is to send EIP back to a buffer that we control. Let’s find everywhere in memory that our input string exists:

I have three choices for direct execution: heap, mapped, or stack. All of the sections are executable. If I run the binary again and do the same search we can determine if any of these sections are affected by ASLR. They all looked stable between runs to me. Remember this for later.

My preference is to use the mapped section because it looks like it has a complete copy of the data exactly as I sent it in. Other options here are to look for more input vectors, specifically cookies and other variables. Let’s use python again to set the “AAAA” in our input to \xcc\xcc\xcc\xcc so that we might trigger an int 3 debugging break point.

Next, let’s overwrite the “MMNN” offset that was in EIP with the little endian URL encoded address of %08%a1%fd%b7 (0xb7fda108) that should point directly to the start of our data (int 3) in the mapped section. If all goes well, we should expect to see a SIGTRAP.

Great! We have arbitrary code execution now. Unfortunately, the start of our string only affords us 22 bytes of execution before we run into the NULL encoded ECX register from earlier. We now have two options. The first is to make the filename larger than %25Time%25 so that more stack is allocated and our offsets are further into the file. The second option I see is to encode a short relative jump instruction in place of the int 3. Because we are doing this from a flat file and not an exploit script it would be very easy to lose track of shifting offsets, so I opted for the second option.

Currently, the start of the “OOPP” that ends our input string is 105 bytes away. I can encode a jump as %eb%67 to jump +105 bytes forward and land right on my data. After a bit of trial and error building the input file I was able to line everything up just right and gain code execution when running in gdb. I simply replaced the “OOPPPP” with my shellcode to open /home/nonameyet/flag, read it to the stack, and write it to stdout. Note that this shellcode would not trigger the .bash_alias backdoor from earlier!

However, when I run it outside of the debugger I get a segmentation fault. This is a common annoyance when writing exploits. Things can change when they are being debugged. I ran a strace command to see if any of the shellcode was making system calls:

Nope, I never got execution. With si_addr=0xb7fda108 I am at least still jumping to the correct spot. What I notice is that the mmap call is returning 0xb76ff000. This is not what I was seeing as a consistent address for my data on the 0xb7fda000 page. So, that address does not exist and we need to go back to pick one of the other two points where we have code. Let’s pick the heap this time with an address of 0x080501b8 as our new EIP.

After modifying the formdata file again and setting a break point on the return from the interesting function, it looks like the heap address has moved as well. It is now at [heap] : 0x8050318 (“AAAABBBBCCCCDDDDEEEEFF”). I suggest that this changed because our input lengths have changed since I last looked. I’ve added shellcode now after all.

The base address for the heap is still in the same spot: 0x08050000. It is just the offset within that page that has shifted. Let’s put in the new address for EIP and try our luck with yet another run. The other thing that is different about this heap location is that all of our data has already been URL decoded. Thus, we will need to URL encode all of our binary values. This includes the shellcode.

This time it hits a SIGTRAP again and we can redo our relative short jump calculation jump to land on arbitrary shellcode. We are executing at 0x08050318 and we need to jump to 0x08050352, or 58 bytes away, which means we should us an opcode of “\xeb\x38”. Setting this at the start jumps perfectly to our shellcode, which now executes just fine in the debugger. Again.

But, once again, running without the debugger attached produces a crash! It appears that the heap moves as well. This makes logical sense. If the mmap call is moving and the heap is allocated in a similar way, then they both should move with ASLR. We could try the stack location by building in a large NOP (\x90) sled before our shellcode and go about guessing stack addresses despite ASLR, brute forcing the return address used for EIP. I’ve shamefully used this technique in past CTF events with success.

The whole problem here, and the reason I’ve failed to exploit twice, is that GDB has disabled ASLR. Remember when I checked it earlier? I could have saved myself a lot of time if I had realized this back then. While having your debugger turn off ASLR makes debugging easier, it leads to false hope. Let this be a lesson to always run the set disable-randomization off command in GDB when starting exploit development on a binary. I believe this default ASLR disabled state is actually coming from the PEDA GDB init script I am using. I have another idea that should work with ASLR.

Remember that data structure passed into the interesting function? Well, there is no reason why we only have to fill out the “filename” and “time” variables. If we set the “date” variable there will be a pointer to the date value on the stack. We can put our shellcode in there and use a technique called a return sled to get down the stack.

Here is a short debugging session showing the stack at the beginning of the interesting function:

If we replace our original return address with 0x08048945 (the address of a ret instruction) and then immediately following this address place the same address again, the program will return twice and the stack will be incremented by 8. We can do this all the way down the stack until we reach our pointer to the date variable. A little math tells us (0xbffff674 - 0xbffff63c) / 4 that we need to put the pointer to the ret instruction on the stack 14 times to reach the pointer to our shellcode.

One problem. When I go to edit the time variable I see that we have &l;teip> then <bss> address in the time variable. This was required to survive the write from earlier. I will not be able to write to the text section of the binary so I cannot use this address for the ret sled. Because the number of addresses is even, I can point the return sled to a pop ret gadget and have a pop/ret sled instead. There is a pop just one byte before the previous ret address at 0x08048944. I will still need to put this address in 14 times but every other instance will not be executed.

My first attempt at this failed as well! When I looked at the stack, the pointer for the “date” variable was not where it should be. The length was correct but we were returning into NULLs. Looking a little closer, I noticed that the pointer for this variable ended with 0x00. Of course, the time variable was null terminating on the stack. My length was off by one. Since I am already doing a pop/ret sled the pointer immediately before the date pointer is not executed. It could really be anything. I made the time variable one byte shorter and FINALLY gained code execution outside of a debugger. Here is the completed formdata file and execution printing out /home/nonameyet/flag:

The only thing left to do is to send this over a socket to the web server so that I can pull back the flag on the remote system. Too bad the service was offline by the time I completed the challenge.

There are other ways to go about landing this stack overflow. Another public write up is here (in Chinese). It looks like this team made a ROP chain to getenv() that would read in cookie data. The same stack overflow bug was used.

Big thanks to HJ and Legit BS for a fun CTF problem. I spent way too much time playing with it. If you enjoyed this walk through or have questions or comments, you are welcome to email me: svittitoe at endgame.com.