Sunday, August 26, 2012

Gera's Warming Up on Stack #1 - Solutions

Following is the part 1 in a series of posts that aim to provide an analysis and possible solutions for the vulnerable programs provided by Gera at his Insecure Programming by example page.

Familiarity with exploit mitigation techniques is expected to gain a proper
understanding of the concepts we talk about here. If terms like ASLR,
NX, SSP, RELRO, etc. seem unfamiliar, I would suggest reading an earlier post that provides details on these.

We will start with a scenario in which most of the exploit mitigation techniques will be turned on through the default GCC compilation command-line. Since,
these techniques prevent successful exploit attempts, we will incrementally turn
them off until successful exploitation is achieved. This setup will
allow us to witness how these individual techniques succeed in
restricting exploit attempts and how their absence affects exploitation
reliability.

The above program accepts user-input through the gets function and then looks for a specific value in a local variable named cookie. If this value is equal to a certain pre-defined constant, a printf function is used to show a "you win!\n" message to the user. There is no direct means of modifying the content of the cookie variable. The gets function will
keep reading from the stdin device until it encounters a newline or EoF
character. Since this reading loop fails to honor the size of the
destination buffer, a classic buffer overflow vulnerability is
introduced in the program. Our aim is to leverage this vulnerability and
exploit this program so that it print the "you win!\n" message to stdout.

Here are a few observations that could be made by looking at the source of the program:

Since it is defined prior to buf, the cookie would be placed at a higher memory address on the program stack, just below the saved registers from the function prologue

The buf character array would be at an offset of at least 80B from cookie

The gets call would accept unbounded user-input within buf array and hence it provides a mechanism to alter the call stack contents

To attempt exploitation, proper understanding of a program's memory layout and the positioning of its metadata is very important. We first need to understand the call stack for the stack1 program.

Whenever a function is called, based upon the calling convention
in use, metadata information will be pushed on to stack. Upon function
termination this information is popped out of stack. The order in which
variables are pushed and popped is of importance here. On Linux/GCC
environments which use the cdecl
calling convention, the caller first pushes any function arguments from
right to left in to its stack frame. Then the return address is pushed
and finally the control is transferred to callee's .text segment. The callee, when initiated, will execute the function prologue to
set up its stack frame. As a part of prologue, the EBP value is pushed
on to the stack. Since this is the first operation on the stack after
the return address push operation, the EIP and saved EBP end up at
adjacent locations. These two values mark boundary for the caller's and
callee's stack frames. The location of EIP marks the top of caller's
stack frame and the location of saved EBP marks the base of callee's
stack frame.

Refer the below stack layouts for better understanding. The first layout outlines the call stack for the caller main():

The second layout outlines the call stack for the callee add():

While
control is in the callee function, the passed arguments are accessed by
using EBP as a pointer. According to the calling convention, the first
parameter is located at an offset of EBP+8, the second parameter is
located at an offset of EBP+12, and so on. Using this formula we can
locate function arguments (EBP+8 in the above layout is 32+8 = 40 which stores the first
argument 3 and similarly EBP+12 is 32+12 = 44 which stores the second
argument 6). Since the above described call stack layout will be used for all programs, we could generalize the above formula and use it to find the
offset of EBP itself and then the offset of EIP (EBP+4). The address of EBP is located by summing up the address of the first local variable on the stack with its size. Similarly EIP could be located by adding 4 to the address of the EBP.

Based on these observations, let's try to visualize the call stack layout for the stack1 program:

NOTE: The stack is assumed to be 4B aligned and we are working on an x86 machine. The addresses in the layouts are for reference only.

While thinking about possible
solutions for this program, I came up with the below listed ideas:

Solution #1: Overflow the 4B past buf, where the cookie is stored, with the desired value (0x41424344 in this case)

Solution #2: Overwrite EIP with the address of the printf statement that prints the "you win!\n" message

Solution #3: Inject and execute a shellcode that simulates the second printf statement, through the internal buf character array

Solution #4: Inject and execute a shellcode, that simulates the second printf statement, through an environment variable

All right! Let's start with the test execution of this program. Here is a brief description of the test system:

Below is the GCC command-line to compile the stack1.c source file. The -mpreferred-stack-boundary=2 option is used to align stack entries at DWORD (4B) boundary:

GCC outlines a few warnings with the above code, out of which, the last one suggests to find an alternative for thegets, since it is a "dangerous" function. We are in the process of figuring out just how dangerous gets can be and hence we can safely ignore this and earlier warnings for now.

Lets have a peak into the assembly code of the stack1 ELF binary. Below command-line uses the objdump utility to dump the disassembled object code of a program in Intel syntax (remove the -Mintel option from the below command-line to view assembled code in the default AT&T syntax):

Variable reordering feature of SSP is also in place since for the initial printf call, the first variable to be pushed on to stack is &cookie instead of &buf (refer cdecl calling convention). This is concluded from the addresses used to move arguments onto stack. The &cookie is accessed from the location [ebp-0x58] and &buf from [ebp-0x54]. As such, cookie is placed at a distance of 88B from EBP and buf is located right above it at a distance of 84B from EBP. The additional 4B are from the canary placed just below EBP.

The code to verify the content of canary, before returning control to the parent process, is also added and can be found at address 0x080484f0.
If this check fails, the __stack_chk_fail function is called to abort the execution of this program.

NOTE: These SSP feature is enabled by default and
hence it was introduced automatically through the vanilla command-line
we used to compile stack1.c above. It is, however, suggested to use explicit command-line arguments without considering their default status when compiling your source files.

You must have already guessed that the call stack layout we saw earlier is no longer in sync with the compiled binary. We need to recreate it considering the above discussed modifications:

The default GCC command-line might have turned on other mitigation features as well. We need to investigate further before proceeding.

Tobias Klein, the author of A Bug Hunter's Diary, maintains an awesome Bash script called checksec.sh that provides an overview of the security features implemented within
the Linux kernel, ELF binaries and executing processes on a system. Here is a listing of its available options:

Obtain the latest version of this script (1.5 as of this writing). Let's try the --kernel option to see available mitigation features implemented within the kernel itself:

The
output above confirms that the GCC stack protector support is enabled and we have already seen it in action earlier. Let's now see what does this script has to say about the stack1 ELF binary:

As discussed earlier, the default compilation command-line enabled quite a few mitigation features like Partial RELRO, stack canary, NX and a few others. These features have made significant modifications to the vulnerable program and their presence will prohibit its successful exploitation. From the above output, also note that the printf and gets functions have not been replaced with their safer counterparts. This should have happened through the default command-line. But since the program source did not include the necessary standard libraries for these functions, the FORTIFY_SOURCE mitigation feature failed to detect their presence and as such could not replace them. If you recompile the source with the necessary libraries included, you will encounter the "*** stack smashing detected ***" error message. Still, in the absence of this feature, the ELF binary is quite difficult to exploit.

We need to print the message to successfully exploit this program. But since the cookiehas
been reordered and placed below buf, we simply have no way to modify it. Additionally, any attempts to overwrite the return address would fail since the canary is placed in between. While overwriting EIP, it will also be overwritten and the __stack_chk_fail function would terminate the program before the message is printed:

In the above test run, supplying 81B of input causes the program to crash. Note the addresses of buf and cookie, 0xbf878da4 and 0xbf878da0 respectively. Variable reordering, we talked about earlier, is in effect here. We are experiencing the
influence of exploit mitigation techniques at this stage. For a successful
exploit attempt, we will have to disable these features to be able to achieve exploitation. Let's disable the stack canary mitigation feature first. Below screenshot outlines the GCC option -fno-stack-protector, that disables SSP and as such provides a wide playground for our exploit attempts. Additionally, we see how the checksec.sh script correctly identifies the absence of stack canary and fortify source mitigation features from the program:

The buf is at 0xbfbed9e4 and the cookie at 0xbfbeda34. The variables have been ordered as per our expectation. Let's have a peek at the program assembly to quickly see if the stack cookie has also been added or not:

From here we could proceed to the exploitation phase.

Solution #1:

For this solution we first need to calculate the offset between buf and cookie:

As expected, it came out to be 80B. We craft a perl command-line to overwrite 80B of data to reach past the buf boundary. Once this is done, we're pointing at the cookie, which can then be overwritten with the desired content:

NOTE: The test system is an x86 Intel machine that uses little-endian byte ordering. We take this into account and reorder individual bytes to set the cookie with appropriate value.

Solution #2:

For the second solution, we need to overwrite EIP with the address of the printf statement that prints the required "you win!\n" message. This will ensure that when the program returns from main(), control transfers to stack1's .text segment again, instead of the __libc_start_main(). But first we need to find the address of the printfstatement in stack1's assembly code:

The last call instruction prepares the stack for a call to puts. That's right, the stack is prepared for puts and not printf. This is due to a default GCC optimization option that finds the second printf call in stack1.c incompatible with its built-in declaration and replaces (optimizes) it with a call to puts. For our exploit attempts, we can safely ignore the implicit differences between functions used here. Since the puts function will do the same thing as printf, we just want its address for proper control transfer. However we need the address of the instruction just above call puts, because it is where the "you win!\n" message is pushed on to stack. From the above output we see that it is 0x08048479.

Now that we have the address to overwrite with, we need the exact offset where we can inject it. For this solution we need to overwrite EIP, whereas in the previous solution, we overwrote cookie, ie. 4B past buf. The size of buf was the offset that we used for junk data to reach cookie.
We concluded this offset using the variable adjacency property. All
local variables are placed adjacent to each other at lower memory
addresses in the order in which they were declared in the source
program. As such we could find out the offset of the EIP as well.

Referring the call stack layout we saw earlier, the offset of EIP can be easily calculated. The buf 80B + cookie 4B + saved Frame Pointer 4B = 88B. This is the offset of EIP from the start of the buf array:

We were able to overwrite EIP and redirect control to a desired location. This action helped us to bypass the if condition without actually modifying the contents of the source program.

Solution #3:

We now move on to the third solution for this program. We have
found that the program has a buffer in which we can inject junk
data and we also have the ability to redirect control to arbitrary
locations. These two possibilities, when combined together, allow us to execute
arbitrary shellcode. We will design a shellcode that simulates the
behavior of the puts
call and inject it within the program buffer. We will then modify the contents
of EIP to point to the buffer where our injected shellcode ends up. If
all goes well, this shellcode will be executed and we will have the
message printed.

There is however one thing we will have to think about before we move ahead. Recall the checksec.sh output above. It tells that one of the mitigation features, NX, is enabled for the vulnerable stack1 program. This means that when we execute this binary, it will have its stack segment marked as non-executable:

From the above output, stack is marked as RW for the vulnerable program. As such, even if we can inject shellcode into buf, we can not execute it. Any attempts to redirect EIP to our shellcode would be successful, however, the instant we try to execute shellcode, an exception would be raised that will eventually terminate the program. So, we'll have to disable this feature for solutions #3 and #4 to work correctly. But I'm not going to disable them for now. As you'll see, our exploit attempts would still work in the presence of NX and at the end of the post I'll point out the exact reason for such a behavior. Till then read on and try to think about why this might be happening.

First we need to design a shellcode that simulates the puts call. I came up with the following:

The above code uses the standard Linux system calls, write and exit, to print the message and cleanly terminate the program. Using the exit
call will help to remove the segmentation fault we encountered in the
previous solution, thus making our exploit much reliable. Additionally,
we use a few shellcode writing tricks
to remove NULL bytes from our shellcode, to reduce the shellcode size,
and to overcome the addressing problem. Assemble and link the program to
create a standalone binary:

Here is the objdump for the resultant printf program:

Extract opcodes to create the required shellcode and calculate its size:

Now we are ready with the shellcode that simulates the puts
call. Once we inject it, we would need the address of the buffer where this shellcode lands. Looking at the source and through the earlier test
executions of the stack1 program, you already know that it prints out the address of the buf and the cookie variables. But we cannot just use the address from an earlier execution for our exploit. Why is this so? If you had noticed earlier, both buf and cookie, although adjacent and aligned as expected, had different address on each invocation:

You would have already guessed by now. It is due to the ASLR mitigation feature that is active
on the test system:

On systems that support brkASLR, the randomize_va_space
file stores a value of 2. On other systems it stores a value of 1 by
default to indicate the presence of ASLR. Modifying this file with a
value of 0 will immediately turn off this feature for all newly spawned
processes:

For all the 3 invocations of stack1 program, the locations for buf (0xbffff4c4) and cookie (0xbffff514) remain constant. Since the buf is always placed at a known static address, we could use it for EIP redirection.

Let's proceed to the exploitation phase. Since the shellcode is of 38B and the buf is located at an offset of 88B from the EIP, we have a junk space of 50B. We could use this
space to increase the reliability of our exploit by adding a NOP sled in
front of our shellcode. This although is not required as we are already
aware of the location of our shellcode.
But we still have to fill this space with junk bytes to reach the
offset of EIP. Let's craft a perl command-line to inject our shellcode at the where ths correct address could be overwritten. However,
we were not able to get the shellcode executed:

It did not work. The offset calculation was correct, address for
EIP overwrite also points to our shellcode, and we actually have a
working shellcode that, if executed, should print the winning message.
What could have gone wrong? A GDB analysis could help but this
specific issue could be debugged without using it. Have a look at the
shellcode once again:

The shellcode above is copied into the buf array through the gets function, which parses newline or EoF as input terminating characters. Unfortunately, the shellcode we so carefully
prepared contains a newline as its last byte. This came in
through the "you win!\n"
message and it is indeed the culprit here. The earlier exploit
command-line breaks at the \x0a byte on offset 87, failing to overwrite
further stack locations. The EIP at offset 88 is untouched and we fail
to gain successful exploitation.

We could quickly modify the printf.s
program and generate a new shellcode that has the message with no newline
character. However, a quick hack can be to remove the newline from the exploit
command-line and test it:

It did work! Although a junk byte was appended to the winning message. We are clear with the exploit technique and it is all that
matters. We used the address of buf to jump back to our shellcode and it is one thing which makes our exploit highly unreliable. There are certain techniques through which you can reliably jump to your shellcode without using memory addresses that could possibly differ between different systems. Please refer the Exploit writing tutorial part 2 : Stack Based Overflows – jumping to shellcode post from corelanc0d3r for more details.

For this solution, we
turned off another mitigation feature (ASLR). Even in its presence we were able to gain successful
exploitation (using solutions #1 and #2) but that was because we had alternate tricks. However, those were very specific to the vulnerable
stack1 program. They won't always work, but you now understand that an insight about how things really work, could help designing custom solutions and hacking around any limitations that stop you from gaining successful exploitation. This solution helped us to get an insight into how useful
addressing information could be for an exploit writer and how
successfully the ASLR technique helps to mitigate exploit attempts that
use this information.

Solution #4:

Let's now move to the final solution for the stack1 program. First, let's have a quick review of solution #3. We injected a shellcode that simulated the behavior of printf
statement. We redirected control to our shellcode and achieved exploitation. However, a minor modification was
required to our exploit command-line that changed the look and feel of
our winning message. The newline character caused the gets
copy loop to stop overwriting memory addresses past the terminating character and as such we
had to remove it from our exploit shellcode. Although this issue was
easily resolved though a quick and dirty hack, it might pose significant issues in
real world exploit attempts. Could there be a better/elegant solution
to this problem?

Okay, no guess work required here. There indeed is one such trick that could help us to overcome the newline issue. The shellcode we injected through the buf
array could be stored within an environment variable and then the EIP
could be overwritten with the address of this variable to get successful
execution. But wait! Where did the idea of environment variable come
from? Why are we using it anyways? How exactly does it help to bypass
the newline filter?

There
are a few scenarios in which injecting shellcode through an environment
variable is the only viable option. One such scenario is when you
encounter a buffer that is too small to fit in your desired shellcode.
Since an environment variable could be of arbitrary size, we could
inject a huge shellcode like the one simulating the Meterpreter payload in Metasploit Framework and get it executed on the target system. In our case, we were lucky enough to have a large buf that could completely hold our printf shellcode. Another scenario could be when string termination filters like the newline above is encountered. For the solution #3, we hacked around and
got the message printed, but it obviously won't work in all cases. In
such a scenario, we could inject our shellcode into an environment
variable. Since the shellcode is injected independent of the vulnerable program, it helps to bypass its inherent filters. The only challenging part that is then left out is redirecting control to the location where this shellcode is placed.

One
of the most important reason to use an environment variable to hold
exploit shellcode is its memory placement. These variables are copied
into the stack segment of all processes and as such they provide a means
for code execution for stack-based exploits.

Let's inject the shellcode we prepared earlier into an environment variable, called WINCODE and use its address to overwrite EIP and get code execution. There are a few techniques using which the address of an environment variable can accurately calculated and as such we won't need a NOP sled in front of our shellcode. If you have any queries regarding environment variables based exploitation, please refer 0x331 Using the Environment from Hacking - The Art of Exploitation book:

We successfully redirected EIP to a NOP-less shellcode present within an environment variable. And it did work! However the output is not exactly what we had expected. There's no newline at the end. Here is what hexdump has to say about our exploit:

Although the environment variable has a newline at the end, it is not echoed back when the shellcode executes. I made a small change to the original shellcode to include "\x0a\x0d" characters and used it for testing:

This time just the "\x0a" was echoed back and it, as expected, corrects the exploit output. However, I could not understand this strange behavior. If you have any ideas please get back.

So, we have now successfully exploited the stack1 program through a shellcode injected into an environment variable. Please note that the use of environment variables is only possible for local exploits and as such it is not much used in common exploits that you see in the wild. However, as you have already seen, it is one of the most reliable methods of exploitation.

All these solutions are however not practical. They serve the purpose of understanding how exploits used to work before mitigation features were introduced.