Format String Bugs

By now, its time for a break from buffer overflows. In this tutorial, we'll be discussing a different sort of vulnerability called format string bugs.
A format string is an array of characters (buffer) used internally by a program to format information. For example, if you were writing a spreadsheet-style
program, it would be nice to format the output in clean rows and columns. Format strings and their associated functions in stdio.h allow programmers
to do this sort of thing in comfort. However, if the coder makes a mistake and allows that format string to fall under user (and thus attacker) control, very
bad things can happen.

Function Arguments

As usual we'll start with a little theory and put it to use later. You should have a pretty good grasp on how stack frames and function calls work
by now. If not, then have another look at the other tutorials and stare at those pretty diagrams a bit more. There is an element of those stack
frames that I haven't talked about much: arguments. Arguments are addresses and values that are fed to a function when it is called. The easiest way
to think about these is that they are pushed onto the top of the stack in right-to-left order (when looking at the call in C). This happens right
before the function is called. Once the function is given control of the program's execution flow, it pops these values/addresses off the
stack and uses them for whatever purpose was intended.

The truth about how this works is just slightly more complicated. Lets say funcA calls funcB. When the code compiles, the compiler examines
all of the function calls made in funcA and notes the call with the most arguments. This is used to determine where the top of the stack should
be when funcA is called in the first place. By doing this, it makes room for any function call arguments without them having to push
/pop and move the ESP(pointer to top of the stack). Instead of pushing arguments, the values/addresses are simply moved to locations
on the stack in reference to ESP. My guess is that doing it this way is less computationally expensive than actually pushing/popping everything
to facilitate function calls. Also, be aware that there will likely be some padding between this argument space and the function's local variables.

The Target

Today's vulnerable program is pretty similar to the one in the first tutorial. First, it generates a random number between 0 and 99 inclusive called
magicNumber. Next, it asks the user for their name and says hello. Finally, it asks for an "access code", converts what the user types into
an integer, and stores it in the variable userCode. At the end, it compares the user supplied access code with the random number to see if
they match. An attacker has a 1 in 100 chance of guessing the random number and winning; lets try to do better.

You also may have noticed that this program is a little beefed up in terms of security. The order in which the local variables are declared means
that we can't use that nice buffer to overflow and change magicNumber. Also, the keyboard input function is different: fgets asks the
user for input but checks the bounds of it. So that means, we won't be able to overflow anything here. We'll have to use a different trick to
break this app.

Format Strings

Before we go hacking this code up, we have to go over what format strings are. From now on, I'll be discussing them as they relate to the
printf function. Be aware that other function use format strings, but printf is a very common context for them. Printf takes at least one
argument: a pointer to the format string. It then uses this string to determine if it needs more arguments. The printf function walks through the
string one character at a time and outputs what it finds to screen. For example, if the string is "Hello", then it will write that the screen.
There are some special characters that make the format string what it is. Whenever it encounters a "%" (percent) symbol, it will the following character
non-literally. There are several of these formatting characters and each one performs a different function.

For example, "%s" tells printf to pop another argument off the stack as a pointer and print the string found at that address. If it encounters "%i" in the
format string, printf will pop an integer off the stack and print it to the screen. Other characters like "%d", "%n", %x", etc.. are used for popping different
kinds of values and addresses off the stack and displaying them to the user in various ways. This function is very handy for debugging and showing the values
stored in variables as well. So if we declare the local variable test as an integer and fill it with the value 42 and call printf("Hello %i", test); the
program will proudly display Hello 42.

An Easy Mistake

Remember that as long as there are no formatting characters, printf will just literally print the characters in the format string. This means that if
the programmer filled a buffer called testStr with Hello World and then called printf(testStr); it would print Hello World
as expected. But just because it works, that doesn't mean its secure. What if the user were asked for keyboard input that was used to fill testStr?
Because testStr is being used by the printf function as a format string, the user could insert his own formatting characters with malicious intent.

What if the user typed "Hello %i" when asked for input used to fill the testStr buffer? There are no more arguments left in the printf function call.
The printf function will just pop an integer value off the stack and display it! This process can be repeated by adding more "%i" characters to the format
string. This means that an attacker can keep reading values on the stack until he runs out of buffer space for his malicious format string. Remember that the
stack frame for a function that calls printf lies directly beyond printf's arguments (and associated padding). This makes a function that calls printf using
a format string under attacker control is particularly vulnerable to having its local variables accessed by that attacker.

Guessing the Access Code

This will cause the program to say Hello followed by AAAA, of course. Next, the printf function will encounter all those "%08x" formatting
characters. There aren't any arguments left in the printf argument list, so it has no choice but just keeping popping values off the stack anyway.
Each time, it pops off 4 bytes and displays them as a hexadecimal number. By looking at the program source, you can probably guess that the magicNumber
lies right between userCode(value: 0xBBBBBBBB) and localStr(the above buffer) on the stack. So check out the output you generated with
that format string and look for those values. Toward the end of the output, you'll see something like bbbbbbbb 0000000c 41414141. You might know
that 0x41414141 is the hexadecimal representation of "AAAA". This combined with the presence of those 0xBB bytes makes it likely that our random number
this time around is 0x0C. So convert it to decimal(12) and try it as the access code! It was correct and the program tells us that we win. There are
even more powerful ways to use format strings, but being able read values from the stack without overflowing anything is a pretty good addition to our
toolbox for now.