2009-10-21

Assembly bomb lab hints

Rather than working online with gdb, I suggest you use offline tools like objdump to study the bomb offline, unless you "like to live dangerously" (there are way too many Austin Powers jokes in this lab).

To produce disassembly of the bomb, including hidden functions phase_1, phase_2, ..., that are not included in bomb.c,

objdump -d ./bomb

You could use shell redirection to save the output to a file:

objdump -d ./bomb > bomb-asm.txt

Note the output is not quite a valid .s file that can be assembled again.

You can dump the section content like this,

objdump -s ./bomb

and you will see output in six columns. The first column is the address, next four the content of this strip of 16-bytes of memory, and the last column the ascii equivalent, which allows you to see the strings visually. Note: each group of 4 bytes from columns 2 through 5 should be read as an array of bytes as they appear in memory, not as an integer with little or big endian byte order.

is to be loaded at memory address 0x08049790, and is the string "Good work! On to the next..." And this 16-byte strip:

80498e0 598e0408 628e0408 698e0408 728e0408 Y...b...i...r...

is at address 0x080498e0, with four little endian integers 0x08048e59, 0x08048e62, 0x08048e69, and 0x08048e72. This appears to be a jump table.

If you're an enterprising hacker, you might have thought about patching the explode_bomb() function in gdb to just return without doing anything. The "ret" instruction has the opcode 0xc3, so you can do something like this:

The line "set {char}explode_bomb = 0xc3" patches the function to just return without exploding. Using this technique, you can fool the program into thinking that the bomb has been defused! But the catch is, the bomb sends us e-mail about the input string you entered, and the solution is verified independently by the auto-grader. Patching the code, while clever, will unfortunately earn you an "invalid" status.

Function call convention

C pushes arguments on stack in reverse order, so 8(%ebp) is the first argument, 12(%ebp) is the second, etc. This allows the callee to potentially have an indefinite number of arguments, like printf(). See figure on the right.

The stdarg.h header provides a way to iterate over an indefinite number of arguments. See print_ints.c attached. The function va_start() initializes the argument pointer ap variable (its type is va_list) to the ... part, right after the given argument; va_arg() is like *ap++, and va_end() to do clean-up. See man stdarg for more information.

On Linux 32-bit x86, when compiled with -O3, the compiler uses %esi for ap, and "leal 12(%ebp), %esi" to initialize ap. The increment by va_arg() is done by addl $4, %esi. See attached print_ints.s.

Notice how in print_ints(), the compiler passes argument to printf() by reusing existing reserved stack space.

Ackermann's function and recursion

Recursive functions we've seen so far are primitive recursion, and can be written using while loops, e.g. factorial, fibonacci. Recursion, however, is much more powerful than that.

Introducing Ackermann's function, ack(m, n), which can grow faster than exponential rate in terms of n, for m >= 4. See ack.c attached below. For this function, we cannot cheat by implementing a while-loop version!

I'm using gcc 4.4.1 (csa2 has gcc 4.1.2 by default) which demonstrate tail call optimization. Comparing the assembly generated by gcc 4.4.1 -O0 (see ack-O0.s below) and -O3 (see ack-O3.s below), notice that for -O3, there is only one call ack in the body of ack(). That's due to tail call optimization. Basically, the statement "return ack(x, y);" is transformed to setting the arguments to x and y, followed by a goto. This saves the effort to create and destroy a stack frame that is not needed.