How Shellcodes Work

It's not an easy task to find a vulnerable service and find an exploit for it. It's also not easy to defend against users who might want to exploit your system, if you are a system administrator. However, writing an exploit by yourself, to convert a news line from bug tracker into a working lockpick, is much more difficult. This article is not a guide on writing exploits, nor an overview of popular vulnerabilities. This is a step-by-step guide on developing a shellcode, a crucial point of any exploit software. Hopefully, learning how they work will help conscientious and respectable developers and system administrators to understand how malefactors think and to defend their systems against them.

How an Exploit Works

Take any exploit downloaded from the internet that promises you an easy root shell on a remote machine, and examine its source code. Find the most unintelligible piece of the code; it will be there, for sure. Most probably, you will find a several lines of strange and unrelated symbols; something like this:

This is shellcode, also sometimes referred to as "bytecode." Its content is not a magic word or random symbols. This is a set of low-level machine commands, the same as are in an executable file. This example shellcode opens port 4444 on a local Linux box and ties a Bourne shell to it with root privileges. With a shellcode, you can also reboot a system, send a file to an email, etc. The main task for an exploit program is therefore to make this shellcode work.

Take, for example, a widely known error-buffer overflow. Developers often check data that has been received as input for functions. A simple example: the developer creates a dynamic array, allocates for it 100 bytes, and does not control the real number of elements. All elements that are out of the bounds of this array will be put into a stack, and a so-called buffer overflow will occur. An exploit's task is to overflow a buffer and, after that, change the return address of system execution to the address of the shellcode. If a shellcode can get control, it will be executed. It's pretty simple.

As I already said, this article is not a guide for writing exploits. There are many repositories with existing shellcodes (shellcode.org, Metasploit); however, it is not always enough. A shellcode is a low-level sequence of machine commands closely tied to a dedicated processor architecture and operating system. This is why understanding how it works can help prevent intrusions into your environment.

What Is It For?

To follow along, I expect you to have at least minimal assembly knowledge. As a platform for experiments, I chose Linux with a 32-bit x86 processor. Most exploits are intended for Unix services; therefore, they are of most interest. You need several additional tools: Netwide Assembler (nasm), ndisasm, and hexdump. Most Linux distributions include these by default.

The Process of Building

Shellcode stubs are usually written in assembler; however, it is easier to explain how one works by building it in C and then rewriting the same code in assembly. This is C code for appending a user into /etc/passwd:

The setreuid(0,0) call attempts to obtain root privileges (if it is possible). execve(const char filename,const char[] argv, const char[] envp) is a main system call that executes any binary file or script. It has three parameters: filename is a full path to an executable file, argv[] is an array of arguments, and envp[] is an array of strings in the format key=value. Both arrays must end with a NULL element.

Now consider how to rewrite the C code given in the first example in assembly. x86 assembly executes system calls with help of a special system interrupt that reads the number of the function from the EAX register and then executes the corresponding function. The function codes are in the file /usr/include/asm/unistd.h. For example, a line in this file, #define __NR_ open 5, means that the function open() has the identification number 5. In a similar way, you can find all other function codes: exit() is 1, close() is 6, setreuid() is 70, and execve() is 11. This knowledge is enough to write a simple working application. The /etc/passwd amendment application code in assembly is:

It's a well-known fact that an assembly program consists of three segments: the data segment, which contains variables; the code segment containing code instructions; and a stack segment, which provides a special memory area for storing data. This example uses only data and code segments. The operators section .data and section .text mark their beginnings. A data segment contains the declaration of two char variables: name and line, consisting of a set of bytes (see the db mark in the definition).

The code segment starts from a declaration of an entry point, global _start. This tells the system that the application code starts at the _start label.

The next steps are easy; to call open(), set the EAX register to the appropriate function code: 5. After that, pass parameters for the function. The most simple way of passing parameters is to use the registers EBX, ECX, and EDX. EBX gets the first function parameter, the address of the beginning of the filename string variable, which contains a full path to a file and a finishing zero char (most system functions operating with strings demand a trailing null). The ECX register gets the second parameter, giving information about file open mode (a constant O_WRONLY|O_APPEND in a numeric format). With all of the parameters set, the code calls interrupt 0x80. It will read the function code from EAX and calls an appropriate function. After completing the call, the application will continue, calling write(), close(), and exit() in exactly the same way.

Running a Root Bourne Shell from Shellcode

That was fun. Now it's time to translate the second program into assembly; one that executes setreuid() and execve() to run a root shell:

Most of this code is similar to the previous example except for the execve() function call. The same program segments are there, and the same execution method works for setreuid(). The second parameter of execve() is an array of two elements. It is reasonable to pass this through the stack, which first needs a zero value (push 0), and then an address for the variable name (push name). This is a stack, so remember to push parameters in reverse order--LIFO, or "last in, first out." When the system call pulls its parameters out, the first will be the name variable address, and then a zero value. A function must also know where to find its parameters. For that, this code uses the enhanced stack pointer (ESP) register, which always points to the top of the stack. The only other work is to copy the contents of the ESP register to ECX, which will be used as a second parameter when calling the 0x80 interrupt.

Eliminating Data Segments

That assembly code works completely. However, it is useless. You can compile it with nasm, execute it, and view the binary file in hex form with hexdump, which is itself a shellcode. The problem is that both programs use their own data segments, which means that they cannot execute inside another application. This means in chain that an exploit will not be able to inject the required code into the stack and execute it.

The next step is to get rid of the data segment. There exists a special technique of moving a data segment into a code segment by using the jmp and call assembly instructions. Both instructions make a jump to a specified place in the code, but the call operation also puts a return address onto the stack. This is necessary for returning to the same place after the called function successfully executes to continue the program's execution. Consider the code:

jmp two
one:
pop ebx
[application code]
two:
call one
db 'string'

At the beginning, the program execution jumps to a two label, attached to a call to the procedure one. There is no such procedure, in fact; however, there is another label with this name, which obtains control. At the moment of this call, the stack receives a return address: the address of the next instruction after call. In this code, the address is that of a byte string: db 'string'. This means that when the instructions located after one label execute, the stack already contains the address of a string. The only thing left to do is to retrieve this string and use it appropriately. Here's that trick in a modified version of the second example, named shell.asm:

As you can see, there are no more segments at all now. The string /bin/sh, which was previously in a data segment, now comes off of the stack and goes into the EBX register. (The code also has a new directive, BITS 32, which enables 32-bit processor optimization.)

Works Now

Compile the program with nasm:

$ nasm shell.asm

And dump its code with hexdump:

$ hexdump -C shell

Figure 1 shows a typical shellcode. The next step is to convert it into a better format by preceding each byte with \x, and then putting all of the code into a byte array. Now check that it works:

Not Yet Working: Eliminating NULL Bytes

Now the shellcode does not use the data segment and even works inside of a C tester program, but it still will not work inside a real exploit. The reason are the numerous NULL bytes (\x00). Most buffer overflow errors are related to C stdlib string functions: strcpy(), sprintf(), strcat(), and so on. All of these functions use the NULL symbol to indicate the end of a string. Therefore, a function will not read shellcode after the first occurring NULL byte.

Thus, the next task is to get rid of all null bytes in the shellcode. The idea is simple: find pieces of code that cause null bytes to appear and change them. A mature developer, in most cases, can say why machine code contains zeroes, but it's easy to use a disassembler to identify such instructions:

Executing this command will give the disassembled code of a program. It will contain three columns. The first column contains the instruction's address in hexadecimal form. It is not very important. The second column contains machine instructions, the same as shown with hexdump. The third column contains an assembly equivalent. This column will give you an idea which instructions contain null bytes in a shellcode.

After a brief review of a dump contents, it becomes evident that most null bytes come from instructions that manage the contents of registers and the stack. This is no surprise; this code works in a 32-bit mode, so the computer allocates a four-byte memory space for each numeric value. Yet the code uses only values for which one byte is enough. For example, the beginning of the program has the instruction mov eax, 70 to put the value 70 into the EAX register. In the shellcode, this instruction looks like B8 46 00 00 00. B8 is the machine code of the instruction mov ax, and 46 00 00 00 is the value 70 in hexadecimal notation, padded with zeroes to the size of four bytes. Many null bytes appear for similar reasons.

The solution for this problem is very simple. It's enough to remember that 32-bit registers (EAX, EBX, and other registers whose names begin with "E," for "enhanced") can be represented by 8-bit and 16-bit registers. It's enough to use a 16-bit register AX instead and even its low and high parts, AL and AH, which are one-byte registers. Just replace the instruction mov eax, 70 with mov al, 70 in all such places.

It's important to be sure that the rest of the EAX register space does not contain any garbage; that is, the code must put a zero value into EAX without using any null bytes. The fastest and most effective way of doing this is with the XOR logical function: xor eax,eax will give the EAX register a zero value.

Even after these modifications, the shellcode still contains zero bytes. The debugger shows that now the jmp instruction causes trouble:

E91500 jmp 0x29 0000 add [bx+si],al

The trick is to use a short jump instruction instead of the usual jmp short. In short programs with simple structure these instructions work in absolutely the same way, and the machine code in this case will not contain zero bytes.

You may now think that this shellcode is ideal, but at the end there is still one remaining zero byte. This zero byte occurs because the string bin/sh has a null byte indicating the end of the string. This is a definite requirement, because otherwise execve() will not work properly. You cannot just remove this byte. You can use one more assembler trick: at the compiling and binding stage, store any other symbol instead of zero, and convert it into zero while processing the program:

After compiling this code, you can now see that it no longer contains null bytes. It's worth mentioning that the problem may arise not only because of null bytes, but because of other special symbols; for example, the end-of-line symbols, in some cases.

How It Works in Exploit

A buffer overflow exploit tries to write beyond a buffer on the stack so that when the function returns, it will jump to some code that most often starts a shell instead of returning to the function that called the current function. To understand how it works, you have to know how the stack works and how functions are called in C. The stack starts somewhere in the top of memory and the stack pointer moves down as the program pushes things onto the stack and back up as the code pops them off again. Given the C function:

void sum(int a,int b) {
int c = a + b;
}

The stack inside of sum() will look like this:

b
a
<return address>
<ebp contents>
c

The computer saves the contents of the EBP register to a stack before calling the sum() function because it will be used inside of the function, so it can be restored from the stack after returning from the function. The goal of an exploit is to change the return address. This is not possible in this case, because no matter what a and b are, the result cannot overflow c into the EBP contents on the stack and the return address. If c were a string instead, it might be possible to write past it. Here is an overflow-exploitable program:

The function copy_string makes a copy of the first command-line parameter of the program into a buffer of a fixed size and then prints it out. This might look stupid, but something like this is quite common for programs that need to perform actions based on external input, either from the command line or a socket connection.

The Perl script above generates a string of 2000 a symbols. Now run the core file through gdb:

% gdb ./overflow core
GNU gdb 2002-04-01-cvs
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-linux".
Core was generated by `aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
#0 0x61616161 in ?? ()

The segmentation fault happened at the address 0x61616161--which is the string aaaa in hexidecimal. This means that the exploit can get the program to jump to an arbitrary address depending on what it receives as a parameter. It would be nice to make it jump to the beginning of the local buffer on the stack--but what is the address of the stack right now? gdb knows:

(gdb) info register esp
esp 0xbffff334 0xbffff334

Now, the only other thing necessary to get the code to execute is the previously written shellcode. You can take the ready shell app and run an overflow victim program from it:

The shellcode[] symbol array contains the shellcode without any null bytes. It may differ slightly, depending on OS conditions. The main() function starts with a buffer that is the size of the local variable (1024 bytes) plus eight bytes for EBP and the return address. As the buffer is longer than the shellcode, the beginning needs a bunch of do-nothing machine code (NOP) operations. Then the function copies in the shellcode, and finally, the address of the beginning of the buffer. Now compile and run it:

% gcc -o exploit exploit.c
% ./exploit
string is <lots of garbage>

Yahoo! A new Bourne shell opened! This is, of course, not much fun as the overflow program runs as yourself, but if it were a SUID root program, then you would now have a root shell. Try that:

That's it! You became a root user on this machine without permission. If the victim machine is a remote one, this will not help. More advanced shellcode creates a listening socket and redirects stdin and stdout to it before calling execve /bin/sh--that way, you don't need a shell account on the machine and can simply direct telnet or nc at the machine and port to get a root shell.

Conclusion

In this article, I have reviewed the most important tricks that will be needed in writing shellcodes and using them in exploit. The key to success is a good understanding of the operating system under which the shellcode will run, as well as assembly programming. There is nothing complicated, though. It's also worth mentioning that you should only use these mentioned techniques for legal purposes and with the knowledge and consent of the machine's owner.