Some days ago we were chatting at #dclabs channel about this blog and people asked for some posts about shellcoding…
Well, I’m not an expert in this subject but I’ll try to write all I know and sometimes we’ll be learning together, writing and reading these posts ;).
Nowadays is pretty unusual when we really need to write our own shellcode, we can always go to MetaSploit Framework and grab a fresh one there, well written, reliable and even encoded. Well, where is the fun of it, huh?… I like to know where everything is and why, I promise that on the end of our saga you’ll be able to understand every detail about shellcoding and write your own shellcode, able to exploit a vulnerability that before was not exploitable using metasploit shellcodes.

1. Introduction
First of all, what is ShellCode?– ShellCode is a magic piece of code within our payload, written directly in Opcodes and injected straight into memory of a vulnerable program and, if we’re lucky, it will be executed at the end of our exploitation.
The name “ShellCode” comes from first exploitation techniques when at the end of an exploitation we had a shell on the target, testifying our ability to execute commands on the target’s machine. The time being, this is not always the case, sometimes is better to execute a custom command or even drop some firewall protection on target. So, why do you keep calling it shellcode? – Well, it’s easier to type than “magic piece of code within our payload, written directly in opcodes and injected straight into memory of a vulnerable program” :). NOwadays expressions “payload” and “shellcode” are ALMOST always interchangeable.

2. Bad Chars
No way to talk about shellcoding and ignore bad characters, as the name say “bad chars” is the kind of character (or byte) that CAN’T be part of a given shellcode.
I said “given shellcode” because each vulnerability has its own bad chars but some of them are almost always problematic, as example the NULL BYTE (0x00), the problem with NULL BYTE is that in a majority of systems it means “end of text”, in other words, the function (or interpreter) reading input stops when found this byte. Let’s see an example:

Now we got another issue, the character “0x0a”. It doesn’t means “end of text” as null byte does, but in the way that bash works it believes that “0x0a” is the point to stop reading a value to a variable (“0x0a” is the character inserted when we press “<RETURN>”, it means “Line Feed” or sometimes called “New line” and represented by “\n”).
Below on that output, we can see that “od” is able to read our “0x0a”, so in this case, “0x0a” is a bad char for “read” bash function but isn’t for “od” command.
It can be even more specific, bad chars are fully dependant of what is being done when and where it is being injected.
Another example could be a HTTP proxy, let’s suppose we are trying to inject our shellcode as an URL, so we inject our bytes like this:
URL => “ABCDEFG~^^?:;][1234567890!@#$%¨&*()~^^?:;][{}” (We need some effort here to imagine this bytes as our shellcode, so help me ;))
Now the proxy read our buffer until NULL BYTE and stores it in memory, but just after reading the URL, proxy does some validations on it, to check if it’s a valid URL or not, and instead of to reply to the user with an “Invalid URL” message, it rather to simply remove invalid characters. At the end of validation function, our shellcode becomes:
URL => “ABCDEFG1234567890@#$%&~?:”
Now what? That function totally broke our shellcode…

The search for bad characters can be exhausting, what we generally do is send a buffer with all bytes (from 0x01 until 0xFF) and watch it using a debugger, checking its reading and if it was modified or not. Then we enumerate this bad chars, remove it from buffer and send again… Keep doing it until we’ve enumerated all bad chars. There is situations where all characters must be within ASCII table range (0x01 to 0x7F), or even worse, only UPPERCASE letters…

Let’s begin with some practice.

3. Assembly: First contact
All our texts are going to be written on the following setup:
— Linux x86 (32 bits) -> More precisely a BackTrack 4r2

Well, our idea to this first text is show the concepts and procedure when writing a shellcode, so the shellcode itself isn’t really interesting.
The objective of this first shellcode will be: execute the function “exit(69);”.

Shellcodes should be always written directly in assembly (trust me, sounds more complicated but it is really easier than alternatives), for those who don’t know Linux keeps a list of System Calls in file: /usr/include/asm/unistd.h that actually sends us to another file, according to our architecture, in this case: /usr/include/asm-i486/unistd.h.
Let’s take a look at the beginning of this file:

Using “head” command we could already see our desired syscall to this first shellcode “__NR_exit”, number 1.
Now we need to understand how Linux works with these system calls.

1. Syscall number must be placed at “eax” register.
2. Each syscall can hold up to 6* parameters, that must be placed in registers “ebx”, “ecx”, “edx”, “esi”, “edi” and “ebp” – respectivelly.
3. Now we invoke privileged/protected mode (kernel mode – or – ring zero in Linux) to execute our syscall.* As you can see, we’ve a limit for parameters to a given syscall, but when a syscall need more than 6 parameters we should use “ebx” register to point to a memory area holding an array of all parameters (here the type (size in bytes) of each parameter is used to delimiter each one). Prior to Kernel 2.4, “ebp” couldn’t be used to provide parameter to a syscall, thus the limit was 5.
Interesting manpages:
— man 2 syscalls
— man 2 unimplemented

1. We’re specifying which memory area we’ll be using: “.text” is the area for code/instructions.
3. Now we specify our “_start” function as global (“_start” in assembly is same concept of “main” in C, it is defining the program’s Entry Point)
5. Here we’re creating a label called “_start” (matching our Entry Point).
The rest is self-explatory :).

Different from usual, assembly codes doesn’t need to be compiled, it is simply assembled, to convert mnemonic instructions (mov, int …) to computer opcodes or “machine language”.

As you can see in our dump, the opcode of “mov” instruction when related to “eax” register AND constant value is “b8”, what comes next is our parameter “$1” in little-endian and using all 32 bits (4 bytes). Assembling we’re also converting decimal values to hexadecimal: $69 -> $0x45.

Once assembled our program isn’t ready to execute yet, check the memory base address assigned to our Entry Point “00000000”. To solve this we must “link” our instructions with “Linux reality”, we can do this using the “ld” command.

Done! Now our program is ready to run, check the new values for our instructions.
Here we go:

waKKu@blog$ ./exitsc
waKKu@blog$ echo $?
69

4. Assembly -> Shellcode
Now that we’ve our “fully functional” assembly program, it’s time to transform it in shellcode.

1st. Consideration:
– Instructions (opcodes) has a fixed length, this is how CPU knows how many bytes after “b8” it should grab as part of “b8” instruction without mess with the next instruction.
Our first instruction is 5 bytes long (“b8 01 00 00 00”) and is at address “0x08048074”, the second instruction is right after the first:

2nd. Consideration:
The register inside CPU responsible to execute instructions is the EIP (Extended Instruction Pointer – or also known as – Program Counter), its function is to point to a memory address holding the opcodes to be executed. The CPU has NO IDEA why it is pointing there, what is there, how those bytes came there or even if those bytes are really instructions or simply text on the wrong place. Whatever exist in memory on that address, the CPU will fetch, decode instruction and execute, then EIP is updated to the next address and the cycle starts again.

That said, comes the idea to mess with execution flow. If we’ve been able to point EIP to the beginning of our shellcode, CPU will execute it with no questions*. What we need to be sure is to execute from the first byte until our last byte of shellcode.

We agreed that all opcodes are aligned in memory, one after another, and the CPU is able to “fetch” one by one because it knows the size of each one. Ok, let’s transform our program’s dump in a queue of bytes, from the first memory address to the last:
b801000000bb45000000cd80

Seems reasonable for us, but for CPU each letter/number there has 1 byte, so instead of “b8” be an instruction, “b8” is the byte 0x63(b) followed by byte 0x38(8) (man 7 ascii).
So what we need to do to convert values “b8” to byte “b8”, is express this value as a hexadecimal value, using “\xb8” notation. Now we got:
“\xb8\x01\x00\x00\x00\xbb\x45\x00\x00\x00\xcd\x80”

Since the beginning we knew that 0x00 is a bad char, we need to found a way to do the same effect using different opcodes.

Using our objdump output, we can enumerate what lines we need to change to remove bad chars:

0: b8 01 00 00 00 mov $0x1,%eax
5: bb 45 00 00 00 mov $0x45,%ebx

What we need to do is find another way to put “$1” into “eax” and “$69” into “ebx” without using NULL Bytes…
The easy way is using only 8 bits of each register, “al” and “bl” respectivelly to our registers. Here you can see these divisions: CPU registers.

Now we’re NULL Bytes free. It works great for a simple program, but we CAN’T use it for a shellcode…
Why?
— When we’re injecting our code into another program’s memory, we have no idea (or at least should consider it) about the values inside the registers, thus we’ve no guarantee of remaining 24 bits of any register are zero’s. We could end up with a totally nonsense value.

I believe the most used trick is to execute a XOR instruction using the register as both, source and destination. So, if we do:
123 XOR 123 == 0
0 XOR 0 == 0
123456789 XOR 123456789 == 0
Thus we can easily zero any register doing “register XOR register”. Now that we are sure all bits are zero, we can proceed with our previous idea.

Done! No NULL Bytes and fully functional… effortless we’ve also reduced the size of our shellcode from 12 bytes to 9 bytes.
Final Shellcode:
“\x31\xc0\x40\x31\xdb\xb3\x45\xcd\x80”

6. Exploiting

How could we test our shellcode?
Is assembling, linking and executing enough? – NO!!
We need to consider the moment we’ll be executing our shellcode in a “hostile” environment can not be compared against the moment of normal process execution within operating system, shellcode has no memory areas as .bss, .heap, .stack or even our previously declared .text… It was fully injected in a point of memory with specific permissions/restrictions, an example is a stack with NX Bit + Exec Shield, it makes our execution impossible due to restrict EIP to fetch instructions from stack area.

Then we need to create a simple program to manipulate the execution flow for us and point it straight to our shellcode, as it was a function pointer

waKKu: ok, I will aks for something in forward to feature, excuse for the english, but sections are very cool and knowledgeable ;) i have posted ws in Google+ and twit, with no advertise, just try :”), i have read the wikis and at around in time.. i will read more..