Generates a venetian shellcode alignment stub which can be placed directly before unicode shellcode.
Arguments:
-a <address> : Specify the address where the alignment code will start/be placed
Optional arguments:
-l : Prepend alignment with a null byte compensating nop equivalent
(Use this if the last instruction before the alignment routine 'leaks' a null byte)
-b <reg> : Set the bufferregister, defaults to eax
-t <seconds> : Time in seconds to run heuristics (defaults to 15)
-ebp <value> : Overrule the use of the 'current' value of ebp,
ebp/address will be used to calculate offset to shellcode

Instead of “!mona unicodealign” you can use the short version “!mona ua”. Here’s a short video on how the new feature can be used:

Puh, I know, long title. So as I was going through my corelan training material again, I was trying to exploit the public Xion exploit that can be found on exploit-db (please read the exploit first). As I can’t cover all the basics in this blog posts, here just a short overview of the exploit:

It’s an SEH exploit. So we control one of the exception handlers. It’s Unicode, so if you use corelan’s mona, use things like !mona seh -cp unicode. Unicode exploits are a little bit tricky, you often get null bytes, the presentation of FX about Unicode exploiting helped a lot.

Ok, let’s assume you already did the SEH exploiting and got code execution. So if you want to play from this point, here’s the exploit so far (simply hits our garbage). All the steps starting from this script are explained in the video below (although you might need to know how to change the offset to SEH by injecting a cycling patter, because the .m3u path length matters):

So now we’re at the point where we want to run shellcode. Usual shellcodes can not be used in Unicode environments, therefore we need to use an encoder, for example the Alpha3 encoder. Now the thing with encoders is that they normally use a GetPC routine or a BufferRegister to locate themselves in memory. GetPC is not compatible with Unicode, so we have to use a BufferRegister. A BufferRegister means nothing else than writing the address of the location of the first byte of the shellcode into a register and telling the shellcode which register it is. To achieve this goal, we need to write an alignment code that is Unicode compliant itself.

Now we come to the core part of this post: Writing alignment code. How do we write the correct address into the register? So far this involved some manual work. We have to find Unicode compatible opcodes. We can look at the transformation tables in FX’s presentation, but for a lot of characters this means we inject “\x41” and it will be transformed to “\x41\x00”. So which opcodes of the x86 assembler language have null bytes in it? Even worse, for most of our injections the opcodes must have a null byte every second byte. Or at least we have to get rid of those zero bytes.

I checked on the metasm shell (included in the Metasploit framework under the tools folder) and found out that all ADD operations with 4 byte registers (ah, al, bh, bl, etc.) start with a zero byte:

metasm > add ah,bl
"\x00\xdc"

So because I’m a lazy guy and don’t want to do the manual work, I wrote a program that will write the alignment code for me and hopefully for everybody else in the future.

Here are the steps that have to be done if you want to use my script:

Make sure the first four bytes of the BufferRegister are correct (not part of the script).

Get some reliable values into EAX, ECX, EDX, EBX with as few null bytes in it as possible (not part of the script).

Tell my script the value of the EAX, ECX, EDX, EBX registers when the debugger stops exactly at the position where the alignment code will start. Additionally tell the script the address of the start of the alignment code (the current EIP).

Run the script. It uses a random “heuristic” (well, bruteforcing with random inputs). So far I always got the single best result (with this approach) when I run it at least 1 minute. An alignment code that is a little longer is normally found in a couple of seconds.

Stop the script (Ctrl+C) when you waited long enough and think there will be no shorter result. Copy the best/shortest/last alignment code into the metasm shell to get opcodes in hex.

Remove the null bytes from the alignment code (they get injected in the Unicode transformation).

Inject the alignment code, add your produced Unicode shellcode, pwn!

The idea is to calculate the correct value where our shellcode lies. But if the shellcode is put directly after the alignment code, every additional instruction in the alignment code will increase our “target” address we want to have in our BufferRegister. This means we don’t know the location from the beginning, but have to calculate it. Until now the approach was to chose an address that is enough far away and then jump to that address at the end of the alignment code. This script finds better solutions. The script does nothing else than trying to sum up the different bytes of AH, AL, BH, BL etc. and try to find the correct value, always keeping track on how many instructions are already needed and adjusting the “target” address where the shellcode will live. So far the theory. Some more theory, the math modell behind:

written by floyd - floyd.ch
all rights reserved
http://www.floyd.ch
@floyd_ch
We are talking about a problem in GF(256). In other words the numbers are modulo 256. Or for the
IT people: A byte that wraps around (0xFFFFFFFE + 0x00000002 = 0x00000001).
Let's first discuss a simple example with 8 inputs (a1..a8). We need the more general case,
but so far 8 inputs is the maximum that makes sense. Although it doesn't make any
difference. If we solve the general case with (let's say) 16 Inputs we can
simply set the not needed inputs's to zero and they will be ignored in the model.
The script runs in the general case and can operate in all cases!
Inputs (values): a1, a2, ..., a8 mod 256
Inputs (starts): s1, s2 mod 256
Inputs (targets/goal): g1, g2 mod 256
Outputs: x1, x2, ..., x8, y1, y2, ..., y8 where these are natural numbers (including zero)!
Find (there might be no solution):
s1+a1*x1+a2*x2+a3*x3+a4*x4+a5*x5+a6*x6+a7*x7+a8*x8-((s2+2*(x1+x2+x3+x4+x5+x6+x7+x8+y1+y2+y3+y4+y5+y6+y7+y8)+3)/256) = g1 mod 256
s2+a1*y1+a2*y2+a3*y3+a4*y4+a5*y5+a6*y6+a7*y7+a8*y8-2*(x1+x2+x3+x4+x5+x6+x7+x8+y1+y2+y3+y4+y5+y6+y7+y8)-3 = g2 mod 256
Minimise (sum of outputs):
x1+x2+x3+x4+x5+x6+x7+x8+y1+y2+y3+y4+y5+y6+y7+y8
Example
{a1, a2, ... a8} = {9, 212, 0, 0, 32, 28, 50, 188}
{s1, s2} = {233, 212}
{g1, g2} = {253, 75}
Btw the +3 and -3 in the formula is because we have to get rid of the last zero byte, we do that by injecting
\x6d, which results in 00 6d 00 (serves as a "NOP"), meaning we need to add 3 to the address.
[Extra constraints!]
1. As we first set AH (or whatever higher byte is in start_is) to the correct value
AH has changed until we want to set AL. Therefore the instruction
add al, ah will NOT return the correct result, because we assume an old value
for the AH register! That's why we have a originals (a1...a8) and
modify them to a21, a22, a23, .. a28 (only for y1...y8, for the x values we're fine)
2. Additionally, the following instructions are not allowed (because it would be a lot more
complicated to calculate in advance the state of the register we are modifying):
add ah, ah
add al, al
Solved by overwriting if random generator produced something like that.
3. This program only works if you already managed to get the first
four bytes of the alignment address right (this program operates
only on the last four bytes!)

I would say that the script already works pretty well, although it is not yet tested with a lot of different situations. Enough talking, here is the script:

As I talked to Peter aka corelanc0d3r he liked the idea that we could implement it into mona, although there are a few things that should be changed/added. It is important that we get static and reliable addresses into EAX, ECX, EDX and EBX before we do the math. So in mona the first two steps from above should be integrated as well. Therefore mona should do the following:

Check if EBP is a stackpointer, if yes go to 2. otherwise go to 3.

Pop EBP into EAX: \x55\x6d\x58\6d

Pop ESP into EBX: \x54\x6d\x5B\6d

Find reliable stack pointers on the stack and pop different ones into EDX, ECX (and EAX if 2. was not executed)

Do the math and suggest to user

In a lot of cases this procedure of checking EBP and find reliable stack pointers is probably not necessary. But to get higher reliability for the automated approach it should be done. As Peter pointed out, one of the registers could be filled with a timestamp or something. For now, if you use my script, check for these things manually.

Another good thing I have to point out about this script: We won’t need to jump to the shellcode and we don’t need to put garbage between the alignment code and the shellcode. The script/math model makes sure that the next instruction after the alignment code is exactly where our BufferRegister is pointing to (and where we can put our Unicode shellcode).

Now check out the video describing all the steps and how to use the script: