x64 spoon

While coding and debugging some low-level stuff I sometime need to write
a little piece of assembly code to see if i'm right. Until now, I was
writing code into a process debugged with OllyDbg, and steping it.
Pretty ugly, but it works when you want to know what a "smsw eax" is
doing. Last time, I was confronted to the X64 reality, where no public
tool like OllyDbg allow you to debug a 64-bit process on Windows. So I
decided to write a little application to see how some 64-bit
instructions are running.

Just for the fun I started this project on Linux X86_64, and just
because it's cool the code is running into a "sandboxed"
environment.

Also, I'm using Linux X86_64 but I wanted to run both 64-bit and 32-bit
code snippets without having 2 different processes. You know 64-bit
kernels (CONFIG_X86_64) allow you to run 32-bit native applications
if you compile in support for CONFIG_IA32_EMULATION.

A few words about X64 computing. For the CPU this mode is called
IA-32e mode (or long-mode). To enable it you need to set bit LME
(Long Mode Enabled) in MSR IA32_EFER (see Intel manual vol 3A :
Initializing IA-32e Mode). Then if you want to run 64-bit code your must
setup a code segment (CS) with bit L (64-bit code segment) at 1 and D
(operand-size) to 0. (Intel manual vol 3A: Segment Descriptors).So if
you want to run both 32-bit and 64-bit code with the same kernel, you
must have at least two segment descriptors in your GDT, or have 2
GDTs (General Descriptor Table). One entry must have L=1 and D=0, for
64-bit code, and the other L=0 and D=1 for 32-bit code. Linux uses 1
GDT, with 2 segments named GDT_ENTRY_DEFAULT_USER_CS and
GDT_ENTRY_DEFAULT_USER32_CS. For your indication cs64=0x33 (RPL:3
TABLE:0 INDEX:6) and cs32=0x23 (RPL:3 TABLE:0 INDEX:4)

So if you want to run pure 32-bit code into a 64-bit task you "just"
need to update your segment selector. This task can be hard because you
cannot just change your CS segment selector with a MOV instruction,
other ways are provided to do this.

In order to perform this hack I chose the famous Metasm framework. So
here is my configuration:

Create a subprocess with restrictions: limited access to resources
and SECCOMP mode enabled. The subprocess can exchange with his parent
only via a dedicated pipe.

Run the shellcode in a basic environment. All GPRs (General Purpose
Register) cleared and a clean stack. Code and data segments are flats
(they address all the memory). FS and GS are not supported.

Read the result with the parent. I'll use the GPRs final values as
result.

Do it yourself

For the first part we will use the Shellcode class from Metasm. With
method assemble we easily assemble what we want by specifying the
target CPU. We will use the X86_64 CPU for now, so only assemble 64-bit
code. You can obtain the raw assembly with the encode_string method
on the Shellcode object.For example:

The next part requires creating a subprocess. For this we just fork()
and after that, in the child, we setup some limits :

CPU user time and memory allocation are limited with setrlimit()
calls, on RLIMIT_CPU and RLIMIT_AS. The setrlimit and fork
methods are directly provided by the core ruby class Process.

To close all possible opened file descriptors (because Ruby's runtime
could have opened some FDs I'm not aware of) I decided to use syscall
close() because I cannot find a sysclose() method in Ruby IO's
class (even if sysopen() is present ...). But it's impossible to
directly use LIBC functions from Ruby. Impossible? Not for Metasm ! I
used Metasm's DynLdr module, its role is closed to python module
CTYPES. It provides you a wrapper for any native library. So you
can wrap a function like memfrob() and invoke it from your Ruby
code.

Moreover, the DynLdr module allows to compile and run asm or C code
directly in the Ruby process. Pretty useful isn't it?

This time I didn't use new_func_c but new_api_c, with this one
you can declare an extern C function from its prototype to make it
available from the Ruby. Here the second argument (libc.so.6) is not
necessary because all GNU libc exports are already defined in
metasm/os/gnu_exports.rb ; but now the reader knows he can interface
with others libraries :]

Our child process is now able to run safely our code. If we want to dump
the GPRs after the code execution we need to wrap it. For this, I use
new_func_asm to create a new function written in assembly, compile
it and load it inside the Ruby process address space. In a few words,
this function clears the GPRs, calls the code and puts the GPRs into a
buffer given as argument. Then I just have to print results in the pipe
for our parent to have them.

To sum up, Metasm assembled code from file 'shellcode', forked a child,
setup some restrictions, ran it and printed the GPRs state after the
assembly execution. All of this happened in the Ruby process. Here, I
chose not to print registers R8 to R15.

The child was killed by the kernel because he has consumed too much user
CPU time.

Gimme moar power !

In this part we will see how to run 32-bit code in our 64-bit Ruby
process. The plan is :

Run 32-bit code in a 64-bit task.

???

Profit !

As I said at the beginning, I want to run both 32-bit and 64-bit code in
the same process. To achieve that I could reuse the
GDT_ENTRY_DEFAULT_USER32_CS code segment selector, but I decided to
see if Linux is well working, so I'll create my own code segment with
syscall modify_ldt(). Here I'm allocating my own LDT (Local
Descriptor Table), the same than a GDT but only for my process. In this
LDT will be placed my own segment descriptors. For example I can have a
16bits dedicated stack segment. Of course you cannot do what you want
with the new segment descriptors, the kernel validates them before, but
you know sometimes you're doing it wrong !

So basically, I need to:

Map my code and stack at some 32-bit address. We use mmap() with
flag MAP_32BIT for this.

As you can see, Metasm supports structure declarations. You initialize
them with alloc_c_struct.

Now the hard part, how do we switch from a 64-bit to a 32-bit code
segment? We can perform a far-call, a far-jmp, even a far-ret ; but I
prefer to use an iret instruction because it allows you to change both
CS, EIP, SS and the ESP register at the same time. The iret instruction
is allowed for this because we are moving to a conforming code segment
with the same DPL (Descriptor Privilege Level). To be short, we stay in
ring3 user-land :]

To call iret we need a proper stack to tell what the CPU state will be.
To be precise we must provide EIP, CS, EFLAGS, ESP and SS. Even if we
still have a 64bits stack we need to push argument as if we were in
32bits, so the stack will look like this:

Highdw Lowdw
[ eip | eip ] <- rsp
[ eflags | esp ]
[ ss | 0 ]

Then we launch iret and land in our new code segment.

Last question: how to come back from 32-bit to the 64-bit code segment ?
Remember we moved our code to 32-bit addressable memory, but our
original 64-bit code (the caller) is not necessarily in this range, so
we need a stager in 64-bit code inside the 32-bit addressable memory to
perform a far-jmp back to our caller. At the end of the 32-bit code we
put a far-ret, when we have initialized the stack some things were
pushed, the original 64-bit code segment (GDT_ENTRY_DEFAULT_USER_CS
if you prefer) and pointer to a code located just after the stager.

At the end of the 32-bit code execution, we do a far-ret, which comes
back in 64-bit mode, but still in the 32-bit addressable memory, and
then we can call our far-jmp to our caller.

Last thing, the far-jmp has to know where to find the caller RIP and
RSP.

Time to die pilot !

Now, I know what "smsw rax" does in 64bits, and that changed my
life. By the way, while testing this code we noticed a few bugs with
some Ruby x64 packages. The Ruby process is flooded by
rt_sigprocmask() syscalls and you need to recompile your Ruby to
avoid that, like says here. If you dont, as soon as the child process
enters SECCOMP mode, he is immediately killed ... Well, it was not
designed to be SECCOMP safe =]

There is not a public release of an OllyDbg64 like for now. But, when I
have to deal with x64 assembly this little script is convenient. Of
course you can do the same on Windows, you just have to remove all the
Linux-dependant things :]

Anyway, if you need to do some ASM hacks, Metasm is very convenient. I
know it's Ruby but one day you have to evolve to use some real tools,
and that is a big one. You can find the Ruby source, spoon.rb,
attached to this post. Thanks a lot to @metasm for his help !