X64 Assembly

Now I’ll try to explain the basics of x64 assembly. I assume the reader is already familiar with x86 assembly, otherwise he won’t be able to make heads or tails of this paragraph.
Moreover, since this is just a very (but very) brief guide, you’ll have to look into the AMD64 documentation for more advanced stuff. Some stuff I won’t even mention, you’ll see by yourself that some instructions are no longer in use: for instance, that the lea instruction has completely taken place of the mov offset.

What you’re going to notice at once is that there are some more registers in the x64 syntax:

8 new general-purpose registers (GPRs).

8 new 128-bit XMM registers.

Of course, all general-purpose registers are 64 bits wide. The old ones we already knew are easy to recognize in their 64-bit form: rax, rbx, rcx, rdx, rsi, rdi, rbp, rsp (and rip if we want to count the instruction pointer). These old registers can still be accessed in their smaller bit ranges, for instance: rax, eax, ax, ah, al.
The new registers go from r8 to r15, and can be accessed in their various bit ranges like this: r8 (qword), r8d (dword), r8w (word), r8b (low byte).

Here’s a figure taken from the AMD docs:

Applications can still use segments registers as base for addressing, but the 64-bit mode only recognizes three of the old ones (and only two can be used for base address calculations). Here’s another figure:

And now, the most important things. Calling convention and stack. x64 assembly uses FASTCALLs as calling convention, meaning it uses registers to pass the first 4 parameters (and then the stack).
Thus, the stack frame is made of: the stack parameters, the registers parameters, the return address (which I remind you is a qword) and the local variables.
The first parameter is the rcx register, the second one rdx, the third r8 and the fourth r9. Saying that the parameters registers are part of the stack frame, makes it also clear that any function that calls another child function has to initialize the stack providing space for these four registers, even if the parameters passed to the child function are less than four.
The initialization of the stack pointer is done only in the prologue of a function, it has to be large enough to hold all the arguments passed to child functions and it’s always a duty of the caller to clean the stack. Now, the most important thing to understand how the space is provided in the stack frame is that the stack has to be 16-byte aligned.
In fact, the return address has to be aligned to 16 bytes. So, the stack space will always be something like 16n + 8, where n depends on the number of parameters. Here’s a small figure of a stack frame:

Don’t worry if you haven’t completely figured out how it works: now we will see a few code samples, which, in my opinion, always make the theory a lot easier to understand. Let us take for instance a hello-world application like:

As said, the child function takes 7 parameters, making it necessary to provide space for 3 extra parameters on the stack. So, 7 * 8 = 0x38, which aligned to 16byte is 0x40. Providing, then, space for the return address makes it 0x48, our value indeed.
I think you have understood the stack-frames logic by now, it’s actually quite easy to understand it, but it needs a second to revert from the old x86/stdcall logic to this one. But now enough of this, now that we’ve seen how the x64 code works, we’ll try compiling an assembly source by ourselves.

Before we start, I have to make something clear. There are some assemblers over the internet which make the job easier, mainly because the initialize the stack by themselves or they create code that is easy to converto from/to x86.
But I think that is not the point here in this article. In fact, I’m going to use the microsoft assembler (ml64.exe), which requires you to write everything down, just like in the disassembly. Another option could be compiling the with another assembler and then link it with ml64.
I think the reader should really make these decisions on his own. As far as I am concerned, I don’t believe that much code should be written in assembly and avoided whenever it could be done. This new x64 technology is a good opportunity to re-think about these matters.
In the last years I always wrote 64-bit compatible code in C/C++ (I mean unmanaged, of course) and when I had to recompile a project of 70,000 lines of code for x64, I didn’t had to change one single line of code (I’ll talk about the C/C++ programming later). Despite of all the macros an assembler offers, I seriously doubt that people who wrote their whole code in assembly will be able to switch so easily to x64 (remember one day even the IA64 syntax could be adopted). I think in most cases the obvious choice will be not converting to the new technology and stick to x86, but this isn’t always possible, it depends on the software category.

The microsoft assembler is contained in the SDK and in the DDK (WDK for Vista). Right now, I’m using Vista’s WDK, which I freely downloaded from the msdn. The first sample of code I’m going to show you is a simple Hello-World messagebox application.

As you can see, I didn’t bother unwinding the stack, since I call ExitProcess. The syntax is very similar to the old MASM one, although there are a few dissimalirites. The ml64 console output should be something like this:

If the libs are not in the same directory as ml64.exe, you’ll have to provide the path like I did. The entry has to be provided, otherwise you would have to use WinMainCRTStartup as main entry.

The next sample of code I’m going to show you displays a window calling CreateWindowEx. What you’re going to learn through this code is structure alignment and how integrating resources in your projects.
Like I said earlier, I don’t want to encourage you to write your windows in assembly, but I believe that this sort of code is good for learning. Now the code, afterwards the explanation.

As you can see, I tried to stay as low level as I could. The reason why I avoided for other functions other than the main the proc macro is that the ml64 puts a prologue end an epilogue, which I didn’t want, by itself.
Avoiding the macro made it possible to define my own stack frame without any intermission by the compiler. The first thing to notice scrolling this code is the structure:

It requires two paddings which the x86 declaration of the same structure didn’t. The reason, in a few words, is that qword members should be aligned to qword boundaries (this for the first padding).
The additional padding at the end of the structure follows the rule that: every structure should be aligned to its largest member. So, being its largest member a qword, the structure should be aligned to an 8-byte boundary.

test.res is a file I took from a VC++ wizard project, I was too lazy to make on by myself. Anyway, making a resource file is very easy with the VC++, but no one forbids you to use the notepad, it just takes more time.
To compile the resource file all you need to do is to use the command line: “rc test.rc”.

I think the rest of the code is pretty easy to understand. I didn’t cover everything with this paragraph, but now you should have quite a good insight into x64 assembly.