README.md

AsmJit

Important

At the moment the most work related to asmjit happens in next-wip branch, which should be now used in case that you use either Assembler or Builder tools, however, it's unstable regarding asmjit's Compiler. The current master branch is considered stable, but is frozen until next-wip branch is merged. Also note that next-wip branch is not compatible with master - it moves all classes with X86 prefix into x86 namespaces and contains a lot of minor breaking changes - the transition is straightforward and if you use gitter you can get help instantly in case you face something more complicated.

Introduction

AsmJit is a complete JIT and remote assembler for C++ language. It can generate native code for x86 and x64 architectures and supports the whole x86/x64 instruction set - from legacy MMX to the newest AVX512. It has a type-safe API that allows C++ compiler to do semantic checks at compile-time even before the assembled code is generated and/or executed.

AsmJit, as the name implies, started as a project that provided JIT code-generation and execution. However, AsmJit evolved and it now contains features that are far beyond the scope of a simple JIT compilation. To keep the library small and lightweight the functionality not strictly related to JIT is provided by a sister project called asmtk.

Minimal Example

#include<asmjit/asmjit.h>
#include<stdio.h>usingnamespaceasmjit;// Signature of the generated function.typedefint (*Func)(void);
intmain(int argc, char* argv[]) {
JitRuntime rt; // Runtime specialized for JIT code execution.
CodeHolder code; // Holds code and relocation information.
code.init(rt.getCodeInfo()); // Initialize to the same arch as JIT runtime.
X86Assembler a(&code); // Create and attach X86Assembler to `code`.
a.mov(x86::eax, 1); // Move one to 'eax' register.
a.ret(); // Return from function.// ----> X86Assembler is no longer needed from here and can be destroyed <----
Func fn;
Error err = rt.add(&fn, &code); // Add the generated code to the runtime.if (err) return1; // Handle a possible error returned by AsmJit.// ----> CodeHolder is no longer needed from here and can be destroyed <----int result = fn(); // Execute the generated code.printf("%d\n", result); // Print the resulting "1".// All classes use RAII, all resources will be released before `main()` returns,// the generated function can be, however, released explicitly if you intend to// reuse or keep the runtime alive, which you should in a production-ready code.
rt.release(fn);
return0;
}

Configuring & Building

AsmJit is designed to be easy embeddable in any project. However, it depends on some compile-time macros that can be used to build a specific version of AsmJit that includes or excludes certain features. A typical way of building AsmJit is to use cmake, but it's also possible to just include AsmJit source code in your project and just build it. The easiest way to include AsmJit in your project is to just include src directory in your project and to define ASMJIT_STATIC or ASMJIT_EMBED. AsmJit can be just updated from time to time without any changes to this integration process. Do not embed AsmJit's /test files in such case as these are used for testing.

Build Type:

By default none of these is defined, AsmJit detects build-type based on compile-time macros and supports most IDE and compiler settings out of box.

Build Mode:

ASMJIT_EMBED - Define to embed AsmJit in another project. Embedding means that neither shared nor static library is created and AsmJit's source files and source files of the product that embeds AsmJit are part of the same target. This way of building AsmJit has certain advantages that are beyond this manual. ASMJIT_EMBED behaves similarly to ASMJIT_STATIC (no API exports).

ASMJIT_STATIC - Define to build AsmJit as a static library. No symbols are exported in such case.

By default AsmJit build is configured to be built as a shared library, thus none of ASMJIT_EMBED and ASMJIT_STATIC is defined.

Build Backends:

ASMJIT_BUILD_ARM - Build ARM32 and ARM64 backends (work-in-progress).

ASMJIT_BUILD_X86 - Build X86 and X64 backends.

ASMJIT_BUILD_HOST - Build only the host backend (default).

If none of ASMJIT_BUILD_... is defined AsmJit bails to ASMJIT_BUILD_HOST, which will detect the target architecture at compile-time. Each backend automatically supports 32-bit and 64-bit targets, so for example AsmJit with X86 support can generate both 32-bit and 64-bit code.

Disabling Features:

ASMJIT_DISABLE_BUILDER - Disables both CodeBuilder and CodeCompiler emitters (only Assembler will be available). Ideal for users that don't use CodeBuilder concept and want to create smaller AsmJit.

ASMJIT_DISABLE_COMPILER - Disables CodeCompiler emitter. For users that use CodeBuilder, but not CodeCompiler

ASMJIT_DISABLE_TEXT - Disables everything that uses text-representation and that causes certain strings to be stored in the resulting binary. For example when this flag is enabled all instruction and error names (and related APIs) will not be available. This flag has to be disabled together with ASMJIT_DISABLE_LOGGING. This option is suitable for deployment builds or builds that don't want to reveal the use of AsmJit.

NOTE: Please don't disable any features if you plan to build AsmJit as a shared library that will be used by multiple projects that you don't control (for example asmjit in a Linux distribution). The possibility to disable certain features exists mainly for static builds of AsmJit.

Using AsmJit

AsmJit library uses one global namespace called asmjit that provides the whole functionality. Architecture specific code is prefixed by the architecture name and architecture specific registers and operand builders have their own namespace. For example API targeting both X86 and X64 architectures is prefixed with X86 and registers & operand builders are accessible through x86 namespace. This design is very different from the initial version of AsmJit and it seems now as the most convenient one.

CodeHolder & CodeEmitter

AsmJit provides two classes that are used together for code generation:

CodeEmitter - Provides functionality to emit code into CodeHolder. CodeEmitter is abstract and provides just basic building blocks that are then implemented by Assembler, CodeBuilder, and CodeCompiler.

Code emitters:

Assembler - Emitter designed to emit machine code directly.

CodeBuilder - Emitter designed to emit code into a representation that can be processed. It stores the whole code in a double linked list consisting of nodes (CBNode aka code-builder node). There are nodes that represent instructions (CBInst), labels (CBLabel), and other building blocks (CBAlign, CBData, ...). Some nodes are used as markers (CBSentinel) and comments (CBComment).

CodeCompiler - High-level code emitter that uses virtual registers and contains high-level function building features. CodeCompiler is based on CodeBuilder, but extends its functionality and introduces new node types starting with CC (CCFunc, CCFuncExit, CCFuncCall). CodeCompiler is the simplest way to start with AsmJit as it abstracts many details required to generate a function in asm language.

Runtime

AsmJit's Runtime is designed for execution and/or linking. The Runtime itself is abstract and defines only how to add() and release() code held by CodeHolder. CodeHolder holds machine code and relocation entries, but should be seen as a temporary object only - after the code in CodeHolder is ready, it should be passed to Runtime or relocated manually. Users interested in inspecting the generated machine-code (instead of executing or linking) can keep it in CodeHodler and process it manually of course.

The only Runtime implementation provided directly by AsmJit is called JitRuntime, which is suitable for storing and executing dynamically generated code. JitRuntime is used in most AsmJit examples as it makes the code management easy. It allows to add and release dynamically generated functions, so it's suitable for JIT code generators that want to keep many functions alive, and release functions which are no longer needed.

Instructions & Operands

Instructions specify operations performed by the CPU, and operands specify the operation's input(s) and output(s). Each AsmJit's instruction has it's own unique id (X86Inst::Id for example) and platform specific code emitters always provide a type safe intrinsic (or multiple overloads) to emit such instruction. There are two ways of emitting an instruction:

Register Type - Unique id that describes each possible register provided by the target architecture - for example X86 backend provides X86Reg::RegType, which defines all variations of general purpose registers (GPB-LO, GPB-HI, GPW, GPD, and GPQ) and all types of other registers like K, MM, BND, XMM, YMM, and ZMM.

Register Kind - Groups multiple register types under a single kind - for example all general-purpose registers (of all sizes) on X86 are X86Reg::kKindGp, all SIMD registers (XMM, YMM, ZMM) are X86Reg::kKindVec, etc.

Register Size - Contains the size of the register in bytes. If the size depends on the mode (32-bit vs 64-bit) then generally the higher size is used (for example RIP register has size 8 by default).

Register ID - Contains physical or virtual id of the register.

Memory Address (Mem) - Used to reference a memory location. Each Mem provides:

Base Register - A base register id (physical or virtual).

Index Register - An index register id (physical or virtual).

Offset - Displacement or absolute address to be referenced (32-bit if base register is used and 64-bit if base register is not used).

Flags that can describe various architecture dependent information (like scale and segment-override on X86).

Immediate Value (Imm) - Immediate values are usually part of instructions (encoded within the instruction itself) or data.

Label - used to reference a location in code or data. Labels must be created by the CodeEmitter or by CodeHolder. Each label has its unique id per CodeHolder instance.

AsmJit allows to construct operands dynamically, to store them, and to query a complete information about them at run-time. Operands are small (always 16 bytes per Operand) and should be always copied if you intend to store them (don't create operands by using new keyword, it's not recommended). Operands are safe to be memcpy()ed and memset()ed if you need to work with arrays of operands.

Some operands have to be created explicitly by CodeEmitter. For example labels must be created by newLabel() before they are used.

Assembler Example

X86Assembler is a code emitter that emits machine code into a CodeBuffer directly. It's capable of targeting both 32-bit and 64-bit instruction sets and it's possible to target both instruction sets within the same code-base. The following example shows how to generate a function that works in both 32-bit and 64-bit modes, and how to use JitRuntime, CodeHolder, and X86Assembler together.

The example handles 3 calling conventions manually just to show how it could be done, however, AsmJit contains utilities that can be used to create function prologs and epilogs automatically, but these concepts will be explained later.

The example should be self-explanatory. It shows how to work with labels, how to use operands, and how to emit instructions that can use different registers based on runtime selection. It implements 32-bit CDECL, WIN64, and SysV64 caling conventions and will work on most X86 environments.

More About Memory Addresses

X86 provides a complex memory addressing model that allows to encode addresses having a BASE register, INDEX register with a possible scale (left shift), and displacement (called offset in AsmJit). Memory address can also specify memory segment (segment-override in X86 terminology) and some instructions (gather / scatter) require INDEX to be a VECTOR register instead of a general-purpose register. AsmJit allows to encode and work with all forms of addresses mentioned and implemented by X86. It also allows to construct a 64-bit memory address, which is only allowed in one form of 'mov' instruction.

You can explore the possibilities by taking a look at base/operand.h and x86/x86operand.h. Always use X86Mem when targeting X86 as it extends the base Mem operand with features provided only by X86.

More About CodeInfo

In the first complete example the CodeInfo is retrieved from JitRuntime. It's logical as JitRuntime will always return a CodeInfo that is compatible with the runtime environment. For example if your application runs in 64-bit mode the CodeInfo will use ArchInfo::kTypeX64 architecture in contrast to ArchInfo::kTypeX86, which will be used in 32-bit mode. AsmJit also allows to setup CodeInfo manually, and to select a different architecture when needed. So let's do something else this time, let's always generate a 32-bit code and print it's binary representation. To do that, we create our own CodeInfo and initialize it to ArchInfo::kTypeX86 architecture. CodeInfo will populate all basic fields just based on the architecture we provide, so it's super-easy:

#include<asmjit/asmjit.h>
#include<stdio.h>usingnamespaceasmjit;intmain(int argc, char* argv[]) {
usingnamespaceasmjit::x86;// Easier access to x86/x64 registers.
CodeHolder code; // Create a CodeHolder.
code.init(CodeInfo(ArchInfo::kTypeX86));// Initialize it for a 32-bit X86 target.// Generate a 32-bit function that sums 4 floats and looks like:// void func(float* dst, const float* a, const float* b)
X86Assembler a(&code); // Create and attach X86Assembler to `code`.
a.mov(eax, dword_ptr(esp, 4)); // Load the destination pointer.
a.mov(ecx, dword_ptr(esp, 8)); // Load the first source pointer.
a.mov(edx, dword_ptr(esp, 12)); // Load the second source pointer.
a.movups(xmm0, ptr(ecx)); // Load 4 floats from [ecx] to XMM0.
a.movups(xmm1, ptr(edx)); // Load 4 floats from [edx] to XMM1.
a.addps(xmm0, xmm1); // Add 4 floats in XMM1 to XMM0.
a.movups(ptr(eax), xmm0); // Store the result to [eax].
a.ret(); // Return from function.// Now we have two options if we want to do something with the code hold// by CodeHolder. In order to use it we must first sync X86Assembler with// the CodeHolder as it doesn't do it for every instruction it generates for// performance reasons. The options are://// 1. Detach X86Assembler from CodeHolder (will automatically sync).// 2. Sync explicitly, allows to use X86Assembler again if needed.//// NOTE: AsmJit always syncs internally when CodeHolder needs to access these// buffers and knows that there is an Assembler attached, so you have to sync// explicitly only if you bypass CodeHolder and intend to do something on your// own.
code.sync(); // So let's sync, it's easy.// We have no Runtime this time, it's on us what we do with the code.// CodeHolder stores code in SectionEntry, which embeds CodeSection// and CodeBuffer structures. We are interested in section's CodeBuffer only.//// NOTE: The first section is always '.text', so it's safe to just use 0 index.
CodeBuffer& buf = code.getSectionEntry(0)->getBuffer();
// Print the machine-code generated or do something more interesting with it?// 8B4424048B4C24048B5424040F28010F58010F2900C3for (size_t i = 0; i < buf.getLength(); i++)
printf("%02X", buf.getData()[i]);
return0;
}

Explicit Code Relocation

CodeInfo contains much more information than just the target architecture. It can be configured to specify a base-address (or a virtual base-address in a linker terminology), which could be static (useful when you know the location of the target's machine code) or dynamic. AsmJit assumes dynamic base-address by default and relocates the code held by CodeHolder to a user-provided address on-demand. To be able to relocate to a user-provided address it needs to store some information about relocations, which is represented by CodeHolder::RelocEntry. Relocation entries are only required if you call external functions from the generated code that cannot be encoded by using a 32-bit displacement (X64 architecture doesn't provide 64-bit encodable displacement) and when a label referenced in one section is bound in another, but this is not really a JIT case and it's more related to AOT (ahead-of-time) compilation.

Next example shows how to use a built-in virtual memory manager VMemMgr instead of using JitRuntime (just in case you want to use your own memory management) and how to relocate the generated code into your own memory block - you can use your own virtual memory allocator if you need that, but that's OS specific and it's already provided by AsmJit, so we will use what AsmJit offers instead of rolling our own here.

The following code is similar to the previous one, but implements a function working in both 32-bit and 64-bit environments:

Configure the CodeInfo by calling CodeInfo::setBaseAddress() to initialize it to a user-provided base-address before passing it to CodeHolder:

// Configure CodeInfo.
CodeInfo ci(...);
ci.setBaseAddress(uint64_t(0x1234));
// Then initialize CodeHolder with it.
CodeHolder code;
code.init(ci);
// ... after you emit the machine code it will be relocated to the base address// provided and stored in the pointer passed to `CodeHolder::relocate()`.

TODO: Maybe CodeHolder::relocate() is not the best name?

Using Native Registers - zax, zbx, zcx, ...

AsmJit's X86 code emitters always provide functions to construct machine-size registers depending on the target. This feature is for people that want to write code targeting both 32-bit and 64-bit at the same time. In AsmJit terminology these registers are named zax, zcx, zdx, zbx, zsp, zbp, zsi, and zdi (they are defined in this exact order by X86). They are accessible through X86Assembler, X86Builder, and X86Compiler. The following example illustrates how to use this feature:

The example just returns 0, but the function generated contains a standard prolog and epilog sequence and the function itself reserves 32 bytes of local stack. The advantage is clear - a single code-base can handle multiple targets easily. If you want to create a register of native size dynamically by specifying its id it's also possible:

Cloning existing registers and chaning their IDs is fine in AsmJit; and this technique is used internally in many places.

Using Assembler as Code-Patcher

This is an advanced topic that is sometimes unavoidable. AsmJit by default appends machine-code it generates into a CodeBuffer, however, it also allows to set the offset in CodeBuffer explicitly and to overwrite its content. This technique is extremely dangerous for asm beginners as X86 instructions have variable length (see below), so you should in general only patch code to change instruction's offset or some basic other details you didn't know about the first time you emitted it. A typical scenario that requires code-patching is when you start emitting function and you don't know how much stack you want to reserve for it.

Before we go further it's important to introduce instruction options, because they can help with code-patching (and not only patching, but that will be explained in AVX-512 section):

Many general-purpose instructions (especially arithmetic ones) on X86 have multiple encodings - in AsmJit this is usually called 'short form' and 'long form'.

AsmJit always tries to use 'short form' as it makes the resulting machine-code smaller, which is always good - this decision is used by majority of assemblers out there.

AsmJit allows to override the default decision by using short_() and long_() instruction options to force short or long form, respectively. The most useful is long_() as it basically forces AsmJit to always emit the long form. The short_() is not that useful as it's automatic (except jumps to non-bound labels). Note the underscore after each function name as it avoids collision with built-in C++ types.

To illustrate what short form and long form means in binary let's assume we want to emit add esp, 16 instruction, which has two possible binary encodings:

83C410 - This is a short form aka short add esp, 16 - You can see opcode byte (0x8C), MOD/RM byte (0xC4) and an 8-bit immediate value representing 16.

81C410000000 - This is a long form aka long add esp, 16 - You can see a different opcode byte (0x81), the same Mod/RM byte (0xC4) and a 32-bit immediate in little-endian representing 16.

If you generate an instruction in a short form and then patch it in a long form or vice-versa then something really bad will happen when you try to execute such code. The following example illustrates how to patch the code properly (it just extends the previous example):

#include<asmjit/asmjit.h>
#include<stdio.h>usingnamespaceasmjit;typedefint (*Func)(void);
intmain(int argc, char* argv[]) {
JitRuntime rt; // Create a runtime specialized for JIT.
CodeHolder code; // Create a CodeHolder.
code.init(rt.getCodeInfo()); // Initialize it to be compatible with `rt`.
X86Assembler a(&code); // Create and attach X86Assembler to `code`.// Let's get these registers from X86Assembler.
X86Gp zbp = a.zbp();
X86Gp zsp = a.zsp();
// Function prolog.
a.push(zbp);
a.mov(zbp, zsp);
// This is where we are gonna patch the code later, so let's get the offset// (the current location) from the beginning of the code-buffer.size_t patchOffset = a.getOffset();
// Let's just emit 'sub zsp, 0' for now, but don't forget to use LONG form.
a.long_().sub(zsp, 0);
// ... emit some code (this just sets return value to zero) ...
a.xor_(x86::eax, x86::eax);
// Function epilog and return.
a.mov(zsp, zbp);
a.pop(zbp);
a.ret();
// Now we know how much stack size we want to reserve. I have chosen 128// bytes on purpose as it's encodable only in long form that we have used.int stackSize = 128; // Number of bytes to reserve on the stack.
a.setOffset(patchOffset); // Move the current cursor to `patchOffset`.
a.long_().sub(zsp, stackSize); // Patch the code; don't forget to use LONG form.// Now the code is ready to be called
Func fn;
Error err = rt.add(&fn, &code); // Add the generated code to the runtime.if (err) return1; // Handle a possible error returned by AsmJit.int result = fn(); // Execute the generated code.printf("%d\n", result); // Print the resulting "0".
rt.release(fn); // Remove the function from the runtime.return0;
}

If you run the example it would just work. As an experiment you can try removing long_() form to see what happens when wrong code is generated.

Code Patching and REX Prefix

In 64-bit mode there is one more thing to worry about when patching code - REX prefix. It's a single byte prefix designed to address registers with ids from 9 to 15 and to override the default width of operation from 32 to 64 bits. AsmJit, like other assemblers, only emits REX prefix when it's necessary. If the patched code only changes the immediate value as shown in the previous example then there is nothing to worry about as it doesn't change the logic behind emitting REX prefix, however, if the patched code changes register id or overrides the operation width then it's important to take care of REX prefix as well.

AsmJit contains another instruction option that controls (forces) REX prefix - rex(). If you use it the instruction emitted will always use REX prefix even when it's encodable without it. The following list contains some instructions and their binary representations to illustrate when it's emitted:

Generic Function API

So far all examples shown above handled creating function prologs and epilogs manually. While it's possible to do it that way it's much better to automate such process as function calling conventions vary across architectures and also across operating systems.

AsmJit contains a functionality that can be used to define function signatures and to calculate automatically optimal frame layout that can be used directly by a prolog and epilog inserter. This feature was exclusive to AsmJit's CodeCompiler for a very long time, but was abstracted out and is now available for all users regardless of CodeEmitter they use. The design of handling functions prologs and epilogs allows generally two use cases:

Calculate function layout before the function is generated - this is the only way if you use pure Assembler emitter and shown in the next example.

Calculate function layout after the function is generated - this way is generally used by CodeBuilder and CodeCompiler (will be described together with X86Compiler).

The following concepts are used to describe and create functions in AsmJit:

TypeId - TypeId is an 8-bit value that describes a platform independent type. It provides abstractions for most common types like int8_t, uint32_t, uintptr_t, float, double, and all possible vector types to match ISAs up to AVX512. TypeId was introduced originally to be used with CodeCompiler, but is now used by FuncSignature as well.

FuncDetail - Architecture and ABI dependent information that describes CallConv and expanded FuncSignature. Each function argument and return value is represented as FuncDetail::Value that contains the original TypeId enriched by additional information that specifies if the value is passed/returned by register (and which register) or by stack. Each value also contains some other metadata that provide additional information required to handle it properly (for example if a vector value is passed indirectly by a pointer as required by WIN64 calling convention, etc...).

FuncArgsMapper - A helper that can be used to define where each function argument is expected to be. It's architecture and ABI dependent mapping from function arguments described by CallConv and FuncDetail into registers specified by the user.

FuncFrameInfo - Contains information about a function-frame. Holds callout-stack size and alignment (i.e. stack used to call functions), stack-frame size and alignment (the stack required by the function itself), and various attributes that describe how prolog and epilog should be constructed. FuncFrameInfo doesn't know anything about function arguments or returns, it should be seen as a class that describes minimum requirements of the function frame and its attributes before the final FuncFrameLayout is calculated.

FuncFrameLayout - Contains the final function layout that can be passed to FuncUtils::emitProlog() and FuncUtils::emitEpilog(). The content of this class should always be calculated by AsmJit by calling FuncFrameLayout::init(const FuncDetail& detail, const FuncFrameInfo& ffi).

It's a lot of concepts where each represents one step in the function layout calculation. In addition, the whole machinery can also be used to create function calls, instead of function prologs and epilogs. The next example shows how AsmJit can be used to create functions for both 32-bit and 64-bit targets and various calling conventions:

CodeBuilder

Both CodeBuilder and CodeCompiler are emitters that emit everything to a representation that allows further processing. The code stored in such representation is completely safe to be patched, simplified, reordered, obfuscated, removed, injected, analyzed, and 'think-of-anything-else'. Each instruction (or label, directive, ...) is stored as CBNode (Code-Builder Node) and contains all the necessary information to emit machine code out of it later.

There is a difference between CodeBuilder and CodeCompiler:

CodeBuilder (low-level):

Maximum compatibility with Assembler, easy to switch from Assembler to CodeBuilder and vice versa.

Doesn't generate machine code directly, allows to serialize to Assembler when the whole code is ready to be encoded.

CodeCompiler (high-level):

Virtual registers - allows to use unlimited number of virtual registers which are allocated into physical registers by a built-in register allocator.

CBSentinel - A marker that can be used to remember certain position, doesn't affect code generation.

CodeCompiler nodes:

CCFunc - Start of a function.

CCFuncExit - Return from a function.

CCFuncCall - Function call.

NOTE: All nodes that have CB prefix are used by both CodeBuilder and CodeCompiler. Nodes that have CC prefix are exclusive to CodeCompiler and are usually lowered to CBNodes by a CodeBuilder specific pass or treated as one of CB nodes; for example CCFunc inherits CBLabel so it's treated as CBLabel by CodeBuilder and as CCFunc by CodeCompiler.

Using CodeBuilder

CodeBuilder was designed to be used as an Assembler replacement in case that post-processing of the generated code is required. The code can be modified during or after code generation. The post processing can be done manually or through Pass (Code-Builder Pass) object. CodeBuilder stores the emitted code as a double-linked list, which allows O(1) insertion and removal.

The code representation used by CodeBuilder is compatible with everything AsmJit provides. Each instruction is stored as CBInst, which contains instruction id, options, and operands. Each instruction emitted will create a new CBInst instance and add it to the current cursor in the double-linked list of nodes. Since the instruction stream used by CodeBuilder can be manipulated, we can rewrite the SumInts example into the following:

The number of use-cases of X86Builder is not limited and highly depends on your creativity and experience. The previous example can be easily improved to collect all dirty registers inside the function programmatically and to pass them to ffi.setDirtyRegs():

Using X86Assembler or X86Builder through X86Emitter

Even when Assembler and CodeBuilder implement the same interface defined by CodeEmitter their platform dependent variants (X86Assembler and X86Builder, respective) cannot be interchanged or casted to each other by using C++'s static_cast<>. The main reason is the inheritance graph of these classes is different and cast-incompatible, as illustrated in the following graph:

The graph basically shows that it's not possible to cast X86Assembler to X86Builder and vice versa. However, since both X86Assembler and X86Builder share the same interface defined by both CodeEmitter and X86EmmiterImplicitT a class called X86Emitter was introduced to make it possible to write a function that can emit to both X86Assembler and X86Builder. Note that X86Emitter cannot be created, it's abstract and has private constructors and destructors; it was only designed to be casted to and used as an interface.

Each X86 emitter implements a member function called asEmitter(), which casts the instance to the X86Emitter, as illustrated on the next example:

The example above shows how to create a function that can emit code to either X86Assembler or X86Builder through X86Emitter, which provides emitter-neutral functionality. X86Emitter, however, doesn't provide any emitter X86Assembler or X86Builder specific functionality like setCursor().

CodeCompiler

CodeCompiler is a high-level code emitter that provides virtual registers and automatically handles function calling conventions. It's still architecture dependent, but makes the code generation much easier by offering a built-in register allocator and function builder. Functions are essential; the first-step to generate some code is to define the signature of the function you want to generate (before generating the function body). Function arguments and return value(s) are handled by assigning virtual registers to them. Similarly, function calls are handled the same way.

CodeCompiler also makes the use of passes (introduced by CodeBuilder) and automatically adds an architecture-dependent register allocator pass to the list of passes when attached to CodeHolder.

Compiler Basics

The first CodeCompiler example shows how to generate a function that simply returns an integer value. It's an analogy to the very first example:

The addFunc() and endFunc() methods define the body of the function. Both functions must be called per function, but the body doesn't have to be generated in sequence. An example of generating two functions will be shown later. The next example shows more complicated code that contain a loop and generates a memcpy32() function:

Recursive Functions

It's possible to create more functions by using the same X86Compiler instance and make links between them. In such case it's important to keep the pointer to the CCFunc node. The first example creates a simple Fibonacci function that calls itself recursively:

Stack Management

CodeCompiler manages function's stack-frame, which is used by the register allocator to spill virtual registers. It also provides an interface to allocate user-defined block of the stack, which can be used as a temporary storage by the generated function. In the following example a stack of 256 bytes size is allocated, filled by bytes starting from 0 to 255 and then iterated again to sum all the values.

Constant Pool

CodeCompiler provides two constant pools for a general purpose code generation - local and global. Local constant pool is related to a single CCFunc node and is generally flushed after the function body, and global constant pool is flushed at the end of the generated code by CodeCompiler::finalize().

Code Injection

Both CodeBuilder and CodeCompiler emitters store their nodes in a double-linked list, which makes it easy to manipulate during the code generation or after it. Each node is always emitted next to the current cursor and the cursor is changed to that newly emitted node. Cursor can be explicitly retrieved and assigned by getCursor() and setCursor(), respectively.

The following example shows how to inject code at the beginning of the function by providing an XmmConstInjector helper class.

There are many other applications of code injection, usually it's used to lazy-add some initialization code and such, but the application is practically unlimited.

Advanced Features

Logging

Failures are common, especially when working at machine-code level. AsmJit does already a good job with function overloading to prevent from emitting semantically incorrect instructions, but it can't prevent from emitting code that is semantically correct, but contains bugs. Logging has always been an important part of AsmJit's infrastructure and looking at logs could become handy when your code doesn't work as expected.

Error Handling

AsmJit uses error codes to represent and return errors. Every function where error can occur returns Error. Exceptions are never thrown by AsmJit even in extreme conditions like out-of-memory. Errors should never be ignored, however, checking errors after each asmjit API call would simply overcomplicate the whole code generation. To handle these errors AsmJit provides ErrorHandler, which contains handleError():

Return true or false from handleError(). If true is returned it means that error was handled and AsmJit can continue execution. The error code still be propagated to the caller, but the error origin (CodeEmitter) won't be put into an error state (last-error won't be set and isInErrorState() would return true). However, false reports to AsmJit that the error cannot be handled - in such case it stores the error, which can be retrieved later by getLastError(). Returning false is the default behavior when no error handler is provided. To put the assembler into a non-error state again resetLastError() must be called.

Throw an exception. AsmJit doesn't use exceptions and is completely exception-safe, but you can throw exception from the error handler if this way is easier / preferred by you. Throwing an exception acts virtually as returning true - AsmJit won't store the error.

Use plain old C's setjmp() and longjmp(). Asmjit always puts Assembler and Compiler to a consistent state before calling the handleError() so longjmp() can be used without issues to cancel the code-generation if an error occurred. This method can be used if exception handling in your project is turned off and you still want some comfort. In most cases it should be safe as AsmJit is based on Zone memory, so no memory leaks will occur if you jump back to a location where CodeHolder still exist.

ErrorHandler is simply attached to CodeHolder and will be used by every emitter attached to it. The first example uses error handler that just prints the error, but lets AsmJit continue:

If error happens during instruction emitting / encoding the assembler behaves transactionally - the output buffer won't advance if encoding failed, thus either a fully encoded instruction or nothing is emitted. The error handling shown above is useful, but it's still not the best way of dealing with errors in AsmJit. The following example shows how to use exception handling to handle errors in a more C++ way:

If C++ exceptions are not what you like or your project turns off them completely there is still a way of reducing the error handling to a minimum by using a standard setjmp/longjmp approach. AsmJit is exception-safe and cleans up everything before calling the ErrorHandler, so any approach is safe. You can simply jump from the error handler without causing any side-effects or memory leaks. The following example demonstrates how it could be done: