Summarizing the ARM for x86 Programmers

RISC: The ARM architecture is RISC-based, yielding two facts of life: an instruction set that’s easy to decode, at the cost of memory accesses that aren’t atomic. While Intel’s more complex x86 architecture lets the programmer increment a memory location with a single machine instruction, ARM requires three explicit steps instead: load-to-register, increment-register, store-from-register. The load-store approach puts the burden on the programmer, rather than the CPU, to accomplish the same result, but the power savings is why ARM is the CPU of choice for battery-powered embedded devices like cell phones.

Predication: ARM instructions typically contain “1110” (hex E) in their highest bits, meaning the instructions always execute, but other combinations of high bits indicate conditional execution. For example, high bits of “0000” mean an instruction will execute only if the Zero flag is set, if for example a downward counter just reached zero, or a comparison between two registers turned out equal. On x86, conditional operations are restricted to execution transfer or data movements, but on ARM almost every instruction is conditional. Additions, branches, returns, logical operations, even hardware floating-point operations, contain this condition field. Instruction predication is nothing new; Konrad Zuse’s Z22 computer had predication in 1955.

The stack: The ARM may be a RISC processor, but its instruction set is flexible. Stacks may grow either down or up, and any register except R14 and R15 may be used as stack pointers. Registers are generally transferred to or from the stack in groups, in single instructions. With sixteen registers, a bit-field representing the registers takes half an ARM instruction. This allows the ARM to store or fetch several registers, and alter the stack pointer appropriately, in a single instruction. The ARM has no dedicated stack pointer, but the instruction set allows any register to be used for memory reference in a stack-pointer fashion. The Linux ABI for ARM appoints R13 for this task; the GNU assembler syntax even uses “push” and “pop” for the instruction, where the ARM documentation uses “stmdb” and “ldmia” for “store multiple, decrement before” and “load multiple, increment after”.

Branches and links: The ARM program counter is R15, by design. With one exception, it is the only dedicated-purpose register in ARM’s architecture. The one exception is R14, which contains the link for subroutine returns. The branch-and-link approach means subroutine calls don’t automatically incur memory access for storing the return address. Leaf functions (which make no other calls) keep the return address in R14, and returns are simply a matter of “branch to register”. In the ARM, that would be “B R14”, a simple jump with no memory access.

If a subroutine does call another code block, then the link register needs to be saved before the next call overwrites it. In programming parlance, this setup at the beginning of a function is called the “function prologue.” But if the pushed registers include the link register R14, the corresponding “function epilogue” isn’t required to pop the return address back into R14 before executing a “branch-to-link” to return. Instead, the ARM can pop the return address directly into R15.

Calling the kernel: Where Intel’s x86 processors have 255 software interrupt vectors, the ARM has one, invoked by the “svc” instruction in the GNU assembler (or “swi” in the ARM tech docs). Neither the ARM CPU nor Linux uses the low 24 bits in the instruction, so it commonly appears as “svc 0x0” in disassembly.