Post navigation

Atomic locking in Intel® 64 and IA-32 architectures

The Intel® 64 and IA-32 architecture guarantees atomicity if one of the following conditions are met

Reading or writing a byte

Reading or writing a word aligned on a 16-bit boundary

Reading or writing a double word aligned on a 32-bit boundary

So what does this mean ? It means that if we are clever about it, we can implement a simple lock using atomic reading and writing of a single byte.

The following function will use the XCHG instruction to swap two registers atomically, so all we have to do is give it the address to our locking variable and check the return value. If the value returned from the function is LOCKED it means that the thread was unable to acquire the lock. The beauty about this instruction is that if it is used to reference one general purpose register and a memory address, the processor bus locking protocol will be invoked causing the operation to become atomic.

The inline assembly statement might look daunting at first, but it is not really that complicated.

Per definition the structure of such an inline extended assembly statement is like this:

The use of the operands is optional, and we do not use and clobbered registers in this statement, we trust the compiler to figure that out.

As you can see, we provide one output operand and two input operands. The output operand is the variable where we will store the previous value of the lock, and the variable that the caller uses to check if the lock was successfully acquired.

In input and output operands we can specify various constraints:

As you look at the “0″(value) operand, the “0″ is the constraint telling it that the value of this input operand is ONLY allowed to be stored in the 0th output variable, which is the output variable value.

The “a” constraint of address specifies that the value should be stored in the EAX register.

The “=b” constraint of the output operand specifies that this value should be read from EBX, and the operand is write-only.

So there we have it, essentially this says “Take the memory address given in the address variable and store the value given in the value variable in it, while putting the previous content of that memory address into the value variable when done.”

When the caller checks the returned value, if the value is LOCKED it will know that the lock was already taken, and if the value is UNLOCKED it will know that the previous value was unlocked, thus the locked has now been taken.