Reference

Atomic operations are quite useful in concurrent programming, notably in the
implementations of lock-free algorithms. Many times when the locking algorithms
goes to the performance bottleneck, atomic operations come to save.

Atomic ordering

NotAtomic (regular load and store)

Unordered (to match java safe language memory model)

Monotinic (or memory_order_relaxed)

Acquire (or memory_order_acquire and memory_order_consume)

Release (or memory_order_release)

AcuireRelease (or memory_oder_acq_rel)

SquentiallyConsistent (or memory_order_seq_cst)

Platforms implementations

X86

all atomic loads generate a MOV and SquentiallyConsistent stores generate an
XCHG, other stores generate a MOV.

A Small Benchmark

I wrote a small program to benchmark the performance of atomic operations
against mutexes and spinlocks. It is hosted at GitHub.. You can clone the
repository and execute make. It requires a modern c++ compiler with c++11
support.