On Mon, Aug 29, 2011 at 1:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> This discussion seems to miss the fact that there are two levels of
>> reordering that can happen. First, the compiler can move things
>> around. Second, the CPU can move things around.
>
> Right, I think that's exactly the problem with the previous wording of
> that comment; it doesn't address the two logical levels involved.
> I've rewritten it, see what you think.
>
> * Another caution for users of these macros is that it is the caller's
> * responsibility to ensure that the compiler doesn't re-order accesses
> * to shared memory to precede the actual lock acquisition, or follow the
> * lock release. Typically we handle this by using volatile-qualified
> * pointers to refer to both the spinlock itself and the shared data
> * structure being accessed within the spinlocked critical section.
> * That fixes it because compilers are not allowed to re-order accesses
> * to volatile objects relative to other such accesses.
> *
> * On platforms with weak memory ordering, the TAS(), TAS_SPIN(), and
> * S_UNLOCK() macros must further include hardware-level memory fence
> * instructions to prevent similar re-ordering at the hardware level.
> * TAS() and TAS_SPIN() must guarantee that loads and stores issued after
> * the macro are not executed until the lock has been obtained. Conversely,
> * S_UNLOCK() must guarantee that loads and stores issued before the macro
> * have been executed before the lock is released.
That's definitely an improvement.
I'm actually not convinced that we're entirely consistent here about
what we require the semantics of acquiring and releasing a spinlock to
be. For example, on x86 and x86_64, we acquire the lock using xchgb,
which acts a full memory barrier. But when we release the lock, we
just zero out the memory address, which is NOT a full memory barrier.
Stores can't cross it, but non-dependent loads of different locations
can back up over it. That's pretty close to a full barrier, but it
isn't, quite. Now, I don't see why that should really cause any
problem, at least for common cases like LWLockAcquire(). If the CPU
prefetches the data protected by the lwlock after we know we've got
the lock before we've actually released the spinlock and returned from
LWLockAcquire(), that should be fine, even good (for performance).
The real problem with being squiffy here is that it's not clear how
weak we can make the fence instruction on weakly ordered architectures
that support multiple types. Right now we're pretty conservative, but
I think that may be costing us. I might be wrong; more research is
needed here; but I think that we should at least start to get our head
about what semantics we actually need.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company