Get your system working to a functional level with coventional blocking queues. If you have a performance issue, you MAY find that you have to look at some different queue class. You're only going to be queueing pointers, yes? You have to try quite hard to get contention when the queue is only locked for push/pop one pointer time. Overall, your app performance will probably be blown by the CPU-loops, as others have warned. Likewise CPU-affinity bodges. Would your time not be better spent on 'real' app code?
–
Martin JamesJul 7 '12 at 18:29

2 Answers
2

with POSIX threads you only have data coherence between threads if you use mutexes, locks etc. And the coherence has no well defined interface with your compiler. (and volatile definitively isn't it) Don't do it like that, all things can happen, as updates of variables that are optimized out (here volatile could help) or partial reads or writes.

C11, the new C standard has a threading model that includes a data coherence model, thread creation functions and atomic operations. There is no compiler that implements this completely, it seems, but gcc or clang on top of POSIX threads implement the feature that you need. If you'd like to try this out and be future proof, P99 implements wrappers for these platforms that allow you to use the new C11 interfaces.

C11's _Atomic types and operations would be correct tools to implement lock free queues that operate between threads.

In C, the volatile keyword has no defined semantics that apply when a variable is accessed concurrently in multiple threads (and pthreads doesn't add any). So the only way to know if it's safe or not is to look at the effects volatile has on particular platforms and compilers, figure out every possible way it could go wrong on those specific hardware platforms, and rule them out.

It's a really bad idea to do this if you have a choice. Portable code tends to be much more reliable. The two big problems are:

New platforms do come out. And fragile code can break when a new CPU, compiler, or library is released.

It's very hard to think of every way this could go wrong because you don't really know what you're working with. Mutexes, atomic operations, and the like have precisely-defined semantics for multiple threads, so you know exactly what guarantees you have -- on any platform, with any compiler, with any hardware.

Your reader code is terrible by the way. On hyper-threaded CPUs, for example, tightly spinning like that will starve the other virtual core. Worse, you could wind up spinning at FSB speed, starving other physical cores. And when you exit the spin loop -- the time performance is most critical -- you basically force a mispredicted branch! (The exact effects depends on the specifics of the CPU, which is another reason using this kind of code is bad. You need at least a rep nop.)