Curious benchmarks

Interesting. I'm playing with pthreads a bit, so I've got a simple loop that increments a variable about a billion times, once just flat out, once locking around the increment. Linux laptop, Intel Core Duo T2450 at 2.00GHz, 3.6 seconds without locks, 58.8 seconds with locks. Mac Pro, 2 Dual-Core Intel Xeons at 2.66GHz, 2.7 seconds without locks, 62.1 seconds with locks.

Stupid benchmark, not significant in any way, except that it often feels that the Linux laptop is way snappier than the Mac desktop (and the Mac laptop that's in the shop right now), if the overhead of the OS and system library implementations is chewing up that additional CPU speed that may explain a lot...

One significant hardware difference: I suspect the dual processor machine has to get the lock written out
to main memory while the T2450 only has to get it as far as the level 2 cache... or the linux pthread lock is
just faster.

I'd expect that there'd be no particular need for the processor local cache to get flushed, so I chalked it up to the Linux pthread lock taking roughly 8 units to 11 units OS/X (where a unit is one iteration of the while (abc < 999999999) abc++; loop).

Edit: Huh, duh, I was just being obtuse there, I need to go look at how multi-processor Xeon boxes communicate internal cache dirty status to each other.