On Fri, Jul 06, 2007 at 10:50:30AM -0700, Li, Tong N wrote:> > Also cache misses in this situation tend to be much more than 48> cycles> > (even an K8 with integrated memory controller with fastest DIMMs is> > slower than that) Mathieu probably measured an L2 miss, not a load ^^^^^^^

I meant L2 cache hit of course

> from> > RAM.> > Load from RAM can be hundreds of ns in the worst case.> > > > The 48 cycles sounds to me like a memory load in an unloaded system, but> it is quite low. I wonder how it was measured...

I found that memory latency is difficult to measure in modern x86CPUs because they have very clever prefetchers that can oftenoutwit benchmarks.

Another trap on P4 is that RDTSC is actually quite slow and synchronizesthe CPU; that can add large measurement errors.