Brisbane Performance Issues Demystified: Higher Latencies to Blame

As you'll remember from Part 1, for some reason, our 65nm Athlon 64 X2 5000+ performed slower than our 90nm part. We had contacted AMD before publication of the article but didn't receive a response until after we were well underway with Part 2. AMD's explanation for the reduced performance? Higher memory latencies.

We wanted to investigate exactly how much higher, thus we turned to CPU-Z's latency benchmark to give us a quick indication of how things had changed.

CPU

CPU-Z Latency (8192KB, 128-byte)

AMD Athlon 64 X2 5000+ (65nm)

122 cycles (46.92 ns)

AMD Athlon 64 X2 5000+ (90nm)

121 cycles (46.54 ns)

A single cycle increase in memory access latency, or 0.4ns, is a slight increase but not enough to cause the sort of performance deltas we saw in Quake 4 and Half Life 2, something else was amiss. Luckily it was another metric that CPU-Z's latency test reported that helped us understand the cause of the poor performance: L2 cache access latency.

CPU

CPU-Z L2 Cache Latency

ScienceMark 2.0 L2 Cache Latency

AMD Athlon 64 X2 5000+ (65nm)

20 cycles

20 cycles

AMD Athlon 64 X2 5000+ (90nm)

12 cycles

12 cycles

Updated - 1/5/07: Although AMD previously did not mention any issues with our findings, we were contacted today and informed that the latency information both ScienceMark and CPU-Z produced is incorrect. The Brisbane core's L2 latency should be 14 cycles, up from 12 cycles and not 20 cycles. This would help explain the relatively low impact on application performance that we've seen across the board. We are still waiting to hear back from AMD on a handful of other issues regarding Brisbane and will update you as soon as we have more information.

The original K8 core, in both 130nm and 90nm flavors, had a 12-cycle L2 cache. With Brisbane, as reported by both CPU-Z and ScienceMark, 65nm K8 now has a 20-cycle L2 cache. Generally speaking you move to a higher latency cache if you're planning on introducing a larger cache size, but a quick glance at AMD's roadmaps doesn't show anything larger than a 1MB L2 per core for the next year. The argument for higher clock speeds isn't valid either as the highest clock speed on AMD's roadmaps thus far is only 3.2GHz.

Luckily the performance impact of the higher latency L2 cache isn't noticeable in all applications, thanks to the K8's on-die memory controller, but make no mistake - the new core is slower. We couldn't figure out why AMD made the change and with most of our key AMD contacts on vacation due to the holidays, we still have no official response on the matter. Rest assured that if/when we learn more we will let you know.

Updated: AMD has given us the official confirmation that L2 cache latencies have increased, and that it purposefully did so in order to allow for the possibility of moving to larger cache sizes in future parts. AMD stressed that this wasn't a pre-announcement of larger cache parts to come, but rather a preparation should the need be there to move to a vastly larger L2. Thankfully the performance delta isn't huge, at least in the benchmarks that we saw, so AMD's decision isn't too painful - especially as it comes with the benefit of a cooler running core that draws less power; ideally we'd like the best of all worlds but we'll take what we can get. Note that none of AMD's current roadmaps show any larger L2 parts (other than the usual 2x1MB offerings), which tells us one of two things: either AMD has some larger L2 parts that it's planning on releasing or AMD is being completely honest with the public in saying that the larger L2 parts will only be released if necessary.