AMD Athlon 64 X2 4800+ Anew: AMD Masters 65nm Technology. Page 4

Without much noise AMD has begun to ship processors based on the 65nm Brisbane core. Does this new core make the company any more competitive against Intel’s Core 2 Duo? Let’s find out now from our new article!

Slower than Windsor? Brisbane in Synthetic Benchmarks

Same-frequency CPUs on the Brisbane and Windsor cores shouldn’t differ much in performance as they incorporate the same number of transistors and, as AMD claims, don’t differ at all in their micro-architecture. However, there are publications on the Web that argue that point. Let’s check it out.

First, we compared the speed of the computing units of processors on the Brisbane and Windsor cores (the latter had a total of 1MB of L2 cache). For the comparison to be correct, we set the clock rate of both CPUs at 2.4GHz. We tested them in the CPU benchmarks from the SiSoftware Sandra XI suite.

Brisbane 2.4GHz

Windsor 2.4GHz

Sandra XI, Arithmetic ALU

17489

17480

Sandra XI, Arithmetic SSE3

14786

14788

Sandra XI, Multi-Media Integer MMX/SSE

44863

44897

Sandra XI, Multi-Media Floating-Point SSE2

49339

49335

The main computing units of the two CPUs indeed provide the same performance. The difference fits within the measurement error range.

The CPU benchmarks from SiSoftware Sandra XI do not depend on the memory subsystem speed. They are indicative of the “pure” performance of a CPU. But in real-life applications the speed at which the CPU is receiving data from memory has an effect on performance, too. So, we measured the bandwidth and latency of system memory as well as the latency of the L2 cache. We installed dual-channel DDR2-800 memory with timings of 4-4-4-12-1T for this test.

Brisbane 2.4GHz

Windsor 2.4GHz

Sandra XI, L2 Cache Latency, clk

22.7

17.7

Sandra XI, Memory Bandwidth, MB/s

8351

8675

Sandra XI, Memory Latency, ns

107

92

Here’s a strange surprise to you. The L2 cache of CPUs on the new 65nm Brisbane core has higher latency and this increases the latency of the memory subsystem in general. And this ultimately reduces the overall memory subsystem bandwidth.

The discouraging results produced by SiSoftware Sandra XI are confirmed by other synthetic benchmarks of the memory subsystem.

Brisbane 2.4GHz

Windsor 2.4GHz

CPU-Z, L2 Cache Latency, clk

20

12

CPU-Z, Memoy Latency, clk

115

108

ScienceMark 2.0, L2 Cache Latency, clk

20

13

ScienceMark 2.0, Memory Latency, clk

114

106

ScienceMark 2.0, Memory Bandwidth, MB/s

7619

8202

EVEREST 2006, Memory Read, MB/s

7816

8044

EVEREST 2006, Memory Write, MB/s

6833

6932

EVEREST 2006, Memory Copy, MB/s

7914

8152

EVEREST 2006, Memory Latency, ns

51

48.7

So, there can’t be any doubt: the L2 cache has become slower in the new core for Athlon 64 X2 CPUs. As a result, the new CPUs work slower with data in memory than their 90nm predecessors, which may show up in real-life applications, too. But the organization of the L2 cache hasn’t changed: 16-way associativity with a line length of 64 bytes.

So, we should seek elsewhere for the root of the problem. AMD has commented that the latency of the L2 cache has increased as a consequence of the engineers having left a reserve for enlarging the cache in the future. This doesn’t sound convincing to us, however. First, AMD’s plans don’t contain any information about enlarging the L2 cache even on the transition to the K8L micro-architecture. Second, Windsor-core CPUs with a 2x1MB L2 cache do not differ in cache performance from their counterparts that are equipped with a 2x512KB L2 cache. So, it is not yet clear to us why the speed parameters of the cache have changed.

The increased latency of the cache memory is not the only problem that can have a negative effect on performance of Brisbane-core CPUs. Another problem is about the fractional CPU frequency multipliers – the real memory frequency has been reduced in some modes because the default CPU frequencies now change with a step of 100MHz. In CPUs with the K8 micro-architecture the memory frequency is actually based on the CPU frequency and an integer divider. We’ve met this problem before, but it has grown worse with the new CPUs. To illustrate our point, here is a table that shows the real memory frequency in the different modes of the memory controller integrated into the CPU.

Processor frequency, MHz

2000

2100

2200

2300

2400

2500

2600

2800

DDR2-800

800

700

733

767

800

714

743

800

DDR2-667

667

600

629

657

600

625

650

622

DDR2-533

500

525

489

511

533

500

520

509

That’s not a catastrophe, of course. There are very poor memory modes with CPUs that have integer frequency multipliers, too. Yet you should be aware of this thing because the real memory frequency is often much lower than the expected one with CPUs that have fractional multipliers. This has a negative effect on the overall system performance, of course.