AMD’s K8L and 4×4 Preview

News from AMD’s Analyst Day

Twice each year, AMD hosts an analyst day. The spring event, which took place at the Sunnyvale headquarters, is more technically oriented and tends to deal with the actual details of the company’s products and technology itself rather than financial performance and metrics. The event itself was relatively low key and included speakers from AMD and a few key partners, such as Sun, VMware, Rackable and video presentations from Microsoft and Alienware nee Dell. There was quite a bit of interesting information presented, but what really seemed to be worth dwelling on were the new revelations about the K8L and the 4×4 gaming systems.

Just as an aside, several architects at AMD expressed puzzlement at the origins of the K8L name. There is an internal engineering code name for the project, but the marketing team is slightly behind and has yet to provide something catchy for the rest of the world. At least one architect at AMD indicated a preference for the name K8++, but it seems unlikely that anyone in marketing or PR would share this point of view.

Do you Have Change for Some Cache?

Previously, Chuck Moore had described several incremental enhancements in the K8L at the Spring Processor Forum. The instruction fetch unit now includes an indirect branch predictor and fetches 32 bytes per cycle. The FP and SSE units have all been widened to 128 bits, as have the memory pipes. The load/store units also have somewhat more flexible execution; they can re-order loads with respect to other loads (although loads cannot move around stores). Physical and virtual addressing is expanded to 48 bits, and the page tables have been augmented as well. The page tables now support nesting for virtualization, and include 1GB pages. On the power side, the cores and system functionality will have separate power planes and independent C and P states. These are not all of the changes, but most of the key elements.

Figure 1 – Floor Plan of K8L

The first significant disclosures regarding the K8L had to do with the cache hierarchy within a single core. Despite an erroneous rumor to the contrary at Daily Tech, the L1D and L1I caches remain at 64KB each, according to a senior architect at AMD. The floor plan of the K8L also tends to confirm that the L1 caches have not decreased in size. The K8L did experience some L2 cache shrinkage and initial parts will feature a 2MB shared L3 cache. Based on the cache sizes, the L2 cache is still exclusive of the L1 contents, and the L3 cache is certainly not inclusive (although this does not mean it is exclusive). Additionally, it is easy to deduce, based on information about the load/store units that the bus between the L1 and L2 caches has been widened to 256 bits. The L3 cache is extensible, and it seems likely that 4MB parts will come out, perhaps as a way to differentiate between low-end parts intended for 1-2 sockets, and the higher-end parts for 4-8 sockets.