Advanced Micro Devices not only took the wraps of its Hammer architecture Monday, but opened up its innards for public examination.

AMD's 64-bit Hammer family uses AMD's X86-64 instruction set, which builds upon the existing 32-bit instructions used in today's X86 CPUs. AMD executives claim that the instructions will be more easily integrated into operating system software, making the Hammer family the more popular choice.

AMD's Hammer family consists of Clawhammer, a one-and two-way part, and Sledgehammer, geared at higher-end four-and eight way systems. The chips are scheduled to sample early next year, rolling out to pilot and volume availability over the course of 2002. Over time, the 64-bit Hammer architecture will be used in servers, desktops, and even mobile PCs, according to Fred Weber, AMD's chief technical officer.

"Personally, I can't wait for a 64-bit mobile system," said Kevin Krewell, analyst with MicroDesign Resources Inc., Sunnyvale, Calif., the host firm for the show.

While the Hammer architecture has previously been disclosed, the microarchitecture has not. Weber confirmed that the processors will indeed integrate a DDR memory controller, feeding additional memory bandwidth directly to the processor, as well as an integrated HyperTransport I/O capability to gluelessly create multiprocessor systems. Hammer's integer pipeline is 12 stages long, the key to driving up clock speed.

The memory controller supports either a 64-bit or 128-bit memory interface to up to 8 DIMMs of up to DDR 333 (PC2700) speed, including support for error-correction code (ECC). The design is deceptively elegant, Weber saidadd an additional CPU, and the number of supported DIMMs doubles, as does the effective memory bandwidth.

On each processor resides up to a megabyte of level 2 cache that is two-way set-associative. The penalty to hit the level-1 cache is 2 cycles; the penalty to fetch data from DRAM is 24 cycles, he said. Branch prediction information is actually kept in the level-2 cache, using ECC data "borrowed" from the level-1 cache.

The Hammer processors can be gluelessly connected to form multiprocessor systems using a crossbar switch. The crossbar switch can route 2 "gigacommands" per second, taking advantage of the large 64 Kbyte buffers found within.

"I think it's fair to say users are more sophisticated than what we give them credit for," Weber said. "They're looking beyond pure speed to more features that add substantially to (the processor)."

Weber's presentation, a 0.5-Mbyte file, may also be found here for further review.