AMD’s “fusion” approach has been pushed as strategy for the past few years, however, until now, the concept has been somewhat nebulous. That is, there wasn’t really anything ground-breaking new that would justify the “the future is fusion” motto, but of course the same line can also be interpreted to mean that fusion was really a thing of the future. Arguably, AMD has been trail-blazing some of the features on new CPUs that nowadays are essentially granted. For example, take the integrated memory controller, which in the case of Intel’s Timna never made it into production and which really fused the CPU proper with the North Bridge into a single monolithic design.

Needless to say, though, that the CPU architecture still followed the x86 concept with the addition of the floating point co-processor in a single, integrated design as it has prevailed since the days of the 80486 or the K5 processors, regardless of whether there were some difference in the ratio of integer vs. floating point units. In other words, there hasn’t been anything spectacular that would increase the efficiency of the CPU as we know it on the basis of architectural / conceptual changes.

One thing we have criticized over the last – I don’t even remember how many – years was the resilience of AMD to embrace the concept of virtual cores, enabled through duplication of the architectural state at essentially no die size penalty and known from Intel as HyperThreading or Symmetric HyperThreading.

Based on nothing but better thread management through seamlessly interleaving workloads, the performance improvements, especially in heterogeneous workloads were in the double-digit percentiles on average. Even if there were some scenarios where the increased overhead caused a small penalty, especially in system configurations with limited physical memory space, there has been preciously little doubt in our mind that AMD was missing out on a huge opportunity to shorten the performance gap to Intel’s top processors by stubbornly clinging to the single core / single thread architecture.

To give credit where credit is due, though, the kind of changes and adjustments we are talking about do not come overnight and it does take a certain vision to plan ahead and figure out the right direction years ahead of production releases. To give an example, we dusted up an AMD slide from 2006 illustrating where things were planned to go in 2008 and beyond. By the end of this short article we’ll have a similar graph with just a tad more detail and maturity, otherwise, it will be obvious that the predictions were not a dead end, even though one major thing has not changed, namely, the “future” aspect of the architectural concept of fusion.

Back to the subject at hand, namely a pretty radical turn from the classic configuration of a CPU, using shared components in a MultiCore architecture on the other side to maximize computing efficiency. Instead of relying on MultiThreading, the new concept is called Core Multi Processing or CMP. CMP means that based on modeling only the portions of the processor are duplicated that are typically over-utilized, whereas those parts that are used more infrequently are shared between the "dedicated structures" making up the "cores" in the new terminology. Essentially, the concept goes back to the days of the '386 where cores were only integer units and floating point operations needed a dedicated 387 co-processor.

Accordingly, only the integer units are duplicated to provide dual "cores" whereas the floating point units are shared between cores. The result is a very minor die size penalty while still doubling the number of cores. The first CPU to follow this design is Bulldozer, geared towards re-conquering the performance crown, particularly relative to power consumption.
In a completely different approach, the second new design, codenamed Bobcat remains single-threaded and is geared towards ultra low power consumption and small die size. Both designs are currently in the “design state”, that is, the individual building blocks are available in form of libraries that can be used to synthesize a complete CPU depending on the target market / application with the highest degree of freedom possible between different embodiments.