Near-threshold voltage computing extends the voltage scaling associated with Moore’s Law and dramatically improves power and energy efficiency. The technology is superb for throughput, at the cost of latency, and best suited to Intel’s products for HPC and mobile graphics.

New compute efficiency data shows GPUs with a clear edge over CPUs, but the gap is narrowing as CPUs adopt wide vectors (e.g. AVX). Surprisingly, a throughput CPU is the most energy efficient processor, offering hope for future architectures. Our data also shows some advantages of AMD’s Bulldozer, and the overhead associated with highly scalable server CPUs.

In our Sandy Bridge-EP and Romley platform review, we look at the performance and power efficiency gains for Intel’s latest server microprocessor on industry standard benchmarks including SPECcpu2006 and SPECpower_ssj2008. The results are impressive, Sandy Bridge-EP is clearly the best x86 server processor on the market, and Romley will be the platform of choice for the next 2 years.

Sandy Bridge-EP is the first major overhaul for Intel servers since 2009, and nearly ever aspect has been enhanced. The processor pairs 8 cores with a large last level cache, DDR3 memory controller, QPI 1.1, integrated PCI-E and power management. This article provides an overview of the major features, including new I/O optimization and power capping techniques and discusses the expected impact.

For 4 years, Intel has struggled to move into the market for mobile devices. Conventional wisdom holds that x86 is too inefficient for smart phones. The recently announced 32nm Medfield proves that x86 is a viable option and that Intel can design smart phone products. We explore the Medfield SoC and analyze the impact on Intel’s mobile strategy.

Over a decade, Itanium scaled down to 65nm re-using the same basic design. The new 32nm Poulson architecture moves from static VLIW to a more conventional pipeline. It has a new core with dynamic scheduling, fine-grained multithreading and a shared L3 cache. The net result is a vastly more efficient microprocessor that should achieve 2.5-2.8X higher performance and power high-end servers for the next 10 years.

Intel’s Sandy Bridge ISSCC paper discusses a number of challenges they will eventually impact most vendors. The novel architectural choices and circuit design solutions that they describe give insight into current and future products from Intel, but also the general direction of the industry. The overarching theme is taking advantage of Moore’s Law at 32nm and beyond, which entails considerable attention to design complexity, process variation, power efficiency and validation.

Sandy Bridge SPECcpu2006 estimates are finally available. The data show per-core performance increased by 30% or more compared to the fastest Westmere design. We analyze the performance numbers for Intel’s newest microarchitecture and estimate gains of 12% for multi-threading on integer workloads. We also show high sensitivity for integer performance to frequency and much more limited response for floating point workloads. Last, we assess the implications for AMD to match Sandy Bridge’s performance for both throughput and single threaded workloads.

In the last decade, the Itanium architecture has quietly progressed and achieved a measure of success in the high-end server market. Yet it has never lived up to the initial expectations and supplanted x86, leading many to wonder whether Intel would eventually abandon the architecture. The recently released ISSCC 2011 advanced program contains a paper describing Poulson, the next generation Itanium microarchitecture on Intel’s 32nm process. The title of the paper suggests that Poulson is a substantially enhanced design and that Itanium still has many years of life ahead. This article explores two microarchitectural possibilities for the new Poulson core.

At IDF, Intel revealed the future Sandy Bridge microprocessor. It is an entirely new design – a synthesis of Nehalem, ideas from the Pentium 4 and a new Gen 6 graphics architecture. The result is a novel microprocessor, GPU and system infrastructure tightly integrated into a 32nm chip. This report details Sandy Bridge’s microarchitecture including the uop cache, AVX, memory pipelines, ring-based L3 cache and Turbo Boost, concluding with the expected performance relative to AMD’s Bulldozer.