Intel Introduces Xeon E7v2 with Massively Improved Performance

On Tuesday Intel introduced the next iteration of the Xeon E7 family targeted that the enterprise computing segment. As we discussed in our silicon technology outlook, this chip was already ripe for introduction. We took the time to discuss this family of CPUs in detail in the following paragraphs.

Due to the longer product cycles in this segment, Intel skipped the tock and moved straight to the next tick. This describes what the jump from the 32nm Westmere-EX to the 22nm Ivy Bridge-EX basically is. The chip is also known as Ivytown since Intel discussed its design at the recent ISSCC conference. As a consequence the chip brings a whole array of new technologies into that segment that were previously only available to clients and smaller 1-2S servers. We are talking about the Sandy Bridge architectural enhancements like IPC improvements, the AVX instruction set extension and improved virtualization to name a few.

The CPU which will be called Xeon E7 v2 series onward features up to 15 cores, each capable of HyperThreading (except the E7-8857v2 SKU which is aimed at HPC). The CPUs come with a massive L3 cache that is up to 37.5MB in size. The CPUs fit in the LGA2011 socket, which is a departure from the previous LGA1567. Intel managed to increase the core count from 10 to 15 for the Xeon E7 line. The maximum TDP of those chips also rises from 130W to 155W. However this allows to run at much higher turbo frequencies, which is an added benefit to the much improved boosting behaviour versus Westmere.

The die is organized in three columns with 5 cores each including the adjacent 2.5MB L3 cache slice. This organization was designed with lesser core variants in mind. The fully fledged 15 core silicon consists of 4.31 billion transistors that are packed into a 541 mm² die. The 10-core variant simply omits the third column including its caches and associated I/O blocks and thus allows for a reduction to 2.89 billion transistors at a 341 mm² die size. Finally a separate 6-core variant omits the second and fourth row of the die, comes in at 1.86 billion transistors occupying an area of 289mm². Any SKUs with a core count in between are made using the next bigger die and having the additional cores, cache and I/O blocks fused off.

Aside from the new microarchitecture, increased core count and higher clock frequencies, the Ivy Bridge-EX based Xeon E7 line comes with an improved memory subsystem. The chip now supports up to 1.5TB of RAM per socket, which means a 4S configuration tops out at 6TB and a 8S system at 12TB. This is a 3x improvement over the previous generation which is enabled by supporting more DIMMs (16 vs 24 per socket) at a larger capacity (32GB vs 64GB). But there is more to the memory subsystem. The memory is connected via a memory extension buffer that supports 2 channels. Based on the 4 memory channels Ivy Bridge-EX supports, this allows either quad-channel operation at 1600 MT/s or octa-channel operation at 1333MT/s (2667MT/s quad-channel on the CPU side).

Improved I/O Capabilities

The CPU comes with three QPI links at up to 8GT/s speeds, whereas Westmere-EX had only 6.4GT/s maximum. This allows direct connection of all sockets in a 4S system and a maximum of three hops in an 8S setup. The QPI links are also utilized more efficiently using a home snoop protocol. While this would go beyond the scope of this article, it basically reduces the number of communications when a CPU asks for data that is neither in its own cache nor local RAM and thus provides better scalability at a minor increase in latency versus source snoop which is used in Westmere-EX. This scheme is also more efficient when the sockets are not fully connected, which happens in the 8S configuration.

The Xeon E7 v2 also finally introduces PCI Express 3.0 to the enterprise server segment. Each CPU comes with 32 PCIe lanes that can be flexibly configured. For example the lanes can be used for two full x16 interfaces or four x8 or a combination of two x8 and four x4 and so on. That definitely improves the I/O capabilities tremendously. Intel also claims that apart from these brute force improvements, they reduced the latencies and improved the direct PCIe to PCIe bandwidth. Another four PCIe 2.0 lanes are available for the DMI interface connecting the CPU to the chipset. Speaking of which, it is a variant of the Patsburg chipset called C602J which has been around for a while.

Performance

Depending on the application Intel touts a 1.7x ? 2.4x performance increase over the previous generation. Due to wider vector instructions (AVX), in pure matrix multiplication, the difference is a whopping 3.5x. These comparison figures are based on the performance of the Xeon E7-4870 and the Xeon E7-4890v2 using 50% more cores and running at a slightly higher frequency. The new model also comes at a higher maximum TDP of 155W vs the 130W of the Westmere-generation chip.RAS features

A distinguishing feature versus the lower Xeon E5 offerings is not only support for very large memories but also a rich set of RAS features. This generation introduces Intel RunSure technology which is an umbrella moniker that covers methods to improve reliability of RAM and the platform. For example the CPU is capable of dynamic memory migration, hot plugging of memory boards and memory mirroring as well as execution path recovery. This is combined with extensive diagnosis features to detect potential problems early and react accordingly. These features are important to hit the 99.99% of uptime that is demanded in that segment.

Target Segments

Intel primarily aims at replacing older x86 systems as well as take additional market share from RISC machines that faced a continuous decline in the last decade. IBM and Oracle seem to make it too easy for Intel, as Intel touts large double digit performance advantages at again large double digit lower prices. While we are aware vendors often cherry-pick numbers to obtain the largest possible percentages, the advantages of the Intel platform are so big they can’t be discounted easily. Do note that these comparisons are based on SPECint_rate_base2006 numbers, which may or may not translate well to other workloads as well as whole system prices.