Analysis: Intel Haswell-E, DDR4 and X99 in detail

02 September 2014

By Wesley Fick

Intel’s Haswell-E family launched last week and things have been heating up in discussion threads on the internet and in videos on Youtube. Haswell-E is one of the most anticipated launches for the high-end desktop crowd and a lot of people who need the extra horsepower on offer are eagerly watching for benchmarks that finally push them to checking out their parts they’ve gathered into their shopping carts. Not only is Intel changing the game as far as core counts are concerned, they’re also offering completely new board logic, new functionality that wasn’t possible before and compatibility with DDR4 memory, also a brand new addition to the market. Lets dive into Haswell-E and look at why you’d want it and what it’s going to cost you.

What’s different to Haswell and older E-series chips?

Firstly, Haswell-E is to Haswell as Ivy Bridge/Sandy Bridge-E were to their regular desktop counter parts. What essentially goes on here is Intel ports over their desktop architecture to the new socket and platform, swapping out a few bits here and there to make things work properly. That’s a crude way of looking at it, but it’s essentially what fundamentally changes between the two platforms – the graphics hardware gets cut out and the memory controller gets swapped for a quad-channel DDR4 version. Intel also does a lot of reworking and tweaking of the architecture to accommodate higher core counts and they make some very drastic changes to how the chips handle TDPs and ECC memory.

In past years the E-seres desktop processors would be derived from the Xeon family, with disabled features, cores and cache to help differentiate it properly from the Xeons. This is the first time that Intel is spinning up a completely separate line for the HEDT platform – starting with the Core i7-5960X down to the i7-5820K, the chips share a native eight-core die that isn’t based on a ten or twelve-core Xeon chip which has features chopped out from the binning process.

This puts Haswell-E into a completely different class of performance. The Core i7-5960X ships with all eight cores enabled with hyper-threading along with a staggering 20MB of L3 cache, clocked at 3.0GHz with a boost speed of 3.5GHz and a TDP of 140W. It’s a monster, but it beats down many other chips that came before it in terms of per-clock performance and thread counts. Let’s get that out of the way before we go further – you’re going to see the i7-5960X perform on average around 5% better than the i7-4960X in many benchmarks (mostly ones that aren’t well-threaded). It has a clock speed deficiency compared to the older Ivy Bridge-E chip and also has to deal with a lot more heat.

But that’s the fun part – all eight cores at 3.0GHz match, at the very least, a six-core, twelve-threaded chip that has a stock speed of 3.6GHz boosting to 4.0GHz. That means that Intel has not only improved IPC, it’s also made Haswell-E massively more efficient at the same tasks with a lower clock speed. Out of the box it has a small performance lead, but once overclocked it’ll have a massive one when it comes to multi-threaded applications.

Pricing is another consideration that Intel took a hard look at when figuring out how to handle the Haswell-E launch. Although it would be possible to start the pricing for the Core i7-5820K above the Core i7-4820K considering you’re getting a lot more bang for your buck, Intel actually decided to make the X99 platform cheaper to get into. The Core i7-5820K kicks off at US $389 which is pretty good for tray pricing. It starts off with six cores and twelve threads, a base clock of 3.3GHz boosting to 3.6GHz, 15MB of L3 cache, an unlocked multiplier and DDR4 compatibility.

The funny bit about the Core i7-5820K is that if you overclock it only by 300MHz for the base and boost clocks, it’ll be faster than the Core i7-4930K and it’ll even match the Core i7-4960X quite easily. There’s nothing like the smell of a chip that costs under $400 that can easily match the flagship product of yesteryear with just one simple multiplier tweak. You don’t even have to change any voltages to achieve a 300MHz overclock.

Next up is the Core i7-5930K with six cores, twelve threads, 3.5GHz base clocks boosting to 3.7GHz and the same 15MB of cache. It sells for almost a $200 premium, but has some features and functionality that would be worth consideration for the extra spend. The lineup ends with the $999 Core i7-5960X with all the bells and whistles of the Haswell-E platform. There’s really no denying that it is a monster and there’s nothing, absolutely zilch out there, from AMD that can remotely compete with it. But there are caveats to the platform that need to be considered.

What’s the deal with X99 and the PCI-E segmentation?

Well, that’s where things change the most and people need to pay special attention. The X99 platform comes with the new LGA2011-3 socket, which has pin-out changes to the LGA2011 socket its based on. There’s no compatibility with older X79 chips, no matter how close the pinouts are to each other. This is not one of Intel’s usual tricks to get you to buy a new board, this was actually a change necessitated by the adoption of DDR4.

It’ll be the same for AMD and Intel’s upcoming Skylake platform as well. It’s a quad-channel DDR4 setup, with the option for manufacturers to put up to eight DIMM slots on the motherboard for a total of 128GB of DDR4 memory. The 16GB DDR4 modules are still in the validation phase, but they’re coming for sure.

No, what’s really going to be a problem for some people is the PCI-Express layout and Intel’s segmentation of the product line.

With the X58 platform and Nehalem processors, you had 40 lanes of PCI-E 2.0 to play with. In X79 and Ivy Bridge, that was raised to 40 lanes of PCI-E 3.0 even if you had the lowly Core i7-4820K. But if all you were getting was similar CPU performance to the Core i7-4770K and just a few PCI-Express lanes over the Z87/Z97 platform, you may as well just stay with the cheaper offering for dual-GPU setups. Although the X79 platform had and still has a lot to offer to enthusiasts, you had to spend lots more money to see a boost in overall compute power as well.

Now with Haswell-E we get the increased core and thread count from the Core i7-5820K as well as more L3 cache, but there’s a reduction in PCI-E 3.0 lanes from 40 to just 28. That limits buyers to the following scenarios when using more than one GPU:

Triple GPU setups with all cards at x8 speeds, four PCI-E lanes left for other expansion cards

Quad-Crossfire with lanes set at 8x/8x/8x/4x

Which, honestly, is still very flexible. Only AMD’s Crossfire technology allows for the fourth card to be run on four lanes of PCI-E 3.0, while Nvidia disables this capability for SLI. This means that there’s a real, value-based incentive to not only pony up for the Core i7-5820K if you were already looking at the Z97 platform, but there’s also a reason, now a valid one, to move up to the i7-5930K as well.

Some attention needs to be paid to how motherboard vendors choose to approach this flexibility, though. Some won’t allow for four AMD GPUs in Crossfire, while others will use the extra four lanes that are available in dual and triple-GPU setups to implement some other interesting features, like on-board M.2 connectors or a built-in Thunderbolt 2 port.

There are even some motherboards that are geared for and expected to work with a chip that has all 40 lanes of PCI-E 3.0 available. You can see in the motherboard manuals available online how the manufacturer approaches the PCI-Express layout so that you’re not short-changed. At the very least, all X99 motherboards should be able to to a triple GPU setup with four lanes left for whatever you want.

One little note about DDR4

Although Intel has validated the Core i7-5820K, i7-5930K and i7-5960X for use with DDR4-2133 memory, it officially supports most DDR4 speeds up to and including DDR4-2666. However, anything higher than that is unofficially supported.

The reason for this is because with DDR4 still in infancy, the 3000MHz modules require overclocking the system bus to achieve those frequencies. This means that any system that needs to be completely stable or stock for use with professional applications and such would be better off with DDR4-2666 RAM, because that would be a selectable speed from the list of XMP-set speeds inside the BIOS.

It’s even stated in many motherboard manuals that inserting DDR4-3000 modules will cause the UEFI BIOS to automatically overclock the BLCK to support the memory speed. As time goes by and the memory chips get cheaper and have tighter timings perhaps the motherboard vendors will abandon this method of ensuring that you get the speed you were promised.

It’s really not bad. With launch prices as they are now it’s going to be a little more expensive to land a good set of DDR4 memory, particularly because this is only the first week and a bit out of launch and there won’t be a lot of stock to go around. The majority of people will be aiming for the Core i7-5820K because it isn’t horribly crippled, together with the MSI X99 SLI Plus, currently the cheapest X99 motherboard available in South Africa.

DDR4 prices seem to be mostly okay for a quad-channel 16GB kit, but I expect these to drop around R500 to R800 in price between now and the end of the year. The reason for Intel putting DDR4 in the high-end first is because they want to see increases in DDR4 chip production and improvements in the chip’s performance before they move on to Skylake, which will be the consumer offering with DDR4 compatibility. So if you’re buying into X99 today you’re a guinea pig for Intel, but that shouldn’t impact performance much.

For the moment, there’s no reason to invest in stupidly high-priced DDR3-3000 modules because the technology is still so brand new and there’s that system bus overclocking thing you need to keep in mind. I’d recommend that people stick with DDR4-2133 or 2400MHz modules for the moment. These will still be reasonably affordable and you can always upgrade to something better in a year’s time once things are more solid. I’d also bet on the fact that many of the DDR3-3000 chips are overclocked from 2400MHz or 2666MHz, so most 2133MHz kits could overclock to higher speeds without much fiddling.

So all in all, the cheapest processor with the cheapest motherboard and reasonably cheap memory sets you back R10, 931. Not too shabby.

Gaming performance is there, but not always

One of the difficulties in talking about game performance with any chip more expensive than the Core i7-4790K is that people automatically assume frame rates will just be higher, but there’s more to it than that once you go over four cores or eight threads. There are a lot of games out there that just don’t scale at all with more cores. There are a lot more that scale but do so poorly, or scale because there’s more clock speed and multithreading wasn’t on the mind of the developer when making the game.

For the most part the benefits of the Core i7-5820K over the outgoing Core i7-4820K are lower frame times with multiple GPUs, smoother frame delivery at higher resolutions and better resource allocation for games that can take advantage of more than four cores at the same time.

In PC Perspective’s tests starting with Battlefield 4 (albiet in single-player) show that the game doesn’t really scale well beyond four cores and eight threads, even on triple SLI. A hardware bug prevented them from running their Crossfire tests, so that’s why AMD wasn’t included in their review. Although the Core i7-5960X does pull ahead very slightly in some parts of the benchmark, in others the low clock speed counts against it.

I’d wager that the very high frame variance is because Battlefield 4 doesn’t play well with triple-GPU setups, a common issue with the Frostbite 3.0-powered multiplayer shooter. They’re pretty much GPU-limited at this point and only more clock speed might help the case of the Haswell-E processors.

Crysis 3 has a similar problem, only this time there’s a big difference in performance. Because the Core i7-5960X has a TDP of 140W and eight cores at 3.0GHz, it can only boost up to 3.5GHz at the most. Compared to the Core i7-3960X, however, it has higher per-clock performance and is able to keep its boost clock relatively high during demanding scenes.

The only explanation for the hiccup closer to the beginning of the benchmark is the Core i7-3960X somehow throttling itself to stay within its 130W TDP.

Crysis 3 really just likes clock speed, which is why the Core i7-4790K walks away with the overall win. That’ll probably be a trend for any other games based on CryEngine, one of which is Chris Roberts’ Star Citizen. Overall, though, the benchmark remains GPU-limited.

GRID 2 is the first out of PC Perspective’s tests to suggest that there’s any reason to go with the Core i7-5960X. Although the difference is small, average and minimum framerates are higher on the Haswell-E chip to the tune of around 10%, which is a good achievement.

That lead will increase in games that can take advantage of more cores, but I doubt there’ll be anything coming out soon that will give a Core i7-5960X reason to struggle.

When a game isn’t GPU-limited all the time, the effects of the lower clock speed and higher TDPs compared to Haswell on the Z97 platform are pronounced. Anandtech’s testing showed that at the high-end of the scale, not much separates Bioshock Infinite‘s scores across CPUs when the load’s all on the GPU. Even Intel’s Pentium G3258 does well here with a 55W TDP , two Haswell-based cores, 3MB of L2 cache and a hefty overclock of 4.7GHz.

It falls down for Haswell-E and a lot of other high-end chips when the workloads increase, however. Minimum FPS with most of the modern chips on LGA2011 or 2011-3 struggle to post frame rates higher than 20fps, while all get a resounding beating from the Pentium G3258, which is clearly unfettered by its TDP. This will be an issue with any Unreal Engine 3-based title that doesn’t require more than two fast cores and that includes others like Batman: Arkham Origins and Borderlands: The Pre-Sequel.

Conclusion

There’s a few things to keep note of here. One is that Haswell-E is the best HEDT platform Intel has ever made. It’s very flexible, comes with a boat-load more features and improvements over X79 and is reasonably affordable to jump into, even if DDR4 is the main culprit in the higher prices. If you need the extra flexibility and don’t want to be limited to just 16 PCI-E 3.0 lanes on mainstream hardware, look no further.

However, there’s noticeable GPU bottlenecking with modern games today and we need faster GPUs to see any benefits from the extra cores. We also need games that stress out the hardware more, but they’re on the way. The Witcher 3 and Star Citizen will certainly be strong candidates for bringing computers to their virtual knees all around the world.

There’s also now a persistent issue across three generations of HEDT chips that show a massive drop in performance when the TDP is completely maxed out. That results in big performance dips on the Core i7-5960X because of the lower clock speed that it starts with. When the going gets tough, there’s a very good chance that a lowly Core i3-4330 will deliver better minimum frame rates because it’s not being asked to nearly melt itself while playing a game. Intel needs to work on getting that problem sorted out asap.

Outside the scope of gaming, Haswell-E changes the game significantly for Intel and AMD. Having a chip capable of addressing sixteen threads is no laughing matter and previously you’d have to be running two or more Xeon processors to get the same core count. It doesn’t stop there either – the Xeon family goes all the way up to a twelve-core, 24-thread behemoth. That is ludicrous amounts of computational power and it’s something that AMD can’t possibly compete with at this point.

My final recommendation is that if you were thinking of setting up a triple-GPU rig, Haswell-E is the platform for you. It’s more flexible and doesn’t involve expensive PLX chips to be added into the BoM for your motherboard. If you’re running quad SLI or Crossfire with two GPUs, Haswell-E is again a better platform because you won’t be limited by CPU performance, but you’ll need to splurge for the Core i7-5930K for that to happen.

When it comes to single-GPU systems and those of you planning to run plain SLI or Crossfore, the regular Haswell and Z97 platform is the place to be. You’d have to go without DDR4, but that’s not much of a drawback for now. Perhaps it might become a necessity in two year’s time, but not now.