The Haswell Review - Intel Core i7-4770K Performance and Architecture

Haswell - A New Architecture

Thanks for stopping by our coverage of the Intel Haswell, 4th Generation Core processor and Z87 chipset release! We have a lot of different stories for you to check out and I wanted to be sure you knew about them all.

This spring has been unusually busy for us here at PC Perspective - with everything from new APU releases from AMD, new graphics cards from NVIDIA and now new desktop and mobile processors from Intel. There has never been a better time to be a technology enthusiast though some would argue that the days of the enthusiast PC builder are on the decline. Looking at the revived GPU wars and the launch of Intel's Haswell architecture, 4th Generation Core processors we couldn't disagree more.

Built on the same 22nm process technology that Ivy Bridge brought to the world, Haswell is a new architecture from Intel that really changes focus for the company towards a single homogenous design that has the ability to span wide ranging markets. From tablets to performance workstations, Haswell will soon finds its way into just about every crevasse of your technology life.

Today we focus on the desktop though - the release of the new Intel Core i7-4770K, fully unlocked, LGA1150 processor built for the Z87 chipset and DIY builders everywhere. In this review we'll discuss the architectural changes Haswell brings, the overclocking capabilities and limitations of the new design, application performance, graphics performance and quite a bit more.

While Sandy Bridge and Ivy Bridge were really derivatives of prior designs and thought processes, the Haswell design is something completely different for the company. Yes, the microarchitecture of Haswell is still very similar to Sandy Bridge (SNB), but the differences are more philosophical rather than technological.

Intel's target is a converged core: a single design that is flexible enough to be utilized in mobility devices like tablets while also scaling to the performance levels required for workstations and servers. They retain the majority of the architecture design from Sandy Bridge and Ivy Bridge including the core as well as the key features that make Intel's parts unique: HyperThreading, Intel Turbo Boost, and the ring interconnect.

The three pillars that Intel wanted to address with Haswell were performance, modularity, and power innovations. Each of these has its own key goals including improving performance of legacy code (existing), and having the ability to extract greater parallelism with less coding work for developers.

The modularity of Haswell is what gives the processor design its extreme flexibility while providing a consistent optimization path for software developers. The ability for a designer to write an application that can run (though at different feature or performance levels) across the entire array of devices that Haswell will find its way in is powerful.

Haswell (at least in this iteration) will be available in various different configurations including 2-4 processing cores, three different levels of graphics subsystem, differing idle and active power levels, interconnects, and platforms. This will greatly increase the power and performance ranges of Haswell compared to Ivy Bridge (and Sandy Bridge) and is enabled by the system agent that acts as the intermediary between all of the components on the SoC.

Intel also claims that Haswell will permit third-party IP integration, and thus will be capable of adding specific features and technologies as the OEMs demand.

Power Management

Changes to power management on Haswell address both active (in use) and sleep states in order to see the biggest alterations from previous architectures. The goal is to lower the power consumption required during CPU load while also decreasing the amount of time it takes for the entire system to enter and leave sleep states. Intel introduces a new S0ix status that it is borrowing from the ultra-mobile designs of Atom to get a 20x improvement in low power states, and allows improved realizable battery life.

Just as important as the new states themselves is that Intel claims they are completely transparent to "well written" software.

Other changes in the design address power with Haswell, including changes to Turbo Boost technology and more granular voltage and frequency "islands" for the CPU to enter. Also changed from SNB and IVB is that the frequency of the cores is decoupled from the ring bus allowing voltages to scale more gracefully to where the power is actually needed. For example, Ivy Bridge and Sandy Bridge both required power to increase on the CPU cores when the GPU needed more bandwidth on the ring interconnect for other purposes, which is a waste of valuable power.

While we talked about the idle power changes in the slide above, Intel also pointed out that at this point that is is the only CPU vendor that has complete control over its manufacturing. Intel can utilize that advantage by tweaking the process in very specific ways to meet any goals that the engineers might have.

Because the majority of Haswell designs will be completely Intel-based platforms, it makes sense for Intel to address this as well. You will see new voltage regulators and better power-managed controllers (embedded now) in addition to new IO options like I2C, SDIO and I2s that are traditionally only found in mobile devices. New link power states for traditional IO connections like USB and SATA are being introduced that can nearly drop power draw at idle to zero watts.

Haswell Microarchitecture Changes

While the Haswell design is based mainly on the architecture introduced with Sandy Bridge, there are some changes that Intel made to improve performance in the more typical fashion with an eye towards IPC (instructions per clock).

There were no changes in the key pipelines of Haswell but there were many areas that Intel said are "typical improvement points" for the company. The branch predictor has been improved as this is usually the best return on time investment from a CPU-design stand point; Intel increased the buffers on the OOO (out of order) structures in order to help improve the ability for the processor to find parallelism and take advantage of it.

Throughput also sees a boost, with 8 total ports on the reservation station with another ALU unit, another branching unit, and address store. This gives Haswell some improved metrics like two branches per cycle and two floating point MADDs per cycle – both improvements over what we saw in Sandy Bridge and Ivy Bridge processors.

New compute instructions expand on AVX, doubling both single precision and double precision FLOPs per core per cycle. Other new instructions accelerate very specific algorithms with updates for extract and deposits, bit manipulation, rotates, etc.

The cache implementation also sees interesting changes with Haswell including a doubling of the bandwidth to 32-bits wide and one L2 cache read every cycle. Seeing both L1 and L2 cache bandwidths double in a single generation without changing the organization and size of those structures is impressive, though it needs more explanation as well.

Another big upcoming change is the introduction of transactional synchronization extensions (TSX). TSX is a method to improve concurrency and multi-threadedness with as little work for the programmer as possible. By using these new ISA extensions, a developer can apply simple prefixes and suffixes to code blocks to indicate that they are independent and can be run in parallel. Hardware is then capable of managing transactional updates and restart execution if the required block isn't able to be run.

While this might be pretty specific to discuss with our audience, the implications are impressive. Increasing the parallelization of software is one of the key issues holding back innovation on many levels. We have seen the GPU vendors fight this (think CUDA) for years, and Intel's continued push into the MIC (many integrated core) markets will require it as well. If you are interested in this technology, you should check out David Kanter's detailed analysis of it.

Are there any OpenCL benchmarks forthcoming, and are there any gaming engines that will be able to utilize Haswell GPGPU + CPU cores for gaming physics while simultaneously using a descrete GPU for gaming graphics! Also, are any lucid gpu virtualization software benchmarks going to be available for Haswell within the next few months, as for desktop gaming Haswell CPUs are always going to be paired with a descrete GPU, and being able to utilize the Haswell GPU for extra gaming compute would be a great boost, short of a 6 core Haswell appearing for the desktop!

Does anyone else see the problem with having 6 SATA3 ports?
They have not changed the 20Gbit DMI 2.0 connection between the CPU and chipset, so the performance of all these ports if actually being utilized is going to be crap, how can you expect to get anywhere close to the 36Gbit that the SATA3 ports should offer (thats when your not even taking into account the other IO, such as the extra SATA3 ports that some boards offer from addon controllers, that likely use some of the pci express lanes from the chipset, its all going to be incredibly bottlenecked by the DMI 2.0 20Gbit bus connecting the CPU to the Chipset

Yawn, what a pathetic showing from Intel.
What was the point of that cringe worthy denial of stagnation next to an admission of 5% improvement?
Isn't it high time to face the music when the efforts of thousands of brilliant and highly educated people and billions in expenses yield a 5% improvement?

For GPU caps viewer, go to OpenCL tab, select the GPU device, then go to "More OpenCL information". That will display the exact list of OpenCL extensions supported. Your help will be greatly appreciated :)

Appreciate the time to write up the review Ryan, it's just a shame Intel is teasing the desktop market with empty promises and a pointless iGPU that nobody cares about. I have yet to meet someone buying an i5 or i7 for their desktop scream, "Oh man it's got this kick ass iGPU HD 4000 graphics man!"

AMD may be weak in the market, but at least they don't waste their time and effort creating an all-in-one chip with half the die being wasted adding unnecessary heat. They could start pushing 6 core chips instead into the top i5/i7 chips and use that extra space to push 8 core Extreme parts, but they don't.

It is so true, Intel's integrated GPU IP will not For the foreseeable future, be able to keep up with AMD's offerings, as all AMD would have to do is up its, current technology, integrated GPU execution resources to easily overcome any Haswell gains! AMD's next generation hUMA APUs will, leave Intel's marketing spin pros, with the hard task of putting so much more lipstick, on an overpriced integrated GPU pig! It is no wonder why Intel marketing had to come up with the ultrabook form factor, to get their Ivybridge hd4000 and Haswell GT3 crystalwell integrated graphics into other than Apple laptop products, yes let's build a form factor so thin, that the only way to meet the thermal budget is to use Intel's CPU/(Anemic)GPU product, AMD will upstage Intel on this front, at a much lower cost! I am just fine with a regular form factor laptop, and descrete GPU, and would be better served if I could get more CPU cores, as opposed to an over priced Ultrabook with an overpriced CPU/GPU!

Looking at how AMD leads in the price performance (even the pretty old $100 A10-5800 is a better value than the $350 i7-4770!!!)...

Now we know why Intel CEO Otellini planned to officially jump ship on May 31, 2013. Because the Haswell benchmarks would show what a terrible investment of billions of dollars wasted with little to show.

Let's not forget Intel's poor graphics driver record, or Intel's OEM partners terrable OEM customizied Intel HD graphics driver update issues! Paul(Chip Pimp) Otellini is gone after pulling that golden rip cord, and bailing out! Intel, like M$, has had too much market share, for too long, and this PC/laptop user has had enough of this WINTEL madness! I will stay with my SandyBridge and W7 laptop, and look for AMD's HSA offerings and Linux! Ultrabooks, without a descrete GPU, is a Ultra Joke!

Would like to see an article looking at power consumption compared to a i7 920. There's a few of us out there with the good old 920 overclocked to 4GHz+ burning up a heap of power. I'm wondering if it's worth the upgrade to Haswell to reduce power consumption and see how long it'll take to pay off the upgrade.

Stepping up from a C2Q 9550 (same chip I have now) to just an i7 920 would be a huge leap, let alone SB being another sizable jump, with the 5% from both Haswell and IB I think it's safe to say you will see major performance boost even with a 1Ghz OC on that chip you have now.

I haven't went out to upgrade myself because I was a believer in the Haswell empty promises that wasted my time, but I work with machines that are SB i5's and they are smoking smooth, quiet, cool, and fast.

I've only heard of a lower TDP 65W model that has the eDRAM onboard (flagship iGPU) that is supposed to be comparable to the i7 4770K, but I really don't see how that is possible.

Anyways, I wouldn't call anything with more "GPU" power to be a top performer on the 4770K lineup because to be quite honest, nobody buying those chips is looking for the integrated GPU component. They'd probably sell better if they took that space and replaced it with 2 extra cores. People would have far less to bitch about and you'd see performance gains that would give Intel another 4 years of this 5% performance boost before people start bitching about monopoly.

Not to sound like a dick, the first page was just a wast of time I'm not a design engineer now if I had access to the equipment I be more then gladly to study the Architecture.

I do know what your talking about though but for the newbie or first timer they wouldn't have a clue why because your throwing words with no meaning or diagrams to where it's coming in or going out and what it's connected to. Long story short I got bored very fast and just wanted to skip the first page all together but didn't.

In the future don't throw up shit like this unless you have some sort of diagram to follow, Tom's Hardware don't use this and either does HardOCP keep it simple but yet in lighting the read slowly not slide show screen shot's from IDF.

Second Page well let's just say I didn't pay for a $400 Graphics card to be reading about Intel's GT2 Architecture and Mobile Crap but then again some people are probable interested in this stuff but I doubt anyone that read this website is.

Evelyn. I agree that Raymond`s postlng is flabbergasting, yesterday I got a gorgeous Acura after I been earnin $7654 this-last/4 weeks and just over ten k last munth. without a doubt its my favourite-work Ive ever had. I started this six months/ago and practically straight away began to make over $82, per-hr. I use the details here, Bow6.com

Wow. Sort of cool but barely evolutionary and nothing crazy new. So glad I bought a beefy 2600k and a sick GB z68. I knew the rumors around haswell were too good to be true. The bottom line of this review should be- "If you are a PC gamer with a fast GPU and an i7. Ignore Haswell altogether." Honestly did we hit a wall? Is 5GHZ on 8 cores good enuf for anything? I will wait (probably for a long time) for the CPU that starts to crush my 2600k in gaming FPS. Glad to see my investment still giving me returns despite several new CPU releases.

Truth be told: Sandy Bridge was the big leap in gaming CPUs. Everything since then has been extremely underwhelming and incremental. Great review as always guys.

I have a new rig. Asus maximus vi extreme board and ci7 4770k but it wouldnt give any display via hdmi to me. Please help !! The only way i am able to use my desktop new is that i have temporarily installed a hd7770 and using its hdmi output for display.

Thanks in advance. My retailer told me that since its k processor u need a graphics card for display!!?!

@ryan!could you adjust message signal interrupt value to one per core per device in the future(if you aren't already)specificly for CPU with GPU onboard.driver are limited to one interrupt per socket per device.(ya it is limited!but ms suggest one MSI per physical CPU.since now each core are CPU . I feel it isn't fair for cpu including GPU to ignore this!why I ignore normal GPU?diminishing return.I feel this have a more dramatic impact on Apr like has well or jaguar then on desktop with GPU like a 7970.Ty Ryan

PS:Drivers can register a single InterruptMessageService routine that handles all possible messages or individual InterruptService routines for each message

I7 920 fanatics: the 920 is a great cpu of you are gaming, doing Photoshop and other light stuff. If you are doing 3D, video editing and compositing and other heavy stuff, the 4770 will swipe the floor with your 920, in performance and power consumption.