Judging by Linux enablement Broadwell is coming along nicely. 3.14/3.15 should do the job on the kernel side. Mesa is also progressing well. H2 2014 should see Broadwell Out-of-The-Box experience on decent level for stable Linux distros.Reply

Right before the Kernel for Linux reaches 3.15 or right after.All the basics for broadwell have been added for 3.14 but are partly disabled on the gpu side by default. The pci ids where added already.Reply

As mentioned, Haswell-E will use DDR4.(More specifically LGA2011-3 spec uses DDR4...It is NOT compatible with existing LGA2011).

The main difference of Broadwell vs Haswell (that I know so far) are:

* Die shrink; 14nm process node.

* GT3e (Iris Pro IGP) will be available in more models compared to Haswell.=> Expect it to be in the multiplier unlocked "K" versions of the Core i5 and i7 desktop processors.=> Still use Gen8 architecture IGP from Haswell.

* Both Haswell-Refresh and Broadwell will use a different LGA1150 socket variant.(Physically unchanged, but electrically different to current LGA1150...Intel has changed the electrical specifications! Effectively killing compatibility between current Haswell, and upcoming Haswell-Refresh/Broadwell processors!)

No, Haswell refresh is expected to be just a minor clock speed bump with respect to current Haswell (and die stepping change at best), and, thus, compatible with the current LGA1150 socket for current Haswell line-up.

As an argument supporting this state of things, Gigabyte recently (two weeks ago) released a new F8 BIOS for my M/B (Z87X-UD3H) stating "Support New 4th Generation Intel Core Processors", while all existing current Haswell models were already supported by previous BIOSes (F7 and earlier).Reply

Perfect is ill defined. Engineering is all about tradeoffs. One persons perfect processor is not anothers. That said, we've enjoyed gains in processor speeds in large part due to fabrication process shrinks and because of them more caches and transistors thrown at problems, and we should start hitting some weird issues after 14nm, so I'll be curious to see what the industry does after that, if they can keep shrinking, or if we'll be stuck there for a while with them only being able to focus on architecture to improve performance. Reply

Maybe by "perfect" s/he meant either the point where further (noticeable) efficiency is (virtually) impossible ( shrinks aside) or the point where things are "fast enough" for (virtually) any user. Or maybe the point where further efficiency gains can no longer (economically) be made.

I'd really love to see AMD pull a rabbit out of a hat with their next CPU design. If they can't, I really hope ARM can. Haswell is an architectural masterpiece yet somehow at the desktop/workstation level fails to do much for performance/watt or OC over IB/SB. None of that makes much sense. If -Y and -U chips see major gains in graphics and general processing, why don't the -K chips too?Reply

Because Intel's optimizing their CPU designs for increasingly low power levels. Instead of the sweet spot being between full power laptop CPUs and mid range desktops leaving a decent amount of head room above the top of the high end parts for additional OC the optimum is being centered on the low power laptop parts with the result that the equivalent of the OC gain from more power we used to get is being used up in the spread between full power laptop and desktop parts with very little left for those of us willing to crank the power up even higher.Reply

"the perfect architecture" there's no such thing, even if you refer to something as simple as x264 encoding a UHD-1 video...

for instance "Cisco is predicting a nearly 11-fold increase in global mobile data traffic over the next four years to reach 190 exabytes in 2018, with Asia Pacific leading the way ..."

the fact that by estimating the new haswell details numbers above its clear that you can't real time encode x264 UHD-1 3840×2160 video with high quality settings ,never mind do the real HEVC Main 10 profile UHD-2 7680×4320 digital broadcast expected in 3-6 years from now,

if you're asking what comes next to accommodate this massive data throughput for real time encoding for the masses then its probably going to be wideIO 2.5D then 3D High Bandwidth Memory , and then Si photonics in combination with these options....

although you just know that the antiquated server providers will try and keep that away from the masses as long as possible to keep their margins highReply

Disregard, sorry for some reason I thought you meant transistor count and not die area. I'm actually a little curious why the transistor counts don't seem to correlate much with die area, despite the fact that they're all the same architecture. Are ULT processors manufactured on a different power-optimized 22nm process?Reply

That might make sense if ULT 2+2 had a larger GPU than quad-core GT2, but it doesn't. It has the same GPU configuration and half the CPU cores, yet its approximately the same die area of quad-core GT2.Reply

I think it's interesting that they used the same circuitry for the eDRAM and the onboard PCH for the ULT models. I was wondering why there wasn't also an onboard PCH with the GT3e models but I'm now really impressed with the answer. I was also under the impression that the eDRAM was built on the 32nm SOC process so a lot of good info is presented here.Reply

Are the any news in regards as to why Xeon and all Server chips are always one node behind? And soon the desktop Haswell-Refresh as well? Will Intel simply relegate all Desktop and Server to older nodes while focusing new node development on Ultra Low Power for Laptop and Mobile Devices?Reply

The Xeon variants remain one node behind due to obligations for extended validation and support of the platforms. Intel can develop a new architecture and get it ready for consumer-level release, making back the R&D money they're putting into it, while they're simultaneously developing, testing, and validating the Xeon variants. It just makes sense to release Xeon on a mature arch (eg, one generation behind), as there is a significant amount of time and money involved getting the current generation behind adapted into the Xeon platform requirements.

Lately, Intel is moving in the direction you describe - targeting mobile (well, laptop) processors first, followed by desktop, and then server/workstation. That makes sense, as well, as laptop processor sales are outstripping desktop. As a bit of speculation, it may be easier to target the lowest-power targets first, and scale up, rather than trying to do the reverse.Reply

"The 128MB eDRAM is divided among eight 16MB macros. The eDRAM operates at 1.6GHz and connects to the outside world via a 4 x 16-bit wide on-package IO (OPIO) interface capable of up to 6.4GT/s. The OPIO is highly scalable and very area/power efficient.

The Haswell ULT variants use Intel's on-package IO to connect the CPU/GPU island to an on-package PCH. In this configuration the OPIO delivers 4GB/s of bandwidth at 1pJ/bit. When used as an interface to Crystalwell, the interface delivers up to 102GB/s at 1.22pJ/bit. That amounts to a little under 1.07W of power consumed to transmit/receive data at 102GB/s.

By keeping the eDRAM (or PCH) very close to the CPU island (1.5mm), Intel can make the OPIO extremely simple."

hmm anand, care to explain please how and why you state "a 4 x 16-bit wide on-package IO (OPIO) interface capable of up to 6.4GT/s." that's a total of 25.6GB/s (at 4.8GT/s it would provide 19.2GB/s total bandwidth to the processor) and yet you say "the interface delivers up to 102GB/s at 1.22pJ/bit. That amounts to a little under 1.07W of power consumed to transmit/receive data at 102GB/s." implying that it has 4 times the bandwidth in real terms ?

so is the 4 x 16-bit wide being used here to obfuscate the fact its really a max data throughput of 6.4 GBps per link x 4 and so matching the generic Quickpath Interconnect speedsReply

So 51.2 GB/s raw bandwidth (no overhead accounted for.) 102 GB/s total throughput as Anand states sounds like the bus is bi-directional with 51.2 GB/s possible in each direction. So 51.2 GB/s x2 (both directions at the same time) give you "up to 102 GB/s" throughput overall.

The 102.5 number appears to come from Intel slide #16 (AT gallery #6). That slide says there are 8 data clusters operating at 6.4GB/s each, with the final x2 apparently being from the bus supporting simultaneous transfer in each direction.Reply

I'm sorta curious about the potential for CRW, or some future version of it, for CPU performance.

More-dynamic programming languages tend to have largish working sets and lots of indirection. In general,the processor still stalls waiting on RAM a lot across lots of workloads. Maybe an "L4" could be a nontrivial win for server-y workloads if the latency/size were right and they shipped it for servers. It's hard to tell; the fact that they're not talking about CRW as a CPU boost does say something.Reply

Well, Akaz, you can get a Bay Trail-T based processor in an iPad like format that gets 10+hrs of battery life and runs full Windows 8.1.

Its not an i5, but is pretty nice.

Come Airmont/Cherry Trail-T this coming fall, and that should be a rather impressive thing too (though Bay Trail-T is actually pretty decent).

My T100 can easily push 10hrs of battery life in lightweight use and more than 12 just watching movies (about 5-6hrs in some pretty heavy gaming). Its still a bit of a lightweight compared to my i5 3317u based laptop, but it ain't shabby either. Not very noticable in basic tooling around the OS and web based stuff. Just noticable in things like photoshop and lightroom (and gaming), but even there, it can get the job done, so long as you aren't expecting full laptop/desktop levels of performance.Reply