It's only available in solder-on packages. Meaning OEM only. It's meant to replace the lower end DGPU's in mid-range ultrabooks. Not for consumers. So the end cost is about the same when you subtract the cost of the DGPUReply

That aspect of it pisses me off. I've been looking forward to building a new desktop to upgrade from my current Bloomfield when Haswell comes out, and so wanted this to be the integrated DRAM version.

The L4 cache would be a huge win for servers as well if they didn't make the stupid thing soldered only. While you could argue that noone ever upgrades a server CPU, they just replace the server, we do have hundreds of dual socket HP servers here ordered with only one socket populated because of core count limits on software licensing.Reply

I'm aware that it's not for consumers. What I was trying to express is that where graphics performance is important, given the choice between a machine using an i7 with its accompanying price premium plus this on top vs. something like an i3/i5 with discrete, I am still going to be looking at the latter.Reply

Yea, not quite sure what the purpose is. This type of addition would be fantastic on a higher end-ish i3 or i5 that would give me a decent system and decent graphics, but more portable. Sticking it in with an i7, most people will still want a dGPU to power their games because even with eDRAM, Haswell won't be powerful enough.Reply

Well, the 15" model has a GT 650M, so even better performance than IVB would have with eDRAM. And I've read that it is more a software issue (and single thread CPU performance issue) than a GPU performance issue. Many people are fine running 1440p/1600p displays off their Ultrabooks in Windows without performance drawbacks in the general UI. :)Reply

The GT3e in the new Haswell chip has to perform 30% better overall compared to the GT650M in order for Apple to ditch the dGpu. There is still a competitor in the GT 750M newchip from Nvidia which might just double the GT650 and if so would still be useful for high-end Retinas if Apple so decides. The improvements to the 13inch Retinas will certainly be iGpu while the highest end would probably get the dGpu (maybe of AMD persuasion ?). Kepler would be king for a long while as Intel still struggles to be "good enough" in the gpu arena.Reply

It seems very much as if they were made with each other in mind. Even with all the latest updates the Retina Macbook Pros drop too many frames during basic UI animations for my liking, Hasewell with GT3(possibly e) would have been great for it. Reply

Although I wouldn't buy an apple product until they change some of their policies regarding hardware/software separation, offering this gt3e along with a highend firegl/quadro would be nice. Now, I don't know if infrastructure is there, but imagine having three modes of operation: 1. running the quadro, 2. running the gt3e, 3. running the gt3 with the dram switched off.That's not including the various hybrid modes.I don't know if this is possible yet, but it would make for some interesting possibilities considering how powerful intel has made their gpus of late.Reply

As Anand said in the article, GT3e isn't expected to be available in low-power parts suitable for ultrabooks, which is too bad really since that market could benefit from increased IGP performance. Since GT3e is only available in higher power parts where discrete GPU alternatives with the same or better performance are available, its use case seems to be more for BOM/board space simplification rather than directly improving performance constrained situations.Reply

I'd actually be surprised if there isn't a ULV part suitable for ultra books down the line. The catch is that CPU performance would likely have to be further reduced to account for the eDRAM. The other option Intel would have is to make the eDRAM strictly a GPU feature and scale the active amount based upon workloads. In other words, while using a word processing, the eDRAM gated down to 32 MB in size without any L4 cache functionality. While gaming it activates all 128 MB.Reply

Well, look at the Razer Edge. It includes a discrete GPU. I imagine a few companies will give it a whirl and slap this into something similar that should cost a lot less. In the meanwhile, I imagine AMD/nVidia will be forced to lower their prices to match the GT3e's new baseline for such systems.

Could be a decent price drop for better integrated/low-end GPU's.Reply

I would also find a Surface Pro based on this chip appealing. Unfortunately, I doubt you'll see a processor with higher than ultrabook power draw in a Surface Pro. It suck, though, as people with a lumberjack build, like me, wouldn't care about an extra Kg in batteries and cooling system.Reply

Not to be that guy, but has Intel been talking about stepping up their graphics driver development as well? GT650 performance would be enough to run quite a few games at reasonable settings, but without vigilant driver updates it all means nothing. Reply

Exactly. I think the emphasis on the GPU aspect of it is short-sighted --- interesting if it somehow affects your Haswell buying decision, but no more than that.

The larger story here is Intel finally adopting eDRAM as the next step in increasing performance. As always Intel is being cautious this time giving us one specialty part with something of an add-on. But the interesting question is where do we go with this when it's ready for the big-time? In particular: do we remove the current L3, give each core a beefed up L2 (maybe 1MB or so), and move to an eDRAM L3 of 128 (or 256?, or 384?) MB?I guess it all hinges on how close Intel can get that eDRAM to the CPU. Can they manufacture it on the same chip? or are we finally ready for the sort of chip-to-chip (as opposed to package-to-package) contact solutions people have been talking about for years?Reply

First, as an L4 cache. At my job, we develop an in-memory database that is typically deployed on a machine with 1 or 2 TB of DRAM and multiple CPUs in a NUMA configuration connected by Intel's QPI. So near and far memory, where caching far memory in near memory could make a big difference.

But secondly, we have experimented with performing certain calculations (statistics, predictive analytics, etc) on the GPU. The trouble was moving the data from the CPU memory to the GPU memory, and then moving the results back. I'm wondering if on a Xeon with an embedded GPU (yeah, I know, doesn't exist today), if the embedded DRAM would be shared between the CPU and GPU so that the data wouldn't have to be moved.

To address the first idea, IBM does use eDRAM in their high end POWER line and their System Z mainframes. The POWER7+ has 80 MB of eDRAM on-die. The mainframes are even beefier with 48 MB of eDRAM L3 cache on die and a massive 384 MB external L4. Those systems are not x86 based but they are fast. Intel going with eDRAM would likely perform similarly. However, it should be noted that Intel has focused on latencies with their caches, not capacities.

Having a common cache between the integrated CPU and GPU would be advantageous for the very reasons you cite. While it didn't get a lot of press, Intel does have a unified address space between their HD graphics in Sandybridge/Ivy Bridge. So essentially Haswell GT3e will have the functionality.

With regards to Xeon + discrete GPU though, there will always be overhead by merit of PCI-e connectivity even if the GPU is using the same x86 address space. It'll be higher bandwidth and lower latency than today but the results will be nothing like the on-die integration we're seeing on the consumer side. Just having everything on-die helps a lot. Then again, both AMD and Intel can remove PCI-e and ship discrete GPU's using their own proprietary processor interconnects (HyperTransport and QPI respectively). At this point the discrete GPU would be logically seen as another socket node.Reply

that last bit of connecting a gpu via QPI/HyperTransport is a very interesting proposition. However what would be the performance gains? it's not even twice the speed of x16 pcie3, so i guess it's mostly direct memory access & latency. right?Reply

For the most part, yes, lower latency and direct memory access as if they were another socket/core. This idea isn't new either. One of Intel's early slide decks regarding Larrabee had mention of a Larrabee chip that'd drop into a quad socket motherboard.

I'm actually quiet surprised that AMD hasn't gone this route or have many any mention of it on their road maps. They do have new sockets coming out next year and HSA GPU's so perhaps next year we'll hear something a bit more concrete.

The other thing about using a common socket between a CPU and a GPU would be that each aspect would have to support a common memory standard. AMD looks to be going with GDDR5 for mobile parts for bandwidth reasons. Considering that laptops (and especially ultra books) are not designed for upgradability or 24/7 rock hard stability. It also means that more desktop/server centric sockets would imply support for ECC protected DIMMs. This would also bring huge amounts of memory support to the GPU side. These two things would be huge on the GPU side.

One thing moving to QPI/Hypertransport for GPU's would result in is the eventual removal of nVidia from this space. PCI-e will still hang around but hardware using it would be at a disadvantage.Reply

Thus the focus on silicon photonics Intel, IBM, and others have been working on. Interconnects such as QPI, HyperTransport, or Xilinx's RapidIO use too much power and/or require much more space for multiple parallel i/o traces. An optical interconnect eliminates many of the restraints imposed by QPI. The optical signal frequency can be orders of magnitude higher than what is possible for electronic signals without increasing power or thermal load. It is not possible, long term, to continue to integrate components onto a single larger and larger piece of silicon (ie. SoC).

Silicon photonics is a way to connect chip-to-chip at full chip speed, or in other words connect multiple chips together to make a single large virtual chip. Since the optical signal can also maintain this high speed for far further trace distances, it can also be used to make chip-to-chip interconnects even when the chips are on different motherboards in separate cluster nodes. Think a rack of servers that function as a single, very large SoC.

We will first see it used for optical Thunderbird, (ie. extending PCIe bus off-chip), but probably for special purpose chip-to-chip soon after. For example, a CPU and discrete GPU + eDRAM pair in a 2-chip module connected via silicon waveguide.Reply

I'd like to know if the CPU can dip into the eDRAM as a L4 cache of sorts if the GPU is underused or disabled. It would be a shame to waste that huge eDRAM die right beside the processor if the GPU goes unused. Reply

This doesn't make a bit of sense. If the primary purpose is to be L4 cache for the CPU and boost performance that way, then why not make it available in desktop and server chips, which would offer far more plausible benefits than laptops?

And if the primary purpose is to be GPU memory bandwidth, then why 128 MB? I could see big benefits to having the heavily-accessed depth buffer and frame buffer in cache, but at 1080p, those are a tad under 8 MB each. Maybe you want to put extra frame buffers there, for use in post-processing or to have both the front and back frame buffers cached. But that's not going to get you anywhere near 128 MB, and if it's for graphics, you're going to end up using most of that space for lightly accessed textures where it doesn't matter.

Surely they're not planning on moving the really heavily used stuff that doesn't take much space and currently goes in GPU cache to slower eDRAM. That would be as dumb as making an Intel i740 without dedicated video memory because they want to use slower system memory instead.Reply

Do tell how you propose to stick an 8 MB frame buffer in < 1 MB of L2 cache. For comparison, a Radeon HD 7970 has 768 KB of L2 cache, a Radeon HD 7870 has 512 KB, and a GeForce GTX 680 has 512 KB. Older or lower end cards tend to have less yet.

And the L1 and L2 caches are presumably needed for smaller but more frequently accessed data such as program binaries and uniforms that are needed at every single shader invocation throughout the graphics pipeline.Reply

Yes, accessing textures from video memory rather than having to pass it through a PCI Express bus does make a big difference. But if you want to do that, you have to have enough video memory to actually store all of your textures in video memory. That's why modern video cards nearly always come with at least 1 GB and often more. 128 MB would let you stick a small fraction of your textures and vertex data in it, but nowhere near all of it except at low texture resolutions or in older games where you don't need much video memory.

If textures are the goal, you'd likely see more benefit from adding a third channel of system memory, which lets you use a few GB if you need to. And while hardly cheap, that might well be cheaper than 128 MB of eDRAM.Reply

For modern graphical purposes, I have to agree, I don't see the point of adding 128MB of eDRAM. If it is for textures, any 3d game made in the last decade uses a few hundred MB, if not well over 1GB in some cases, at any reasonable resolution.

I really only see this being useful as a cache for the CPU or for 2D applications.Reply

Also a unified fast memory between the GPU and CPU is exactly what is needed for good GPGPU performance, up until now the bottleneck was transferring things from the GPU memory to the CPU memory. Reply

While this could conceivably have some big benefits for certain GPGPU applications, I really doubt that's the primary intended purpose. If you want to do serious GPGPU, you get a desktop or workstation or server or some such so that you can dissipate serious amounts of heat. You definitely don't get a laptop with dinky little integrated graphics that sports a peak GFLOPS rating of not much, which is the only place that GT3e is going to be used.Reply

IIIRC, compositing window managers provide a draw buffer for every open window, even if it's (partially) hidden. With a couple of windows open, that's easily tens of megabytes. Is there any reason to not keep those in the eDRAM?Reply

It's a way to make the research into eDRAM pay for itself while it goes on.It's essentially a research chip in this iteration. This means(a) if they can't deliver, no catastrophe(b) if the power utilization is higher than the actual benefits for general use cases, again no catastrophe.

I see the fact that they have gone this route rather than giving all the CPUs eDRAM as telling us that eDRAM is actually harder to get right than it looks, and they are being cautious. One might confirm this by noting that AMD has not jumped on eDRAM as a way to goose their sales, even though it would have been an obvious point of differentiation from Intel.Reply

Given how often it's repeated that Apple pushed for the eDRAM and higher performing integrated graphics in general, it seems like the classes the GT3e is put into miss the product it would benefit most, namely the 13" Macbook Pro and Retina version in particular. It seems only quad core 50+ watt TDP processors will get GT3e, and only ultrabooks will get GT3 without the eDRAM, so 13" laptops seem like they're stuck with GT2. A 15" laptop would generally have enough room for a better dedicated graphics chip, so it seems like it's missing what would be a perfect match for it. Reply

I'm confused, it's an integrated GPU. If it's too big for ultrabooks then what's the point? Anything bigger (thicker) and I'm getting a dedicated GPU with proper driver updates. Their will either be ultrabooks that use it or it will only be Apple using it. On the other hand (I find this nearly impossible) if they can actually hit a GT650M in terms of across the board performance I might actually care this time. Reply

It is not nearly impossible, from a hardware viewpoint alone. The currently existing difference between HD4000 and an 650M is almost completely covered by the increase in Execution Units alone. Add some improvements in IPC efficiency, some catching up by the driver development team, and the performance boost from the integrated memory, and it is actually quiet likely that GT3e will achieve 650M levels.Reply

On the AMD method, GDDR5 is obviously great for GPUs, but the memory timings are much slower than DDR3, even accounting for the clock speed differences. I know differences in latency don't really affect modern CPUs a whole lot, but this difference would be bigger than the slowest to the fastest DDR3 latencies, it's many times higher. I wonder how that will turn out.

I'm also curious, IBM integrated eDRAM into its Power7 processors a long time ago, I wonder what the performance implications of the CPU being able to access the eDRAM in Haswell will be, and if they'll carry that over to the high end server/workstation markets like Power7. Reply

The comparison for using eDRAM on the POWER7 was similar latencies to having an external SRAM in the same package (like IBM did with the POWER6). Going SRAM for the L3 cache would have been faster in the POWER7 but IBM felt that capacity/density was more important considering the chip's market.Reply

The power draw may be too much for a light mobile device but could be excellent for something like an HTPC. I'm very curious as to what will be offered in the mini-ITX format in a few months. Some good GPU power and a lower overall power envelope than current IB choices would be worth waiting for.Reply

To be honest I was hoping for (if not expecting) quite a bit more than 64GB/s... as beneficial as lower latency is, quad channel DDR3 already gives us 50.Even for notebook graphics, that's far from ground-breaking. 650M performance seems like a stretch, especially considering how this will be relegated to larger laptops, at which point having a dGPU sounds much more feasible.Reply

True, but you don't get quad channel ddr3 in a laptop and especially not with ram soldered to the mobo. This is was perf/watt decision with mobile as a priority.

You get a lot of temporal goodness for games (streaming texture buffer should fit in the 128MB nicely) and smaller datasets and you free up the pipes to main memory so they can keep the L4 full. It's a win-win.

It's only 128MB. With 64GB/s, you can already read or overwrite the complete memory in only 2ms, thus less than a quarter of even a 120Hz frame. Since most graphics software should not work iteratively, this seems fast enough. And why does 650M performance seem like a stretch to you based on this number, when the 650M itself does have exactly the same bandwidth, or even less with the DDR3-version? Even the 675M comes only with 96GB/s.Reply

In server environments, I can see embedded DRAM acting as a real boon to multi-core performance.

But 128MB is not that much more than the already available 20MB of L3 cache in a Xeon, while it is much less than the 32GB (or more) of available RAM. Sounds to me like only a very specific class of software would be able to profit from it. And if you have software that really speeds up with more low-latency memory, does that not mean you're better of running it on a Xeon Phi anyways?Reply

If the previous rumor of a TDP of 55W for a chip carrying GT3e and a graphics performance roughly equivalent to a GT 650m, then I imagine large screen (15"+) "ultrabooks" carrying this chip. Laptops like the Macbook Pro and Razor. Without the need for a descrete GPU, other manufacturers would not need premium materials or smart engineering to stay within the thermal limit of their thin designs.

An additional $50 is not much when you look at the price of a mobile i7 chip. It's another 2-4% price bump for the overall price of the typical i7 carrying laptop. If the GT3e performs similar to a GT 640m (more realistic) it would easily be a worthwhile upgrade for the people who only need/want mid-range GPU performance. Reply

With 2-channel DDR3 memory controller Haswell will barely run games only in low settings and if you change to average or high details it'll be an unplayable slide show with 2-10fps and integrated 128MB memory will not help much as games nowadays require at least 1GB or higher of graphics memory.

So Haswell will survive only for 4-5 months till AMD Kaveri appears in October-November. Kaveri will support 4-channel GDDR5 and will have much better GPU than Haswell so for gaming Haswell will be a joke comparing with Kaveri.Reply

Sony Playstation 4 will come with 8GB of unified 4-channel 256bit GDDR5 memory sub-system (with 176GB/s bandwidth) for both CPU and GPU. The 64GB/s bandwidth of Haswell eDRAM is still 3 times lower than what AMD did for Sony in PS4 and AMD Kaveri will have similar architecture as in PS4.Reply

uptil I looked at the bank draft of $7284, I didn't believe that...my... best friend could actualy earning money in there spare time on-line.. there neighbour had bean doing this 4 only 18 months and just repayed the morgage on their place and purchased a great Lotus Elise. this is where I went, Jump44.comCHECK IT OUTReply