Post Your Comment

127 Comments

Yes being able to to use 4 DIMMS to drive 256 bit wide interface would be attractive for HTPC/Steam boxes. I think it could even help the current 12 CU model quite a bit as it appears severely bandwidth constrained. Another thing to account for would be that you wouldn't have to spring for super fast 2100+ memory, you could likely get away with four DDR3 1600 or 1866 DIMMS and still see substantial gains. Of course with this configuration you would have to sacrifice ITX form factor motherboards and go with M-ATX and I'm sure the power consumption would hurt a little. Reply

P.S. Going by current prices on Newegg there is only a $14 price difference if a person buys 4 x 2GB DDR3-1600 SO-DIMMs ($84) compared to 2 x 4GB DDR3-1600 SO-DIMMs ($70). I'd say that is worth it for those of us who want 8GB RAM.Reply

Added Note: The current Lenovo H530s SFF desktop supports up to a 84 watt Core i7-4770. The old H520s (same chassis) supported up to a 95 watt Core i7 Sandy Bridge according to documentation.

So if AMD is planning on larger desktop APUs I would like to see 95 watts for the top OEM APU (up from the current level of 65 watts) and a TDP higher than that (maybe 125 watts?) for the top enthusiast level K model APU.

I think a better option would be to stick with the same dual-channel DDR3 set-up from a price perspective, but then re-introduce "Display Port" where board manufacturers can then bundle a chunk of faster memory on the motherboard. (Like 512Mb-1024Mb GDDR5.)Once the GDDR5 is filled it falls back to DDR3.Should help with more size-constrained scenarios too as a couple of memory chips should take up far less board space than 4x DDR memory slots especially in ITX...Reply

Lets say the Kinect cost $100 so both the xbox one and PS4 are the same price. It seems like PS4's gddr5 solution beat the xbox on performance so it is the right way for an APU to go at this time? Or is sony just taking a hit financially? I know there are a lot of variables to cost/performance between the two solutions. Either way, AMD should have the chip for it, probably even more so with the GDDR5 solution. Didn't xbox do a lot of the work themselves on their SOC? I'm more interested lowering overall power under high load for some on the go gaming than a desktop solution. Reply

From what I have read..."A Jaguar core can issue 6 ops per cycle -- two memory, two int alu, and two fpu/simd ops. However, it can only decode two, and retire two, so the average rate cannot ever go above two."Reply

X1 CPU is customized not a "stock" Jaguar. "to get all of this processing out of the box" is a challenge too.The new CPU core can do six CPU operations per core per cycle, on an eight-core CPU." -Nick BakerReply

From the XOne documentation, i don't see any difference from a regular Jaguar core, it can decode and retire only 2 instructions per cycle. Up to 6 µ-ops at once, 1 load, 1 store, 2 ALU and 2 FPU.Reply

From all we know, the jaguar cores in both XBO and PS4 are 100% stock. This is why these systems have 2 cpu modules with 4 cores in each (both modules having 2MB L2 cache), because a standard module with jaguar cores doesn't scale further than 4 cores.But certainly there is additional logic _outside_ the cpu cores - in particular the interconnect between the cpu modules and northbridge logic very obviously is different (and more complicated - the cpu modules reportedly are cache-coherent).Reply

I just want to weigh in her since I'm a computer engineer and this I can clarify a few things. XB1 and PS4 use the same Jaguar cores. I'm not sure where Gabrielsp85 got the idea that they aren't the same. What I do know, is that PS4 uses GDDR5 memory, while XB1 uses DDR3 + eSRAM. Since GDDR5 is more geared towards graphics workloads, Sony likely chose this option to improve graphics performance at the expensive of having to spend more time optimizing the CPU side to not be as latency bound to RAM. The XB11 on the other hand uses the more CPU centric DDR3, with eSRAM to increase bandwidth for GPU operations.

All that said, I can't be sure of the specifics of the memory buses on these chips. The XB1 memory bus has been listed as 68.3 GB/s which would indicate a quad-channel DDR3 controller with 4 64-bit lanes of DDR3-2133. The PS4 on the other hand uses a GDDR5 bus (guessing a 256-bit width) for both the CPU and GPU and has a peak bandwidth of 176 GB/s. In addition there seems to be 256 MB of DDR3 for CPU "background tasks".

I can tell you right now that the PS4 is listed as having a peak throughput of 1.84 TFLOPS, while the XB1 has a peak throughput of 1.31 TFLOPS. The 1.84 TFLOPS of the PS4 is due to the 6 additional compute units adding 384 shaders, but also indicates that the GPU runs closer to 800 MHz. These numbers of course neglect bandwidth to the GPU. You would need 7.36 TB/s of bandwidth to offload all of those calculations to system memory for the PS4, and 5.28 TB/s for the XB1.

In the end, I would expect the PS4 to run faster because of the higher system memory bandwidth. Especially if a mantle-like API is used to reduce draw-call overhead to the CPU, reducing the CPU's need for bandwidth.Reply

I still reckon the reason for the DDR3/GDDR5 difference between the Xbox and the PS4 is more down to MS starting their R&D and supply contracts 6-12 months or so earlier than Sony and therefore not getting the good deal on GDDR5 at the time of enquiry. Had the timing been different they would probably both be GDDR5 equipped.Reply

no that's definitely not it. GDDR5 availability is scarce. That's why Microsoft could keep pumping out weekly shipments of Xbox One while PS4 was getting shipped out every 2 weeks with much less units.

Also since the Xbox was built to run 2 OS's at the same time with a hyper visor, onboard cache was needed to prevent constantly having to read/write to main memory. The X1 cpu is more efficient because the majority of it's memory bandwidth is served on die at very low power compared to ps4's gddr5 constantly reading and writing and higher latencies and greater power.

Apple does the same thing with the A7, Intel is doing it with Iris Pro. They take advantage of the super low latency low power super cache's therefore making it more efficient (Note: efficient doesn't mean more powerful, however MSFT seem's to have struck a good balance)Reply

Microsoft is shipping more units? Shipping data and sales data thus far would contradict that statement, with the PS4 outpacing the Xbone by about 45%. The PS4 is currently having trouble meeting demand.

(For the record, I own neither and plan not to as I believe they are both underpowered, overpriced, and over-DRMed.)

As far as wanting AMD to make a higher end or more potent APU, I can't say that it would interest me at all. Let's assume that I could buy a version for $250 that was an order of magnitude more powerful than the 7850K that was design around a NUC-size. Chances are good that the entire package would be around $500 when all was said and done. Now what's my upgrade path?

A DIY mITX setup in a slightly larger enclosure with a weaker CPU and more potent GPU would like not only cost about the same (or less) but also provide more options for upgrades - at least providing one more generation of improvements for less than the cost of a new NUC+APU setup.

I'm not saying a NUC-style AMD platform would be a waste, just that the value diminishes when longevity is considered. This seems like a question to ask when (if) AMD ever delivers on their promises.Reply

I think AMD's 45W is still way too hot for NUC style platform. I don't think they are really chasing these down. the whole NUC thing is a bit odd, I think ITX size is already small enough for most people to simply plop on a desktop/VESA mount. Going even smaller to NUC size while increasing the price premium and reducing performance (thermal headroom, IO options and memory bandwidth are typically less on a NUC) is quite odd. at the current $500+ NUC price point, an ultrabook for another $50-200 adds an 768p/1080p IPS screen, battery backup, keyboard, plus real portability.

Suffice to say additional L3 or 256 wide DDR3 will also add to the socketing requirements, again relegating the whole APU system into mATX (maybe ITX if they opted for L3 on package) minimum size.

I would imagine then your upgrade path would be as simple as swapping the APU and/or adding memory/L3 (if socketed).Reply

Why do you think NUC is odd? You said it yourself, they're expensive. Now you know why it exists: to boost margins. You can't do that unless you have something "different" to push, in this case primarily size as compared to previous SFF designs.

With that being said, but I don't like it. I don't mind SFF in general but I like a greater degree of customization, variety, and competitive pricing. So if I want a SFF machine I look to ITX.Reply

I'm going to have to agree with Anand here: such a thing would indeed interest me greatly. I'd love to be able to have a machine like that just to poke around on. That said, I'm not sure if it would merit a purchase at this point or not. It would make an incredible Steam machine or HTPC, but I don't currently have a need for it since my main desktop is close enough to my home theater to use a 50' HDMI cable to hook it up (small apartments are so fun).

I suppose something like this would tempt me greatly if I was in the situation where I couldn't have my current setup. I would want a 4-module Steamroller core, though. Other than that, putting this in something like the Gigabyte Brix or NUC box for a HTPC or Steam box would be quite awesome indeed, especially with Mantle support (if that pans out like AMD says it will). Then again, the sales of such a chip would be tied directly to the success of SteamOS as well, sooo...

That was a lot of rambling, but basically I'd probably buy one for a Steam machine if it came in at <=$350 for the barebones system (lacking SSD/HDD, RAM, and WiFi like the NUC and Brix systems do).Reply

F that man, by the time you put a SSD RAM and wifi card, external drive, your in the $700-$900 range. just try building one of those Brix or NUC systems with reasonable components. You pay a premium for the low power consumption and small desktop footprint.Reply

See, I'd be fine spending that kind of money on something that basically had a 7870 in a box the size of a Gigabyte Brix Pro. It's certainly not going to eb high-colume, but I can see a use case for it.

Also, switching that to a HDD for $60 or so, using 8GB of RAM for $80, and forgoing the WiFi and external disc drives (neither are strictly necessary, each add ~$30) brings it to $490 for the whole system if they price it at $350. That seems reasonable to me given that current consoles aren't far off that, and you get better silicon than they have (Steamroller vs. Bobcat cores mainly).Reply

Yes, for a machine with these desktop APUs I would prefer a slightly larger form factor for the cost savings.

Something the size of a Gateway SX, Dell Inspiron 660s, Lenovo H530s would work. These are systems can be found for as low as $350 on sale with a Haswell Core i3, 4GB RAM, 1TB HDD, Optical drive, Wifi, Windows 8.

Unfortunately for AMD I did not see them make any breakthroughs in value desktop for the Richland APUs. In almost every case an A8-6500 APU prebuilt was priced well above what the Core i3 systems were selling for. Hopefully this changes for the A8-7600 as I would like to see it priced more on par with the Haswell Core i3.Reply

Especially since the 7600 proved to be a good performer, whether at 65W or 45W. I think OEMs will like the flexibility this chip offers over its (distinctly 45W or 65W only) Richland counterparts. The higher TDP chips were actually a bit of a disappointment by comparison (considering shader count), at least with the memory configurations they were tested at. I'd really like to see the "K" models tested with faster memory. There's even an AMD branded set of 2400 DDR3 memory available now - that's like "wink wink hint hint it's an unlocked model".Reply

I'd love to be able to have a machine like that just to poke around on. That said, I'm not sure if it would merit a purchase at this point or not.

That's the problem I have with this idea, though. It would be a fun toy, but I don't need one and I certainly wouldn't pay much for it. Even using it as a Steam box would be a stretch, the current offering of mITX-sized Steam boxes would be faster. Why would AMD want to spend a lot of R&D making a low-margin niche product?

If it was in a 35-45watt laptop it would be a different story of course.Reply

I completely concur that the market for such a small desktop would be by definition a niche. However, if they're already designing such a thing for a laptop, why not allow the clocks to ramp a bit and throw it in a NUC or Brix sized system? I'd wager Intel put out the NUC as part a technology demonstration and part to force OEMs to start critically looking at what you can do with current processors. They certainly didn't put it out to be a high-volume product. They just had the work done and threw it in a case I'd bet. AMD could do something similar, but their pockets aren't quite as deep as Intel's, so they might not.Reply

my thoughts exactly! When I finally digested the Kaveri reviews, I wondered if these 6-8 CUs @ 700Mhz are bandwidth starved. and wondered what if it had L3 with 256MB GDDR5 onboard/package.

In fact this realization stopped me from buying one (and I really, really, really wanted AMD to stay alive). Worse, the controller seems to top out at ~2400Mhz DDR3.

Intel had their strategy right: realizing that iGPU is yet not able to do AAA games at 1080p and opted to optimize 720p and lower to playable territory (lower # of 'CUs' with bandwidth stopgap : crystalwell).

On the other hand, I think AMD have a good understanding of balanced design. IMHO Temash is very balanced: single channel DDR3 @ 1600, feeding 2 CU @ 300 + 4 LP cores. ~1/3 of Kaveri's bandwidth for 1/6 ~ 1/8 Kaveri's compute.

If AMD would have earned my money if they came out with Quad/Triple channels DDR3 or Dual Channel DDR3 + GDDR5 L3..Reply

The quad-channel interface is very likely to be aimed at GDDR5. There's almost no chance of 256-bit DDR3. 35-45W Kaveri with 128 to 256-bit GDDR5 in a thin, light laptop would be very price:performance competitive with Intel. It would even have "Radeon" graphics branding, which is essential in some markets that are conditioned after years of only buying discrete GPUs (even if they are very low end).There's no JEDEC spec for socketed GDDR5 so it will have to be soldered down, which makes PC motherboards out of the question. Even if you think niche mini-ITX would be cool it won't happen because the price would be "astronomical" compared to other boards, and there's limited facility to express it has memory built in.What BGA options does AMD offer for mobile?Also you'll never see Xbox-size SoC in retail - those divisons are completely separate and if AMD wants to grow its custom semi-con business it wouldn't push it to gaming PCs that directly compete against Xboxes.Reply

I'm not sure AMD would care if it competes with itself against Xboxes and PS4's, since (as has been stated in every XB1 vs. PS4 argument ever), it's the games that sell the console, not the hardware. Even if such a mini ITX build does pull buyers away from consoles, why should AMD care?

Also, I do agree that a monolithic die that size would never make it to consumer retail, which I personally feel is a damn shame.

I hadn't thought about the laptop aspect, though... That's incredibly insightful and a very interesting proposition. That might actually even work. I was miffed with Intel because they didn't include Iris Pro on any lower wattage (<35W) SKUs because I feel that Iris Pro would've helped something like the retina MacBook Pro quite a bit. If AMD could do a high-bandwidth (relatively) low-ish power SoC first, that'd be killer.Reply

Of course AMD should care, because their customers care. Why would x customer go to AMD to develop and pay for a new chip, if AMD then spins it off for itself competing with said customer? If only games sold consoles no one would even care the slightest that PS4 is more powerful or Wii U is significantly underpowered, yet it's all fanboys yap on about.

RE: Laptops - due to costs though you'll likely only ever see 128-bit GDDR5 used, because AMD has never yet commanded a premium price in notebooks. 256-bit requires more PCB layers, more RAM chips, more dev time; whereas 128-bit is a relatively easy and cheap 4-layer affair. I'm not sure how well GloFo's 28HPM process can scales down in voltage/power, and AMD's current core designs require frequency (and thus voltage) to achieve significant IPC, which is the enemy of low power.

Iris Pro with L4 cache requires a LOT of extra transistors and WideIO packaging tech, which is very cost intensive still. Therefore it must be packed it with a premium chip to make it worthwhile. More EUs is still cheaper to add than L4, so a 40EU+2C/4T chip is the better cost and thermal alternative.Reply

Oh, I'm aware it's incredibly cost prohibitive to do a big L4 like that. But I still maintain that a 30W SKU with 2 cores and Iris Pro would've been nice. You'd save a little money on the die size (admittedly not much though), but doing 2 cores would keep the power manageable.

And the point about not hurting relations with Sony and MSFT is a good one. Why bite the hand that feeds you, etc. From a consumer standpoint, AMD doesn't care much I'd wager because they're still selling chips, but partner relations is another matter I didn't think of.Reply

Iris Pro's lowest TDP is 55W or 45W with 64MB L4 AFAIK. Either way, to hit 30W on 22nm (not sure if the L4 eDRAM, which is a separate die, is also 22nm) would require a very low CPU/GPU clock to compensate or halving/thirding the L4 again, which would not bring benefit.

AMD cares about selling chips - for sure - but ALL tech companies care about IP protection and long term existence. Trust if fundamental to doing tech business they rely on others to work with them, so IP considerations are a big deal. Visit TSMC, for example, is like going into a fortified government facility.Reply

Iris Pro is available on 57W and 47W processors. The clock rates change by 400MHz between the Iris Pro and regular SKUs (2.0/2.4, 2.3/2.7, 2.4/2.8 for HD5200/HD4600) for nominal clock rates. The Turbo rates change by 300MHz.

I hadn't actually looked at those before. Much higher than I thought. I'd wonder how much of that is due to differences in configuration between the 40EU (HD5X00) and 20 EU (HD4{6,4}00), and how much is due to the additional power needed by the eDRAM and its accompanying changes (pinout, controller, etc.). I suppose it likely wouldn't be worth it to do eDRAM on a dual core chip if the total cost of the chip would be basically the same as a quad core anyway, just with slightly lower TDP.

In essence, yes, you're correct. I do hope they do a dual core Broadwell SKU with eDRAM, though. If the GPU improvements are even close to what Intel says they are, they're gonna need a higher bandwidth memory solution.Reply

No, 256bit gddr5 would make no sense at all. The chip simply isn't fast enough to be able to benefit from that. 128bit gddr5 or 256bit ddr3 are options which would make sense.That said, it's not just those docs talking about 4 channels which got some discussions going. If you look at the die shots compared to llano and trinity, the ddr3 area seems doubled (on the upper edge for llano/trinity, lower edge for Kaveri) - you can even compare that to quad-channel (xeon) sandy bridge chips and it looks pretty similar too. Which would imply those 4 controllers are really present (but not wired for fm2+ chips). But this is just speculation.Reply

I am not convinced that a big Kaveri is possible without a significant heatsink upgrade, seeing as the current A10-7850K is already at 95W TDP. PS4/XB1 utilize 8x low power and slow bobcat cores to keep their TDP lower.

It would be interesting if AMD could do a big Kaveri while getting thermals lower. Such an APU would be amazing inside a desktop replacement ultrabook, laptop or NUC like computer.

Get the big Kaveri to fit laptops without throttling and AMD could give i7-4xxxMQ a run for its money.Reply

On the subject of big Kaveri, if AMD cannot push the frequencies high then surely it would make sense to release a 3 or 4-module version for the desktop?

This would have a potentially serious impact - enthusiasts could now get FX-class performance - with the potential of huge gains due to the GPU compute features - from an APU. This would also mean that AMD could EoL the AM3 socket and focus on just FM3 (obvious benefits here), thus migrating all their customers onto APUs. Enthusiasts are enthused, and AMD has a streamlined product line.Reply

A quad channel interface and 6MB L3 added to the current version would have been a great enhancement, along with higher GPU clock rates in the 100W version to best make use of the die. The L3 missing in these APUs is quite puzzling to me, I would have expected the APUs and their memory starvation to have been an ideal candidate for an L3.

Next year they will move to TSMC's 20nm node I imagine, it wouldn't surprise me if AMD went with a larger 896 or 1024 SPU GPU design, added an L3 and a wider DDR4 memory interface to keep the extra cores fed. Reply

Take out the GPU, put in four modules and an enormous L3 cache like you say and have quad channel DDR3 memory. With a discrete video card this will be a great (and competitive) desktop solution. If they have to leave a few CU's in just to support integrated desktop video then fine, but stop trying to game on it. There just isn't enough heat dissipation from the CPU to compete with a dedicated card that has it's own cooling system.Reply

Uh, what i would be interested in buying is a GDDR module.. it would also be vital that both IGP and traditional GPUs (even the ones made by other brands) would have to have priority/instant access to the GDDR module, this is to ensure customizability and hardware endurance.

What would also be cool, is that costumers would have the opportunity to buy a larger size GDDR if i need it, 4k? surround 4k? 8k? surround 8k? no problem... just buy a new module with more gigabytes and be done with it..

Let's say i have an APU that is using my GDDR module , but then i decide to buy a high-end VGA with a high-end GPU inside it and all the sudden my GDDR module worths nothing because the VGA comes with its own GDDR chips soldered into the PCB, the solution to this problem would be to sell GDDR and GPUs separately, so that when someone decided to buy a high-end VGA, the VGA wouldn't come with GDDR chips soldered into its PCB, instead the consumer would also have to buy a high-end GDDR module with whatever amount of gigs the consumer sees fit.

I can't emphasize enough how much i would love to see the GDDR and GPUs being sold separately... and i strongly think the majority of consumers would love it too, but this wouldn't have to be forced into peoples' throats, it could be a passive-addiction, meaning that it is optional, one could continue to buy the standard VGAs and choose to not buy any GDDR separate modules, the regular easy-choice. While the option to go hardcore and customizable choice would also be available..Reply

(If I'm remembering this right) I remember the first PC I built back in the mid to late 90's had a graphics card that you could stick extra 1 MB chips in. That'd be cool to see something like that nowReply

Excellent ideas. This was along the lines of my thoughts as well. I would like to see graphics modules up for purchase so I could potentially choose which bandwidth to utilize. That way it is still left up to the consumer and would allow for multiple price points.

Video cards would have to be redesigned to all proper cooling of the module. Aftermarket GPU cooling would have to be retooled as well. Memory manufactures could push this through to see higher revenue, just remember we will likely see price premiums on all components go up.

An alternative would be to have a dedicated area of the motherboard allotted for graphics memory modules. Again, we have to look at location, cooling, and direct access to the GPU whether through the chipset or some other silicon.Reply

It sounds good, but the generally accepted problem with that is the massive latency the GPU would have to deal with when accessing the system's RAM over the PCIe bus. That's the entire reason why GPUs have massive frame buffers on the card, they can't afford to go out to main memory.Reply

I'd me most interested in a large L4 cache like intel did because this seems to be the most affordable and reasonable option. Intel bills the eDRAM at $50. If AMD were do something similar they could charge a small premium for those who value/need more bandwidth and for those who don't need it, they can save some money and forego the L4.Reply

AMD most likely recycled blocks from the PS4 SoC.I've been saying for a while that they needed 20nm parts with at least 18CU this year (early better than late) but i highly doubt we will see much from AMD anytime soon. They are not eager to jump on new processes or to offer significant perf gains at decent prices if they aren't forced to.As for GDDR or SRAM it's rather costly , Mavbe go with what Nvidia is doing with Volta and put some RAM on an interposer.The problem with integrated GPU is that they've been chasing 1080p and failed to get quite there. Now we are going 4k and APUs are years behind so the timing stinks for fast APUs. Ofc matching consoles in perf is not a bad thing so it wouldn't be a waste of time if it can be done at acceptable prices.It's sad thought that AMD can't do fast CPUs anymore. With Intel wasting die are on GPUs we don't use and it's grotesque prices for more than 4 cores ,there was a market to exploit there.The DIY market lacks energy and enthusiasm because we aren't getting what we want and nobody gives a damn.Hell, i would go ARM and Linux if we had faster chips there.Guess ARM desktops is something nobody is trying to do yet and AMD might not have the resources for such a gamble.Reply

Yes. If Kaveri already has four memory controllers, it badly needs a chipset that can take advantage of that fact. And if quad-channel RAM can support even bigger SKUs without bottlenecking, AMD should go for it. I've never liked the idea of my computer needing extra cards for graphics. Reply

For the desktop I would go for a 4-module Excavator core + 14CUs, for a total of 8 Excavator CPU cores and 896 GCN GPU cores.The die size would sit between 300-330mm² (4EModules~ 170mm², 14 CUs ~115mm², other SOC logic~45/30mm²) roughly matching the die size of the FX 8350 at 315mm² on Globalfoundries' 20nm process node.TDP would likely sit at 100W with 3.0-3.4Ghz clock speeds on the CPU and 720+ on the GPU.Essentially double the power consumption of an A8 7600 but also more than double the performance.Quad-channel DDR3/4 memory would have just about enough bandwidth to drive the GPU & CPU.

Also leveraging the powerful 896 GCN core integrated GPU (R7 260X class) in hybrid-crossfire with AMD's Hawaii GPUs (R9 290/290X) and all future AMD discrete graphics cards that will include a CrossfireXDMA engine would improve game performance so significantly that all the CPU IPC deficits the AMD CPUs have would be completely masked and game performance would be significantly higher than any equivalently priced Intel+GPU combination.

Oh lord that chip would be a gamers dream.With HSA this could also become a very successful workstation and server chip.I personally don't think fitting this in notebook form factors would be an issue, we already have 150W+ TDP gaming laptops in very thin and light configurations.After all the thinnest laptop in the world the 14" Razer Blade uses a 37W+75W CPU+GPU combo.Reply

Ahh I loved the eight core chips, with the new 28nm SHP process AMD managed to keep 4 cores running at 3.1Ghz in a 45W TDP envelope they can theoretically give us 3ghz 8 cores under 90W right now, I would LOVE to get my hands on such a chip and they can fit the 8 cores in a die size smaller than Kaveri, so essentially giving you an 8 core Steamroller CPU in a sub 230mm die area for less than 150$.DROOOOOOOOOOOOL

You know what I would take a six core Excavator + 7790 out of a single processor any day, it doesn't have to even be 8 cores, HSA can handle any parallel workloads, AMD just needs to bring up the per core performance slightly.Reply

What about cooling though. You'd need a pretty beefy cooler for a 300mm chip for laptops.Also I think your configuration would be very much possible once AMD introduces die stacking.Put HBM memory on there, 4 serial compute units, a thousand GCN cores, asics, memory controller and you have a very nice package.Reply

I don't think cooling is going to be a problem, I mean look at AMD's 45 7600. If we essentially double the components and power you're going to get 90W on a 28nm process, 20nm will consume even less power.

I might vote for a 2-module Steamroller core + 20CUs with a DDR5 interface (the memory would likely have to be included). Basically a PS4+. Expandibility of course would be an issue with DDR5, but the performance would be great for the price, and would not need any high-speed memory on-die.

But either way - DDR3 or DDR5, a large Kauveri could be *very* interesting; what a great price-performance box that could be as a gaming rig.Reply

Yes I would very much be interested in that. I built an HTPC using an A10 5800K.Its main purpose is an emulation station (I run hyperspin on it) but I've used it for suitable PC games too. Having a bit more GPU grunt would give me more gaming options (I don't want to use a discrete card).

AMD need to sort out their IPC though. Until they have a chip that can handle full speed Wii/PS2 emulation I won't be upgrading. No matter what the GPU gains. Reply

With the current Kaveri you can start from A8-7600 and latter add a 240/250 in dual graphics mode. I am not considering 7700/7850k because they are expensive for that job, which is, making a low cost entry gaming PC with the possibility of easy upgrade with a cheap graphics card using dual graphics.If 7850K had 896 stream processors and 7700 had 768 on the other hand, things could be very different, assuming of course we didn't had that huge bandwidth limitations like now. In that case the price would be more than justified to start that entry level gaming PC buying just one of those processors and have the option latter to upgrade 7850K with 260X or 7700 with a card like 7770.

But as long as we have that limitation with memory bandwidth AMD can't do much more than concentrating on the cpu part. Giving a model with just 128 stream processors and 3 or 4 modules on the FM2+ socket with prices comparable to those of 63X0/83X0 would trigger the upgrade path for many people like me, who are still happy with their AM3(+) platforms.

So while Anand you would be interested in a chip with a huge GPU on it, me and I think most AMD customers who are with a Thuban, or with a 3 or 4 module AM3+, would be more interesting for an APU with more Steamroller modules.Reply

" If you had the building blocks AMD has (Steamroller cores and GCN CUs) and the potential for a wider memory interface, would you try building a high-end APU for the desktop? If so, what would you build and why?"

What I want from the Kaveri successor is two different products using a unified socket:

The current fp64 ratio of the GPU cores is 1:16 When you work out the total DP Gflops between the GPU and CPU it is around 105Gflops. An i7 is DP 224Gflops CPU only. i.e. there is no point programming a double precision app in HSA, its far easier to just program in AVX2 only. However if the fp64 ratio was 1/4 then the total is 243 Gflops which makes HSA a much more attactive option especially has later APU's will have more cores.Reply

That's what I'm thinking as well: server chips with DP = 1/4 SP and 4 memory channels. They want to do server APUs anyway. Then make this platform the new high end desktop and vary a bit: 4 modules + moderate GPU etc.Reply

You only need strong graphics when you want to play games.If you want to play games you buy a dedicated graphics card.If you have a dedicated graphics card you dont need stong on integrated graphics.

So in my opinion AMD must either:

Improve their non gaming CPU perfomance (perhaps even with GPGPU/HSA) so they can compete with Intels offerings on a performance, energy efficiency, price ... level to be attractive to non gaming customers.

or

Inprove their integrated graphics to a level where even a mid range gaming enthusiast doesn't need a dedicated graphics card.

As an example my own situation is that I do my gaming on a ultrabook with Geforce 730M graphics, and I don't want to build and maintain a dedicated gaming PC. However I do have a mini-ITX Server with six HDDs as a file dump/ personal cloud/ backup service/ HTPC running on an Intel Celeron.I'd love to drop a X Box One class APU in there and do something like Steam Big Picture for more higher level gaming.Reply

I actually forgot the reference to the article: If they go the second route with improving integrated graphics to serious levels i bellieve memory bandwidth is (one of the) biggest issues.Adressing that with lots of DDR3 RAM/bandwidth (ie 4 channels) or using GDDR5 RAM seems to me more elegant than just drop an other cache level on the die witch is what Intel does.Ideally they would use GDDR5-Sticks but I dont bellieve, that it is good to itroduce an other Standard.How could DDR4 play into this? Would it be a viable alternative to GDDR5 in this scenario?Reply

I would definitely be interested in buying such a thing. Like many others have said: building your own Xbone or PS4 in a PC? Hell yeah! I would specifically use it for a Steam machine (although I am still not 100% fan of SteamOS) and also as a replacement for my NAS/Home backup system. Perfect combo.

According to nordichardware (http://www.nordichardware.se/CPU/Styrkrets/amd-lan... - It is in swedish, ask Google if you want to read it) there are testing being done on implementing R9 280X graphics into an APU next year. This is a rumor, and should of course be treated as such. But if they manage to solve the memory problem by using 4 memory controllers then perhaps it is viable. I wonder how they are going to keep the thermals down. How would DDR4 stand in this? Would it be possible for them to use that instead of GDDR5 and still have about the same performance? Reply

I sure would be interested in it! As it stands to me Kaveri tries two things at once (trying to be a good desktop CPU and a good GPU) but fails at both. Anand stated in his launch article that Kaveri fits in a rather tiny niche, that is Steambox/HTPC gaming. I agree, but for that to really take off the GPU need to be a least as powerful as the XBone and PS4. Currently it isn't so I might as well just get either console. On the desktop CPU part Steamroller (the 3rd Bulldozer "Revision" mind you) still lacks power. If I build a new Office computer I'd choose Celeron/Pentium/Core i3. Their OpenCL performance is not as good as AMDs, but it's still there, which is something that get's lost often when looking at AMD marketing slides. They make it seem like they are the only ones with OpenCL capability. Not to mention Broadwell is almost right around the corner and we'll have to see how much Intel can improve the GPU part. Also there is Quicksync. I use it on all my Blu-Rays. Even a lowly Core i3 can use that and effectively boost Blu-Ray encoding compared to AMD APUs. Go to hardcore gamers and they are better served with a Core i5 or AMD FX CPU and given the prices of the new Kaveri chip it's actually advisable to do so.In my eyes there is simply no selling point of the AMD APU on a desktop. Mobile will probably be fantastic as the A8-7600 45W review showed, but from a desktop/HTPC perspective? Meh...I'll pass. More cores and GDDR5 or quadchannel would have sold me.Reply

Given the current building blocks I would like to see 3-4 Steamroller modules (6-8 cores) with 4 CUs (keeping the GPU for assymetric coprocessing with Mantle, as announced).

Given the chance to improve Steamroller for a new stepping I would seriously review those caches (they have been getting worse ever since Brisbane) and memory controllers. AMD is now basically accessing the L2 cache at the same latency that Intel accesses main memory, it's ridiculous.

Also, L3 cache seems to be sorely needed into the equation. AMD was pioneer on this, and they got a big win by implementing it. Now they are neglecting it and suffering the consequences. The current architecture is bandwidth and latency starved, if they fix this then every other improvement can show its true potential.Reply

There is of course the chance that the Kaveri die supports both DDR3 and GDDR5. Of course the FM2+ implementation would only support dual channel DDR3 to maintain backwards compatibility. This does not preclude the possibility of a GDDR5 version that'd be soldered onto a motherboard alongside GDDR5 memory. Ditto for the chances of a quad channel version.

One thing worth noting is that with GDDR5 there would be a memory capacity limitation. 16 GB of GDDR5 is the current maximum on a 256 bit wide bus using multiple and expensive high capacity GDDR5 memory chips (a similar DDR3 setup can reach 64 GB using unbuffered memory).

The other thing about a GDDR5 option is that it is not suitable for ultra mobile devices. GDDR5 memory consumes more power than vanilla DDR3 and far more than LP-DDR3. This means that a GDDR5 version would only have a handful of niche roles: HTPC, HPC server clusters, and embedded applications that need lots of memory bandwidth.

I would love to see a quad channel GDDR5 option though. Such a setup would provide 4 to 6 times are much memory bandwidth as the current DDR3 implementation. It'd effectively remove the memory bandwidth bottleneck.Reply

I see three markets for this:1) The ITX that was already mentioned. However, the market is small.2) Laptops that were mentioned but getting people to buy it would be hard and a custom chip for that small of a market is going to lose money.3) R&D. I think AMD has a real chance here. HSA and GDDR5 would offer a really high performance solution for researchers needing real-time processing.

Combind all three of those markets and you might have enough market to sell the CPU and not lose money.Reply

I fully agree with Anand here an APU with 20 CUs would be very welcome provided memory bandwidth wasn't an issue. Give me four channel DDR3 or see what DDR4 can bring as some others in the comments have said. I would love to ditch my discreet card (HD7850) for a single chip solution, but until that kind of performance is reached or surpassed I won't be spending up on a half baked solution.

I got into the PC market during the Althon XP days and it pains me to see where AMD is today. I would love to justify a new AMD PC but as is stands I'll be sticking with my Intel CPU and AMD graphics card.Reply

Heck yes I'd be interested in any of those options. I think I recall seeing someone show 5-15% performance bump on Kaveri by just switching from DDR3 1600 to DDR3 2133.

I'm seriously considering at a Kaveri chip for a HTPC / light gaming machine, but I'm concerned that it will be a little too light for the games I play. I think more memory bandwidth would be enough to fix that.

If the performance was about equal, I'd prefer a 256-bit interface to regular DDR over the other options, but I'd be happy with any of them.Reply

That high-end APU could be a good option for saving a few dollars on mid-tier gaming PCs. If it performs like a $200 CPU and a $200 GPU if they sold it for $300 I could see buying one. Kitted out with all the right components you could end up with a half-decent gaming PC or Steambox for $500-$600 which would be great for most things.

Some of my less affluent friends could really use something like that and you don't need to lose out on upgradablity because they could easily include a PCI-E slot on the board. Even on an mITX form factor.Reply

I would still prefer a 4 module part if possible, as at least in my scenarios 2 module parts choke when running more than 5 applications at once. Something in the range of 20-28 CU's seems about right for what you could hope to run off a 256bit DDR3 interface. If such a part existed for say $300.00 or less it would find its into many of the systems I sell to others at this point, not to mention a couple of my own.Reply

I personally would love to have that option. AMD would have to go the Intel Iris pro route, and sell it on a pre-soldered motherboard. The difference would be they could sell to OEMs and the retailers, unlike Intel. I was actually hoping for a beefier gpu on Kaveri, I want a Project Spark box for building, I have a Xbox One for playing. AMD could have made and itx motherboard with both DDR3 slots and soldier GDDR5 on the back of the board and use the itx case as a heat sink. We have itx motherboards with mpcie slots on the back already. If you a wondering, HSA is enabled on/for Hawaii CGN gpus on pcie slots so it already works across different memory types/controllers, mixing both looks like what Kaveri had planned. Iris pro and Xbox one SoC already do, it's just another cache, GPU and CPU would both be taken care of.Reply

I think those extra mem controllers come into play on the server variant. Or a future revision. As to a high speed variant, I would be interested in a dual processor system with more CPU cores and GPU cores with configurable TDP for each side of the equation (CPU\GPU). It would be great if turbo would automatically figure out which set of cores would be better served with more TDP headroom on the fly, so applications that are processor intensive get the extra juice that they need. 8-12 CPU cores. I had commented on AMD releasing and 200W TDP processor and they did with the FX processor, but I was thinking of a 200W TDP APU, because that would make more sense as in a case you would just skip discrete cards. Those higher GPU cores only make sense if you can have CPU that push the FPS in games. If the CPU is too weak, than that GPU goes to waste. This was something that was apparent in a laptop review that was done here on Anandtech. It was a review with the 8970.

I'd like to see a system that can run a 4K monitor at 30 FPS with settings at medium to high. I think that is realistically possible given AMDs situation. That could support 64GB of RAM. Throw in the GPU virtualization that works in VDI solutions like VMware and Citrix and I would be really interested. Reply

While a one chip solution is elegant and powerful by some metrics I think now more than ever keeping the GPU and CPU separate is important for high end performance for the same reason overclocking is dying. As processes get smaller the thermal overhead shrinks because it's no longer "how much power can you dissipate" but "how much power can you dissipate on a a given size die" and there's a difficult reality to accept to the latter question. If I want a powerful GPU it needs a big heatsink and big fans (a la R9 290, 780, etc.) There simply isn't room to put a CPU on GPUs of that size without the chip literally melting. If they can then they could always separate them and get more thermal breathing room and get higher clocks so I'd like to see them stay separate.Reply

Answering the question in the post: Yes, that is something I would buy (if in the market for a new PC or console).The problem with Kaveri as it is right now is this: What role does it actually fill? It is a launch vehicle for HSA, and is a platform for entry level PC gaming. It does fill that role, but I believe that market is small.There are better options for a traditional CPU + GPU computer (Intel), there are better options for GPU compute (a dedicated GPU), there are better options for midrange gaming (a console), and there are better options for mobile (Intel being more power efficient).

I see three directions that Kaveri would need to go to be more competitive: More powerful CPU - that doesn't look likely to happen; FX hasn't done well recently. Lower power for ultra mobile - that doesn't look likely to happen while Kaveri is at 28nm and Intel will be launching 14nm this year. But a more powerful GPU with greater memory bandwidth - That could happen! That would be a killer chip for professionals working at 4K resolutions, and it would make console level gaming performance more affordable and compact. A "Steam Machine" needs a one chip solution like the consoles, with the graphics performance of the consoles. HSA support and better-than-Jaguar CPU performance is just a plus!Reply

The extra 2 memory controllers would be a real performance enabler, even with hUMA. Bank+row interleave on the AMD APUs has boosted performance in CPU+GPU memory starved scenarios by over 50%. 4 DDR3 controllers leading to 8 DDR3 slots would cheaply max 32GB hUMA, and beat 2-controller DDR5 in bandwidth and latency.

The current APU design would benefit, but the 4 controllers would also enable higher performing, more efficient parts, with either the CPU or GPU doubled. I expect a 4+ CPU module Opteron APU at some point with 4 memory controllers.

I do not expect AMD to release an APU besting the XB1 or PS4 in gaming performance until at least 2015. Large-scale cooperative efforts like those tend to come with short-term noncompete clauses. This would mean doubling the GPU or offering a 4-controller DDR5 solution are out of the question for this generation. Reply

As far as I understand it, the x86 instruction set in modern processors is really just a common API for each manufacturer's (AMD/Intel) custom hardware back-ends...if so, why can't AMD use the graphics hardware to offset its FPU deficit? Obviously AMD is expecting software companies to create/modify software to explicitly leverage the APU, but I don't understand why the x86 instruction set "API" doesn't afford them this capability for free; if my standard x86 code is issuing floating point operations, why not simply have the GPU quietly satisfy those operations on the back-end?

If they did that, wouldn't we have an extremely competitive "CPU" product from AMD?

As for memory bandwidth, I'd tend to think that on-package GDDR5 would be a safer bet that on-motherboard; not only does that allow me to take my GDDR5 with me if I buy a new motherboard, but memory bandwidth is always impacted by the length and quality of the traces between it and the memory controller, so on-package ensures those traces won't be a problem.Reply

The thing is that GPUs are only good for massively parallel computations. The SIMD instructions could take advantage of the GPU hardware. However, anything that's not don't in parallel will be better off just running on the normal CPU cores.Reply

I kinda figured from slides that I saw on the architecture blocks on other sites, that it having the capability to do quad channel was a given since it has a 256bit memory interface. Also the ability to run the IMC possibly in DDR5 mode was a given, since it's essentially the same hUMA/HSA design that is in the PS4 which uses gDDR5 memory exclusively.

The limiting factor with Kaveri obviously is the FM2+ socket. They obviously decided to eschew the use of of a quad channel memory, or on board DDR5, to maintain backwards compatibility of the fm2+ socket with older gen APU's.

I'd also imagine, that may be why they haven't decided to make an FX version of the Steamroller cores, since in order to do so with HSA/hUMA, it would require a completely new socket. Remember we're also suppose to be seeing HSA/hUMA enabled GPU's this year as well. Otherwise it'd be stuck with the same memory and system constraints as previous AM3 CPU's. Which historically tend to be bandwidth limited, partially by dual channel memory, and partially by imc/L3 cache speeds.

I'd also imagine there is a lack of desire to use quad channel memory, even Intel dropped support for mainstream triple channel memory on its i series after the first gen i7's. Since it's a hard sell to convince people to buy ram in 3 stick kits instead of 2 stick. Now you only really see triple channel and quad channel in their server chips. Which is where we'll likely see the quad channel imc come into play for Kaveri, in the server space.

In the end, would I like to see an FX series replacement, even if it's on a new socket, with say 6 Steamroller Cores (3 modules), a decent ipg, and the possibility for triple or quad channel memory? Who the hell wouldn't?

Gotta remember with the DDR5 thing though. AMD isn't in the position to drive use of a new memory standard these days. Back in the days of the Athlon, and Athlon XP, and Athlon 64 pushing DDR and DDR2 things were different. But, with hUMA allowing them to put gDDR5 on a board, allowing the cpu direct access to it, would be the easiest work around for them. Reply

if an OEM picked it up as a LAPTOP design it could be pretty excellent. The adjustable TDPs would be perfect because there could be a high-power mode for plugged in/cooling stand and obviously a lower mode for mobility. GDDR5 a little pricier but lower voltage so I'd guess overall pretty good battery. All depends on the GDDR5 latest pricing, but luckily it appears good ole Sony has pushed forward and it should be dropping now to mass production. Way to go, Sony :)Reply

I'd be very interested in a high-end Kaveri. I've pretty much determined that I'll be building/buying a Steam Machine this year both to act as my HTPC (with XBMC) and for occasional gaming. I'm not a big gamer, don't need the best graphics at the best settings, something moderate will be good for me.

Kaveri right now is nearly there but the graphics performance just falls short. If AMD can sell a version of Kaveri with 12 CUs and 256-bit wide DDR3 for about ~$200 I think it will be the perfect SoC for my Steam Machine. Reply

I would love a big Kaveri -- I'd happily buy a 4 module / 16CU / quad-channel APU. I'd be plenty happy if that's what replaced the current FX line even. 8 x86-64 threads, 1024 shaders in the same memory space, an 70+ GB/s between them? Yes, please. I don't care if the TDP is approaching 150w, it'd be worthwhile.Reply

Quad is a little over kill, even a tri-memory controller by default for FM2+ would have been the most intelligent option; best balance between money and performance. As for Kaveri configuration, you always have a split; some people prefer more CPU, a few others prefer more GPU. I think if AMD was to be intelligent with regard to market saturation they need to cater to both these crowds in the APU design; have some more CPU focused. What AMD doesn't seem to realize is that their APU fundamentally is nothing more then a pure compromise; its not excellent at anything, just good at everything. The onboard graphics are still no where near where they need to be to forget about discrete solutions. 20 CUs with a better memory controller would definitely change that. HSA is another reason not to beef up the GPU on the APU because HSA has not taken off; so it makes more sense for AMD to put more CPU power until their is an HSA bottleneck. They are approaching this from the wrong perspective with regards to desktop. On laptops their strategy is on point, but its completely the opposite for desktops. Personally, I thinking of investing in G2140 since its much faster then Kaveri in archiving, encoding etc. I will then possibly buy Pirate Islands GPU midend depending on whether Denver competes with mantle which I suspect it will be a complete failure by comparison because even Denver has to deal with DirectX, and who knows if Nvidia even managed to offload any significant CPU time to Denver or is it mostly for physx. Also AMD apus are way too pricey, so not only is it a pure compromise, but its an expensive one too. 200$ canadian, where richland cost 140-145$ CanadianReply

Just so I'm clear, what's I'm saying is on the desktop side their Flagship APU should be a 6 core, with whatever die space left used for GPU. That would have been a hit, if they released it instead of their current 7850k. Reply

If AMD were to release a version of Kaveri with L3 cache or quad-channel memory, would they be able to just tweak it and release it under the same platform with another number (like A10-7870K) or would they have to "re-engineer" the chip (like in Trinity>Richland)? As I understand it option 1 is only possible if the existing chip already has those components albeit disabled (I don't know if that's possible for the memory controller anyway), am I right?

I also think that the APU is a strange proposition (outside laptops). Home office is best served by intel CPUs without that graphic horsepower and there is a small margin where an APU makes sense before considering a dGPU for gaming. (For HTPCs I think that between AMD's and Intel's offerings the choices are fairly diverse, which is a good thing.) Although the chip looks great and to be a good foundation to build upon, there are still many checkboxes unticked:

1. Good Linux drivers: I use Linux for everything except gaming and some audio software, my family does as well. Intel is hassle-free. Nvidia gets gaming covered (if you game on Linux). I don't think this is more important than other possible improvements, but cheap beefy AMD+free Linux would be something that a lot of enthusiasts would like to see (for HTPC with native Linux gaming maxed out for example).2. Dual Graphics: I remember when I first read about Dual Graphics in Trinity and was very impressed. Then saw some benchmarks and I was disappointed. This may be fixed now with Kaveri (is it?). If Dual Graphics is working the chip becomes much more interesting.3. Mantle and TrueAudio: Everything sounds cool in theory but it's not shipped yet and seems to be quite proprietary now. I hope AMD gets everyone to use this easily and efficiently (easy for developers to add, stable and easy for end users to get performance out of it, and non-problematic for non-radeon users).4. L3 Cache: It puzzles me why this chip doesn't have it (may be part of a bigger plan), but it would make sense.5. HSA: This seems to be the reason of existence of this chip. But it's also something the end-user won't notice or care about until real-world applications use it. The technologies that make HSA seem to be well thought, well designed, and a very good architecture to build a very wide range of systems. I really hope AMD pushes very hard with it and that other HSA Foundation members release products built on HSA.

This is it, I think, from least to most important (even though I believe that each one is equally important). As everyone says, this whole APU thing is like an eternal promise. In this regard Kaveri feels rushed (I don't mean incomplete or buggy) but necessary. I am very interested to see what are AMD short-term plans for FM2+ and Kaveri. For the long-term I hope they succeed with HSA, like they got with x64 (I think that theoretically HSA benefits would show up quicker than 64 bits did).

I may not be right in everything since I am guessing a lot of things I don't know, but just wanted to give my 2 cents. Reply

L cache is used to keep the CPU fed.Intel may require LV3 cache due to lower latency (greater bandwidth, greater buffer requirement).Kaveri has a much higher latency more or less voiding LV3 cache.But on die DRAM for graphics and HSA would make a lot of sense but is expensive.On die cache is expensive.Reply

If you want to go purely on gpu alone, it would seem 7850k would need somewhere around 128-bit/ddr3-2600 if simply a gpu without whatever benefits the cpu cache brings, and of course that is not counting whatever bandwidth the cpu needs. That is also if the design is under full load. Past amd designs were clearly designed toward certain ddr3 spec speeds (above those listed for the processor spec) and peak bandwidth on simply the gpu, so perhaps that is a some-what safe formula assuming they did their homework and that makes sense for realistic loads for the whole chip counting internal cache.

What a 256-bit bus affords amd is to take their most efficient gpu design (outside of some aspects of Hawaii) in Bonaire; 1x16 ROP array with 14 CUs (896sp), a gpu that could actually compete with the xbox/ps4 and play games at decent settings (especially in crossfire), and run it at realistic clocks with cheap memory...ddr3-2400 kits can be had for cheap, and clearly where they should aim.

I've always used the bandwidth formula of roughly 56.25gbps per 1tf on a gpu for scaling, and up to this point it has always worked to show both tangible results as well as a bottleneck.

Those seem like very realistic clockspeeds, as even though GF (on 28nm for instance) is around 10% less efficient per volt than TSMC (the later which is roughly 1v=1ghz on avg, perhaps a percent or two lower...or take the 1.163v 7970ghz which seems binned to 1150mhz or 7870 which is 1.218 and 1200mhz) that still leaves them with 764mhz at the lowest/most efficient voltage of .85v for hpp at GF...or just around perfect for such a design.

Do I think that is a good idea? Hell yes! Both in low tdp form and higher-tdp black editions (coupled with more voltage, better cooling, and faster memory as it becomes cheaper) could bring amd back in the game in a big way. Would it not be amusing to say you overclocked your APU to 1029mhz (1.15v-1.175...still within the scalability power curve) with 3242mhz memory (as ddr3-3000 rated/capabie kits may become cheaper) and were equal than a ps4?

Not saying that is exactly realistic on 28nm, but rather the core design would be solid for future iterations and there would be useful scalability from stock to overclockers.

4 cores with 1024b shared fp unit per module(or alternativelly dinamic context swithing and possibly no fp at all but more inter pipelines...), deeper pipeline and 20-25 cu's working at 3.5-4ghz cpu and 1-1.2ghz gpu. To power this beast properly 2-4gb of on package stacked ram the kind amd is building with hynix.... this beast could even run without external ram. Alternativelly i would rather go 4/6 channels ddr4... then either ddr3 or gddr5... we're talking highend after all arn't we?

For said apu i'd gladly pay 600€ no questions asked!! I would even go higher if needed!

The bhemoth i described would be rather big.. probably in the 500mm2 range using 28nm... but i'd go for broke and use 14nm glofo (they said they're now taping out on) process if viable that should shrink die size and also lower power usage to manageable levels... it will be expensive but such a beast would have around 3tflops of power and hsa with preemtion would bring the full brunt of the gpu to bear... it would be a valid knights landing counter... and a magnificent cpu...Reply

If AMD could actually get some drivers out there to take advantage of the unique features of the APU, then it would interest me much more. On paper there is at least the potential for 2-5 times the framerates vs a discrete GPU of the same specs. But the drivers have to be totally redesigned from the bottom up. I just dont see this rather incompetent AMD laying all this groundwork. I remember back in the day when the SNES was competing against the PC. There was no way you could get SNES type graphics on a PC in 1992. Just look at the boss fights in Super Contra. You simply could not get that from a 1992 PC game, not even close. There was way too much overhead in the gpu drivers and in the OS and just about every other aspect of the platform. It's kind of the same situation today. The entire game engine is designed around a gpu being in its own memory space. It is going to take 100x what AMD has to get developers to dump all they've done and start over. Reply

I am not sure where you have seen the "paper" claiming 2-5 times the framerates vs traditional GPU. I heard the claim of up to 40% better performance with the Mantle driver. However, first, 40% better frame rate would simply allow to catch up with the discrete GPU that Kaveri's APU is based on, and two, we haven't seen Mantle enabled games to confirm this claim yet.Reply

I think AMD will test it internally for Workstations, the bandwith should mainly/only be needed at GPU-loads.VERY INTERESTINGFor the folks that are interested: AMD said: "Dual-Ranked DDR3 Modules are optimal on Kaveri in a 2-Slot-DDR3-Setup !" to COMPUTERBASE.DE they tested on their own benchtable with Single-Ranked first after the NDA cause they didn´t get a Rig from AMD to test. AMD itself shipped only Rigs with Dual-Ranked Memory.please look at the Numbers.translated with google: http://translate.google.de/translate?sl=de&tl=...Reply

Kaveri definitely needs a scale up solution to support proper 2k and 4k gaming without losing the benefit of HSA: Otherwise the HSA ecosystem simply won't develop.

For now that scale up solution might simply have to mean Kaveri modules, which include RAM, pretty much in a dGPU form factor. That could be GDDR5 or "ultra clocked and specially binned" DDR3 depending in the number of CUs on the chip. Beyond 10CUs there might be diminishing returns without GDDR5 and 16 CUs with a somewhat lower CPU clock might still fit into a 150Watts per "blade".

Now I'd just want to be able to put 1-4 of these blade on a back-plane or "motherboard" which is little more than a place to connect the SATA, USB and power cables to create something that scale to the resolution I need at that specific point in the house.Reply

Ok, just hoping that someone has reached this far ;-) as for me, I reached up to page 7.. and might return back later.

Having a personal interest for many years in SFF boards (Small Form Factor), for embedded systems, yes you mobile / cell phone is a computer too... I would recommend the following options:

Embedded option:

gDDR5 Controller on die, 256 bit wide, with 2 - 4 SKU's available from the M/B manufacturer, for on board memory of 4GB, 8GB or 16GB. Reason behind this is that a lot of the embedded systems are not even in general circulation to the average Joe, since they end up for industrial purposes, Data Centres or Rendering farms.Also increase the 'power more' like to (under the old scheme) 4, 6 or 8 CPU's with a much better FPU (Floating Point Unit, since it is not as good as Intel's) and have the GPU(GCN) power as a minimum of 512, but preferably an R9 maybe Hawaii but only half with 1280/80/32 with 512 bus width or a Curaçao with a 256 Bit bus width, both with a somewhat lower Core Clock Rate, so it can be in a reasonable TDP. it can be slightly a larger size, since it will not be user replaceable as a part, as is common, and is soldered directly on board. The board would need to be either a 6 or 8 layer depending on what amount of memory and bus width is used and supporting circuitry.

As for the socketed version, my approach would be similar to a SoC, of which could have a R9 with 768:48:16 for example and since the memory will be on chip and to lower complexity, it could be done with 256 bit wide and lower end with 128 bit wide.

On Chip Ram to Have 1GB or 2GB directly on the chip of gDDR5, or more like 4GB or 8GB if going to be a combined memory for CPU & GPU access.

Alternatively, AMD should push a gDDR5 socket standard, so M/B manufacturers can add sockets on the M/B or have a 'lower end' version with DDR3/4 so more memory can be added by an end user.

If you had the building blocks and the potential for a wider memory interface, would you try building a high-end APU for the desktop? If so, what would you build and why?

I'd love to see a 6 core Excavator APU with about the same amount of CUs like Kaveri (but GCN 2.0) and a quad-channel interface; I'd combine that with a best-bang-for-the-buck (upper) midrange dedicated card.

According to Golem.de, the memory bandwidth should jump from 34.1 to 68.2 GB/sec - just by enabling quad channel on DDR3 modules that are already available today.I really wonder why AMD didn't enable it on Kaveri. Doesn't make much difference if I buy 2x8GB or 4x4GB of DDR3-2133...Reply