Post Your Comment

58 Comments

Intel has some cool features like virtual port assignment, etc. that AMD currently doesn't have (so databases and gigabit network cards go faster on Intel VT hardware in the upper end Xeons).
Does the new AMD core have any improvements in this? Reply

I think I recall that AMD expects the GPU to take the most of the FPU load in the future, so maybe the APU chip will be an FP op monster when the GPU part is not used much. Pair that with an external GPU and theoreticaly unused on die GPU in AM3 and there is actualy no problem for AMD to make the chip AM3 compatible..

Anyway I don't give a damn about Direct Compute whatever version it is. I want good OpenCL performance and finaly decent OpenGL drivers from ADM. And finaly a working and usable Linux driver would be nice. Reply

If there's sone thing interesting about the two next generation (Bulldozer and Bobcat)architures (modules) it's that they both appear to have been designed with far less floating point power than previous AMD CPU's.

This tends to indicate that both next gen modules will likely have a GPU on the same die to shunt floating point operations to.

Traditionally GPU's completely smash the x86 architecture when it comes to floating point performance so this will be a good move.

The Llano looks to be a the use of the current gen Propus (Athlon II) core with an APU to massively boost floating point performance.

Not a big deal as it looks like they're positioning Llano as their mainstream product whereas Bulldozer will fit into the same niche as the i7 is currently in. Bobcat is their atom equivalent which should beat that handily.

I'll be keenly watching the renewed CPU wars next year. This integration of the GPU has the potential for AMD to leapfrog Intel in all performance segments if they can pull it off. Intel's graphics chips so far have been abysmal.

The Larabee will not help them in this respect as it's essentially a large grid of dumbed down x86 cores. As I've previously mentioned, floating point operations have always been x86's weakness.

We pretty much already know from the die shot the 'APU' is 480sp, which implies 8 ROP and 24 TMU. A little stronger than 5670 architecturally speaking.

GF has said 28nm bulk should allow for a 40% increase in clock speed compared to the same size die and TDP of a 40nm bulk chip, and 28nm is a 10% linear shrink of 32nm. Of course SOI should have better characteristics than bulk, but that gives you an idea of what to expect. Because of the given die size of the core minus L2 in the article, we now know Fusion is 13x13mm2, 169mm2, or exactly the same size as Propus (Athlon II X4), so THIS DOES APPLY.

Imagine a very possible scenario where the GPU is clocked at 1/4 of the CPU standard. It's very plausible this could start at the 3.2-3.6ghz (800-900mhz gpu) and creep up to 4ghz with a 1ghz GPU clock contained in a 95W TDP. Wouldn't be surprised by an ~875mhz set clock either.

Add to that the probability of Sideport going to two chips instead of one. When the 900 series chipset(s) launches, GDDR5 will be in 2Gb form. This means a likely 512MB of decent on-board memory, perhaps of the 7Gbps variety. That's 56Gbps on a 64-bit bus.

What we'll likely end up with is a GPU that will be faster than an 8800gt, sometimes by a lot, slower than 5750/gts250, likely questioning the usefulness of a 128sp 64-bit 28nm Fermi for either platform, and perhaps a xfire partner to the smallest Northern Island. This also brings the defacto standard for gaming to this level, which is GREAT news, because lots of people own old 8800gt or similar-performing cards as hand-me-downs.

On the CPU side, just simply compare Athlon X4 to Clarkdale. I would imagine the 2c version of Sandy Bridge is essentially Clarkdale with GPU on die, with similar cpu clockspeeds, tdp, and die size to fusion. That would mesh with the 4c die shot that's been on the net for 6 months (Citing 3-3.8ghz clocks and 1-1.4ghz gpu). The thing is, AMD will have a CAPABLE gpu.

Personally, I think it's going to be cake-and-eat-it for anyone gaming below 1680x1050 at that point, say a 720/768p htpc, or a good casual all-arounder + GPGPU perks. Plus, If you look at a 5670, there's plenty of stuff it can run at decent rez...This could be up to a third faster in some cases, if such speculation pans out. Reply

Your asumption that the gpu will not have the same clock as the cpu is kind of odd to say the least... It's the same(exactly the same) process as the cpu why on earth would you want to have it artifically downclocked? Since a) soi is much more efficient and dense and b) you're working on a smaller node with much better technology with much less heatdissipation and so on... The 2 things aren't even 2 dies like in the intel solution ... they are on the same friggin silicon... If it will feature 480 sp's @ 3+ GHz it would behave as a ~1600 sp bulk gpu (eerm... remember HD5870...) so what's the deal? Plus think of all the v/frequency synch you need to do between the 2 cores to get them in one line... Why even bother with all that if you can put a gpu+cpu on a silicon that can outrun any current generation laptop solution at far less power drain?? Reply

I don't think you understand the difference between GPUs and CPUs. Because of the way they are designed CPUs clock higher. This has to do with the pipeline in a CPU. Ever wonder why the highest clocked Pentium 4 still runs faster than the fastest i7? Reply

I suspect that the Zambezi will be 4 Bulldozer modules, each showing up to the OS as 2 cores. I think they will market it as 8, but it will be 4 Bulldozer modules. Not that I'd mind that or anything (My current 4 Cores / 4 Threads aren't exactly being strained). Just want to make sure they aren't trying to slip one past us.

The main thing I want is for Bulldozer to be much faster clock for clock than STARS. Reply

So with the integration of CPU/GPU, who cuts through the muck and produces a CPU that is more like a GPU as far as parallel processing goes? To clarify it seems GPU's are innovating faster and becoming quicker at doing all purpose functions of mainstream CPU's. So when do they ditch the idea of 2-6 thick core CISC chips in favor of thin mass multi core RISC chips that can emulate CISC commands quicker then native CISC CPU's? Or am I just way way off base? Reply

You see, basically you're right ... practlcally you're not. To create an emulator you need extra power. Creating generic wireing for a jack of all trades cpu you need technology that's in my opinion at least 10-20 years out. Besides the fact that you need to manage the entire wireing of small unified processing units, you need to translate the input than you need to make this entire thing happen quicker then to have command->dedicated wireing->output(per cicle). That means that your jack-of-all-trades needs to be so insanely efficiently engineered that it does not waste extra energy cost you extra clock cicles and so on and so foth. And this my friend is not on anybodys drawingboards at this moment. Even AMD has sensed that this is the future back then when they decided to go ahead with fusion (And only now intel has gotten the idea and are going with larrabee)... but they just didn't realise how complex the task is... luckilly for us intell did, and gave us some good cpus while AMD was in the s(t)inkhole... now they seem to come back, let's wellcome this advancement and hope for the best competition 2010+ Reply

We already did. The last x86 processor that was entirely CISC (I too may be off base) is the 386. The 486 started using the pipeline (I think), Pentium used , and by the time of the Pentium M the micro-ops fusion appeared (CISC instructions are broken into RISC-like micro operations, and Pentium M could "fuse" together micro-ops to execute them in a single clock cycle - even if the micro-ops were from different x86 instructions).

As for emulating x86 with a different internal architecture, you should look at the Transmeta processors (which were very wide RISC-like internally, and ran x86 code thru some kind of interpreter) Reply

With DX11 support, comes support for the latest direct compute model. This support is meaningless for games as these "APU"s will be too weak to play games making use of it, but applications like video rendering might make use of this and accelerate encoding dramatically, we are already seeing this with the avivo video encoder, and 3rd party apps that use ati stream(and CUDA). the support of dx11 tells me this apu will be much more powerful then 3300/4200. with the launch of 5450, we saw that adding support for dx11 took up a very large amount of space, which is why 5450 has the same amount of stream units, despite using the much smaller 40nm process. I think dx11(and open cl) will play a bigger part in these new cpus and an igp with say, 120spu's would accelerate encoding(and general fp performance i bet) dramatically. all of this makes sense as bulldozer will have half as many FP units as integer, the igp will have to pick up the slack. Reply

I dunno why the completed frames could be sent out on the HT bus -- a relatively simple HT-to-video out device, either standalone or in the southbridge, could unpack them to DVI or a DAC if you're going to a VGA out. And the MB would work without cpu with onboard video -- the vid outs just wouldn't function. Reply

I don't know how much of the integrated graphics are on the microprocessor. However, to output a high definition image (1920 by 1080 at 60 Hz, 16 millions colours) one needs to send out about 480 MBps of data (8 MBytes for a frame, times 60 frames a second).
Now, the processor has quite a bit of memory bandwidth, and quite a bit of HyperTransport bandwidth - but it will be eaten into by the need to use some of it for GPU memory access (remember that current generation graphic cards use tens of GBps of memory bandwidth, and are in some cases limited by memory bandwidth). Contrasting this, CPUs hardly go over 10 GBps memory bandwidth.
These being said, I think the processor will have direct video output, and for this reason it needs some more dedicated pins (nine for VGA, maybe another ten for DVI, and some others of those pins for HDMI, DisplayPort, whatever else). This will make the mainboards cheaper and easier to build, would allow the use of existing north/south bridges and so on. Reply

We try to keep as few pins on the processor as needed. I would imagine there's only 8-10 video out pins that will need to be ran though a conversion chip of some sort to convert to monitor output. The CPU die itself is incapable of handling the +/- 5V needed to power your DVI connection. Reply

No reason to add piles and piles of extra pins to the CPU to support the current mess of analog and digital interconnects, as well as waste die space on DACs, TDMS transmitters and so on when all of that belongs much more in the chipset.

Consider AMDs current infatuation with multi-monitor support. Just how MANY pins would you like AMD to add to support 3+ monitors? It'd require dozens. You pack all that shit into the chipset instead and voila - you can support virtually as many monitors as you like. The hypertransport bus will be virtually unused with on-die graphics (save for some intermittent disk and network I/O which is marginal compared to the total bandwidth of the interface), it could support an almost "unlimited" number of outputs, or at least as many as the system has room for on the backplate. :)

Not sure how you figure monitor outputs coming from the CPU would be "cheaper" or "easier to build" than them coming from the chipset. Seems about the same from my perspective. Reply

They could have, but then socket AM3 processors would not fit into Socket AM2+ boards. AM3 was designed to be backwards compatible with AM2. The question is will AMD at some point stop including IGP on their north bridges after the 800 series chipsets. Reply

It's very likely, when Intel overhauled their chipsets going from 775 to 1156 (Late 2009), they took into account the pins necessary for video whereas when AMD overhauled theirs from 939 to AM2/AM3 (~2006), this idea of fusion was just that... an idea... Reply

I personally think that the new processors will work in AM3 boards, however doing so would then deactivate the Graphics, AMD has enjoyed some great backwards compatibility as of late and it's a good selling point, so it wouldn't surprise me if they engineered a solution around it.

Plus if you were going to throw this chip into an AM3 board chances are you have a Radeon 3xxx or higher IGP anyway, and the GPU on the CPU wouldn't be required in such a case. (Especially considering there is a substantial amount of boards with Side-port memory, why make that go to waste?)

I guess we have to do the waiting game for awhile yet for more details to emerge, but Currently I'm running an AM3 Athlon 2 620 in an ancient AM2 board (Not AM2+) and works a treat, if a new board is required for me to upgrade that system I wouldn't loose any sleep over it. :P Reply

What I would like to know is if there is any advantage for the common user, and if there is any advantage for the common AT reader. I think common user is fine with atom that can play hulu, while AT reader wants games and is interested in CUDA projects like folding@home, while this help that? Reply

just do the math man... this is NOT bulk we're talking about this is SOI/HkMg ... that means you will get it in 3ghz flavours, meaning it can push about 2TFLOPS at that horrendous power it is an equal match to todays descretes. Think: 4x the speed, 32 nm vs 40 nm, think much denser SOI vs bulk process.(provided it will not have less then 4x less rops then HD5870) This baby is going to play crysis like a charm.... not to mention new games. All this provided this thing is not going to be memory starved! So hell yeah! Give me 4 laned 2ghz DDR3 and I won't need a discrete any longer! This thing will prove or fail depending on io it will have... let's pray amd gets the balance right this time around! And let's pray we get a SOI incarnation for the HD5870 as the next generation... Reply

If a program is compiled for a standardized API it won't favor one architecture over another due to the compiler, rather the architecture that is a better implementation will run the code faster. Reply

DX11 has more compute features than DX10. So if a program is compiled in OpenCL, it will be able to perform more functions using the IGP on the AMD system than the Intel system. Kind of a reverse of optimizations on the CPU in which Intel normally included the instructions first. Reply

As Mr Perfect points out, DC 5.0 runs on any DX11 card. I'm sure it will run on future DX12 cards too. Even integrated DX11 solutions from Nvidia/AMD will be able to run DC 5.0 code, and thus accelerate a lot of parallel code. Not to mention that a piece of software can be written with both DC 4.0 and DC 5.0 code, much like having multiple rendering paths in 3D games. Reply