Post Your Comment

108 Comments

VERY eye-opening discussion on TMT. Thank you for it.
I've been trying to understand how GPUs can be competitive for scientific applications which require lots of inter-process communication, and "local" memory, and this appears to be an elegant solution for both.

I can identify the weak points of it being hard to program for, as well as requiring many parallel threads to make it practical.

But are there other weak points?
Is there some memory-usage profile, or inter-process data bandwidth, where the trick doesn't work?
Perhaps some other algorithm characteristic which GPUs can't address well?

this shows how many people don't run a dual monitor setup. I would snatch up one of these 260/280's over the gx2's anyday, gladly!!

The performance may not be quite as good as an sli setup, but it will be much better than a single card which is what a lot of us are stuck with since you CANNOT run a dual monitor setup with sli!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Reply

Well what the heck are they doing with 1.4B transistors, which is becoming the largest die that TMSC has been producing so far?
The larger the core, the more likely that an blemish would take out the core. As far as I know, didn't Phenom (4 cores on die) suffered low-yield problems?
Reply

You know, when you consider the price and you look at the benchmarks, you start looking for features and NVIDIA just doesn't have the features going on at all.

COD4 -- Ran perfect at 1920x1200 with last gen stuff (the HD3870 and 8800GT(S))so now the benchmarks have to be for outrageous resolutions that a handful of monitors can handle (and those customers already bought SLI or XFIRE, or GTX2 etc.)

Crysis is a pig of a game, but it's not that great (it is a good technical preview though, I admit), and I don't think even these new cards really satisfy this system hog... so maybe this is a win, but I doubt too many people care... if you had an 8800GT or whatever, you're already played this game "well enough" on medium settings and are plenty tired of it. Though we'll surely fire it up in the future once our video cards "happen to be able to run it on high" very few people are going to go out of their way $500+ for this silly title.

In any case, then you look at ATI, and they have the HDMI audio, the DX 10.1 support and all they have to do at this point is A) Get a good price out the door, B) Make a good profit (make them cheap, which these NVIDIA are expensive to make, no doubt) and C) handily beat the 8800GTS and many of us are going to be sold.

These cards are what I would call a next gen preview. Some overheated prototypes of things to come. I doubt AMD will be as fast, and in fact I hope they aren't just as long as they keep the power consumption in check, the price, and the value (HDMI, DX10.1, etc).

Today's release reminded me that NVIDIA is the underdog, they are the company that released the FX series (desperate technology, like these are). ATI has been around well before 3DFX made 3d-accelerators. They were down for a bit, and we all said it was over for ATI but this desperate release from NVIDIA makes me think that ATI is going to be quite tought to beat.

Can I go somewhere to find the exact settings used for these benchmarks? I appreciate the tech side of the write up but when it comes to determining whether I want one of these for my gaming machine (I ordered mine at midnight), I find HardOCP's numbers much more useful. Reply

AMD/ATI isn't going to abandon the high end like your article implies. Their plan is to make a really good mid range chip, and ductape to cores together ala the X2's. Nvidia goes from the high-end down, ATI from the mid-end up. From the look of it, ATI might have the right idea, atleast this time around. I seriously doubt we'll see a two core version of this monster anytime soon. Reply

Isn't the R700 high-end model going to have a direct link between the two cores. Could be a false rumor, but i would think that would solve a lot of problems with having two GPU's on a single board, since games would see it as 1 chip instead of a Crossfire/SLI setup. And besides, why the heck does it matter what the card looks like under the cooler. If it delivers better performance than Nvidia's offering without driver headaches, I don't think most gamers are going to care. Reply

Since the release of the 8800GTX top end single GPU performance has been a little stagnant... then came the refresh (8800GT/8800GTS-512) better prices came into effect.

Now we've got the new generation, and like in years prior, the new gen single GPU card has near performance of the previous gen in SLI. Price is also similar with when NVIDIA launched the first 8800GTX.

Sure, I wish they came in at a lower price point and at less power draw. (Same complaints that we had with the original 8800GTX). Lower power and lower price will come with a refresh.

Will I be getting one? ... nahh these cheap 9600GTs, overclocked 8800GT's and 8800GTSs will be the cards I recomend till i see the refresh. But I'm still happy there's progress.

I'm hoping the refresh hits around the same time as Intel's updated quad core. Reply

but i'm not gonna spend that much money for something that doesn't deliver enough value (or even performance) compared to other solutions that are available. you pretty much reflect my own sentiment there: it's another step forward but not one that you're gonna buy.

i think people "don't like it" because of that though. it just isn't worth it right now and that's certainly valid. Reply

1)First an foremost at the heart of a real gamer ticks the need for good story lines fed by characters you will never forget, held by a gameplay you will fall in love with and finally covered by graphics that will transport you to another world (kinda like when I first played FF VII on my PC).

Within the context of the world we live in today I wonder what is really going through the minds of these people selling $600+ video cards. Kinda like those $10 000+ PCs. Madness. Sure they have their market up there but I shudder to think of how much money has been poured into appeasing a select few. Furthermore for what reason? Glory? I don't know but seeing as how the average gamer is what has made the PC/Gaming scene what it is, where does a $600+ video card fit into the grand scheme of things?

2) The possibilities that these new cards open up certainly seem exciting. The comparison with intel has been justified, but considering the other alternatives out there are much further ahead in development, who is going to bypass intel/amd/etc for a GPU technology based supercomputer?
Reply

developers will bypass Intel, AMD, SUN, whoever owns Cray these days, and all other HPC developers when a technology comes along that can speed up their applications by two orders of magnitude immediately on hardware that costs thousands (and in large cases millions) less to build, run and develop for. Reply

LOL that was quite funny but incorrect as well, there's more than 4 Billion of people in China, in the future probably nVidia will launch a 4 Billion Transistors GPU hehe. It will require a Nuclear Reactor to turn it on, a and two of them to play games :D Reply

4 Billion? Did you just make that out of thin air. Latest tabs show approximately 1.4 billion (give or take a couple hundred million). The world population is only estimated at 6.6 billion, so unless 60% of the people in the world are living in China, you're clueless.

Firstly I must say I enjoyed reading the whole article written by Anand Lal Shimpi & Derek Wilson. However, what does not make sense to me is the fact that "At most, 105 NVIDIA GT200 die can be produced on a single 300mm 65nm wafer from TSMC", but by looking at the wafer, only 95 full dies can be seen. Is this the wrong die?

Also, it is not fair to compare the die of the Penryn against the GTX 280die because Penryn's die was made in 45nm process and GTX280 was made in 65nm die. Maybe it would be fair to compare it with the Conroe (65nm) die. But well done folks for putting an excellent article together! Reply

FANTASTIC write up on fine-grained TMT. I was unaware about this threading technique and was always thinking of this in class or whenever someone would talk about hyperthreading. this technique was literaly in my head for well over a year and I didn't know what it was called or that it even had a name. I always thought there had to be a more elegant way than hyperthreading to do multithreading down at the chip level without doing the OS style time slicing.

i was sitting there wondering how the hell the schedule and run these SPs and then bam whole page about it

really appreciate the effort that goes into researching the core of these chips. i know not everyone likes it but for guys that are educated and work in the field its really interesting Reply

remember though that this type of fine-grained TMT only has payoffs in systems running millions of threads concurrently.

on an OS you'll see hundreds or even thousands of threads on heavily used systems, but there still wouldn't be enough concurrent action to justify this type of architecture for general purpose computing.

of course, as developers push towards an effort to thread their code as much as possible, who knows what architectures might be worth exploring on the desktop ... Reply

1) Last week at WWDC Apple announced OpenCL as an alternative to CUDA. It's a C99 based HLL for creating compute kernels that can be deployed to GPU's and CPU's. Today Khronos officially announced a working group for this, and NV is a part of the committee. As such, your wish for an industry standardized compute language similar to CUDA that runs on all platforms and vendors HW may not be so far off.

2) I believe your interpretation of how multiple threads simultaneously execute in an SM is incorrect. Per thread context switching is not free, and you would never be able to execute a different thread every cycle in the manner described. There is far too much context that needs to be swapped out, and there would be significant power implications for doing that, in addition to the latency. Instead, I believe what NV is claiming is that any given SP executes a single thread. All threads in the SM can all be a single warp, but you can also have multiple threads (one per SP) all executing simultaneously in an SM.
Reply

1) I haven't had a good chance to look at OpenCL, but I certainly hope that if it's everything everyone is saying it is in the comments here that it takes off in a bigger way than CUDA :-)

2) it does not context switch per thread -- warps define a context, and you have 32 threads grouped together. these threads all share the same instruction stream, which is why if threads in a warp take different directions on a branch all 32 threds must follow both paths.

NVIDIA has flat out stated that every schedule clock a new warp is scheduled and that it takes 4 clock cycles to process one warp on an SM. For both of these to be true, we conclude that the scheduler alternates scheduling SPs and SFUs on altenating clocks which means the SPs would be scheduled every 4 clocks relative to itself.

On 8 SPs per SM, you some how need to execute 32 threads in 4 clock cycles. This makes sense if you execute 4 threads per SP in some way. The details at this point are fuzzy though.

regardless, if an SP executes 4 different threads from the same warp, there is no need to context switch to execute any of these threads -- again, threads in the same warp share context. Reply

From this conclusion, Amd seems to be the shrewd player, let nvidia and intel duke it out in the high voltage, heat, meaningless speed gpu while Amd can pull something like its first dualcore or athlon64 for the win.
this new beast from Nvidia will have how many developers making games for it right away? i'm guestimating maybe 2yrs-4yrs down the road we'll see a decent title that take full advantage of this hardware.
by then Amd will have something of a midrange that can more than handle the games.
2 things nvidia could work on that it already has, the ps3 market, and small graphic devices to improve profits. shrink the ps3 gpu further so Sony can shrink it's machinel and sell more.

The GT200 core may be a technical masterpeice in terms of actually making something that big which is fully functional on GTX280 cards, but it seems to me the penalty of fabbing it at 65nm negates much of the benefits of such a wide GPU.

They've had to drop the clock speeds throughout presumably because of the ridiculous amount of heat such a large core generates, which means the ~60% performance advantage in current games over the G80 core at similar clock-speeds is somewhat reduced.

Given that ATI are not producing their 55nm cores in AMD's fabs but instead are getting them churned out reliably elsewhere, nVidia have made a mistake this time around in having their high-end product rely on previous-generation fabrication as it makes it run too hot to allow the clock-speeds needed for it to be the product it should be. There is always a risk in transitioning to a smaller fab technology, and nVidia suffered badly in the past by doing so too early, but with a chip the size of the GT200, they really should have gone to 55nm even if it meant a delay of a month or three, whilst the smaller cut-down derivatives were rolled out first. Reply

Great article, but what about the microstuttering issues present in Nvidia's 9800GX2 cards (both SLI and Quad-SLI)? There is very little discussion on this, but I've seen some benchmarks where the FPS floor is 4fps with the 9800GX2s. Can you add a subjective review of whether or not the actual gameplay is smoother with the GTX280s across these games? Aggregate numbers may say one thing, but I've returned a 9800 GX2 Quad-SLI setup because it was unable to handle the incredible amount of texture loading that was done in Age of Conan (2560x1600 4xAA 'High' settings = 4fps). The 8800 GTX Tri-SLI configuration I am currently using is more resilient to microstuttering with its increased bus and memory capacities, but I'm very curious about the GTX280s and their increased memory and bus on texture-heavy games like Age of Conan. Reply

I find it humorous that nobody discusses the fact that the shrink has already taped out and will likely be out in two months or just after. This humongous chip was only released so that when AMD releases in the next few weeks they will be behind still in single GPU cards. This is basically what Intel does to AMD every time AMD has a better chip. For all intents and purposes this is a PAPER release of what will come in 2-2.5 months (In Intel's case they just show you what will be out 6 months from now, and a large portion of people don't buy an AMD because Intel might be ahead by xmas...LOL - works like a charm every time AMD is ahead). THE DIE SHRUNK CHIP! Most likely with faster speeds. I suspect they'll come with "ULTRA" version first (and stick it on top of the price heap, so as to not kill all FAT cards in the channel already) and then filter down as these big suckers leave the channel. That's if they even plan to sell more than a few of these to begin withat 65nm. It's only out there so AMD won't look any good in two weeks.

MIND SHARE is everything, which is why Intel's KING of the paper launch when behind strategy. They've even went to doing it for all chips no matter what now. Nehalem scores 6 months before availability. AMD's marketers have no clue an should be fired. You have to play the same DIRTY game as your enemy or you've already lost. If AMD had half a brain in their head they'd paper launch an ultra or 2x4870 version for the same reason...LOL. Then claim "our 4870x2 makes nvidia look like crap for $600"...ROFL. Who cares when it's available, just say it. Having said that, Nvidia will wipe the floor with them in 2 months anyway on a 2xGTX280 that's die shrunk. Which is all they are doing today...BUYING TIME! Reply

Say what you want about this guy but this is partially true which is why AMD/ATI is in the position they have been. They are slowly climbing out of that hole they've been in though. Would have been nice to see 4870x2 hit the market first. As we know competition = less prices for everyone! Reply

I would love to kick you hard in the face, breaking it. Then I'd cut your stomach open with a chainsaw, exposing your intestines. Then I'd cut your windpipe in two with a boxcutter. Then I'd tie you to the back of a pickup truck, and drag you, until your useless fucking corpse was torn to a million fucking useless, bloody, and gory pieces.

Die painfully okay? Prefearbly by getting crushed to death in a garbage compactor, by getting your face cut to ribbons with a pocketknife, your head cracked open with a baseball bat, your stomach sliced open and your entrails spilled out, and your eyeballs ripped out of their sockets. Fucking bitch
Reply

The main benefit from the 280 is the reduced power at idle! If I read the graph right, at idle the 9800 takes ~150W more than the 280 while at idle. Since that's where computers spend the majority of their time, depending on how much you game, that can be a significant cost. Reply

Maybe you should look at the GT200 series from the point of view of nvidia's GPGPU customers - the academic researchers, technology companies requiring fast number-cruching available on the desktop, the professionals in graphics-effects and computer animation - not necessarily real-time, but as quick as possible... The CUDA-using crew. The Tesla initative. This is an explosively-expanding and highly profitable business for nVidia - far more profitable per unit than any home desktop graphics application. An in-depth analysis by Anandtech of what the GT200 architecture brings to these markets over and above the current G8xx/G9xx architecture would be highly appreciated. I have a very strong suspicion that sales of the GT2xx series to the (ultra-rich) home user who has to have the latest and greatest graphics card is just another way of paying the development bills and not the true focus for this particular architecture or product line.

nVidia is strongly rumored to be working on the true 2nd-gen Dx10.x product family, to be introduced early next year. Considering the size of the GTX280 silicon, I would expect them to transition the 65nm GTX280 GPU to either TSMC's 45nm or 55nm process before the end of 2008 to prove out the process with this size of device, then in 2009 introduce their true 2nd-gen GPU/GPGPU family on this latter process. A variant on the Intel "tic-toc" process strategy. Reply

I think these ridiculous prices and lackluster performance is just a way for them to sell more SLI motherboards, who would buy a $650 GTX 280 when you can buy two 8800GT's with a SLI mobo and get better performance? Especially now that the 8800GT's are approaching around $150. Reply

It's only worth riding the bleeding edge when you can afford to stay there with every release. Otherwise, 12 months down the line, you have no budget left for an upgrade, while everyone else is buying new $200 cards that beat your old $600 card.

So yeah you can buy an 8800GT or two right now, and you and me should probably do just that! But Richie Rich will be buying 2x GTX 280's, and by the time we could afford even one of those, he'll already have ordered a pair of whatever $600 cards are coming out next. Reply

Nope, the majority of these cards go to Alienware/Falcon/etc. top of the line, overpriced pre-built systems. These are for the people that blow $5k on a system every couple years, don't upgrade, might not even seriously game, they just want the best TODAY.

They are the ones that blindly check the bottom box in every configuration for the "fastest" computer money can buy. Reply

Very few people are richie rich and stay at the bleeding edge. People that are very wealthy tend not to be computer geeks and purchase their computers from Dell and what not. I'd say at least 96% of gamers out there are value oriented, these $650 cards will not sell much at all. If anything, you'll see people claim to have bought one or two of these in forums and other places, but their just lying. Reply

At the point where NV has actually managed to position SLI mobos and GPU's where you actually need that much power to get decent FPS (above 30 average) from games gaming on the PC will be entirely dead to all those but the most esoteric. It would be different if there were any games worth playing or as many games as the console brethren have. I thought GPU's/cases/power supplies were supposed to become more efficient? EG smaller but faster sort of how the TV industry made TV's bigger yet smaller in footprint with way more features - not towering cases with 1200KW PSU's and 2X GTX 280 GPU's? All this in the face o drastically raised gas prices?

Wanna impress me? How about a single GPU with the PCB size of a 7600GT/GS that's 15-25% faster than a 9800GTX that can fit into a SFF case? needing a small power supply AND able to run passively @ moderate temps. THAT would be impressive. No, Seargent Tom and his TONKA_TRUCK crew just have to show how beefy his toys can be and yank your wallet chains for said. Hell, everyone needs a Boeing 747 in their case right? cause' that's progress for those 1-2 gaming titles per years that give you 3-4 hours of enjoyable PC gaming.....

nVidia say they're not saying exactly what GT200 can and cannot do to prevent AMD bribing game developers to use DX10.1 features GT200 does not support, but you mention that

"It's useful to point out that, in spite of the fact that NVIDIA doesn't support DX10.1 and DX10 offers no caps bits, NVIDIA does enable developers to query their driver on support for a feature. This is how they can support multisample readback and any other DX10.1 feature that they chose to expose in this manner."

Now whilst it is driver dependent and additional features could be enabled (or disabled) in later drivers, it seems to me that all AMD or anyone else would have to do is go through the whole list of DX10.1 features and query the driver about each one. Voila- an accurate list of what is and isn't supported, at least with that driver. Reply

the problem is that they don't expose all the features they are capable of supporting. they won't mind if AMD gets some devs on board with something that they don't currently support but that they can enable support for if they need to.

what they don't want is for AMD to find out what they are incapable of supporting in any reasonable way. they don't want AMD to know what they won't be able to expose via the driver to developers.

knowing what they already expose to devs is one thing, but knowing what the hardware can actually do is not something nvidia is interested in shareing. Reply

Well, yes and no. The G80 is capable of more than what is implemented in the driver, and also some of the implemented driver features are actually not natively implemented in the hardware. I assume the GT200 is the same. They only implement the bits that are actually being used, and emulate the operations that are not natively supported. If a game comes along that needs a particular feature, and the game is high-profile enough for NV to care, NV will implement it in the driver (either in hardware if it is capable of it, or emulated if it's not).

What they don't want to say is what the hardware is actually capable of. Of course, ATI can still get a reasonably good idea by looking at the pattern of performance anomalies and deducing which operations are emulated, so it's still just stupid paranoia that hurts developers. Reply

Games are tested at 2560x1600 in these benchmarks with the 9800GX2, and some games are even playable.
Now when i do this with my GX2 at this res, a lot of the time even the menu screen is a slide show (often under 10FPS). Epecially if any AA is enabled. Some games that do this are Crysis, GRID, UT3, Mass Effect, ET:QW... with older games it does not happen, only newer stuff with higher res textures.

This never happened on my 8800GTX to the same extent. So i put it down to the GX2 not having enough memory bandwidth and enough usable VRAM for such high resolution.

So could you explain how the GX2 is getting 64FPS @ 2560x1600 with 4x AA with ET:Quake Wars? Aswell as other games at that res + AA. Reply

I know the article is aimed to hit as hard as the product it's introducing us to, but put a little English into your English.

"Mass" and "aggression".

FWIW, the GTX's numbers are unreal. I can appreciate the power-saving capabilities during lesser load, but I agree, GT200 should've been 55nm. (6pin+8pin? There's a motherboard under that SLI setup??) Reply

There is an important precedent that gives Nvidia good reason to not rush to a new smaller process level. Recall when ATI first became a serious player in gaming GPUs with the 9700. It was for its time a big chip pushing the limits of the process level, while Nvidia at the time was concentrating on bleeding edge technology. Nvidia's chips got stomped by ATI's in that generation, in large part because the ATI chip had far better optimization of its transistors. Reply

We can agree the pricing sucks. But the point that seems to be missing is that Nvidia promised a 50% performance improvement and they delivered. The 280 delivers 45FPS vs 32FPS for the 9800GT in Assassins creed. Thats just shy of 50% (48FPS) which is a huge performance increase compared to what we have been getting the past couple years for a new card. Slap 2x280 on a card and it vaporizes the 9800 GX2 or any SLI/Xfire solution. The 9800 GX2 scales ~63% over the 9800GTX. So if you do that for a 280GX2 (or SLI) you get roughly 73 frames per second. Plus the new cards have more memory to deal with bandwidth and large textures vs the nuetered 512 on the 96/9800's and 8800GT... the reason I have held onto my 8800 GTX with 768mb. Granted I won't be rushing out and buying one tomorrow but the 280 is the fastest GPU and a x2 will be faster than any other x2 card. It's a little rediculous to think the single 280 sucks becuase it's not faster than multiple GPU's like the 9800 GX2 (although when memory counts it is). Reply

I don't think many people at this forum tread understand that nVidia target is the supercomputer market. I was totally impressed from one post a month ago, where a software engineer managed to put and use 3 SLI system for magnetoresonance rendering. Nvidia and AMD /that's why they acquired ATI/ have already significant experience in multiprocessor and parallel calculation. nVidia is ahead though, since they have CUDA becoming more popular for complex calculation. A year ago Intel realized parallel processing from Sun is their biggest danger, now nVidia and Ati come too. Imagine, supercomputers build with thousands of G200 chips, and only some Intels used for mapping, instead of thousands of Xeons. nVidia thinks way more ahead just for the mere visual/gaming market. I am very very impressed, and very eager to see what ATI can do. Also, I hope Ati and Havoc will be able to offer competition to CUDA, or uniformity? Anyway, from a scientific point of view, recent developments in the graphic market make foundamental science more affordable than anytime before. Reply

I am not sure why you are comparing this chip to a Penryn or other general purpose CPU as the comparisons are meaningless. GPU's are designed very differently than CPU's, namely a high level descriptor language is used and the design is then created by a program, which is then hand tweaked by engineers. By contrast, a CPU may use a high level language, but the actual design is almost entirely done by hand, with large teams working on each sub component and literally years of tweaking. It takes Intel between five and ten years to bring a design to market, which is why there is such a push by them to keep adjusting the design and optimizing it to stretch its usefulness out as long as possible to maximize the initial investment. This simply does not happen with a GPU.

GPU's are designed to last 18-24 months as a competitive solution. nVidia and Ati cannot afford to spend even five years designing them. As a result the level of hand optimization is greatly reduced, and inefficiencies with transistors are tolerated. Typically they are produced on equipment that is already paid for by the previous, more optimized products, or contracted out to third parties(TSMC). Since the products are sold for a premium, the wasted die space is not very relevant. It is a diametrically opposed process to what you see with CPU development.

Despite how impressive it may seem to go on about 1.4 billion transisters, truthfully a modern CPU does more with far less than a modern GPU, and honestly neither nVidia nor Ati are in the same league as Intel and AMD, neither at the engineering level nor when comparing the products they put out. To an Intel engineer, this GPU is at least four times larger than it needs to be to get the performance you get out of it.

The maturation of the industry, either due to reaching a point where GPU's can do 90% of what anyone needs, or simply because power budgets get more restrictive, will come when the level of optimization required for a CPU is required for a GPU, and product cycles stretch out to 3-5 years. Then you will have a more direct comparison between the two, since the design parameters will be much more similiar.

I am not knocking nV here, btw, I'm simply calling into question why one would even compare a Penryn to a GPU, it makes no sense at all when they were designed from the ground up for different purposes, lifespans and with different transister budgets. Reply

I think what this shows is there a brute force way of doing something that while not necessarily pretty can get you to a goal. Yes compared to Intel's latest and greatest it is a grotesque abomination of wasted energy/transistors/die size, but the bottom line is it is pretty darn impressive from a CPU/GPU standpoint.

I think many of us long for the days of more than 2 major competetors for each race (CPU/GPU). We've been stuck in a rut with ATI and Nvidia, AMD and Intel. Yes you have some niche products by other companies, and budget pieces being made by a host of has-beens, but really tier 1 stuff is just not being fought over by more than 2 companies.

What I want to see (complete dreamland here) is a start up from some very savvy disgruntled employees of say AMD/ATI, Intel, IBM, etc. (and don't forget possibly the most important segment, the marketing team) with some clout and a LOT of dough to say, "Screw this, we're going balls to the wall and throw the kitchen sink at the market."

I mean let's be honest here, what's another 100 watts or a billion transistors anymore? I can guarantee you every geek out there would shell out more money for a product that devestates the current competition. I don't care if it's not as frugal with the power, or as small, or as pretty, I want the speed man, gimme the speed! Reply

While I'd normally agree with you, GPUs have been getting pretty complex to design. Much of the shader multiprocessors in G80 and GT200 were designed by hand, and remember that G80 (the original predecessor to GT200) was in development for four years before its launch.

The transistor comparison is a valid one, while Penryn is a very impressive design, it is so for different reasons than GT200. The size of GT200 also helps illustrate fundamental differences in approach to CPU vs. GPU design and really highlights why Intel is building Larrabee.

Because to non-engineers, they're two silicon computer chips, and 1.4 billion of anything is a lot!

It also helps me to visually understand why this thing gets so hot, since it's got so much more surface area packed with transistors.

You're right that CPUs and GPUs are designed for different tasks and shouldn't be considered pure apples to apples, but then you go against your own advice and start saying how CPUs are so much more advanced, and how Intel engineers could do that in 1/4 the size of a chip. So which is it - should they be compared, or should they not be compared?

And the authors did mention how simple it could be for either company to slap the other type of chip right in with their usual type; make a Intel CPU with added GPU capabilities, or make a nVidia GPU with CPU capabilities. So there's another point where they recognize the differences but do try to illustrate the sameness.

You are looking for contradictions where there are none. A chip is a chip, but that does not mean that they are all designed with the same goals, budgets and time constraints. *IF* Intel devoted the resources to a GPU that they devote to a CPU, yes they could produce a product like this in a fraction of the transisters. That said, the product would take 5-10 years to design, would cost hundreds of millions of dollars to develop, and would need a lifespan of at least 5 years in the market to be worth the effort. Obviously this is not a reasonable approach in a market with such fast product turnover.

My post was not an attempt to diss nV or this product, it was pointing out that the comparison of a GPU to a CPU is inane as they have completely different design constraints. You may as well compare a CPU to cache memory, or RAM or a sound processor. All have transisters, right?

It especially bothered me when they implied that nVidia has the transister budget to toss a general purpose CPU on the die. The fact is that they may have the transister budget, but they do not have the time or money available to do so, and the product would be obsolete before it ever hit the market as a result of such an attempt. It would be marrying two completely different design philosophies, and this is why the combined CPU/GPU products that are upcoming are not likely to be the strongest performers. Reply

You all seem to be assuming that GPUs will only be used for games. If that's all you care about, then why do you whine when a GPU is made to perform well as a number cruncher (for science, for modeling/simulations)?

It's the best single GPU gaming card.
It's the best widely (?) available GPU number cruncher.
For a whole system gaming GPU solution, it isn't the most cost effective.

If you're all into numbers, then why are you assigning emotions to it. It simply is what it is. Reply

When I looked at that, I assumed it must be a non-native English speaker who put that in the block. I'm still not entirely sure what it was trying to convey other than that the core will need to be fed with lots of vertices to keep it busy. Reply

Its going to take some time to digest it all, but you two have done it again with a massive but highly readable write-up of a new complex microchip. You guys are still the best at what you do, but a few points I wanted to make:

1) THANK YOU for the clock-for-clock comparo with G80. I haven't fully digested the results, but I disagree with your high-low increase thresholds being dependent on solely TMU and SP. You don't mention GT200 has 33% more ROP as well which I think was the most important addition to GT200.

2) The SP pipeline discussion was very interesting, I read through 3/4 of it and glanced over the last few paragraphs and it didn't seem like you really concluded the discussion by drawing on the relevance of NV's pipeline design. Is that why NV's SPs are so much better than ATI's, and why they perform so well compared to deep piped traditional CPUs? What I gathered was that NV's pipeline isn't nearly as rigid or static as traditional pipelines, meaning they're more efficient and less dependent on other data in the pipe.

3) I could've lived without the DX10.1 discussion and more hints at some DX10.1 AC/TWIMTBP conspiracy. You hinted at the main reason NV wouldn't include DX10.1 on this generation (ROI) then discount it in the same breath and make the leap to conspiracy theory. There's no doubt NV is throwing around market share/marketing muscle to make 10.1 irrelevant but does that come as any surprise if their best interest is maximizing ROI and their current gen parts already outperform the competition without DX10.1?

4) CPU bottlenecking seems to be a major issue in this high-end of GPUs with the X2/SLI solutions and now GT200 single-GPUs. I noticed this in a few of the other reviews where FPS results were flattening out at even 16x12 and 19x12 resolutions with 4GHz C2D/Qs. You'll even see it in a few of your benches at those higher (16/19x12) resolutions in QW:ET and even COD4 and those were with 4x AA. I'm sure the results would be very close to flat without AA.

That's all I can think of for now, but again another great job. I'll be reading/referencing it for the next few days I'm sure. Thanks again! Reply

AMD has officially joined Apple's OpenCL initiative under the Khronos Compute Working Group.

Truthfully, with nVidia's statements about working with Apple on CUDA in the days leading up to WWDC, nVidia is probably on board with OpenCL too. It's just that their marketing people probably want to stick with their own CUDA branding for now, especially for the GT200 launch.

Oh, and with AMD's launch of the FireStream 9250, I don't suppose we could see benchmarks of it against the new Tesla? Reply

tons of people reading this article and thinking "well, performance per cost, it's underwhelming (as a gaming graphics card)." What people are missing is that GPUs are quickly becoming the new supercomputers. Reply

G92 does not have 6 rop partitions - only 4 (this is also wrong in the diagram). Only G80 had 6.
And please correct that history rewriting - that the FX failed against radeon 9700 had NOTHING to do with the "powerful compute core" vs. the high bandwidth (ok the high bandwidth did help), in fact quite the opposite - it was slow because the "powerful compute core" was wimpy compared to the r300 core. It definitely had a lot more flexibility but the compute throughput simply was more or less nonexistent, unless you used it with pre-ps20 shaders (where it could use its fx12 texture combiners). Reply

Thanks for the heads up, you're right about G92 only having 4 ROPs, I've corrected the image and references in the article. I also clarified the GeForce FX statement, it definitely fell behind for more reasons than just memory bandwidth, but the point was that NVIDIA has been trying to go down this path for a while now.

Thanks for correcting. Still, the paragraph about the FX is a bit odd imho. Lack of bandwidth really was the least of its problem, it was a too complicated core with actually lots of texturing power, and sacrificed raw compute power for more programmability in the compute core (which was its biggest problem). Reply

I appreciate the in-depth look at the architecture, but what really matters to me are graphics performance, heat, and noise. You addressed the card's idle power dissipation but only in full-system terms, which masks a lot. Will it really draw 25W in idle under WinXP?

And this highly detailed review does not even mention noise! That's very disappointing. I'm ready to buy this card, but Tom's finds their samples terribly noisy. I was hoping and expecting Anandtech to talk about this.

I've updated the article with some thoughts on noise. It's definitely loud under load, not GeForce FX loud but the fan does move a lot of air. It's the loudest thing in my office by far once you get the GPU temps high enough.

From the updated article:

"Cooling NVIDIA's hottest card isn't easy and you can definitely hear the beast moving air. At idle, the GPU is as quiet as any other high-end NVIDIA GPU. Under load, as the GTX 280 heats up the fan spins faster and moves much more air, which quickly becomes audible. It's not GeForce FX annoying, but it's not as quiet as other high-end NVIDIA GPUs; then again, there are 1.4 billion transistors switching in there. If you have a silent PC, the GTX 280 will definitely un-silence it and put out enough heat to make the rest of your fans work harder. If you're used to a GeForce 8800 GTX, GTS or GT, the noise will bother you. The problem is that returning to idle from gaming for a couple of hours results in a fan that doesn't want to spin down as low as when you first turned your machine on.

While it's impressive that NVIDIA built this chip on a 65nm process, it desperately needs to move to 55nm." Reply

I agree with what Darkryft said about wanting a card that absolutely without a doubt, stomps the 8800GTX. So far that hasn't happened as the GX2 and GT200 hardly do either. The only thing they proved with the G90 and G92 is that they know how to cut costs.

Well thanks for making me feel like such a smart consumer as it's going on 2 years with my 8800GTX and it still owns 90% of the games I play.

P.S. It looks like Nvidia has quietly discontinued the 8800GTX as it's no longer on major retail sites.
Reply

The new ATI cards should be very nice performance for the money, but they aren't going to be competitors for these new GTX-200 series cards.

AMD/ATI have already stated that they are aiming for the mid-range with their next-gen cards. I expect the new 4850 to perform between the G92 8800 GTS and 8800 GTX. And the 4870 will probably be in the 8800 GTX to 9800 GTX range. Maybe a bit faster. But the big draw for these cards will be the pricing. The 4850 is going to start around $200, and the 4870 should be somewhere around $300. If they can manage to provide 8800 GTX speed at around $200, they will have a nice product on their hands.

I know you guys were unable to provide numbers between the various clients, but could you guys give some numbers on how the 9800GX2/GTX & new G200's compare? They should all be running the same client if I understand correctly. Reply

While I don't wish to simply another person who complains on the Internet, I guess there's just no way to get around the fact that I am utterly NOT impressed with this product, provided Anandtech has given an accurate review.

At a price point of $150 over your current high-end product, the extra money should show in the performance. From what Anandtech has shown us, this is not the case. Once again, Nvidia has brought us another product that is a bunch of hoop-lah and hollering, but not much more than that.

In my opinion, for $650, I want to see some f-ing God-like performance. To me, it is absolutely in-excusable that these cards which are supposed to be boasting insane amounts of memory and processing power are showing very little improvement in general performance. I want to see something that can stomp the living crap out of my 8800GTX. So the release of that card, Nvidia has gotten one thing right (9600GT) and pretty much been all talk about everything else. So far, the GTX 280 is more of the same. Reply

They just keep making these cards bigger and bigger. More transistors, more heat, more juice. All for performance. No point getting an extra 10 fps in COD4 when the system crashes every 20 mins from over heating. Reply

Yeah but for the performance of these cards, the price isn't quite right. I mean you can get two 8800GTs for under $400 and they typically outperform both the 260 and the 280. Yes if you want a single card, these aren't too bad a deal. But even the 9800GX2 outperforms the 280 normally.

So really I have to question the pricing on them. High end for a single GPU card yes. Better price/performance than last generations card, no. I just bought two G92 8800GTSs and now I don't feel dumb about it because my two cards that I paid $170 for each will still outperform the latest and greatest which cost more. Reply

No, The reason is high cost to produce. over a Billion transistors, low yields, 512 bit bus ...

Unfortunately the high cost and the advance tech doesn't translate to equally impressive performance at this stage. For example, if the card had much lower power usage under load, still it would have been considered a good move forward for having comparable performance to a dual GPU solution but with much cooler running and less demanding hardware.

As the review mentions, this card begs for a die shrink. It will make it use less power, be cheaper, run cooler and even have a higher clock. Reply

That competition won't come for another two weeks, but when it does -- rumour has it NV plan to lower their prices. Most preliminary info has HD 4870 at 299-329 and pretty much GTX 260 performance, if not, then biting at it's heels. Reply