GF100 graphics architecture unveiled

I have to admit, long-delayed graphics chips can be kind of fun. Instead of a single, overwhelming burst of information all at once, complete with performance data and hands-on impressions from the actual product, we get to partake in a slow drip-drip-drip of information about a new GPU architecture. That’s certainly been the case with Nvidia’s DirectX 11-class GPU, known variously as Fermi (the overarching GPU architecture) and GF100 (the first chip to implement that architecture). We’ve already had a look at the compute-specific bits of the Fermi architecture, and we’ve engaged in deep, informed speculation on its graphics capabilities, as well. We know exactly what the competition looks like, and gosh darn it, we’d like to lay hands on the GPU itself soon.

The first cards based on the GF100 aren’t quite ready yet, though, and we have one more stop to make before we get that chance. After the Consumer Electronics Show in Las Vegas, Nvidia invited various members of the press, including your humble correspondent, to get a closer look at the particulars of the GF100’s graphics hardware. We now know that a great deal of Rys’s speculation about the GF100’s graphics particulars was correct, but we also know that he was off in a few rather notable places. We’ve filled in quite a few details in surprising ways, as well. Keep reading as we round out our knowledge of the GF100’s graphics architecture and explain why this GPU just might be worth the wait.

A graphics architecture overview

First things first, I suppose. The GF100 is late, and Nvidia made no bones about it in this most recent briefing about the chip. Drew Henry, head of the company’s GeForce business, told us forthrightly that he’d prefer to have a product in the market now, but said of the situation, “It is what it is.” At present, the message about the GF100’s status is equal parts straightforward and cautious: GF100 chips are “in full production at TSMC” and we can expect to see products in “Q1 2010.” If the chip is in production, we can probably assume the main sources of the product delays have been rectified in the latest silicon spin. Beyond that, we have very little: no product names, prices, clock speeds, or more precise guidance on ship dates.

By making the window extend throughout the first quarter of the year, Nvidia has given itself ample leeway. Products could ship as late as March 31st without missing that target. If I were to narrow it down, though, I’d probably expect to see products somewhere around the first of March, give or take a week or two.

Time will tell on that front, but we now have a trove of specifics about the operation of the GPU from Nvidia itself. We’ve covered the computational capabilities of the GF100 quite thoroughly in our two prior pieces on the architecture, so we’ll focus most of our attention here on its graphics features. Let’s begin, as we often do, with a high-altitude overview.

A functional block diagram of GF100. Source: Nvidia.

As GPUs become more complex, these diagrams become ever more difficult to read from this distance. However, much of what you see above is already familiar, including the organization of the GPU’s execution resources into 16 SMs, or shader multiprocessors. Those SMs house an array of execution units capable of executing, at peak, 512 arithmetic operations in a single clock cycle. Nvidia would tell you the GF100 has 512 “CUDA cores,” and in a sense, they might be right. But the more we know about the way this architecture works, the less we’re able to accept that definition, any more than we can say that AMD’s Cypress has 1600 “shader cores.” The “cores” proper are really the SMs, in the case of the GF100, and the SIMD engines, in the case of Cypress. Terminology aside, though, the GF100 does have a tremendous amount of processing power on tap. Also familiar are the six 64-bit GDDR5 memory interfaces, which hold the potential to deliver as much as 50% more bandwidth than Cypress or Nvidia’s prior-generation part, the GT200.

The first hint we have of something new is the presence of four “GPCs,” (or graphics processing clusters, I believe, although I thought that name was taken by Gary Phelps’ Choice, as we used to call our Dean of Students’ preferred smokes back in college). Nvidia Senior VP of GPU Engineering Jonah Alben called the GPCs “almost complete, independent GPUs” when he first described them to us. As you can see, each one has its own rasterization engine, which points toward an intriguing departure from the norm.

Each GPC contain four SMs, and we’ll have to zoom in on a single SM in order to get a closer look at the rest of the GF100’s graphics-focused hardware.

Now we can see that each SM has four texture units associated with it. More unconventionally, each SM also hosts a geometry unit, which Nvidia has creatively dubbed a “polymorph engine.” Since the GF100 has four GPCs and 16 SMs, it has a total of 64 texture units and 16 polymorph engines. The Fermi architecture detailed here is scalable along several lines: variants could be made to have fewer GPCs and fewer numbers of SMs within each GPC. We can surely expect to see smaller chips based on this architecture that have been scaled down in one or both ways.

We should also note the GF100’s ROP units, of which there are 48 ringing the L2 cache in the diagram above. With that bit added, we have sketched in full the general outlines of the GF100. What remains is to fill in some detail in several areas, starting, of course, with the curiously quad rasterizers and 16 geometry units.

Polymorph? Yeah, I saw The Crying Game

The biggest surprise of the day is undoubtedly the reshuffling that’s happened in the GF100’s geometry handling resources. When explaining the decision to undertake this reorganization, Alben pointed out that geometry performance hasn’t been a major focus in GPU progress over the years. Between the GeForce FX 5800 Ultra (NV30) and the GeForce GTX 280 (GT200), he estimated pixel shading performance has mushroomed by a factor of 150. During that same span, he said, geometry performance has only tripled.

That’s true in part because the hardware that handles a key part of the graphics pipeline, the setup engine, has simply not been parallelized. Instead, any progress has been supplied by increases in clock rates and in per-clock performance. The GeForce 256, for instance, could process a triangle in eight clock cycles. The GeForce FX could do so in two cycles, and the G80 in (optimally) a single cycle.

Alben and the team saw that growing gap between geometry and shader performance as a problem, and believed that the advent of DirectX 11, with its introduction of hardware-based tessellation and two new programmable pipeline stages for geometry processing, made the moment opportune for a change. Henry Moreton, an Nvidia Distinguished Engineer and geometry processing expert who authored the original rasterization microcode at SGI, characterized the earlier attempts at geometry processing in Direct3D as “train wrecks.” He told us, however, that he believes in DirectX 11, “they got it right.” Nvidia’s response was to build what it believes is the world’s first parallel architecture for geometry processing.

Block diagram of a GF100 polymorph engine. Source: Nvidia.

Each SM in the GF100 has a so-called polymorph engine. This engine facilitates a host of pre-rasterization stages of the Direct3D pipeline, including vertex and hull shaders, tessellation, and domain and geometry shaders. All four of those shader types run in the shader array, of course. Beyond that, Alben told us the block diagram above is in fact architecturally accurateall five of the functions map to dedicated units on the chip.

DirectX 11’s tessellation support is what enables the GF100’s geometry-intensive focus. The basic concept has been around quite a while: to create complex geometry by combining a low-resolution polygon mesh with a mathematical description of a more complex surface.

Situating tessellation further along in the graphics pipeline has tremendous advantages over simply using more complex geometry, not least of which is a savings of bus bandwidth between the host system and the GPU. Because tessellation happens after the vertex shader stage, animations can be processed (much more efficiently) for the base polygon mesh rather than the more complex final result. The final model will then inherit all of the appropriate movement.

Tessellation isn’t just for smoothing out objects, either. Once a more complex mesh has been created, it can be altered via the use of a displacement map, which imparts new depth information to the object. Displacement maps can be used to generate complex terrain or to add complexity to an in-game object or character model. Unlike currently popular techniques like bump or normal maps, displacement maps really do alter geometry, so object silhouettes are correctly modified, not just the object interiors. Thus, tessellation has the potential to improve the look of games substantially beyond the current standard.

In DX11, tessellation involves two programmable stages, hull shaders and domain shaders, sandwiched around a fixed-function geometry expansion step. Hull shaders run first on the base polygon mesh, and they do the level-of-detail calculations for the subdivision of existing polygons. The tessellators take this input and create new vertices. Domain shaders then evaluate the surfaces created by the tessellation step and can apply displacement maps. So yes, the GF100 has 16 separate, fixed-function units dedicated to tessellation, but their duties are limited to geometry expansion. The rest of the work happens in the shader array.

The distribution of the polymorph engines’ various duties to 16 separate units suggests broad parallelization, and so it is. Nvidia claims, for instance, that vertex fetches now happen in parallel, with up to 32 attributes being fetched per cycle across the GPU, four times the capacity of the GT200.

Managing all of the related calculations in parallel for some of the pipeline stages is no trivial task. Moreton cited an example related to tessellation, when a single SM has been given a patch that will generate thousands of triangles and thus potentially spill out of local storage. In such cases, the GF100 will evaluate patches and decompose them into smaller patches for distribution to multiple SMs across the chip. The related data are kept on die and passed to other SMs via their L1 caches. The results must still be output in the appropriate order, which requires careful scheduling and coordination across SMs. Thus, the GF100 employs a sort of coherency protocol to track geometry data at the thread level; a network in the chip distributes this information.

Simple block diagram of a GF100 raster engine. Source: Nvidia.

Once the polymorph engines have finished their work, the resulting data are forwarded the GF100’s four raster engines. Optimally, each one of those engines can process a single triangle per clock cycle. The GF100 can thus claim a peak theoretical throughput rate of four polygons per cycle, although Alben called that “the impossible-to-achieve rate,” since other factors will limit throughput in practice. Nvidia tells us that in directed tests, GF100 has averaged as many as 3.2 triangles per clock, which is still quite formidable.

Sharp-eyed readers may recall that AMD claimed it had dual rasterizers upon the launch of the Cypress GPU in the Radeon HD 5870. Based on that, we expected Cypress to be able to exceed the one polygon per cycle limit, but its official specifications instead cite a peak rate of 850 million triangles per secondone per cycle at its default 850MHz clock speed. We circled back with AMD to better understand the situation, and it’s a little more complex than was originally presented. What Cypress has is dual scan converters, but it doesn’t have the setup or primitive interpolation rates to support more than one triangle per cycle of throughput. As I understand it, the second scan converter is an optimization that allows the GPU to push through more pixels, in cases where the polygons are large enough. The GF100’s approach is quite different and really focused on increasing geometric complexity.

Comparative GF100 tessellation performance. Source: Nvidia.

Nvidia claims the higher setup rates enabled by the combination of the polymorph and raster engines allows the GF100 to achieve up to six times the performance of the Radeon HD 5870 in directed tests.

Source: Nvidia.

The firm also supplied us with frame-by-frame performance results for a selected portion of the Unigine DX11 demo that’s particularly geometry intensive. The GF100 purportedly outperforms the 5870 during this sequence thanks to its superior geometry throughput.

Clearly, Nvidia has gone to great lengths to give the GF100 a parallel geometry processing architecture, and that is the distinctive and defining feature of this chip. If it works as advertised, they will have solved a difficult problem in hardware for perhaps the first time. But make no mistake about it: giving the GF100 these capabilities is a forward-looking play, not an enhancement that will pay off in the short term. You will note that none of the examples above come from a contemporary, or even future, game; the closest we get is the Unigine DX11 technology demo from a third party. In order for the GF100’s geometry processing capabilities to give it a competitive advantage, games will not only have to make use of DX11 tessellation, but they will have to do so to an extreme degree, one unanticipated by existing DX11-class hardware from AMD. In other words, the usage model for GPUs will have to shift rather radically in the direction of additional geometric complexity.

Moreton expressed hope that geometry scaling techniques like dynamic level of detail algorithms could allow developers to use the GF100’s power without overburdening less capable hardware. Whether or not that will happen in the GF100’s lifetime remains to be seen, but Nvidia does appear to have addressed a problem that its competition will need to deal with in future architecture generations. That fact would be even more impressive if the GF100 weren’t so late to the party.

Texturing

Nvidia has reshuffled the GF100’s texturing hardware, as well. In the GT200, three SMs shared a single texture unit capable of sampling and filtering eight texels per clock; each texture unit had an associated texture cache. In the GF100, each SM gets its own dedicated texture unit and texture cache, with no need for sharing, although the units themselves can only sample and filter four texels per cycle. That filtering rate assumes an INT8 texture format. Like the GT200 and Cypress, the GF100 filters FP16 textures at half the usual rate and FP32 textures at a quarter of it.

Add it all up, and the GF100 can sample and filter “only” 64 texels per cycle at peak, whereas the GT200 could do 80. (Rys had guessed 128(!) for the GF100. Thanks for playing!) Nvidia VP of GPU Architecture Emett Kilgariff explained to us that several considerations offset the GF100’s theoretically lower per-clock potential. For one thing, the GF100’s texture cache has been optimized so there are fewer sampling conflicts, allowing the texturing hardware to operate more efficiently. For another, Nvidia has done away with the split between the so-called core and shader clocks familiar from the GT200. Most of the chip now runs at half the speed of the shader clockincluding the texture units, polymorph engines, raster engines, schedulers, and caches, as I understand it. That should mean the GF100’s texturing hardware is clocked a little higher than the GT200’s.

Even so, Nvidia has said the GF100’s theoretical texturing capacity will be lower than the GT200’s, but Kilgariff presented some numbers to underscore the point that the GF100’s true, delivered performance should be as much as 40-70% higher.

By the way, that forecast of lower peak theoretical texturing performance for the GF100 gives us a big hint about likely clock speeds. We’ll revisit this topic shortly.

One mild surprise is that Nvidia hasn’t changed its texture filtering algorithm from the GT200, despite some expectations that the GF100 might bring improved quality in light of the new Radeons’ near perfect angle-invariant aniso. Alben described the output of the algorithm first implemented in G80 as “really beautiful” and said the team thus viewed filtering as “a solved problem.” Hard to argue with that, really.

Of course, the texture hardware now supports the HD texture compression formats introduced in DirectX 11.

An additional DX11 feature AMD touted with the introduction of Cypress was pull-model interpolation, in which yet another duty of the traditional setup engine was handed off to the shader core and made programmable. At the time, AMD said its setup hardware had limited the RV770’s performance in some directed tests of texture filtering, and Cypress was indeed quite a bit faster than two RV770s in such benchmarks. When I asked how they had implemented pull-model interpolation in the GF100, Moreton explained that interpolation had been handled in the shader array since the G80 and that the new Direct3D spec essentially matches the G80’s capabilities. His short answer for how they implemented it, then: “Natively.”

One bit of fanciness Kilgariff pointed out in the GF100’s texture samplers is a robust implementation of DX11’s Gather4 feature. The samplers can pull scalar data from four texel locations simultaneously, and those locations are programmable across a fairly broad area. By varying the sample locations, developers can implement what is essentially a hardware-accelerated jittered sampling routine for softening shadow edges and removing jaggies. Kilgariff said that, by using this technique, they’d measured a 2X increase over non-vectorized sampling on the GF100 and roughly 3.3X over the Radeon HD 5870.

ROPs and antialiasing

Nvidia has traditionally closely associated its ROP hardwarewhich converts shaded fragments into pixels and writes them to memorywith its L2 cache and memory controllers. With the move to GDDR5 memory, the GF100 promises to have as much as 50% higher memory bandwidth than the GT200, but it now has only six 64-bit memory controllers onboard, down from eight in the prior-gen chip. To keep the right balance of ROP hardware, Nvidia has reworked its ROP partitions: each one now houses eight ROP units, for a total of 48 ROP units across the chip. At peak, then, the GF100 can output 48 pixels per clock in a 32-bit integer formata straightforward increase of 50% over the GT200 or Cypress. GF100’s ROPs require two cycles to process pixels in FP16 data formats and four for FP32.

Source: Nvidia.

Not only are the GF100’s ROPs more numerous, but they’ve also been modified to handle 8X multisampled antialiasing without taking a big performance hit, mainly due to improved color compression speed. GeForce GPUs have been at a disadvantage in this 8X multisampling performance since the introduction of the Radeon HD 4800 series, but the GF100 should rectify the situation, as indicated by the Nvidia-supplied numbers above. (I should caution, however, that HAWX supports DirectX 10.1, which also accelerates antialiasing performance on newer GPUs. We’ll want to test things ourselves before being fully confident on this point.)

Source: Nvidia.

One saving grace for the GT200’s antialiasing performance has been Nvidia’s coverage sampled AA modes, which store larger numbers of coverage samples than color samples and offer nicely improved edge quality with little performance cost. Now that true 8X multisampling is more comfortable, Nvidia has added a coverage sampled AA mode based on it. The new 32X CSAA mode stores eight full coverage-plus-color samples and an additional 24 coverage-only samples.

Not only will 32X CSAA provide higher fidelity antialiasing on traditional object edges, but Kilgariff pointed out that many games use a technique called alpha-to-coverage to render dense grass or foliage with soft edges, in which alpha test results contribute to a coverage mask. This method produces better results than a simple alpha test, but it relies on coverage samples to work its magic. Sometimes four or eight samples will be insufficient to prevent aliasing. In such cases, 32X CSAA can produce markedly superior results, with a total of 33 levels of transparency. Also, Nvidia’s transparency multisampling modea driver feature that promotes simple alpha-test transparency to alpha-to-coverageshould benefit from the additional coverage samples in 32X CSAA.

What does caching do for graphics?

We’ve already spent ampletime on this architecture’s computing capabilities, so I won’t revisit that ground again here. One question that we’ve had since hearing about the GF100’s relatively robust cache architecture is what benefits caching might have for graphicsif any.

Most GPUs have a number of special-purpose pools of local storage. The GF100 is similar in that it has an instruction cache and a dedicated 12KB texture cache in each SM. However, each SM also has 64KB of L1 data storage that’s a little bit different: it can be split either 48/16KB or 16/48KB between a local data store (essentially a software-managed cache) and a true L1 cache. For graphics, the GF100 uses the 48KB shared memory/16KB L1 cache configuration, so most of the local storage will be directly managed by Nvidia’s graphics drivers, as it was in the GT200. The small L1 cache in each SM does have a benefit for graphics, though. According to Alben, if an especially long shader fills all of the available register space, registers can spill into this cache. That should avoid some worst-case scenarios that could greatly hamper performance.

More impressive is the GF100’s 768KB L2 cache, which is coherent across the chip and services all requests to read and write memory. This cache’s benefits for computing applications with irregular data access patterns are clear, but how does it help graphics? In several ways, Nvidia claims. Because this cache can store any sort of data, it has multiple uses: it has replaced the 256KB, read-only L2 texture cache and the write-only ROP cache in the GT200 with a single, unified read/write path that naturally maintains proper program order. Since it’s larger, the L2 provides more texture coverage than the GT200’s L2 texture cache, a straightforward benefit. Because it can store any sort of data, and because it may be the only local data store large enough to handle it, the L2 cache will hold the large amounts of geometry data generated during tessellation, too.

So there we have some answers. If it works well, caching should help enable the GF100’s unprecedented levels of geometry throughput and contribute to the architecture’s overall efficiency.

One more shot at likely speeds and feeds

Speaking of efficiency, that will indeed be the big question about the Fermi architecture and especially about the GF100. How efficient is the architecture in its first implementation?

Almost to scale? A GF100 die shot. Source: Nvidia.

The chip isn’t in the wild yet, so no one has measured its exact die size. Nvidia, as matter of policy, doesn’t disclose die sizes for its GPUs (they are, I believe, the last straggler on this point in the PC market). But we know the transistor count is about three billion, which is, well, hefty. How so large a chip will fare on TSMC’s thus far troubled 40-nm fabrication process remains to be seen, but the signs are mixed at best.

Although we don’t yet have final product specs, Nvidia’s Drew Henry set expectations for the GF100’s power consumption by admitting the chip will draw more power under load than the GT200. That fact by itself isn’t necessarily a bad thingIntel’s excellent Lynnfield processors consume more power at peak than their Core 2 Quad predecessors, but their total power consumption picture is quite good. Still, any chip this late and this large is going to raise questions, especially with a very capable, much smaller competitor already in the market.

With the new information we have about the GF100’s graphics bits and pieces, we can revise our projections for its theoretical peak capabilities. Sad to say, our earlier projections were too bullish on several fronts, so most of our revisions are in a downward direction.

We don’t have final clock speeds yet, but we do have a few hints. As I pointed out when we are talking about texturing, Nvidia’s suggestion that the GF100’s theoretical texture filtering capacity will be lower than the GT200’s gives us an upper bound on clock speeds. The crossover point where GF100 would match the GeForce GTX 280 in texturing capacity is a 1505MHz core clock, with the texturing hardware running at half that frequency. We can probably assume the GF100’s clocks will be a little lower than that.

We have another nice hint that running the texturing hardware at half the speed of the shaders rather than on a separate core clock will impart a 12-14% frequency boost. In this case, I’m going to be optimistic, follow a hunch, and assume the basis of comparison is the GT200b chip in the GeForce GTX 285. A clock speed boost in that range would get us somewhere near 725MHz for the half-speed clock and 1450MHz for the shaders. The GF100’s various graphics units running at those speeds would yield the following peak theoretical rates.

GT200

GF100

RV870

Transistor Count

1.4B

3.0B

2.15B

Process node

55 nm @ TSMC

40 nm @ TSMC

40 nm @ TSMC

Core clock

648 MHz

725 MHz

850 MHz

Hot clock

1476 MHz

1450 MHz

—

Memory clock

2600 MHz

4200 MHz

4800 MHz

ALUs

240

512

1600

SP FMA rate

0.708 Tflops

1.49 Tflops

2.72 Tflops

DP FMA rate

88.5 Gflops

186 Gflops*

544 Gflops

ROPs

32

48

32

Memory bus width

512 bit

384 bit

256 bit

Memory bandwidth

166.4 GB/s

201.6 GB/s

153.6 GB/s

ROP rate

21.4 Gpixels/s

34.8 Gpixels/s

27.2 Gpixels/s

INT8 Bilinear texel rate

(Half rate for FP16)

51.8 Gtexels/s

46.4 Gtexels/s

68.0 Gtexels/s

I should pause to explain the asterisk next to the unexpectedly low estimate for the GF100’s double-precision performance. By all rights, in this architecture, double-precision math should happen at half the speed of single-precision, clean and simple. However, Nvidia has made the decision to limit DP performance in the GeForce versions of the GF100 to 64 FMA ops per clockone fourth of what the chip can do. This is presumably a product positioning decision intended to encourage serious compute customers to purchase a Tesla version of the GPU instead. Double-precision support doesn’t appear to be of any use for real-time graphics, and I doubt many serious GPU-computing customers will want the peak DP rates without the ECC memory that the Tesla cards will provide. But a few poor hackers in Eastern Europe are going to be seriously bummed, and this does mean the Radeon HD 5870 will be substantially faster than any GeForce card at double-precision math, at least in terms of peak rates.

Otherwise, on paper, the GF100 projects to be superior to the Radeon HD 5870 only in terms of ROP rate and memory bandwidth. (Then again, it’s now suddenly notable that we’re not estimating triangle throughput. The GF100 will have a clear edge there.) That fact isn’t necessarily a calamity. The GeForce GTX 280, for example, had just over half the peak shader arithmetic rate of the Radeon HD 4870 in theory, yet the GTX 280’s delivered performance was generally superior. Much hinges on how efficiently the GF100 can perform its duties. What we can say with certainty is that the GF100 will have to achieve a new high-water mark in architectural efficiency in order to outperform the 5870 by a decent marginsomething it really needs to do, given that it’s a much larger piece of silicon.

Obviously, the GF100 is a major architectural transition for Nvidia, which helps explain its rather difficult birth. The advances it promises in both GPU computing and geometry processing capabilities are pretty radical and could be well worth the pain Nvidia is now enduring, when all is said and done. The company has tackled problems in this generation of technology that its competition will have to address eventually.

In attempting to handicap the GF100’s prospects, though, I’m struggling to find a successful analog to such a late and relatively large chip. GPUs like the NV30 and R600 come to mind, along with CPUs like Prescott and Barcelona. All were major architectural revamps, and all of them conspicuously ran hot and underperformed once they reached the market. The only positive examples I can summon are perhaps the R520the Radeon X1800 XT wasn’t so bad once it arrived, though it wasn’t a paragon of efficiencyand AMD’s K8 processors, which were long delayed but eventually rewrote the rulebook for x86 CPUs. I suppose we’ll find out soon enough where in this spectrum the GF100 will reside.

If a 4850 sells for under $100, considering the relative die sizes, the 5850 is almost certainly profitable under $150.

AMD is milking a very fat cash cow as Nvidia is cutting profit margins to the bone to stem the market share arterial bleeding.

AMD can kneecap Nvidia at will … and come the holidays and the initial 6XXX card releases, they probably will.

sigher

10 years ago

I can see somebody who’s very relaxed about thing forgiving the crippling of DP, but not if they have to pay $700+ for a card, at that point crippling seems a kick in the face.

iamvincent

10 years ago

The test result they have seems… I don’t know, awkward?
There is just a hunch that the numbers are not right
Well if it is right that will be the best thing for everyone
If it is “littlebit off” like the old 9800, I think people know what to do

I will wait for testing done by techreport.
Then I will go look at how those NV-bias website test it
and see if this thing is worth to purchase

Wintermane

10 years ago

The test results are actualy very simple.

1 The thing can tesslate like crazy so if a future game uses it alot the nvidia card will do rather well.

2 The puppy can still texture rather fast and even tho its theoretical max rate is lower its real world rate has gone up. In short the new hardware does textures better then before.

Wintermane

10 years ago

I see alot of confusion on why ati cards do poorly and STILL do poorly at folding.

One in 5 shaders on ati hardware yes including the 5000 series is FAT as in having all the stuff needed to do everything. All the rest are thin as in having alot cut out to make them much smaller so they could fit alot more shaders on the chip.

Folding needs stuff that simply isnt included with the thin shaders so they cant run it. Thats all folks.

Kougar

10 years ago

/[However, Nvidia has made the decision to limit DP performance in the GeForce versions of the GF100 to 64 FMA ops per clock

thecoldanddarkone

10 years ago

Currently, it won’t affect FAH.

PRIME1

10 years ago

Currently a GTX260 outfolds a HD5870

Kougar

10 years ago

Only due to F@H’s lack of optimizing for 5000 series cards. Once they begin migrating to using OpenCL with their GPU program GPU optimizations should hopefully become less of a problem. Last I heard their current GPU software treats the 5870 as if it was a 4870, hence the lack of any performance improvement.

Kurotetsu

10 years ago

I’m curious about something, did Nvidia work with Stanford to optimize the Folding@Home client for their GPUs? Or did the developers do that themselves? If its the first case, AMD only has itself to blame for not working with the developers to optimize for their GPUs (this is pretty common for AMD though).

Voldenuit

10 years ago

Actually, GPU2 treats a 5870 as a 3870, hence the 320 shader limitation.

And Wintermane, I’m not an expert, but I’m pretty sure the simple shaders can still do MADD, MUL, dot products etc, so are still useful in F@H. Witness the Radeons’ prowess in Milkyway for instance.

GF100 Tesla sounds like it will have amazing double precision performance, but nvidia’s decision to cripple the consumer GPU is likely to sour the experiance for home folders.

Wintermane

10 years ago

If they could they would after all folding doesnt give a carp about vid card companies. Now it COULD be that its more complicated on a simple shader and they just didnt bother… but id suspect that would have been fixed by now.

Voldenuit

10 years ago

I’m pretty sure Stanford has not been optimising their code – the 320 shader limit for the past 2 years (!) is proof enough. Other distributed projects have been able to write clients that automatically scale with new hardware – but not F@H. I think they simply consider optimising their code for ATI’s architecture to be not worth their time, whether that is for practical, technical, or some other reason.

swaaye

10 years ago

Isn’t the ATI FAH code running through Brook+ or some such? Old stuff.

Voldenuit

10 years ago

It is, but there are no hardware or language limitations preventing Stanford from recompiling GPU2 or rewriting it from scratch without implicit limitations on hardware resources.

If anything, the fact that it’s still in Brook+ shows that nobody has cared to do anything with it in the past 2 years. Maybe it was somebody’s thesis or pet project that’s been abandoned?

Actually, it’s partly because they couldn’t do much with Brooks, but they could with CUDA. They only have limited resources and they got far more “bang for the buck” by putting them towards NVIDIA code development. This is why they have always wanted to use a platform independent code as the basis for their program, to take vender specific code/optimizations out of the picture so they can focus on writing a single, unified program to function across multiple platforms. They are planning for OpenCL to do just that from how I understand it.

Theoretical performance is unimportant; actual performance is a bigger deal. And when it comes to distributed computing, both BOINC and F@H, nVidia’s winning for whatever reason.

Sahrin

10 years ago

I’d imagine it’s because nVidia provides significant engineering assistance to them. ATI has never done this, and it is one of the big reasons that nVidia has led over them for so long in terms of unit sales (when they’ve consistently traded blows in performance wins).

ritsu

10 years ago

Pretty good chip for a software company.

marvelous

10 years ago

I have to agree with Scott. This GF100 is more future than for current crop of games. The industry has to change to take advantage of GF100 which will take a long time.

And what’s up with 64TMU when there are tons of games that’s texture based? Efficiency? Is that enough? Nvidia says 50% better texture throughput over GT200 with less TMU. I’ll believe it when I see it.

I can’t wait for performance #’s but I think it’s a lot closer to cypress instead of making a mockery. 6 months late at that.

Lans

10 years ago

I don’t know if Nvidia is making something intended for future games or not but I do agree it seems totally weird to have such small number of texturing units and the placement of vertex shader/tessellation funny to me. It just seems to me if you really want to go crazy with geometry, you’ll want to do some texture look up (displacement mapping et al)?

Sahrin

10 years ago

This looks more and more to be exactly the GPU nVidia wanted – which is to say, a GPGPU in sheep’s clothing.

The only ‘exciting’ thing for gamers is the promise of huge geometry power (and it is VERY exciting), but as Scott pointed out – that is at least one game design generation away.

It looks like they are counting on the nVidiots to keep them afloat for one more generation until they can finally spin all of the fixed-function hardware out of the GPU. At that point it’ll really be an unparalleled (pun intended) accelerator, for which there is likely to be a rapacious market.

I wish nVidia well, because they’ve done a lot for the enthusiast – but it’s sad to see them head in the ‘off into the sunset’ direction of design. Maybe next generation there will be enough volume in GPGPU for there to be a split in the product lines, and nVidia to produce something that’s more sensible for the GPU market?

anotherengineer

10 years ago

New architecture yeah!!! Stanford is going to have to make a new program to take advantage of the new design for F@H.

I dont game much anymore, the new family keeps me busy, so I dont have no need for this, I will be saving for an SLC SSD, or a 24″ IPS monitor 2 years down the road.

Im sure there are a ton of Nvidia fanboi’s just waiting for this though, at least the Win7 drivers should be developed well by the time it hits the shelves.

Shining Arcanine

10 years ago

Since Nvidia is limiting double precision floating point performance to a fourth of what the card really can do, this will likely not be very attractive for the Folding@Home people. They like double precision arithmetic, however, given that the double precision floating point performance is still around that of a dual processor gulftown system, programming for it might still be worth their while.

OneArmedScissor

10 years ago

I don’t think $600 video cards are generally attractive to people who just leave their computer sitting around to begin with.

wibeasley

10 years ago

F@H not need single precision. It ran on the 8800s, and those cards had zero double-precision shaders.

WaltC

10 years ago

I see nothing wrong with articles of this sort–indeed, if TR is offered the information, TR is obligated to share it. It’s no different from the pre-shipping stuff about Prescott, Barcelona, R600 (which AMD had the good sense to postpone prior to shipping), and, of course, nV30. Often it’s possible to read between the lines as much as by what a company doesn’t say as by what it does say about its upcoming products.

In this case, nVidia is still saying very little of any specific import, which I find troubling. There’s still too much here that’s theoretical as opposed to practical. This leads me to believe that GF100 product planning is still very fluid at this time.

With nV30, nVidia screwed up enormously by bragging outrageously long before it even knew what sort of nV30 products it could actually ship, but bear in mind that even that would have not detracted from nV30 as much as it did had it not been for the stellar introduction of R300–which just blew the doors off of all nV30 product development regardless of how grandly it was promoted.

This time, lessons learned, nV is playing its hand a bit differently. My own thought is that things are still so theoretical and low key even at this late date so that, if it has to, nVidia can gracefully back away from Fermi with an absolute minimum of egg on the ol’ corporate face.

wira020

10 years ago

Great article TR… it’s more in depth than most earlier article out there… it’s a bit late, but it’s worth the wait… i just cant help but to suspect something fishy, i’ve read article about this from a lot of sites, they’re all confidently claimed ” it;s worth the wait”… i dont know why but i cant help to think something is off since there really is not much prove to support that claim so far…

Nvidia could really be losing market share if it’s mainstream part got delayed as rumoured… that’d be bad for them and us… still waiting for the price cut… sigh…

ish718

10 years ago

Surely, this will end up as one overpriced desktop product.
ATI/AMD going to pound Nvidia in the price/performance area once again. History repeats itself?
When AMD drops the HD5870 to $300-$350…

tfp

10 years ago

That will depend on the price Nvidia runs and the gf100 performance. If they can pull off 1 or 2 or 3 of the 4 groupings of gf100 core they will easily be able to segment the product line and cover all of the price points and lower production cost when needed.

If a gpu with 3/4 or 1/2 the core groupings performs near what AMD has now or better they will be in fine shape.

Either way both companies will be here for a long time to come.

ish718

10 years ago

Well, I doubt Nvidia will be able to compete in the high end segment price/performance wise.
AMD will also drop the price of the HD5970, so GTX380 will end up going against that beast. By the time “GTX395” comes out, HD5970 will be a lot cheaper.
Price war?
I think Nvidia will get more aggressive with their TWIMTBP…

wira020

10 years ago

I’d say by the time gtx396 came out, ati could be welcoming it with radeon 6000.. a new architecture at a new node…

BTW, ati kinda have the advantage with game developer right now as they are the only one with dx11 gpu.. dx11 game dev use their gpu to build the game as it is the only choice available… it could change when nvidia came out with fermi but it could be too late by then…

tfp

10 years ago

New architectures at a new nodes have and can cause delays, I wouldn’t count on anything until the dates get closer and there is real news.

thecoldanddarkone

10 years ago

Yea, I’m not quite sure why people are so sure fire about ati releasing a brand new architecture out on time. Heck the many changes from gt200b is part of the reason why gf100 is late.

tfp

10 years ago

Well your assuming the card doesn’t out perform Ati by a good margin like they have done in with the last few sets of releases by releases much larger GPUs. The delay has hurt them but it’s not useful to assume ATI’s next generation will be much better with no details. At least with Nvidia we can see it’s just a lot more of everything they had before plus improvements. 1 of Nvidia’s cuda engines or whatever has always done more work than wond of ATI’s compute engines or whatever they want to call them.

Time will tell and the sooner Nvidia gets the card out the door the sooner prices go down. The better the card the better the prices will be because of competition.

I really don’t understand why people don’t want Nvidia to do well…

Sahrin

10 years ago

The poster you are responding to specifically mentioned price/performance as his metric.

I think everyone wants to do well, I also think that they aren’t helping anyone when they give 1.3x the performance for 2x the cost (something that is well within the realm of possibility – for GTX380 to cost $599).

nVidia wants the GPU segment to move upmarket. ATI wants to move it downmarket, to expose more functionality to more users. The catch is, that ATI is able to get 85% or better of the performance at half the cost (or better, considering wholesale).

I wish nVidia well in their effort to beef up geometry engines, though. I have always been puzzled by the way geometry stalled out with DX9? I know the API emphasizes shader power much more, but ATI took the lead in designing DX9 (with MS, of course) – and ATI has been the Geometry evangelist (until now, that is).

I am really excited by a lot fo the changes nVidia is making to the hardware, but at the same time their late launch will probably kill any usefulness; and that nVidia loves to make their GPU Compute applicatons proprietary means only the tiny sliver of users who buy the top end GF100 cards will ever get the full experience.

That’s why people dislike nVidia…I don’t know that anyone doesn’t want them to be competitive. When they are we get simply astounding values (like the RV770. Seriously, a top-end graphics card for $180 bucks?. Who could possibly be against that, except nVidia?)

swaaye

10 years ago

I wouldn’t say geometry power ever stalled. Every GPU generation has boosted geometry processing capabilities. The move to unified shaders was a particularly large jump because it allowed the entire computational resources of the GPU to be devoted to geometry.

The thing about NVIDIA’s direction is that it’s less about graphics than it used to be. This is going to show up in their GPU’s performance for graphics because the chip is less specialized for graphics. Like GT200 vs. RV770, there will be a loss of performance per die size essentially. The new Radeons aren’t as aggressively after GPGPU from what I can tell.

wira020

10 years ago

Well, would you release a product that is not better than the competitor?

Even if their initial result were not better than fermi, they’d delay it until they can get it better.. same as what i’m guessing nvidia’s doing right now…they had alot of delay in order to make fermi faster than the current ati offering…

Prion

10 years ago

FX5800? HD2900? Phenom? It happens. Just put it out and hope you can price it to keep yourself in the market while focusing on the R&D for the refresh or the next big thing.

MadManOriginal

10 years ago

AMD has been executing very well with the release of their video cards, TSMC issues aside, so they do have a decent recent track record.

Voldenuit

10 years ago

nvidia isn’t delaying fermi to ‘make it better’. It’s delayed it because they have no choice.

Sahrin

10 years ago

I will say that nVidia has put a lot of engineering resources and processing horsepower into this chip; I applaud them for being bold and going for the proverbial ‘gold’ in their design. Their competitors have found success in designing more conservative and tailored designs, nVidia really seems to be going all-out to achieve the best performance possible.

That said, I piss on them for doing a ‘paper’ announcement like this to take away steam from their competitors products. And I’m more than a little disappointed in TR for buying into the hype and running the story. This is another press release; that’s all. No product, no bechmarks. Just press. I am fascinated by architecture/design information like this so I am greatful for the update, but shame on TR and even moreso nVidia for trying to protect itself from its massive failure of execution by spounting irrelevant nonsense. No product, no talky. A chip in the hand is worth a hundred billion on the white papers.

nVidia deserves to get burned because they couldn’t or wouldn’t get the job done on the same timeline as their competitors.

SomeOtherGeek

10 years ago

I don’t think TR is “buying” into nVidia. They are simply speculating and having fun playing the guessing game. In fact, I like these kind of reports cuz at the end, when/if the product even comes out, there are less surprises.

Sahrin

10 years ago

They aren’t speculating; nVidia gathered a bunch of jounralists together and gave them information about the architecture without giving them information about the product. It’s deceptive; for all we know the product could launch a year from now or never (it has happened before – X1700XT from ATI, among others).

Hattig

10 years ago

It’s always tough to compare the prime optimised current generation chip with the new, future architecture chip. Certainly GF100 will be undercut by RV8xx series chips in terms of size and cost and perf/$ and probably $/watt, but that’s just a side effect of the different generations. We’ll see later this year what AMD’s future architecture chip looks like.

GF100’s main problem is that AMD can release a faster RV870 whenever they want, as 1GHz doesn’t seem to be out of the question now that 40nm is becoming mature. That will reduce the performance difference, whilst leaving AMD with a chip that is 1/2 the cost to manufacture.

Never mind the option of a 2400 shader chip, but I think that’s unlikely.

I wouldn’t be surprised if the GF100 can only keep within PCIe power consumption requirements with 512 shaders by disabling most of the 64-bit support.

If it wasn’t for NVIDIA’s advantage in development/SDK/etc, AMD would be running away with the compute market as well right now.

johndisko

10 years ago

And yet, what is this card’s worth in the hands of an actual gamer ?
We are all playing shitty xbox360 ports these days anyway.
What do we have to look forward to in terms of games ?
Why should we buy a hot, expensive card without having something to actually use it with ?
We are now several generations ahead of the consoles in terms of graphics, yet, nothing comes out that really makes that delta apparent.

It is now clear to me that the only sensible choice is to save money and go for an ATI 5850 or maybe the 5870 if you’re sporting a 30″ monitor. It should last you at least a couple of years and will give you the headroom to see what is going to happen to the games/gfx card market without burning your cash right now for features that might never be used.

I used to be one of those guys that always bought the fastest card, even if it cost me a pretty penny. But what’s the point now ?

Just make sure you buy a card that plays xbox360 ports and Unreal Engine
smoothly and everything else is pretty much covered. Perhaps this time i should use my dough of 700 bucks to buy all the consoles at once and get it over with… 🙁

Lazier_Said

10 years ago

Why to run those shitty X360 ports at *[<32X

TurtlePerson2

10 years ago

Maybe DX11 will actually be adopted by some studios that will make some good looking games. We’ve already seen some multiplatform games like Dirt 2 take advantage of some DX11 features.

Hattig

10 years ago

“By making the window extend throughout the first quarter of the year, Nvidia has given itself ample leeway.”

Certainly, as NVIDIA’s financial quarter one ends at the end of April, and it’s been confirmed that is what NVIDIA mean by Q1…

green

10 years ago

i don’t get it
can someone point out how the crippled dp units are going to affect my usage?
(some low-end video / image editing, 3d games, web browsing)

Game_boy

10 years ago

No effect.

wibeasley

10 years ago

In academics many people run CUDA simulations with GeForces. I imagine this is to dissuade them from buying the $250 GeForce and getting them to buy the $1000+ Tesla.

I saw all the graphs and got excited until I realized there are 0 actual benchmarks that reviewers have run themselves.

lycium

10 years ago

i’m not an eastern european hacker, but i am SO bummed to hear about the crippled DP support on the gaming cards 🙁

the whole appeal is that you spend a small amount of money and get a lot of processing power. handing over 10x more money for something you know has been artificially crippled truly stinks, why don’t i just buy a few amd gpus? (we’re doing ok without exception handling and virtual functions so far.)

this dramatically lowers the gigaflops/$ picture for nvidia: a whole order of magnitude! amd’s solution is looking WAY more attractive suddenly…

Voldenuit

10 years ago

Maybe nvidia’s relying on its developer relations to continue to cripple support for ATI cards? :p

I’m sure there’s blame enough to go around here, but 2 years later and GPU2/F@H *still* can’t address more than 320 shader units on an ATI GPU.

Madman

10 years ago

In reply to #4

Not that there is ANY reason to get DX11 card…

Yep, don’t forget to say thanks to consoles for that.

BlackStar

10 years ago

Do we know anything about possible mid-range models based on GF100? The high-end cards sound interesting, but mid-range is where the game’s at ($100-$200).

mesyn191

10 years ago

Not really no, the rumour mill is saying mainstream parts are supposed to be out in volume around May or June.

Game_boy

10 years ago

Well Fudzilla is. Semiaccurate says they are delayed significantly too. Which one was closest about Fermi’s release date?

mesyn191

10 years ago

SA. June/May for mainstream is pretty damn late (the parts supposedly coming out in March/late Feb. are to be the high end stuff, so it’ll be expensive and low volume…), R900 is coming in Q3, so it’d probably be worthwhile to wait for it just to see what its like.

BlackStar

10 years ago

May/June? Ugh.

In that case, I think it might be best to hold out until R900. (is it really going to be released this year? Wow!)

khands

10 years ago

Release date difference between the 4000 and 5000 series was 5 quarters, if they can do it again then yeah, this year.

wira020

10 years ago

Well, so far not much info about radeon’s new architecture.. only what node it will be on is known.. but amd have pretty good run lately.. i’m thinking 70% chance they’ll hit the target…

Game_boy

10 years ago

Do we even know that? It could be 40nm, 32nm SOI like Llano, GF 28nm or TSMC 28nm. 28nm would imply 2011 looking at when the 40nm process was ready vs. when it was announced.

At the rate NV is releasing information, tune in next month, or even next year.

DrDillyBar

10 years ago

I await benchmarks.

SecretMaster

10 years ago

reply fail

PRIME1

10 years ago

Bye bye ATI

Krogoth

10 years ago

Put down the green-shaded glasses and wait for the silicon to come out in retail channels.

sage920

10 years ago

I totally expected these two comments on this story–an anti-ATI comment by prime followed by a shaded glasses comment by krogoth. I love TR.

Jigar

10 years ago

Thank you for playing DX 11 game, oh sorry you don’t have one…

PRIME1

10 years ago

Does anybody? No.

Meadows

10 years ago

Wrong.

derFunkenstein

10 years ago

I have Battleforge, but I haven’t actually played it yet.

ClickClick5

10 years ago

Dirt 2, BattleForge, STALKER: COP…all I know of right now.

NeXus 6

10 years ago

Still not worth buying a DX11 card for.

yogibbear

10 years ago

BF:BC2……..

spigzone

10 years ago

Bye bye … as in Nvidia sitting on the side of the road waving despondently as ATI zooms into the distance?

SomeOtherGeek

10 years ago

Or off the cliff?

Meadows

10 years ago

At this rate nVidia will be bankrupt by March.

SomeOtherGeek

10 years ago

And ATi will buy them out!

khands

10 years ago

If anyone were to buy out a bankrupt Nvidia it would be Intel (or, to a lesser extent, possibly Apple).

wira020

10 years ago

Intel would really want that.. they could surely use nvidia’s gpu to replace their gma… i3/i5 with nvidia gpu inside could really means trouble for amd…

Scrotos

10 years ago

I don’t think Apple would. I mean, maybe, to force them to get decent drivers for OS X, but Apple uses PowerVR for their iPod/iPhone graphics and I think they actually own some of PowerVR. Since it’s just a licensed graphics core and Apple already has their in-house team doing ARM stuff anyway, no real reason to try to transition to Tegra.

Mobile GPUs, maybe? I dunno, seems like more trouble than it’s worth for Apple.

wira020

10 years ago

I cant see how Apple would benefit from it… if they’re serious about gpu they’d be having 5800 in their macs by now… i can only imagine Intel rushing to make a bid for it… their larrabee failed, so nvidia geforce would really cure their disappointment…

UberGerbil

10 years ago

How do you figure? They’ve got over a billion in liquid assets and their burn rate isn’t huge and may not even be negative (they haven’t reported their most recent quarter, but they were profitable in Q3 last year). I get that people like to exaggerate (Intel was /[

PRIME1

10 years ago

Nvidia has money and is making money, unlike AMD which has not made a dime in something like 2 years and is very, very deep in debt.

Came here to say the same thing. This seems like a really bad move if Nvidia is trying to create a market place for GPGPUs.

Meadows

10 years ago

They’re trying to maintain a reason to buy Quadros.

MadManOriginal

10 years ago

Gee thanks Mr Redundant, as if that wasn’t apparent from the article too. That doesn’t make it any less lame.

Meadows

10 years ago

Well, they have to cut corners somewhere, they can’t come up with a legitimately stronger chip yet for any truly kickass Quadro products.

If this is a hardware modification, then I wonder if a properly picked Quadro card will then perform better than the GeForce in games using gaming drivers.

poulpy

10 years ago

Wouldn’t count on any gaming boost as AFAIK Double Precision floats aren’t likely to be used in games any time soon given that even for mainstream GPGPU they’re overkill.

To quote TechReport:
q[https://techreport.com/articles.x/17747

Fighterpilot

10 years ago

#11 Under $280 for a HD5850 is an ATi ripoff price?
You do recall it smokes all NVidia cards remotely near that price?
Fail wail.

SecretMaster

10 years ago

I would say the cards are over-priced. Or perhaps I’m still spoiled from the original amazing value propositions of the 4850/4xxx series. I think that set a new level for price/performance when it debuted.

Krogoth

10 years ago

5850 can handle smooth 4Mpixel gaming for under $299.

That was unheard of with the last generation. You had to run a SLI/CF configuration or a dual-chip card to get similar performance. That ran easily into the $499 range.

SecretMaster

10 years ago

I dunno, to me it just doesn’t have the same “wow!” factor as the 4850 did. I imagine once they cut prices in response to Fermi, it’ll be much more attractive. That being said, I am the owner of a 5850 and I cannot be more pleased with the performance of the card.

Voldenuit

10 years ago

Yup, since the 4850 came out at $199 and rewrote the price/performance curve at the time (I pre-ordered).

5850 and 5870 are merely matching nvidia’s (admittedly frantic) efforts to stay price-competitive. The damn 58xx cards aren’t even meeting their MSRP, and have gotten more expensive since launch.

Supply and demand may set prices, but that doesn’t mean I have to be a sheep and buy high.

It’s even worse because graphical demands have plateaued with cross platform development, so there is even less incentive to buy a high end card at the moment.

For those with short memories, the Athlon64 X2s and s939 Opterons also held a significant price premium until Core 2 came out. Just because a company can charge a price for a product and find buyers doesn’t mean you have to think it’s worth it (and you’d be a damned fool to buy it when there is no competition and the prices are jacked up, when the competition is around the corner).

EDIT: clarified definition of ‘competitive’ to pricing.

moriz

10 years ago

it’s what happens when there is no competition. at least for the last generation, nvidia put up a decent fight. this time they don’t even have a whimper. combined with the initial supply issue, it makes perfect sense that the prices rose.

the 4800 series was an anomaly, not the norm.

coldpower27

10 years ago

This is why you have choice, your not forced to buy the 5850’s and for many GTX 275/280/285 the 30-40% from HD 5870 is enough to warrant spending money at this time.

Alot of games run fine at 2560×1600 even on those cards.

ltcommander.data

10 years ago

I’m guessing nVidia is going to be very careful not to repeat GeForce FX situation so I don’t doubt that the top-end Fermi will be faster than the HD5870 on release. However, I think the HD4870 proved that raw speed isn’t always the most important factor in a GPU, especially to a company’s bottom line. If Fermi is big and only available in limited quantities it might be far too expensive to match the value proposition of the HD5800 series once ATI starts their inevitable price slashing. The situation will be even worse if ATI makes a quick turn-around and release of their rumoured Cypress refresh (a la RV790 HD4890) which would cut into Fermi’s performance advantage while still being cheaper.

Shining Arcanine

10 years ago

Given how well Fermi performs in comparison to ATI’s current hardware, I doubt Cypress will change things significantly. How much of an improvement do GPUs usually get from a refresh? 20-30% maybe, and Fermi is well above that.

JustAnEngineer

10 years ago

Maybe, possibly, might be if it actually existed…

ssidbroadcast

10 years ago

q[< (or graphics processing clusters, I believe, although I thought that name was _[

MadManOriginal

10 years ago

q[

ssidbroadcast

10 years ago

Same. Yeah on my 24″ monitor, it was bigger than an m-ITX mobo!

wira020

10 years ago

thats weird, on my 23.6″ it’s not that huge.. are u zooming the page so there’s no room at both side?

didnt get the joke at first, now i’m laughing…

MadManOriginal

10 years ago

Yeah he was exaggerating. It’s not nearly 6″x6″+ which is mITX-sized, I was going to post asking him how he’s enjoying his .75″ pixel pitch monitor but didn’t see the need. But you’ve given me the chance to post it!

MadManOriginal

10 years ago

I don’t care as long as it scales down well and they release lower price derivative chips that are worth a damn – no 8600-style GPUs please. At least that would give us some competition in an area other than the high-end.

It almost seems like NV might need to split their line if they really want to pursue the HPC market with chips that have HPC required functionality which do nothing for graphics.

coldpower27

10 years ago

It doesn’t look like DX11 support is particularly expensive so it shouldn’t be a problem.

7600 -> 8600 was an expensive transition due to adding all the DX10 functionality. That didn’t leave much in the way performance improvements.

Hopefully we can get something in the 196-216 shader range with a 256Bit Interface on the 40nm node clocked high enough as a performance part to compete with the HD 5700 Series.

Suspenders

10 years ago

They would have done so, I think, if it made any economic sense to do so. For now, the HPC market is probably too small to be profitable with specially designed chips. Graphics derivatives will have to suffice…

JustAnEngineer

10 years ago

When are we scheduled to get the next marketing brochure and powerpoint presentation from NVidia to make another front page article?

I am personally disappointed by NVidia’s repeated delays in bringing DirectX 11 GPUs to market. This product is at least six months late. I planned to upgrade my graphics card this month, since NVidia’s astroturf marketing representative in the forums promised that GF100 would be available in quantity at retail in January.

Instead, AMD has introduced several more DirectX 11 cards and there’s still zero competition from the green team. Now we’re reading speculation about limited availability in March and graphics card prices seven times as high as a quad-core CPU. 🙁

mesyn191

10 years ago

You might actually be better off hanging on to what you’ve currently got and waiting for the new ATI arch. coming out later this year if you’re seriously thinking about waiting that long. Current new GPU’s from them are “just” a glorified refresh, the R900 might just blow GF100 out of the water. If nothing else GF100 cards will be cheaper by then, as will R870 based cards.

Freon

10 years ago

Meh, there really isn’t much reason to upgrade at the moment. There’s no killer app for a high performance DX11 card. Hold out for another month or two. If nothing else the GF100 release should put a stop to the 58×0 card “gouging.”

JustAnEngineer

10 years ago

A $600+ GF100 in April (which is part of NVidia’s corporate first quarter) does nothing to push down Radeon HD5850 prices today, and I don’t really see it doing that much when it arrives. When will we hear something about the actual mid-range cards that people will buy rather than the über-expensive halo product?

IMO, GF100’s design philosophy is another “a solution looking for a problem”.

swaaye

10 years ago

I think it looks mostly like NV is aggressively leaving their dedication to rendering graphics behind. This beast is meant for a lot more and it cost them big in efficiency. Jack of all trades, probably master of none, and a major bitch to build.

axeman

10 years ago

I suspect you are right. They’re sharing design details before availability in order to hopefully get (some) consumers to wait instead of buying ATi now. Otherwise, there’s no reason to release this.

Voldenuit

10 years ago

I’m waiting because ATI’s prices are ripoff atm. :p

NarwhaleAu

10 years ago

Hardly. A GPU more powerful than anything from a previous generation for ~$400? They aren’t exactly pillaging the market.

Agreed, their prices are “higher” than I want to pay (5850 = $220-$230 is my sweet spot), but you’re a fool if you think this stuff should be priced lower than it is right now.

Freon

10 years ago

If you have ~$290-$400 to spend, I don’t think there is any logical choice besides the 5850 or 5870. Hard to argue with that.

It’s unfortunate the prices got inflated after release, but a mix of lack of competition and poor yields will do that. Good for AMD and their partners for at least making a profit. Since team green has been ahead on the general case (see Steam hardware surveys) I kinda see it as a good thing for long term competition that AMD has a decent lead for a while. They need to hold it in the midrange, though, and the 5770 hasn’t been as obvious of a win as the 5850/5870 since its performance isn’t exactly mind blowing on current gen games.

ClickClick5

10 years ago

They are trying to get a single card to basically do ALL of the computer’s functions….good idea on paper and theoretical profit, but in reality, it can die real fast.

If they price this fella near the $600 range, ATI will kick their prices down, in turn creating a bigger struggle for Nvidia.

(I like it!)

Meadows

10 years ago

Why would they put anything near $600 again, especially when this time they’re barely going to be ahead of the curve by the time they even start shipping?

Game_boy

10 years ago

Same reason most of the GTX285s at Newegg are over $400 when the majority of the 5850s are under $300. Because some people are willing to pay.

coldpower27

10 years ago

They decided not to compete with ATI on the 5850’s and 5870’s they ceased production, and just let whatever the price of remaining stock sells at increase. nVidia was smart enough not to enter a price war with a smaller die product.

DancingWind

10 years ago

Well .. if they will price them at 600 bucks depending on performance ati wount have to lower anything … and thats BAD 🙂

StashTheVampede

10 years ago

This is exactly my thoughts on the new tech: VERY advanced, SUPER advanced. So far advanced that it won’t ship to “consumers” until a rev 1.1 of it is made so that it can be produced en masse.

For those willing to pay the premium, it will be available (my guess is $599 MSRP) and it probably will go right to high-end OEMs first and fewer on the retail chain.