Posted
by
CmdrTaco
on Wednesday March 09, 2011 @11:40AM
from the my-mom-thinks-i'm-super dept.

RedEaredSlider writes "NVIDIA outlined a plan to become 'the computing company,' moving well beyond its traditional focus on graphics and into high-profile areas such as supercomputing. NVIDIA is making heavy investments in several fields. Its Tegra product will be featured in several mobile devices, including a number of tablets that have either hit the market already or are planned for release this year. Its GeForce lineup is gaming-focused while Quadro is all about computer-aided design workstations. The Tesla product line is at the center of NVIDIA's supercomputing push."

Um, yes, of course. Because they have 292 cores instead of 4/6/8. While those two designs do remarkably different things, the point remains, for the tasks that GPUs are well suited, you cannot possibly beat it with an Intel/AMD.

Texture units clearly aren't cores, they're largely passive data pipelines. If you really look at a GPU more closely you can of course get far more complicated, The AMD architecture at the high end has two control flow cores with 24 SIMD coprocessors that execute blocks of non-control flow non-memory work. It is true that even those are hard to qualify as cores given their limited capabilities.

Yes... because all algorithms that anyone actually uses are 100% parallel.

Throwing a shitton of SIMD units on a chip isn't that cool anymore. DSP's have been doing it forever. Real workloads require fast sequential code performance, and a GPU will have truly embarrassing results on such workloads.

You're right. The new NVIDIA Teslas (C2070) have 448 cores, not 292.
If you're doing work that a super-computer needs to be doing, your software is massively parallel. Otherwise, run it on your laptop at home.

If all you're measuring is pure FLOPS, then here are some numbers: Cray X1, 250MFLOPS. nVidia Fermi: 1GFLOPS. ARM Cortex A8: 10MFLOPS. Of course, that doesn't tell the whole story. Getting 250MFLOPS out of the Cray required writing everything using huge vectors. Getting 1GFLOPS from Fermi requires using vectors within independent concurrent processing tasks which access memory in a predictable pattern and rarely branch.

GPUs are not magic, they are just optimised for different workloads. CPUs are designed to work well with algorithms that branch frequently (every 7 instructions or so - so they devote a lot of die area to branch prediction), have good locality of reference (cache speeds up memory in these cases), and have an integer-heavy workload. GPUs generally lack branch prediction, so a branch causes a pipeline stall (and, on something like Fermi, if two kernels take different branches then you drop to 50% throughput immediately). Their memory architecture is designed to stream large blocks of data in a few different orders (e.g. a texture cube, pixels in order along any axis). So, depending on your workload, either one may be faster than the other.

5 of Top 10 most green supercomputers use GPUs:Green 500 List [green500.org]

Each GPU is very high performance and so high power. Performance / watt is what counts andhere GPUs beat CPUs by 4 to 5 times. This is why so many of the new supercomputers are usingGPUs / heterogenous computing.

They also can do matrix computations up to 40 times faster than a CPU. This is incredibly useful for scientific applications. I would use this if I had a Nvidia card since several things exist to use CUDA with Matlab, until then I have to teach myself OpenCL.

Today, parallelism seems mostly limited to "scientific" applications. But I think possibly, our computing model may evolve towards more parallelism for lots of new applications, that compute more like a brain - that is, massively parallel pattern matching. Of course we'll still use more direct algorithms where applicable, such as word processors and web browsers, but as computers integrate better with the natural world they'll need much more algorithms rooted in signal processing, pattern matching, and ge

Exactly, CUDA has become a major player in the field of supercomputing. Just like IBM's PowerPC/BlueGene systems. With support for Floats/Doubles and amazingly fast math functions and tons of data in Matrices, the only other way to do all that math fast is a FPGA or a PowerPC chip.

Frankly, its the only thing that has be eyeballing Nvidia cards these day. I need to learn more about OpenCL as I think it may be a player here shortly once AMD's fusion line starts getting the better desktop processors. Then we can use the GPU on die to speed up matrices calculations.

My GPU has 1536 shaders, consumes ~220watts. My i7 has 4cores/8threads and consumes ~130watts.On several of my distributed tasks, the SSE2 version takes about 36 hours on one core, or about 6 hours per task average if using all 8 threads(assuming an optimistic 50% scaling from hyper-threading). My GPU, on the same work units, takes only 1min 40sec. The GPU is about 216 times faster and slightly under twice the power. 60% more power draw, 21600% better performance.

Figured it out. The ~36 hours was with the non-SSE client. The SSE2 client runs near half that time, so closer to 100 times faster. With my i7-920 rated near 30GF and my ATI card rated near 2.75TF, this would be about in line(2750/30=~90). Not all GPU clients run this much faster as not every work load scales perfectly, but this one client really shows off how much faster a GPU can be.

I'm nowhere as qualified as everyone here, but Nvidia seems to be pushing more for the pararell supercomuting with rows of Tegra chips working in unision. They had talked about supercomputing when the Tegra 3 was announced.

The company that was setup by disgruntled Silicon Graphics gfx division employees because the SGI gfx tech was suffering from toxic internal politics and the push into Big Iron and Storage... is now moving into 'Supercomputing'. Hope they bring back the Cube Logo:)

I doubt it would be truly useful, but I'd like to see a 2 million core processor. Arrange in, let's see, a 1920 x 1080 grid. The 8008 used 3500 transistors per core, so even before memory, it'd be a 7 billion transistor chip.

More practical might be a 128 x 128 core processor, using a modified 386 or 68020 for cores. That could be less than 5 billion transistors. Each processor is simple and well known enough that hand optimized assembly begins to make sense again.

"Supercomputing" almost always means "massive Linux deployment and development." I will spare critics the wikipedia link on the subject, but the numbers reported there almost says "Supercomputing is the exclusive domain of Linux now."

Why am I offended that nVidia would use Linux to do their Supercomputing thing? Because their GPU side copulates Linux users in the posterior orifice. So they can take, take, take from the community and when the community wants something from them, they say "sorry, there's n

nVidia shuns linux users? They may 'shun' those that can not have any non-GPL code, but they do make a higher performing and far more feature rich driver for their cards for Linux, FreeBSD and Solaris and keep it (for the most part) up to date. If you don't like it, there are alternatives.

Gotta love the rabid GPL fans. The GPL doesn't mean freedom for everyone to do things the way you think they should be done.

I know they publish a driver for Linux. Trouble is, I can't use it because they won't tell us how to make it work through their "Optimus technology." I had high hopes for my newest machine only to have them dashed to bits with the words "we have no plans to support Optimus under Linux..."

Let's just let the market forces do their thing here. Personally, I tell anybody I hear thinking about buying NVIDIA to buy AMD instead. Sure, you might get a few more fps today, but tomorrow you may find your card unsupported by the manufacturer with no documentation available to end users on how to fix problems they may encounter in the future. NVIDIA dug their grave, let them sleep in it.

I tell anybody I hear thinking about buying NVIDIA to buy AMD instead. Sure, you might get a few more fps today, but tomorrow you may find your card unsupported by the manufacturer with no documentation available to end users on how to fix problems they may encounter in the future.

AMD no longer support my integrated ATI GPU; I had to manually patch the driver wrapper source to make it work after recent kernel changes and I'm guessing that before long it will be too rotted to work at all.

There is an open source driver but it doesn't work with my monitor resolution and performance is awful. So my solution before I discovered I could patch the source was going to be buying the cheapest Nvidia card I could fit into the computer.

I bought an ATI card (HD 3800) and its Linux driver sucks, I can't use it for gaming or 3d arts. (If I try to run blender, it won't display some menu elements, and looks totally broken.) It only works decently on Windows. So the funny thing is, I can't use an opensource software (Blender) with a video card that's supposedly opensource friendly on an opensource operating system (Linux; I tried it with several distros).The funny thing is that only nVidia and Intel have decent drivers for Linux. So it's not a

Optimus is actually a great tech, if you have Windows. It switches automatically between an integrated graphics card and a discreet nVidia card, saving battery power when you don't need the heavy duty GPU, but giving you the power when you need it.
An Optimus-equipped laptop will run Linux. I know, as I am writing this from a Dell XPS 14 running Ubuntu. You will not, unfortunately, be able to use the discreet card. The integrated card, however, works fine. It has more than enough power to run Compiz, which

Shouldn't it be easy to have two xorg.conf's: one for the integrated graphics and one for the discrete card? You could start the discrete xorg.conf when you want to run a game and the integrated one when you don't. Maybe (I'm not a guru by very far, so I'm going on a limb here) you could even have them running on different tty's and switch semi-on the fly. Would the discrete be shut down if you are on the integrated tty? Some explanation: I reserved some space on my nettop with ION2. Some day I might want

nVidia will not open-source their driver package. At least not anytime soon.

Performance comes from hardware, but optimization of the hardware comes in the form of good software; or drivers in this case. When you write a good set of drivers in an already cut-throat industry, those optimizations become trade secrets. Like all closed source code or a chefs recipe, it would be suicide to reveal what algorithms are being employed and where. Also, some drivers contain cross-licensed technology in which royalties

One thing I've been really keen to know is what the utilisation is like on those supercomputers. We know they can do LINPACK really fast and more efficiently than the CPUs do, that's what you get for having a high ALU density, a few threads per core and wide SIMD structures. The question is: out of the algorithms that people intended to run on those supercomputers, then what level of efficiency are they hitting.

Are they still a net gain over a standard opteron-based machine? They may be, but I don't know th

I've been working with their GPGPU push for a couple of years now. What I notice is they are very good at data parallelism with highly regular data access patterns and very few branches. While they are technically general purpose, they don't perform well on a large portion of high performance tasks that are critical even in scientific computing which are generally compute-bound. This creates some really annoying bottlenecks that simply cannot be resolved. They can give tremendous speedup to a very limited s

The number of "stream processors" doesn't necessarily scale as a linear performance metric. As an example (using dated lower midrange hardware, as it's what I still know), a Radeon 3850 sports 320 stream processors. My Geforce 9600GT advertises 64 SPs, yet pulls ahead of the 3850 in many benchmarks. It's not as simple as quoting a number used in marketing material as a universal metric, any more than a 3 GHz Pentium 4 is 50% faster in real-world performance than a 2 GHz Athlon64.

Well, I'm still failing to see nVidia putting their money where the mouth is on that one. The last time I checked their OpenCL implementation, a lot of the demos that were ported over from CUDA ran slower - 10 times slower in the case of the volume rendering example. So this is not how you get to impress people who are solely concerned with performance. Oh, and unlike the CUDA compiler, the in-process OpenCL compiler even segfaulted on me within about 4 hours of playing with nVidia's OpenCL implementation (

NVIDIA is dragging their heels with OpenCL. They have yet to publicly release an OpenCL 1.1 compliant driver, despite the fact they have had a beta version for about 8 months. They are also slow to respond in their forums, and many problems/bugs that were reported at least a year ago still have not been fixed. They are throwing their weight behind CUDA, plain and simple. CUDA 4.0 just came out, and has some phenomenal technologies that make me wonder if OpenCL has a fighting chance.

They chose to not release the necessary specs to allow others to utilize their hardware the way Intel and to a lesser extent AMD did, and as the current smartphone trend has shown, locked in is the same as being locked out.

... when GPGPU was in its infancy and I was lusting to play with that stuff; that's about 5 yrs ago, at most.

Alas, our semiconductor department was so content with its orthodoxy and cluster running Fortran WTF hairballs...:`(Ah well, no point crying over that spilt milk... it just takes patience and pig headedness...:>

Eh, it's a logical step. Graphics is, has been, and will always be about parallelism and matrices. Supercomputing is almost always about simulation and high-order computation, which works out to the same thing. Really good graphics hardware, thirty years ago, or now, or thirty years in the future, will always be good science hardware, and supercomputing is driven by science.

Nvidia linux support is getting fixed by nouveau anyway. They reckon the GTX 5xx/4xx series is already upto the same level as the 2xx/9xxx/8xxx cards for drivers. As more resources get spent implementing opengl features in gallium and less on reverse engineering the cards, feature parity with the closed drivers will be achieved. I reckon in 1-2 years Nvidia card open source support will be at near parity with the closed source drivers.

I think it's a great idea. Intel keeps putting out chipsets with video on-board, and this has to hurt nVidia's core business. If they make inroads into other areas where Intel is now dominant, and can do it without going broke, then that puts them in a nicer position.

Didn't their licenses expire on some bus or other preventing them from making chipsets for intel CPUs? The press release I saw said the recent $1.5b deal excluded certain chipsets. They probably aren't too interested in making AMD chipsets these days. Large racks of MIPS/ARM CPU & Fermi GPU systems makes sense to me. Top-end graphics cards will die off soon thanks to consoles & hollywood. Even multi-monitor gaming wont slow that by much. In a generation or two even low-end graphics cards will probab

Exactly but because most newer games are made with 'console capable' engines designed to run on 1080p you'll see less and less games makinguse of the extra power PCs have (especially as not everyone has higher spec pcs). The same is true for console games with most games beingdesigned for xbox360 capabilities and not as much effort into improving that for ps3 gameplay.

Its not entirely a bad thing as there is a very small chance some of that effort might be re-channelled into improving gameplay and not just

In a generation or two even low-end graphics cards will probably have the power to play 1080p games at full detail.

I suspect you are right, and that there will be a race for power efficiency like there is today on tablets/phones. "High end" will still exist, but the definition will change to power/performance rather than just raw performance.

And of course, powerful video cards will always be appreciated in the rendering world.

In a generation or two even low-end graphics cards will probably have the power to play 1080p games at full detail.

They do already, so long as you're playing games from 2003. The reason why you can play many modern games on max settings on mid-range cards is that those games have been crippled for the console market and simply cannot benefit from the power of a high-end card.