Posted
by
kdawson
on Friday October 13, 2006 @05:20PM
from the that's-fast dept.

ludd1t3 writes, "The Folding@Home project has put forth some impressive performance numbers with the GPU client that's designed to work with the ATI X1900. According to the client statistics, there are 448 registered GPUs that produce 29 TFLOPS. Those 448 GPUs outperform the combined 25,050 CPUs registered by the Linux and Mac OS clients. Ouch! Are ASICs really that much better than general-purpose circuits? If so, does that mean that IBM was right all along with their AS/400, iSeries product which makes heavy use of ASICs?"

Those 448 GPUs outperform the combined 282,111 CPUs registered by the Linux and Mac OS clients. Ouch! Are ASICs really that much better than general-purpose circuits? If so, does that mean that IBM was right all along with their AS/400, iSeries product which makes heavy use of ASICs?"

That's pretty lopsided, but I suppose some of it could be explained away by GPU's not chewing through OS code
and having to play nice for memory, so they'd be a bit more efficient. Could be most of those Linux and MacOS
s

And I have a feeling that this is also use-related. There are a lot more things which demand processor power than GPU power. I bet there are tons more spare cycles on a GPU than on a CPU. I mean really - what maxes out a GPU? I'm guessing just a handful of games, while many other things rely on the CPU quite heavily.

I don't really think that's the reason. GPUs have very fast RAM, lots of it, are dedicated to specific tasks and are very fast at these specific tasks. CPUs have lots of stuff in there that does nothing related to what the GPU is designed for like logic circuits and whatnot. Dedicated hardware is always way better than general purpose hardware.

It has nothing to do with memory bandwidth or use. The ASIC is about 1000 times faster than the CPU because it is using dedicated hardware designed to run very fast and parallel in 3D image processing, which is almost exactly the same problem as folding protiens.

Unless you are saying all CPUs are pegged at 99.9% use, or the GPU has memory three orders of magnitude faster then you're just looking at a effects that make a few percent difference here and there. The simple fact is the GPU is insanely faster at solving specific problems (3D processing) while it simply cannot ever run an operating system.

i'm sure load difference is part of the picture. but load can't be a dominating factor, because while i'm sure there is a load difference it doesn't seem likely that it even approaches the same 50x difference that the performance gap has (if a non-Folding CPU averaged only 2% of its cycles unused, the GPU would have to be used 0% of the time to get a 50x difference).

i'm guessing that the better parallelization in the GPU together with the fact that the average GPU participating in Folding is much more mo

Look at the first two letters of the acronym: Application Specific. A screwdriver and a swiss army knife will both turn a screw, but the screwdriver is going to be much more efficient at it. GPUs are finely tuned to rip through massive volumes of floating point vectors and not much else. It just so happens that the folding project also fits this desctiption and as such is an excellent use of an otherwise wasted resource.

You can look at the statistics many ways. Here's the GFLOP/CPU catagorized by OS:

1. GPU: 65.4632. Linux: 1.2193. Windows: 0.9484. Mac: 0.511

Of course, GPU beating the hell out of CPU in such tests is no surprise. It's pretty much a massive parallel vector engine. I'm more interested in seeing how PS3 holds agains all other guys when it comes out. They have a folding client for PS3 already.

It'll probably put out some pretty impressive numbers. Except for two problems. One: nobody will be able to buy one (or more importantly, nobody who would run F@H will, since so many are the Sony-hating slashdotter type). Two: most people have their console off when it's not being used for gaming. And I don't except it would do much number crunching beyond the game if it lives up to the hype.

A modern GPU might have as many as 48 "fragment shader" processors inside it - so there are really around 21,500 processors versus 282,000 CPU's. Then each GPU processor works in full four-way arithmetic parallelism - so it can do arithmetic and data move operations on four numbers just as fast as one. So with the right mapping of algorithm to processing, you have 86,000 floating point arithmetic units...they only need to be about 3x faster than the CPU's.But these are not general purpose processors - in

This only makes sense if you are going to count all the ALUs and SIMD units separately for the CPUs, too. Your basic CPU can issue at least two floating point calculations in parallel and/or use SIMD units to operate on vectors as large as 128 bits. So the capabilities you ascribe to a GPU are not uncommon in a CPU.

The differences come in the quantity, not the kind. A CPU gives over a lot of transistors to caches and complex logic units. A GPU does not care much about logic and lacks caches.

The reason they use GPUs is that they're very powerful and better-suited to this type of computation than the CPU. The other specialized chips aren't. Maybe some of those new physics accelerators could be used, though I don't know enough about them to know if they'd be useful or not.

Good one... but I also wonder why anyone is throwing around the term "ASIC" in this article. A GPU is obviously not an application-specific circuit, which is clearly shown by the fact that it can be programmed to process graphics, or protein folding, or numerous other tasks. A GPU is a general-purpose processor like a CPU, it just happens to have different numbers and kinds of execution units.

Partially true.
The GPUs of today now have some general purpose circuits, but they are far from optimized and the execution unit count is skewed to the point that these processors would never, ever be able to run, say, an OS with anything approaching efficiency. FAH benefits from the insane amount of Floating point power because FAH is nothing but a pure FP stress test. They had to heavily modify the code to run on these babies, basically tuning the problems into vector information and letting the GPU do its thing, throwing. Only a few areas involve a need for CPU style processor, which is functionality provided only on these new cards. So please, please realize that even though these cards do not a contain a "protein folding circuit", they did modify the program to run on what it does have: 4x4 matrix operation units for multiplication and addiction.

Okay, this far down the page and nobody mentions it.GPUs are designed to perform floating point operations on 4x4 arrays of floating point numbers. This allows them to do the math required to scale, rotate, and project 3d vectors onto a 2d surface. Follow so far? These circuits not only have fast memory ties and huge parallelism, the are also hard wired to perform some of the exact same operations required by FAH in only a few clock cycles instead of 44 (on the P4, 14 on the Opteron).Being massively para

A vehicle can be super efficient when designed to take one person from point A to point B over smooth terrain. When you start adding requirements like carrying a family of people with 50 cubic feet of junk and an attached trailer over both smooth terrain and off road, your efficiency drops tremendously. [/obligatory car metaphor]The more specifically you can narrow down the problem set you're trying to solve, the faster you can solve it. The more specific your tool, the better it will work on that problem

So, will someone please create a really pretty 3D screensaver representing the folding calculation process? I'd love to see a represention with hi-res lighting and texturing, full transforms, and user-scalable views at 400 million triangles/sec.. Thanks.

The folding team has done this, and it will be a free download for the PS3 version. The Cell processor runs the Folding application itself, and the graphical representation of the protein folding calculations will be handled by the GPU with a pretty display.

Until you realize that Asus computers are made in the same Chinese factory as Apples, are cheaper for the same hardware, and aren't made of ugly white plastic. I can stand white plastic in small things like iPods but for a whole computer it's plain ugly.

So Macintosh hardware can do everything a Mac can do and everything a PC can do, making the Mac the superior hardware choice.
News flash. PC hardware can do everything a Mac can do and everything a PC can do, making crippled Mac operating software the inferior software choice. Please review the thousands of posts to the OSX86 project immediately after Apple released MacIntel hardware, and before they tightened down the screws on their software interface to TPM authentication.

So since we are talking games...The top of the line iMac comes with a 7300GT Nvidia card. You can up it to a mid range 7600GT card at most. Now with Windows installed, do you think that the 7600GT would drive the 24" monitor as well as oh say a dual 7900GT SLI setup?Oh, get a Mac Pro you say. Well except that you still have no SLI or Crossfire support and extremely limited choices of video cards. And it costs an arm and a leg. Sure it's cheap for a dual Xeon workstation - but if you just want to play games

GPUs are, for the most part, highly specialized parallel computers [wikipedia.org]. Virtually all modern CPUs are serial computers. They do essentially one thing at a time. Because of this, most modern programming languages are taylored to this serial processing.

Making a general purpose parallel computer is very, very hard. It just so happens that you can use things like shaders for more than just graphics processing, and so via OpenGL and DirectX you can make GPUs do some nifty things.

In theory, and indeed often in practice, parallel computers are much, much faster than their serial counterparts. Hence the reason a GPU that costs $200 can render incredible 3D scenes that a $1000 CPU wouldn't have a prayer trying to render.

Folding is what's know as a rediculously parallel problem. That is, it can be broken up in to small subproblems that can be distributed among many processors with a minimal amount of communication among processors. It also benefits from not requiring a lot of branching (if/switch statements and such), which GPUs generally do not handle well.

Many problems, (I'd argue MOST problems) do not cater well to these kinds of restrictions. So, while a GPU is well suited to crunching away on pieces of the folding problem, it's going to be lousy at doing the day-to-day stuff you do with your computer.

I actually installed boinc with seti on several of my machines last night and it worked quite well to heat part of the house (us Canadians need to turn the heater on earlier). Took a bit of time to get started, but it was nice and toasty in the morning.

Does anyone know if this method is less efficient in generating heat than using a apace heater? Slower perhaps...If you're going to use energy by turning on the wall heater anyways, why not use it to crunch some numbers?

P.S. - all electric heaters have the same efficiency, assuming no energy is "wasted" as visible light. The difference between them basically comes down to radiant vs. convection heat. Which is more useful depends on your circumstances. Radiant heat has the advantage of heating you and not the air.

I'm not sure if you meant to exclude heat pumps from this statement, but if not, heat pumps can achieve 3-4 times the efficiency of resistance heaters. Here's a handy link [gsu.edu] that explains it in layman's terms.

I actually installed boinc with seti on several of my machines last night and it worked quite well to heat part of the house (us Canadians need to turn the heater on earlier). Took a bit of time to get started, but it was nice and toasty in the morning. Does anyone know if this method is less efficient in generating heat than using a apace heater? Slower perhaps..

Using your CPU as a space heater is not a bad idea. It is 100% efficient. Every watt it consumes gets turned into heat. Before someone says "but the cooling fans are wasteful" let me remind you that the air moved by those cooling fans will eventually come to a stop (inside your house) as a result of friction, releasing its energy as heat in the process.

Depending on what type of space heater you use, and the construction of your house, your computer can be more efficient than many other electric space heaters. Since none of the energy "consumed" by your CPU/GPU is converted to visible light, none of it has the opportunity to leave your house through your window panes (assuming you have IR reflective glass). Contrast this to quartz and halogen space heaters which produce a fair amount of visible light.

In much the same way, incandescent bulbs match the efficiency of compact fluorescents during the winter months. Every watt "wasted" as heat during the summer is now performing useful work heating your house. (Before someone says "you called a quartz/halogen space heater inefficient because of its waste light, and now an incandescent efficient because of its waste heat!' let me say that the space heater's light is not useful light, while the bulb's heat is useful heat (during the cool months.))

Yes, everything you say is true, but (at least here in Toronto) electricity is one of the more expensive ways to produce heat. For the purposes of heating, natural gas is about 1/3 the price on a watt for watt basis. So while you're right, those incandescant lights are not making "waste" heat in the winter months, their heat is 3x more expensive than that produced by your furnace. You will still save money by using more efficient ways of producing light.(And before you tell me that some percentage of my fur

Houses don't need to be designed for HE furnaces -- they can be installed in ANY house with a forced air heating system. But they do need to be installed *correctly*. (vented and drained -- they need to suck in outside air, so you need two pipes, and they produce quite alot of water from their condenser as it extracts the latent heat of condensation from the exhaust gases -- composed mainly of CO2 and H2O. This liquid water needs somewhere to go.)

Heat pumps can be effectively more then 100% efficient when heating a home however their efficiency goes down as the outside temperature gets colder. Typical numbers are 3 to 5 watt of heat for every watt of power in temperate climates.

The guy just doesn't take in account the fact that heat pumps move calories from point A to point B.

Usually, when such people try to compute the "efficiency" of a heat pump", they take two things: the power draw from the electrical outlet (which is the energy loss), and the heat that comes in (the energy gain). This pretty much always yield an efficiency > 1 in good conditions (won't work in siberia for example), but the part missing is that the heat pumps don't generate heat from the electricity they d

You are not getting more energy out then you put in because it has to do with moving heat from one area to another when the temperature difference betweenthe two is relatively low allowing higher efficiency then a straight heater. I probably should have said that the coefficient of performance can be in the 3 to 5 range making it very competetive with gas heating systems. The COP of an electric heater is always 1.http://en.wikipedia.org/wiki/Heat_pump [wikipedia.org] http://en.wikipedia.org/wiki/Coefficient_of_perfor mance [wikipedia.org]

Heat pumps can be effectively more then 100% efficient
Perpetual motion machine anyone?

Key word is effectively. If you only look at the energy you have to pay $$ for, it appears to be more than 100% efficient. (the ambient heat is free. It's part of the system, but you don't care because you aren't paying for it.)

In much the same way, incandescent bulbs match the efficiency of compact fluorescents during the winter months. Every watt "wasted" as heat during the summer is now performing useful work heating your house. (Before someone says "you called a quartz/halogen space heater inefficient because of its waste light, and now an incandescent efficient because of its waste heat!' let me say that the space heater's light is not useful light, while the bulb's heat is useful heat (during the cool months.))

Not true. You aren't taking into account Power Factor at all... Not that I'm surprised, as most people don't understand it.

With switching power supplies, it's common to see PF in the range of 0.4, as opposed to fully-resistive electric space-heaters (and incandesent lightbulbs) with a perfect 1.0 PF.

Residential customers are lucky, in that they don't get charged for PF losses by the power company, while companies certainly do. However, it's still highly ineffecient, even if you aren't paying for it directly.

And besides that, electric heating is almost always more expensive than conventional heating, like natural gas, or electric heatpumps.

PF "losses" are not losses, it is power that is in effect returned back to the source. One can simply treat it as power that isn't delivered at all. Therefore the original posting can be considered as essentially correct.

PF "losses" are not losses, it is power that is in effect returned back to the source. One can simply treat it as power that isn't delivered at all.

Electricity isn't water, you can't return it to the source.

With a lower power factor, you're either forcing the power company to install huge banks of capacitors, or making the generators work that much harder for fewer watts actually delivered/used. That's practically the definition of "inefficent".

In the case of computer power supplies that use a rectifier and capacitor combination for AC to DC conversion which is almost all of them, they do not look like an inductive or capacitive load but have a lower power factor caused by drawing current in pulses instead of a sine wave. The result is a higher RMS current then necessary for the load which causes increased line losses and requires higher current capacity for a given power. In extreme cases, distribution transformers can go into saturation causin

>Using your CPU as a space heater is not a bad idea. It is 100% efficient.

Not really. Consider exergy [wikipedia.org]. Yes, your CPU is just as efficient as any electric space heater. However, consider that the alternative is probably burning natural gas or oil in a furnace. If you burn fuel for heat, 90%+ of the chemical energy goes to producing heat (the rest is lost as unburnt hydrocarbons in the exhaust). If you burn fuel to spin a turbine at a power plant, only about 40% goes to electrical energy, and unless it's a cogeneration plant which uses the waste heat for industrial purposes, the rest is lost as heat up the smokestacks. So, starting from the fossil-fuel source, electrical heating is less than half as efficient as burning fuel for heat. If you do need to heat using electric power, it's much more efficient to use that electricity to pump heat in from a lower temperature outside than it is to turn that electricity itself into heat.

If you are stuck with electric (non-heat-pump) heating in your house, however, you are correct: There is absolutely no reason not to run your CPU or any other electrical appliance full tilt.

Using your CPU as a space heater is not a bad idea. It is 100% efficient.

Do you ever think you might not know everything? 100% Efficient doesn't mean anything in this context. If it means anything, then everything that exists is 100% efficient at producing heat, due to the second law of thermodynamics. However, you are not considering the fact that CPU's produce RF and other radiation which escapes from the system you are trying to heat, and thus to get that 100% efficency you would have to mean that it

In terms of efficiency, it is always worse to burn fuel to produce heat to do work to generate electricity to produce heat. In terms of price, when natural gas prices rise high enough, electrical heat is already better. I mean it comes and goes, but when heating oil and natural gas prices peaked here (California) it was cheaper to heat with electricity and just a bit up from here there's a Canadian talking about the same thing.

Yeah, but most new power plants these days are natural gas, and if you live in California, then you are just in a energy nightmare seperate from the rest of the world where you regulate, get screwed; deregulate get screwed far worse; regulate and get screwed.

If my computer idles at 150w and runs FAH at 100% cpu at 200w and I need 20h to generate 1 unit,I am spending $.10 for the extra kw hour roughly. In the summer I waste money on AC in the winter I savegas money on heat. If I put my computer in 4watt S3 standby for 15 of those 20 hours, I can save a lot more.FAH calculations do not depend on "free" "idle" computer power, they depend on users spending money to generatethe results.

I actually installed boinc with seti on several of my machines last night and it worked quite well to heat part of the house (us Canadians need to turn the heater on earlier). Took a bit of time to get started, but it was nice and toasty in the morning.

Does anyone know if this method is less efficient in generating heat than using a apace heater? Slower perhaps...
If you're going to use energy by turning on the wall heater anyways, why not use it t

Q: Are ASICs really that much better than general-purpose circuits?Yes, that's why anyone would bother.Q: If so, does that mean that IBM was right all along with their AS/400, iSeries product which makes heavy use of ASICs?A: Yes and no. More relevant will Cell pave the way to good price/performance. The problem with the iSeries line is not so much performance, but price/performance For the same cost of an iSeries config you can cluster a bunch of xSeries and beat it through sheer brute force of CPUs. I

Not completily true.I run using both hardware types. xSeries can not complete in the shear thoughput of grunt data processing - Billions and Billions of records. Yes, 1000 PC can process billion of records, but then your cost pass that of 1 iSeries.

Now when you limit the processing to the type of computer that best handles that type of work:

xSeries is better in single/one off processes, like a web page request, or finding the Lat/Long of Address, where all the information is fresh look up each time. So s

The article links to the site to download it, and one of their faqs is asking about supporting nvidia, and they said something to the affect that Nvidia doesn't have as many pipelines or something. My nvidia card (7950, really 2 7900's put together) is a pretty nice card. I'm wondering if it really isn't worth it to optimize to the nvidia, or if there is some bias from the guys who wrote this.

The Nvidia GPU supports 64-bit floats that are more accurate than ATI's implementation, and have done so since the GF6x00 series came out. As for the number of pipelines, that's largely irrelevant. CPU's don't have 48 pipelines, and FAH works there. So it's most likely a case of the FAH people being sponsored by ATI, and ATI wanting to advertise their so-called flagship product

They dont support 64bit floating point, and never have.And one big reason why it wouldnt fly on nvidea cards is that the latest ATI cards are MUCH MUCH faster when doing conditional stuff, which DOES happen in that kind of statement.

The raw throughput is irrelevant if one cards slows down an order of magnitude just because it cant handle an "if"...

Take one hundred people with computers, and who have an interest in Folding@Home. Offer them a CPU-driven version of the app, and 100 computers will be running the CPU-driven app, regardless of the age/performance of the machine.

Now, offer them a GPU-driven alternative. For the most part, the only people that will install and run it are those with a fancy-schmancy video card capable of running it, and for the most part, the only people that have a fancy-schmancy video card capable of running it have high-performance computers as well (or at least more recent computers that came with compatible cards.)

So let's say that's ten out of the hundred, and those ten are statistically likely to have had the highest-performing CPUs as well; so you've pulled the top ten performers out of the CPU-client pool, and thrown them in the GPU-client pool. Even if you didn't switch those ten people over to the GPU, you could probably isolate those computers' CPU-client performance numbers from the other 90 and find that they're disproportionally faster than a larger number of the slower computers.

There's still more to the story, of course, but you really are taking the highest-performing computers out of the CPU pool and into the GPU pool. The exception would be high-performance servers with lousy/no graphic cards, but those are likely working so hard to perform their normal business that Folding@Home isn't a priority.

Your logic is fine, but you are overestimating the effect you mention if you really think that it "solves the mystery".

500 users out of 25000 means that you have at most taken the 2 percent highest performers out of the CPU pool. If we assume that those 2 percent have computers that are 5 times as powerful as the average computer, then we have lowered the average performance of the CPU pool by roughly 9%.

This 9% systematic effect will lower the reported performance superiority of around 5000% of the GPU vs. the CPU to something like 4500%. I.e. it doesnt change the result at all (which seems to be that GPUs kick ass for these applications).

So when are we going to see (x86/64) motherboards with a socket for a standard processor and a socket for a vector processor?
Couldn't we finally have graphics cards that only give output to the screen and separate vector processors with a standardized interface / instruction set?

This is what I also was thinking. If stream processing is so damn useful for so many things other than graphics, why should the graphics people have all the control over the microarchitecture and instruction set architecture? There should be a standard ISA so that the software end is uniform, and I assume that AMD and Intel would be the ones to make standard hardware interfaces for each one's own architecture, so that hardware in the same generation is mostly interchangeable.

Well, I guess we have since Altivec/SSE/3DNow came out. Except you don't need the extra socket because it's built onto your regular processor. Same as how we used to have a socket for a floating point unit. Then they got tired of that and started putting one or more on the main CPU.

Ah, so you want to take the GPU and put it on the motherboard. Why didn't I think of that? Maybe there's some advantage to having a GPU and it's own dedicated high bandwidth memory in the same place?Good luck running your game on an off board GPU.

We have vector units on CPUs to help with the things that CPUs generally do. We have GPUs on video cards to do the things that video cards do. There are a few applications where the GPU is useful for things other than video, but not many. In fact, the vector u

That does sound impressive.Even if, as i imagine, many of the linux clients aren't exactly top end CPUs. Usually it seems the top-end GPU is as complex as the top-end CPU of the time. I know the transistor count was close when i built my last complete setup. Surely my 1.6 P4 (soon-to-be-linux box) get trashed for complexity/throughput by a new video card by now:(

That and like others said a targeted client and screaming memory for the GPU is gonna rock. Would be closer if the CPU client was aimed at a parti

In short, if you're performing one simple task trillions of times, many very simple, highly optimized processors with dedicated memory do the job better than even a similar number of much more capable processors that have to play nice across a whole system.

And this ignores the number of old couple of hundred megahertz systems that people don't use anymore so hand over to the task vs. X1900s being the very high end of ATIs most recent line.

For massively parallel tasks like rendering pixels, folding proteins, compressing frames of a movie, etc. I'd absolutely love large quantities of a simple processor. For most other tasks, given present technology, I'd still side with fewer more able processors. Either way comparing 448 of something with 56 processors within it to 25,000 single processors and saying, "But 448 is SO much less than 25,000!" is an unfair comparrison.

I saw the article a few weeks back about having a GPU client for the 1900xt. Decided to try it out on my mac pro booting in XP. Must say it may be giving off great numbers, but I'm not sure how it's affecting the card itself. Once the client starts the fan is running non-stop (which is obviously understandable).Not sure how my vid card will be in a month or so because i don't think they're designed to be in game play mode for more than 12 hours or so (can't think of anyone who plays more than 12hours at

So they hadn't kept up to date with patches. How exactly is an IBM/iSeries fault.I've worked with the S36/S38/AS400/iSeries/I5 for decades. The thing has always been rock solid. Not quite in the mainframe realm of up time, but in 20+ years I have only seen the machine down twice unplanned. Both cases due to hardware borking, fscking disks.

You can refresh all you like, no problem.

The downside is the lack of a decent gui, screen scrapers just don't cut it. IBM should make the X protocol or VNC protocol a

well, the new ones do by virtual of power pc, but the old cisc as/400 never will, they don't have a memory management unit and IBM won't give out the specs for them. YOu write to a virtual machine on the old critters, not the real processor, which is why the reliability and robustness. Really more like an old 8080 in that there's no memory protection or process protection in the real hardware.

Imagine if macs stayed with PPC, and use a cell based version like ibms server, it would have 8x grunt.

And before any one ays SPE cannot run general code, look at the instruction set, its as good if not better than the old68k chips, infact much better as it has lots of cool math simd type instructions. Whats 90% of C/C++ code? lots of IFs andvariable assignments, and structure memcpys. Now if the whole OS would thread to all SPEs on demand it would fly.