May 27, 2008

More info about the GPU1 to GPU2 transition

There's been several questions regarding the GPU1 client and why we
decided to shut it down. I hope I can shed some light here at least on why we're doing what we're doing, such that even if people disagree with our decisions, they can at least see where we're coming from.

Some people have asked "why shutdown the client if it's working?" The bottom line here is that the GPU1 results
are no longer scientifically useful. It's pretty clear now that DirectX (DX) is
not sufficiently reliable for scientific calculations. This was not
known before (and some people wouldn't believe this until we proved
it). With the GPU1 results, we can now show what the limitations are
pretty decisively.

GPU1 also did help us a lot in terms of
developing its successor and what's needed to run GPU's in a
distributed computing fashion. The good news here is that GPU2 is behaving very well, on both ATI and NVIDIA hardware, and this is a direct result of what we've learned with GPU1 WU's. In the end, however, GPU1 will not be
able to help us understand protein misfolding, Alzheimer's Disease, etc
due to these unresolvable limitations. We could keep GPU1 live just
crunching away in its current form, but that would be wasting people's
electricity at this point, as we've learned everything we can learn
from those cards, based on what they can do.

In the past, we had a somewhat similar
shutdown situation, i.e. when QMD core projects stopped. In that case, donors were
left hanging since we didn't give any warning for stopping QMD
projects. We did try (perhaps unsuccessfully) to handle the GPU1 situation better than QMD. In
QMD, we stopped needing that core and so we stopped the calculation
without warning, not realizing the impact that would cause. With GPU1, we gave a several month warning (indeed,
note that GPU1 is still actively running), so this is information
in advance to shutting down GPU1). We tried to avoid the QMD situation
by giving advance warning, but it looks like donors would like even
more advance warning. However, there's limits to how much in advance we
know the situation ourselves.

Indeed, the knowledge that it made sense to end GPU1 came reasonably recently to us. We have been working on CAL for a while and it seemed
that CAL might be a solution, but we only knew until we got some
testing "in the wild." DirectX (DX -- what GPU1 is based on) works
much better in the lab than in the wild, and it was possible that CAL
behaved that way too. After seeing that CAL behaved well in the wild,
it became clear that the GPU1 path was obsolete. However, this is a
relatively recent finding and we made the announcement about the
situation relatively shortly thereafter.

It was a tough decision. Some
suggested we just leave GPU1 running, even though people's electricity
really would be going to waste, other than generating points. I didn't
think that was a good idea. We did know it would be a tough PR hit, but
when people talk about the history of FAH, I want to make it clear that
we're here to address AD and other diseases, not just running
calculations for the sake of points and nothing more (which has been
the critique of some other distributed computing projects).

So, what's the right thing to do? I guess it comes to this: would GPU1 donors be
happier if we just keep GPU1 servers running, doing with no scientific
value, but just for points? We could do that, at a cost of taking away personnel
from improving existing clients, keeping existing servers going, etc
for the sake of keeping GPU1 running. However, that's not what FAH is
for and I think it's important that FAH not devolve into a big points
game, losing sight of why we're doing what we're doing.

I am not a GPU cruncher and I am interested in advancing research. If you wanted to take less of a PR hit then slowly or quickly give fewer and fewer points for GPU1 clients until it just doesn't matter. Just a thought. It might just be the best of all worlds! Complainers never win and winners never complain! :)

Anyone who uses the higher performance beta clients knows that they will be superceded at some point - I think this is just a sign of (rapid) progress on the GPU front. I'm sure that most people won't want to crunch WU's just for crunching's sake - but I guess that people who Folded on older hardware (I used to Fold on an X1950XTX) may be a bit lost with not being able to contribute anymore ..... sounds like a great excuse for an upgrade ;-)

I'd agree that part of the problem is probably that certain hardware could run GPU1 but can't run GPU2. If you avoided the hardware obsolescence issue, there might be fewer complaints. That might not be scientifically/computationally feasible, however.

Is there clear scientific criteria for validating the folding result? I am afraid other, so many results from another clients are to be concluded as unreliable. Probably you have already established methods of validating folding results, would you advise me reference? Any links would be fine.

I can answer the question from penguinusaf: How about letting those who have x1900 cards run in gpu2 so the gpu1 can be shut off? Arnt them similar?

ATI provides a software interface to the 2xxx and 3xxx called "CAL" and used for GPU2. CAL doesn't support the x1900 cards. The x1900 cards used the DirectX interface instead for GPU1 and it does not work reliably.

I think F@H learned with GPU1... that they needed a GPU2. The whole concept of getting GPGPU code to run through DirectX (where anything nearly could come along and blow off the DX context) seemed like pushing it all along.

So, when is the nVidia client coming out and what sort of performance can we expect? I got an overclocked Radeon 3850 for the primary purpose of F@H but so far it hasn't really satisfied me for games.. it would be great if nVidia's performance is at least as good as ATI's, but hopefully better. Then it'll make sense to switch to an nVidia GPU that's great for both folding and games..

Good on those for folding with GPU1, It appears you've significantly helped Stanford advance their research so GPU2 could be possible. Without u guys GPU2 wouldn't be where it is and i think thats a great contribution to the project.

You made the correct decision. Encouraging people to waste electricity for bad science is illogical. On the flip side, maybe some of the x1000 series people people would consider purchasing newer graphics cards in order to help all of humanity.