The GPU3 open beta test for NVIDIA GPUs is going well. There have been a few issues uncovered and our team is working on it. However, there haven't been any major show stoppers so far, which is good news.

Here's what lies ahead. We will continue QA in an open beta format for a little while longer until we can resolve the remaining most significant issues. Then, the new client will replace the existing GPU client. However, the science/WS switch over will take much longer. In general for FAH, science calculations that were started with one core (eg core11) will need to be completed with it. New NVIDIA GPU projects will start up with core15 (although there may be a few projects already in the pipeline that will use still core11), but this switchover to using only core15 could take a while, easily 3 to 6 months, or maybe longer depending on how long we need to complete the existing core11 projects.

Since Fermi boards must run core15, we are prioritizing core15 assigns to Fermi (with only a few core15 projects, we would run out of Fermi work unless we do this). So, donors without Fermi cards will likely see mostly core11 WUs short term. This will change as more core15 project come on line in the coming months.

However, there is still a major benefit for GPU donors to run the new client. As we switch over, more and more WUs will be running in core15, so running the old client would eventually lead to WU shortages, etc. There is no need to switch over immediately as there will be plenty of core11 WUs for quite a while (eg on the weeks to months timescale).

We are also making a major push for OpenCL on ATI and NVIDIA. We are working closely with NVIDIA and ATI on this and together we are making progress, although this new core does still seem to be some time out.

We'll look into it. For now, the updates may be a bit sporadic until we see what's up. Also, it's the weekend and the sysadmins are not in the office, so I don't want to do anything too ambitious in case we need a server restart, etc.

Note that 24x7 support requires about $1.5M or so every year, since 24x7 support means 3 people for the job normally done by one, the late shifts costs more money, and we have at least 2 FTE spots to fill (one for sysadmin and one for science/FAH infrastructure).

A redundant failure over system would cost significantly less than that.

Unfortunately, it's not that simple. This issue would not have been fixed by a fail over system (we have one). Please see my post above about "easy fixes" and consider that we have considered many such solutions, implemented the ones that make sense.

Finally, please also consider this is *just* for the stats system. The science/WS have been chugging along just fine.

Sorry for the delay on this one. But how about we move on -- it's working again and the 3rd party stats should be updating soon! (and the PS3 servers do seem to be up as well)

This is specific to bigadv WUs. Earlier, there was a release of a reactivated Project 2682, however, there were some memory problems and now it has been stopped. Some modifications later, it is re-released as Project 2692:

Quote:

Originally Posted by kasson

We've reformulated some proj2682 work units to use the newer, more compact representation. Memory requirements on these should be similar to 2684, etc. These are now being released in an open test as project 2692.

We've just released MemtestCL, a dedicated memory tester for OpenCL-capable GPUs in the same vein as MemtestG80 for CUDA-capable GPUs. In particular, this means that owners of ATI OpenCL-capable GPUs (the Radeon 4000 and up) can test their GPU memory as well. Binaries for Windows and 64-bit Linux are available, as is (LGPL-licensed) source code for those of you interested in doing additional development work (for example, GUI frontends).

Note that you must have an OpenCL-capable driver and runtime installed on your machine for this to work. For Nvidia, this means a 195 driver or newer; for ATI, you need the Cat 9.12 or newer video drivers, AS WELL AS installing the ATI Stream SDK (http://developer.amd.com/gpu/atistreamsdk/) - for some reason the OpenCL runtime does not come with the video driver. Incidentally, by installing the Stream SDK, you can use MemtestCL to test your CPU memory too.

MemtestG80 is a software-based tester to test for "soft errors" in GPU memory or logic for NVIDIA CUDA-enabled GPUs. It uses a variety of proven test patterns (some custom and some based on Memtest86) to verify the correct operation of GPU memory and logic. It is a useful tool to ensure that given GPUs do not produce "silent errors" which may corrupt the results of a computation without triggering an overt error.

Basically, the idea is that we wanted to put out a code to test GPU memory that's roughly equivalent to Memtest on CPUs. If you run FAH heavily on a GPU, it's a good idea to check out your GPU memory, just as one would run tests on CPU memory.

We are a bit tight on SMP servers at the moment, as we are waiting for a new big server to come back on line. Our admins have been working on this problem and think they have found the issue with the raid on that machine (after consulting the hardware manufacturer, they're upgrading the firmware of the drives). We don't have an ETA on this, but I'm hoping it will be relatively soon.

After the new machine comes on line, we should have a lot more SMP power. Moreover, we have other servers waiting in the wings and our plan is to bring several servers on line for SMP in the coming weeks with new WUs and SMP projects (although note that it takes a while for new projects to go through our QA process).

All of this is in anticipation of the v7 client maturing and going into open beta in a few months. V7 should make it a lot easier to run SMP. However, it's also important to note that the v6.30 client (already released and on our high performance client download page) already makes SMP pretty easy to run.

Also Server problems with SMP Projects will hopefully come to an end soon:

Quote:

Originally Posted by kasson

We've brought some projects online on a different SMP server. Essentially the problem is that some of the servers are low on jobs and another server has been up and down for RAID maintenance, so this one got overloaded. The server's been up throughout--the transaction threads just get full. We're working on some additional load-balancing.

There isn't any way to force the download of a particular Project. It simply means that initially there was only X number of Servers for SMP2 WUs but they were getting hit pretty hard due to the sheer number of SMP Clients that people used. This caused the load to be so high on the Server that it was refusing further connections. (To be specific, the Net Load was 200, which is pretty high. More details in this thread) Now there is X+1 SMP Server. The following steps are being taken:

1) To combat this situation, they have brought a new SMP Server online to help ease the load and hopefully in the future, more servers will be for SMP WUs.

2) A new project was released so that the SMP Clients can continue to work

What I meant by downloading was that if there isn't any Project X WUs on the Server, you will hopefully be redirected to another Server with Project Y WUs which you can download and fold.

There will be new projects 10641-10692 for the open beta release of the GPU3 core. These beta tests are to evaluate the performance of a new core (openmm-gromacs) on gpu with Generalized Born (GB) model used as implicit solvent. Different force fields and different inner dielectric constants are used for this set of simulations.

HOWEVER, there seems to be a PPD problem with these new Projects. Right now, they are on hold until further notice. This is the official response:

Quote:

Originally Posted by yslin

Dear GPU3 OpenMM donors,

I've received reports on PPD drops of the new projects 10641-10692 from the previous 10628-10633. The new projects have been benchmarked using the same machine but it seems different cards have different scaling factors for system size (p10628-10633 have 582 atoms while p10641-10692 have only 264 atoms and in general should run faster). I have put the new projects 10641-10692 on hold and please report your performance (ns/day) or PPD here. Thank you so much!

We have two papers (one that just came out in PNAS and one that's about to come out in Physical Review Letters -- papers #74 and #75 on our papers list) that we're particularly excited about. They represent some key results that we've learned by examining multiple results from Folding@home. The resulting picture of how proteins fold is fairly different from the prevailing view, so it will be interesting to see what experiments tell us about the specific predictions made within. We're excited to see where this goes!

We've been making steady progress in our complete rewrite of our code for OpenCL. However, it's not just about porting code, but often redesigning algorithms to run efficiently on ATI hardware. That's the challenging part.

New GPU3 projects 10927-10978 and 11214-11265 are entering the advanced stage of testing (restricted to Fermi boards).

Projects 10927-10978 are the Protein-G peptide simulations and projects 11214-11265 are the Fs peptide simulations. These are to evaluate the performance of a new core (openmm-gromacs) on Fermi boards with Generalized Born (GB) model used as implicit solvent. Different force fields and different inner dielectric constants are used for this set of simulations.