Introduction

The performance of the Folding@home (FAH) software is critical to the success of the Folding@home project. In order to study many of the problems of interest (especially related to protein misfolding and aggregation, such as in Alzheimer’s disease), we need to not just have lots of computers participating, but we need results returned more quickly so that we can simulate trajectories of sufficient length. When FAH first started, we achieved this by running simulations for many months or even years (indeed, our first Alzheimer’s Disease simulations ran for almost two years straight). However, we want to tackle problems that could take even longer, but those projects wouldn’t be practical if we had to wait many years for all the results to come back. This suggests the need to find methods which can perform the simulations even faster.

Since 2006 we have been looking at methods to produce major advances in our capabilities. One of the technologies we pursued was multi-core CPUs in modern computers. SMP means “Symmetric Multi-Processing” and it is a term that generally refers to the situation where a computer has more than one processor core. It’s now very common for computers to contain multiple CPU cores, which allow the computer to process multiple sets of information in parallel. Most computers contain dual- or quad-core CPUs, and the higher-end machines can contain eight, sixteen, or more. Working together, SMP gives us considerably longer trajectories in the same wall clock time, allowing us to turn what used to take years to simulate even on FAH to a few weeks to months. For example, a quad-core CPU can complete Work Units nearly four times faster than a single-core CPU. Initially, it was a challenge to scale the GROMACS core — the highly optimized software that performs the actual protein folding simulations behind the scenes — to fully utilize multiple CPU cores, but our methods are now quite efficient at using them.

This has allowed us to address questions previously considered impossible to tackle computationally, and make even greater impacts on our knowledge of folding and folding related diseases. Our goal is to apply our simulations to further our knowledge of protein folding, misfolding, and related diseases, including Alzheimer’s disease, Huntington’s disease, and certain forms of cancer. By joining together hundreds of thousands of PCs throughout the world, calculations which were previously considered impossible have now become routine. Thanks to your help and these new technologies, Folding@home has remained one of the world’s most powerful computing systems. FAH has targeted the study of protein folding and protein folding disease, and numerous scientific advances have come from the project. We’re very excited about what the multi-core processors has been able to do so far. One of our papers (#53 in our Papers page) would have been impossible without the multi-core processors and represents a landmark calculation in the simulation of protein folding. We’re looking forward to more exciting results like that in the years to come!

How can I run FAH on my multi-core CPU?

The SMP method of folding is now standard in V7 — the latest generation of our software — so your computer can automatically be configured to run SMP work units. Please see the V7 FAQ and the Installation Guides for more information.

What operating systems are supported by SMP?

We support Windows, Mac OS-X, and Linux. Please see the Installation Guides for more specific requirements.

How do you determine points for this platform?

Before releasing any new project (series of Work Units), it is benchmarked on a dedicated machine and assigned a points value (please see the New Points FAQ on specific details). The points for your system are relative to this benchmark machine; a faster system will get proportionately more points. This determines the “base points” awarded at the completion of the Work Unit. This base points system is consistent across all of our platforms.

Points from the Quick Return Bonus (QRB) are often added on top of the base points. A major part of the scientific benefit is dependent on rapid turnaround of Work Units, so we need to have results returned quickly and reliably. Thus, we assign short deadlines for SMP work units, and we’ve implemented a non-linear function into the QRB. This gives an extra award to donors who rapidly return Work Units on a consistent basis. To qualify for this bonus, you need a passkey, complete ten Work Units, and then maintain an 80% successful return rate. See the New Points FAQ and the Passkey FAQ for more information.

How many cores do I need?

FAH can run on any PC CPU but it’s the SMP methodology which allows multiple cores to process a single job cooperatively and more productively. There is no upper limit to the number of CPU cores that can be used for SMP.

We have seen reports of occasional problems during processing if FAH is using an odd number of processors greater than five. One and three are usually fine, but problems sometime occur on seven, nine, eleven, etc. numbers of cores. The problem relates to how Gromacs splits up the work and allocates it to each processor. We are working to resolve this unusual issue, which affects very few users. Changing the number of utilized cores is typically an advanced/expert option.

What about hyperthreading?

Hyperthreading is a technology used on Intel CPUs to improve parallelization of computations. A dual-core CPU with hyperthreading enabled appears to the operating system as a four-core CPU. Each physical core powers two logical cores, so in reality hyperthreading does not actually double the CPU’s performance. Hyperthreading is usually enabled in the BIOS by default, and we recommend that it stays enabled, as the SMP cores can use it to process Work Units faster.

What about using both SMP and GPUs?

GPUs can also be used to process a separate FAH assignment while running SMP (see the GPU FAQs) but some CPU processing is required to support that GPU. If you also process with a GPU from AMD, that GPU support function consumes one CPU core and less is available for SMP processing. You can expect to higher SMP performance by dedicating fewer CPU-cores to SMP to avoid resource conflicts. NVIDIA GPUs do not currently require a full CPU’s resources.

What are bigadv Work Units?

Big Advanced (bigadv or BA) is an experimental type of Folding@home WUs intended for the most powerful CPUs in Folding@home. Our goal is to use bigadv to work on projects that are particularly large (memory utilization, upload/download requirement) and require a large amount of computation. We are all fortunate in that processors get faster over time, so the highest-performing tier of donor machines also gets faster over time. We have a lot of exciting science being enabled by FAH donors, and it takes place at all levels of computational requirement and performance sensitivity.

These units have extremely tight deadlines and require a minimum of sixteen physical CPU cores. Some systems, especially for hyperthreaded CPUs, may not be able to complete the units in time, so this core count is not an absolute requirement. Bigadv Work Units can at times consume approximately 750 MB of RAM per CPU core, so you may need 12 GB of RAM as well. They are also larger WUs and take longer to upload. In return for these requirements we add an additional 20% of points to bigadv Work Units, on top of the Quick Return Bonus awarded to all SMP WUs. We recognize that donors work hard to optimize their setups, but please keep in mind that BA is very much experimental and that future changes not just could happen but most likely will.

Please see the Configuration FAQ for more information on how to get bigadv WUs. The Folding Support Forum is frequented by many hardware experts who may be able to help answer your questions about specific hardware and setups.

Does SMP work on the same Work Units as single-core CPUs?

Typically they work on the same WUs. FahCore A4, our most common SMP core, can use any number of CPU cores to process WUs. FahCore 78, an older core, can only use a single CPU core. Thus it depends on the simulation core being used.

What scientific cores are used by SMP?

Gromacs is the current scientific code used by SMP, and is behind FahCores 78 and A1-A4.

Can I run SMP on a multi-computer cluster?

Unfortunately, no. The communication between each node is often too slow, and we do not support this setup, nor do we distribute Work Units to installations on clusters. Instead, we recommend that you install the FAH software on each node individually and let each computer download, process, and upload WUs separately.

History

SMP1

July 2005 We have two approaches to solving the SMP problem and are experimenting with them in parallel. Both are pretty early stage, but appear to work reasonably well in the lab. The biggest challenge right now appears to be server side.

August 2005 Abhay has been making steady progress here. We are trying two approaches: a pure SMP approach as well as an approach which would also work on computer clusters. There are pros and cons of each and having both programs allows us to take the best version.

September 2005 Abhay has found that we need to change the Gromacs code base used. We’re working on the switch with Prof. Peck’s group.

November 2005 We have had some snags with the code. Prof. Peck will be coming out to Stanford in early 2006 and that should help push this through.

January 2006 We have been talking with the Gromacs developers about a threads-based solution as well, which would have many benefits over an MPI solution for multi-core CPUs/SMP.

July 2006 Discussions with Gromacs developers suggest that their code development is going well, but a bit delayed. The good news is that the delay is due to added functionality, so when it does ship, we should be in good shape.

October 2006 We have had some good success with a new direction for the SMP client. We are optimistic that this will be reasonable to release. Once the GPU client is a little further along, we will put more attention into this direction.

November 2006 The SMP client is now looking good enough that we are starting a more broad beta test outside of Stanford. If that looks good, we will move to a completely open beta test of this new client. The SMP client supports OSX/Intel natively (which means a major points boost for OSX donors) as well as 64-bit linux (with 32-bit linux hopefully to come soon). Windows support will come much later, as this is a very different architecture for porting than OSX & Linux.

March 2007 We have released a version of the SMP client for 32-bit Windows. There are some unique quirks due to the nature of running MPI on Windows, but it’s a natural choice for donors who need or want to run Windows.

April 2008 We have continued to update our SMP core, most notably including our A2 core, which has much better scalability.

Why did SMP1 use MPI instead of threads?

Originally, both Gromacs and AMBER could only utilize multi-core processors by using MPI, but that made installation of SMP very onerous on Windows systems. At that time, none of our simulation engines were designed to safely use threads. We worked with the Gromacs developers to resolve this, but it took time to convert Gromacs from MPI to threads. Until that complex task was completed, MPI was our only option.

SMP2

SMP2, introduced in early 2010, uses a purely thread-based system, which fixed a lot of the problems associated with SMP1. MPI is no longer required, and we deprecated the original MPI-based v5 SMP client in favor of SMP2 since it is so much easier for donors to use. SMP2 also brings a new points scheme, which is summarized in the new points FAQ.

SMP2 is now the basis for our current SMP methods, which are commonly used by default in the V7 client software.