The protein-folding problem  one of the major challenges of molecular biology in the 1990s  could be thought of as a version of cryptography. Scientists like Peter Kollman are the code breakers, trying to uncover a set of rules somehow embedded in a protein's sequence of amino acids. This chemical alphabet of 20 characters, strung like beads on a chain along the peptide backbone of a newborn protein, carries a blueprint that specifies the protein's mature folded shape.

Proteins are the action superheroes of the body. As enzymes, they make reactions go a million times faster. As versatile transport vehicles, they carry oxygen and antibodies to fight disease. They do a thousand different jobs, and with no complaint. But before a protein can go to work, it must fold into the right shape.

Within a matter of seconds  or less  after rolling off the protein assembly line (in the cellular ribosome), the stretched-out chain wraps into a bundle, with twists and turns, helices, sheets and other 3D features. For proteins, function follows from form  the grooves and crevices of its complex folds are what allow it to latch onto other molecules and carry out its biological role.

But what are the rules? How is it that a particular sequence of amino acids results in a particular folded shape? "The protein-folding problem is still the single most exciting problem in computational biochemistry," says Kollman, professor of pharmaceutical chemistry at the University of California, San Francisco, and a leader in using a research tool called "molecular dynamics," a method of computational simulation that tracks the minute shifts of a protein's structure over time. "To be able to predict the structure of the protein from just the amino-acid sequence would have tremendous impact in all of biotechnology and drug design."

In 1997, Kollman and his coworkers Yong Duan and Lu Wang used the CRAY T3D at Pittsburgh Supercomputing Center to develop molecular dynamics software that exploits parallel systems like the T3D and CRAY T3E much more effectively than before. Using this improved software on the T3D and, later, on a T3E at Cray Research, the researchers tracked the folding of a small protein in water for a full microsecond, 100 times longer in time than previous simulations. The result is a more complete view of how one protein folds  in effect, a look at what hasn't been seen before, and it offers precious new insight into the folding process.

A snapshot from simulations of the villin headpiece subdomain, by Peter Kollman, Yong Duan and Lu Wang.

Exploiting Massive Parallelism

Simulating a millionth of a second of protein movement may sound less than impressive, until you realize that the longest prior simulations of similar proteins extended only 10 to 20 nanoseconds (billionths of a second). The limitation holding back this critical work has been the tremendous computational demand of the simulations, which must account for interactions between each atom in a protein and all the other atoms and surrounding water molecules. To capture protein movement at a useful level of detail, the full set of these interactions must be recalculated every femtosecond (a millionth of a nanosecond) of protein time. Even with the most advanced systems, it's a daunting, costly computational challenge.

The Kollman team's recent effort focused on a small, 36 amino-acid protein called the villin headpiece sub-domain. With surrounding water molecules, the computation involved about 12,000 atoms. To capture the first 200 nanoseconds of folding took 40 days of dedicated computing using all 256 processors of the T3D. With a 256-processor T3E, four times faster, the next 800 nanoseconds took about another two months.

This big leap in folding simulation was made possible by Duan's ability to exploit the parallelism of 256 processors running simultaneously. Using Pittsburgh's T3D, he devised and tested software-doctoring manipulations to the molecular dynamics part of AMBER (a widely used package for modeling proteins and DNA, developed by Kollman's research group). The changes boosted single-processor performance about 70% and greatly improved the "load balancing" and communication among processors, resulting overall in 256-processor performance six times faster than before.

"The real challenge for molecular dynamics," says Kollman, "is to effectively use massively parallel machines. If you have 256 processors, you have to divide the computation so each processor does the same amount of work, which is difficult because each particle has to keep track of every other." Juggling information back and forth between processors and memory involves a high communication overhead, which has seriously limited the usefulness of parallel processing for these calculations. "Virtually all other molecular dynamics codes in the literature level off at 40, 50 or 60 processors," says Kollman. "In other words, even if you use all the processors, you don't get any faster because the communication among processors is rate limiting."

Duan broke key parts of the task  distribution of updated position coordinates for the atoms and "force collection" for the atom-to-atom interactions  into small pieces that each processor does on an as-needed basis. In earlier versions of AMBER, each processor kept a complete set of coordinates and forces. He also implemented a "spatial decomposition" scheme, breaking the entire protein and water system into rectangular blocks, each block assigned to a processor. This reduces redundant communications and "latency," time required to open communication pathways between processors.

With these changes and others, the software showed significantly improved parallel "scaling"  it now runs 170 times faster on 256 processors than on one alone. "This was a tour-de-force of parallel programming," says Kollman, "and it wouldn't have been possible except for Pittsburgh making the T3D available to us."

"The dedicated T3D allowed me to conduct extensive tests on a variety of plausible schemes," says Duan, who cites training at a PSC parallel programming workshop and discussions with PSC scientist Michael Crowley as also being instrumental to his work on this project.

Radius of Gyration over Time
The vertical axis shows the protein's radius of gyration (in angstroms) as it changes over time. The "quiet period" occurs from about 250 to 400 nanoseconds.

Quiet Time for Protein Folding

What did the researchers learn from viewing a full simulated microsecond of protein folding? A burst of folding in the first 20 nanoseconds quickly collapses the unfolded structure, suggesting that initiation of folding for a small protein can occur within the first 100 nanoseconds. Over the first 200 nanoseconds, the protein moves back and forth between compact states and more unfolded forms. The researchers capture this behavior by plotting the protein's radius of gyration  how much the structure spreads out from its center  as a function of time. "If you look at those curves," notes Kollman, "they're very noisy  the structure is moving, wiggling and jiggling a lot."

The folded structures, often called "molten globules," have 3D features, such as partially formed helices loosely packed together, that bear resemblance to the final folded form. They are only marginally stable, notes Kollman, and unfold again before settling into other folded structures.

This ribbon representation shows the protein's simulated structure (blue) during the marginally stable period (about 300 nanoseconds) in comparison to its native structure (red) as determined from NMR.

The next 800 nanoseconds reveal an intriguing "quiet period" in the folding. From about 250 nanoseconds until 400 nanoseconds the fluctuating movement back and forth between globules and unfolding virtually ceases. "For this period in the later part of the trajectory," says Kollman, "everything becomes quiet. And that's where the structure gets closest to the native state. It's quite happy there for awhile, then it eventually drifts off again for the rest of the period out to a microsecond."

For Kollman, this behavior suggests that folding may be characterized as a searching process. "It's a tantalizing idea  that the mechanism of protein folding is to bounce around until it finds something close, and stay there for a period, but if it isn't good enough it eventually leaves and keeps searching. It might have 10 of these quiet periods before it arrives at a period where enough of the amino-acid sidechains are in a good enough environment that it locks into the final structure."

Although only a partial glimpse  even the fastest proteins need 10 to 100 microseconds to fully fold  these results represent a major step forward in protein-folding simulation. New experimental methods are also providing more detailed looks at the process, offering the possibility of direct comparison between experiment and simulation, which will further advance understanding.

The challenge of the protein-folding problem is to have the ability to predict protein structure more accurately. For the pharmaceutical industry, this holds the prospects of greatly reducing the cost and expense of developing new therapeutic drugs. Recent research, furthermore, suggests that certain diseases, such as "mad cow disease" and possibly Alzheimer's can be understood as malfunctions in protein folding.

Ultimately, for Kollman and others using molecular dynamics, the goal is to follow the entire folding process. With the promise of more powerful computing and higher level parallelism, Kollman sees that goal as within reach. "We're getting some new insights that weren't available before because the simulations weren't in the right time-scale. Being able to visualize the folding process of even a small protein in a realistic environment has been a goal of many researchers. We believe our work marks the beginning of a new era of the active participation of full-scale simulations in helping to understand the mechanism of protein folding."