Papers

Recent Results and Research Papers from Folding@home

INTRODUCTION

Here are our peer-reviewed results from Folding@home. For summaries of these methods and papers, as well as the scientific background behind Folding@home, please see the Folding@home article on Wikipedia.

For all of the papers from the Pande Lab (not just those from Folding@home), please see our group papers page. Note that it can take quite a while to go from a result to a published peer review article (often as much as a year). These papers represent our progress to date that’s publicly available, with lots more on the way.

The distribution rules for published papers vary by the publication in which the paper appears. Due to these rules, a public web-source of each paper may not be immediately available. If full version is not linked below or available elsewhere on the Internet (Google Scholar can be helpful for this), most, if not all of these publications are freely available at a local municipal or collegial library. Note these articles are written for fellow scientists, so the contents are fairly technical.

Biophys J. 2015 Oct 20;109(8):1528-32. doi: 10.1016/j.bpj.2015.08.015.
McGibbon RT, Beauchamp KA, Harrigan MP, Klein C, Swails JM, Hernández CX, Schwantes CR, Wang LP, Lane TJ, Pande VS
As molecular dynamics (MD) simulations continue to evolve into powerful computational tools for studying complex biomolecular systems, the necessity of flexible and easy-to-use software tools for the analysis of these simulations is growing. We have developed MDTraj, a modern, lightweight, and fast software package for analyzing MD simulations. MDTraj reads and writes trajectory data in a wide variety of commonly used formats. It provides a large number of trajectory analysis capabilities including minimal root-mean-square-deviation calculations, secondary structure assignment, and the extraction of common order parameters. The package has a strong focus on interoperability with the wider scientific Python ecosystem, bridging the gap between MD data and the rapidly growing collection of industry-standard statistical analysis and visualization tools in Python. MDTraj is a powerful and user-friendly software package that simplifies the analysis of MD data and connects these datasets with the modern interactive data science software ecosystem in Python.

Proc Natl Acad Sci U S A. 2015 Aug 18;112(33):10377-82. doi: 10.1073/pnas.1501804112. Epub 2015 Aug 3.
Weber JK, Shukla D, Pande VS
Life is fundamentally a nonequilibrium phenomenon. At the expense of dissipated energy, living things perform irreversible processes that allow them to propagate and reproduce. Within cells, evolution has designed nanoscale machines to do meaningful work with energy harnessed from a continuous flux of heat and particles. As dictated by the Second Law of Thermodynamics and its fluctuation theorem corollaries, irreversibility in nonequilibrium processes can be quantified in terms of how much entropy such dynamics produce. In this work, we seek to address a fundamental question linking biology and nonequilibrium physics: can the evolved dissipative pathways that facilitate biomolecular function be identified by their extent of entropy production in general relaxation processes? We here synthesize massive molecular dynamics simulations, Markov state models (MSMs), and nonequilibrium statistical mechanical theory to probe dissipation in two key classes of signaling proteins: kinases and G-protein-coupled receptors (GPCRs). Applying machinery from large deviation theory, we use MSMs constructed from protein simulations to generate dynamics conforming to positive levels of entropy production. We note the emergence of an array of peaks in the dynamical response (transient analogs of phase transitions) that draw the proteins between distinct levels of dissipation, and we see that the binding of ATP and agonist molecules modifies the observed dissipative landscapes. Overall, we find that dissipation is tightly coupled to activation in these signaling systems: dominant entropy-producing trajectories become localized near important barriers along known biological activation pathways. We go on to classify an array of equilibrium and nonequilibrium molecular switches that harmonize to promote functional dynamics.

J Chem Phys. 2015 Jul 21;143(3):034109. doi: 10.1063/1.4926516.
McGibbon RT, Pande VS.
Continuous-time Markov processes over finite state-spaces are widely used to model dynamical processes in many fields of natural and social science. Here, we introduce a maximum likelihood estimator for constructing such models from data observed at a finite time interval. This estimator is dramatically more efficient than prior approaches, enables the calculation of deterministic confidence intervals in all model parameters, and can easily enforce important physical constraints on the models such as detailed balance. We demonstrate and discuss the advantages of these models over existing discrete-time Markov models for the analysis of molecular dynamics simulations.

We report the development of a united AMOEBA (uAMOEBA) polarizable water model, which is computationally 3-5 times more efficient than the three-site AMOEBA03 model in molecular dynamics simulations while providing comparable accuracy for gas-phase and liquid properties. In this coarse-grained polarizable water model, both electrostatic (permanent and induced) and van der Waals representations have been reduced to a single site located at the oxygen atom. The permanent charge distribution is described via the molecular dipole and quadrupole moments and the many-body polarization via an isotropic molecular polarizability, all located at the oxygen center. Similarly, a single van der Waals interaction site is used for each water molecule. Hydrogen atoms are retained only for the purpose of defining local frames for the molecular multipole moments and intramolecular vibrational modes. The parameters have been derived based on a combination of ab initio quantum mechanical and experimental data set containing gas-phase cluster structures and energies, and liquid thermodynamic properties. For validation, additional properties including dimer interaction energy, liquid structures, self-diffusion coefficient, and shear viscosity have been evaluated. The results demonstrate good transferability from the gas to the liquid phase over a wide range of temperatures, and from nonpolar to polar environments, due to the presence of molecular polarizability. The water coordination, hydrogen-bonding structure, and dynamic properties given by uAMOEBA are similar to those derived from the all-atom AMOEBA03 model and experiments. Thus, the current model is an accurate and efficient alternative for modeling water.

Comput Sci Eng. 2015 Jul 1;12(4):34-39.
Eastman P, Pande VS

The wide diversity of computer architectures today requires a new approach to software development. OpenMM is a framework for molecular mechanics simulations, allowing a single program to run efficiently on a variety of hardware platforms.

Recent successes in simulating protein structure and folding dynamics have demonstrated the power of molecular dynamics to predict the long timescale behaviour of proteins. Here, we extend and improve these methods to predict molecular switches that characterize conformational change pathways between the active and inactive state of nitrogen regulatory protein C (NtrC). By employing unbiased Markov state model-based molecular dynamics simulations, we construct a dynamic picture of the activation pathways of this key bacterial signalling protein that is consistent with experimental observations and predicts new mutants that could be used for validation of the mechanism. Moreover, these results suggest a novel mechanistic paradigm for conformational switching.

G-protein-coupled receptors (GPCRs) are a versatile family of membrane-bound signaling proteins. Despite the recent successes in obtaining crystal structures of GPCRs, much needs to be learned about the conformational changes associated with their activation. Furthermore, the mechanism by which ligands modulate the activation of GPCRs has remained elusive. Molecular simulations provide a way of obtaining detailed an atomistic description of GPCR activation dynamics. However, simulating GPCR activation is challenging due to the long timescales involved and the associated challenge of gaining insights from the “Big” simulation datasets. Here, we demonstrate how cloud-computing approaches have been used to tackle these challenges and obtain insights into the activation mechanism of GPCRs. In particular, we review the use of Markov state model (MSM)-based sampling algorithms for sampling milliseconds of dynamics of a major drug target, the G-protein-coupled receptor β2-AR. MSMs of agonist and inverse agonist-bound β2-AR reveal multiple activation pathways and how ligands function via modulation of the ensemble of activation pathways. We target this ensemble of conformations with computer-aided drug design approaches, with the goal of designing drugs that interact more closely with diverse receptor states, for overall increased efficacy and specificity. We conclude by discussing how cloud-based approaches present a powerful and broadly available tool for studying the complex biological systems routinely.

Markov state models are a widely used method for approximating the eigenspectrum of the molecular dynamics propagator, yielding insight into the long-timescale statistical kinetics and slow dynamical modes of biomolecular systems. However, the lack of a unified theoretical framework for choosing between alternative models has hampered progress, especially for non-experts applying these methods to novel biological systems. Here, we consider cross-validation with a new objective function for estimators of these slow dynamical modes, a generalized matrix Rayleigh quotient (GMRQ), which measures the ability of a rank-m projection operator to capture the slow subspace of the system. It is shown that a variational theorem bounds the GMRQ from above by the sum of the first m eigenvalues of the system’s propagator, but that this bound can be violated when the requisite matrix elements are estimated subject to statistical uncertainty. This overfitting can be detected and avoided through cross-validation. These result make it possible to construct Markov state models for protein dynamics in a way that appropriately captures the tradeoff between systematic and statistical errors.

A set of improved parameters for the AMOEBA polarizable atomic multipole water model is developed. An automated procedure, ForceBalance, is used to adjust model parameters to enforce agreement with ab initio-derived results for water clusters and experimental data for a variety of liquid phase properties across a broad temperature range. The values reported here for the new AMOEBA14 water model represent a substantial improvement over the previous AMOEBA03 model. The AMOEBA14 model accurately predicts the temperature of maximum density and qualitatively matches the experimental density curve across temperatures from 249 to 373 K. Excellent agreement is observed for the AMOEBA14 model in comparison to experimental properties as a function of temperature, including the second virial coefficient, enthalpy of vaporization, isothermal compressibility, thermal expansion coefficient, and dielectric constant. The viscosity, self-diffusion constant, and surface tension are also well reproduced. In comparison to high-level ab initio results for clusters of 2-20 water molecules, the AMOEBA14 model yields results similar to AMOEBA03 and the direct polarization iAMOEBA models. With advances in computing power, calibration data, and optimization techniques, we recommend the use of the AMOEBA14 water model for future studies employing a polarizable water model.

Protein function is inextricably linked to protein dynamics. As we move from a static structural picture to a dynamic ensemble view of protein structure and function, novel computational paradigms are required for observing and understanding conformational dynamics of proteins and its functional implications. In principle, molecular dynamics simulations can provide the time evolution of atomistic models of proteins, but the long time scales associated with functional dynamics make it difficult to observe rare dynamical transitions. The issue of extracting essential functional components of protein dynamics from noisy simulation data presents another set of challenges in obtaining an unbiased understanding of protein motions. Therefore, a methodology that provides a statistical framework for efficient sampling and a human-readable view of the key aspects of functional dynamics from data analysis is required. The Markov state model (MSM), which has recently become popular worldwide for studying protein dynamics, is an example of such a framework.

Lawrenz M1, Shukla D2, Pande VS. Sci Rep. 2015; 5: 7918.

We describe an innovative protocol for ab initio prediction of ligand crystallographic binding poses and highly effective analysis of large datasets generated for protein-ligand dynamics. We include a procedure for setup and performance of distributed molecular dynamics simulations on cloud computing architectures, a model for efficient analysis of simulation data, and a metric for evaluation of model convergence. We give accurate binding pose predictions for five ligands ranging in affinity from 7 nM to > 200 μM for the immunophilin protein FKBP12, for expedited results in cases where experimental structures are difficult to produce. Our approach goes beyond single, low energy ligand poses to give quantitative kinetic information that can inform protein engineering and ligand design.

D. Shukla, Y. L. Meng, B. Roux, and V. S. Pande. Nat Commun 5 (2014)

Unregulated activation of Src kinases leads to aberrant signalling, uncontrolled growth and differentiation of cancerous cells. Reaching a complete mechanistic understanding of large-scale conformational transformations underlying the activation of kinases could greatly help in the development of therapeutic drugs for the treatment of these pathologies. In principle, the nature of conformational transition could be modelled in silico via atomistic molecular dynamics simulations, although this is very challenging because of the long activation timescales. Here we employ a computational paradigm that couples transition pathway techniques and Markov state model-based massively distributed simulations for mapping the conformational landscape of c-src tyrosine kinase. The computations provide the thermodynamics and kinetics of kinase activation for the first time, and help identify key structural intermediates. Furthermore, the presence of a novel allosteric site in an intermediate state of c-src that could be potentially used for drug design is predicted.

Markov state models provide a powerful framework for the analysis of biomolecular conformation dynamics in terms of their metastable states and transition rates. These models provide both a quantitative and comprehensible description of the long-time scale dynamics of large molecular dynamics with a Master equation and have been successfully used to study protein folding, protein conformational change, and protein-ligand binding. However, to achieve satisfactory performance, existing methodologies often require expert intervention when defining the model’s discrete state space. While standard model selection methodologies focus on the minimization of systematic bias and disregard statistical error, we show that by consideration of the states’ conditional distribution over conformations, both sources of error can be balanced evenhandedly. Application of techniques that consider both systematic bias and statistical error on two 100 μs molecular dynamics trajectories of the Fip35 WW domain shows agreement with existing techniques based on self-consistency of the model’s relaxation time scales with more suitable results in regimes in which those time scale-based techniques encourage overfitting. By removing the need for expert tuning, these methods should reduce modeling bias and lower the barriers to entry in Markov state model construction.

Developing an understanding of protein misfolding processes presents a crucial challenge for unlocking the mysteries of human disease. In this article, we present our observations of β-sheet-rich misfolded states on a number of protein dynamical landscapes investigated through molecular dynamics simulation and Markov state models. We employ a nonequilibrium statistical mechanical theory to identify the glassy states in a protein’s dynamics, and we discuss the nonnative, β-sheet-rich states that play a distinct role in the slowest dynamics within seven protein folding systems. We highlight the fundamental similarity between these states and the amyloid structures responsible for many neurodegenerative diseases, and we discuss potential consequences for mechanisms of protein aggregation and intermolecular amyloid formation.

The folding mechanism of the N-terminal domain of ribosomal protein L9 (NTL91–39) is studied using temperature-jump (T-jump) amide I′ two-dimensional infrared (2D IR) spectroscopy in combination with spectral simulations based on a Markov state model (MSM) built from millisecond-long molecular dynamics trajectories. The results provide evidence for a compact well-structured folded state and a heterogeneous fast-exchanging denatured state ensemble exhibiting residual secondary structure. The folding rate of 26.4 μs−1 (at 80°C), extracted from the T-jump response of NTL91–39, compares favorably with the 18 μs−1 obtained from the MSM. Structural decomposition of the MSM and analysis along the folding coordinate indicates that helix-formation nucleates the global folding. Simulated difference spectra, corresponding to the global folding transition of the MSM, are in qualitative agreement with measured T-jump 2D IR spectra. The experiments demonstrate the use of T-jump 2D IR spectroscopy as a valuable tool for studying protein folding, with direct connections to simulations. The results suggest that in addition to predicting the correct native structure and folding time constant, molecular dynamics simulations carried out with modern force fields provide an accurate description of folding mechanisms in small proteins.

We applied methods developed and honed on Folding@home to Google Exacycle (Google’s massively parallel cloud resource — a lot like running Folding@home behind their firewall). The resources that Google donated to PG/Folding@home was pretty massive, allowing us to tackle a really significant and challenging problem in biology and drug design.

Specifically, we were able to study the protein dynamics of B2AR, a G-Protein Coupled Receptor (GPCR) important in many medical and biological processes (especially asthma and heart disease). What’s even more exciting to us is that this opens the door for FAH and our methods to be much more broadly applied. In fact, this is just the first of several papers in the pipeline using FAH and FAH-like methods to tackle challenging biomedical problems.

Building Markov State Models (MSMs) is at the heart of how FAH can take advantage of hundreds of thousands of processors in a very efficient way. This paper describes a new scheme for how to more accurately build MSMs, especially for areas in conformational change, relevant for cancer and other areas in drug design.

Building Markov State Models (MSMs) is at the heart of how FAH can take advantage of hundreds of thousands of processors in a very efficient way. This paper describes a new scheme for how to more accurately build MSMs, especially when we have to take water into account. Structured water in the binding site of proteins are believed to be very relevant for drug design applications and we can now naturally integrate them into our MSM models.

A third paper in improved MSM designs, this paper shows how modern MSMs approaches can now pick up delicate effects that could not be seen before by previous MSM approaches.

J. K. Weber, R. Jack, and V. S. Pande. JACS, in press (2013)

With all of our work in protein folding, we can step back and ask questions of what’s common, especially for protein misfolding. This paper is a start in this direction, using a new method to probe our models. This is just the tip of the iceberg as in later papers (under review), we will show how this approach can give insight into protein misfolding, giving new hypotheses for how protein misfolding (relevant in many diseases, such as Alzheimer’s and Parkinson’s) could arise.

The folding times accessible by simulation have increased exponentially over the past decade. Shown are all protein folding simulations conducted using unbiased, all-atom MD in empirical force-fields reported in the literature. Some folding times for the same protein differ, due to various mutations. FAH results are in blue, results from Shaw’s Anton supercomputer are in red.

SUMMARY.
This a review of protein folding achievement from Folding@home and other researchers. Our findings demonstrate that Folding@home is capable of simulating large, complex, and slow-folding proteins, beyond the capabilities of other systems, including the specialized hardware in the supercomputer from David Shaw’s DESRES group.

ABSTRACT.
Quantitatively accurate all-atom molecular dynamics (MD) simulations of protein folding have long been considered a holy grail of computational biology. Due to the large system sizes and long timescales involved, such a pursuit was for many years computationally intractable. Further, sufficiently accurate forcefields needed to be developed in order to realistically model folding. This decade, however, saw the first reports of folding simulations describing kinetics on the order of milliseconds, placing many proteins firmly within reach of these methods. Progress in sampling and forcefield accuracy, however, presents a new challenge: how to turn huge MD datasets into scientific understanding. Here, we review recent progress in MD simulation techniques and show how the vast datasets generated by such techniques present new challenges for analysis. We critically discuss the state of the art, including reaction coordinate and Markov state model (MSM) methods, and provide a perspective for the future.

SUMMARY.
This paper discusses OpenMM 4, powerful and adaptable library for molecular dynamics, is one of the key components behind our current GPU FahCores. A more recent release, OpenMM 5, powers the upcoming FahCore 17 (Zeta). We are looking forward to OpenMM 5.1, which should be a big release from the user’s perspective and offers a lot of exciting scientific features.

ABSTRACT.
OpenMM is a software toolkit for performing molecular simulations on a range of high performance computing architectures. It is based on a layered architecture: the lower layers function as a reusable library that can be invoked by any application, while the upper layers form a complete environment for running molecular simulations. The library API hides all hardware-specific dependencies and optimizations from the users and developers of simulation programs: they can be run without modification on any hardware on which the API has been implemented. The current implementations of OpenMM include support for graphics processing units using the OpenCL and CUDA frameworks. In addition, OpenMM was designed to be extensible, so new hardware architectures can be accommodated and new functionality (e.g., energy terms and integrators) can be easily added.

ABSTRACT.
Markov state models (MSMs) for the study of biomolecule folding simulations have emerged as a powerful tool for computational study of folding dynamics. MSMExplorer is a visualization application purpose-built to visualize these MSMs with an aim to increase the efficacy and reach of MSM science.

SUMMARY.
In this paper we have initiated an study to build Markov state models for molecular dynamical systems with solvent degrees of freedom. The methods we described should also be broadly applicable to a wide range of biomolecular simulation analyses.

ABSTRACT.
Markov state models have been widely used to study conformational changes of biological macromolecules. These models are built from short timescale simulations and then propagated to extract long timescale dynamics. However, the solvent information in molecular simulations are often ignored in current methods, because of the large number of solvent molecules in a system and the indistinguishability of solvent molecules upon their exchange.

We present a solvent signature that compactly summarizes the solvent distribution in the high-dimensional data, and then define a distance metric between different configurations using this signature. We next incorporate the solvent information into the construction of Markov state models and present a fast geometric clustering algorithm which combines both the solute-based and solvent-based distances.

SUMMARY.
This details some new results from Folding@home on Alzheimer’s Disease.

ABSTRACT.
Amyloid beta (Aβ) peptide plays an important role in Alzheimer’s disease. A number of mutations in the Aβ sequence lead to familial Alzheimer’s disease, congophilic amyloid angiopathy, or hereditary cerebral hemorrhage with amyloid. Using molecular dynamics simulations of ∼200 μs for each system, we characterize and contrast the consequences of four pathogenic mutations (Italian, Dutch, Arctic, and Iowa) for the structural ensemble of the Aβ monomer. The four familial mutations are found to have distinct consequences for the monomer structure.

SUMMARY.
This is a continuation of paper #100, “Exploiting a natural conformational switch to engineer an interleukin-2 ‘superkine’”. Our results provide new insights for the development of new specific therapeutics based on interleukin.

ABSTRACT.
Interleukin 15 (IL-15) and IL-2 have distinct immunological functions even though both signal through the receptor subunit IL-2Rβ and the common γ-chain (γ(c)). Here we found that in the structure of the IL-15-IL-15Rα-IL-2Rβ-γ(c) quaternary complex, IL-15 binds to IL-2Rβ and γ(c) in a heterodimer nearly indistinguishable from that of the IL-2-IL-2Rα-IL-2Rβ-γ(c) complex, despite their different receptor-binding chemistries. IL-15Rα substantially increased the affinity of IL-15 for IL-2Rβ, and this allostery was required for IL-15 trans signaling. Consistent with their identical IL-2Rβ-γ(c) dimer geometries, IL-2 and IL-15 showed similar signaling properties in lymphocytes, with any differences resulting from disparate receptor affinities. Thus, IL-15 and IL-2 induced similar signals, and the cytokine specificity of IL-2Rα versus IL-15Rα determined cellular responsiveness. Our results provide new insights for the development of specific immunotherapeutics based on IL-15 or IL-2.

SUMMARY.
Here we use molecular dynamics and chemical informatics methods to better understand sodium channels. This is important as it has applications to developing pain medicines, whose functionality relies on interacting with these channels.

ABSTRACT.
Human nociceptive voltage-gated sodium channel (Na(v)1.7), a target of significant interest for the development of antinociceptive agents, is blocked by low nanomolar concentrations of (-)-tetrodotoxin(TTX) but not (+)-saxitoxin (STX) and (+)-gonyautoxin-III (GTX-III). These findings question the long-accepted view that the 1.7 isoform is both tetrodotoxin- and saxitoxin-sensitive and identify the outer pore region of the channel as a possible target for the design of Na(v)1.7-selective inhibitors. Single- and double-point amino acid mutagenesis studies along with whole-cell electrophysiology recordings establish two domain III residues (T1398 and I1399), which occur as methionine and aspartate in other Na(v) isoforms, as critical determinants of STX and gonyautoxin-III binding affinity. An advanced homology model of the Na(v) pore region is used to provide a structural rationalization for these surprising results.

SUMMARY.
Markov State Models (MSMs) have been very helpful in analyzing protein folding. Here they yield new insight into simplified models of protein folding.

ABSTRACT.
Markov state models constructed from molecular dynamics simulations have recently shown success at modeling protein folding kinetics. Here we introduce two methods, flux PCCA+ (FPCCA+) and sliding constraint rate estimation (SCRE), that allow accurate rate models from protein folding simulations. We apply these techniques to fourteen massive simulation datasets generated by Anton and Folding@home. Our protocol quantitatively identifies the suitability of describing each system using two-state kinetics and predicts experimentally detectable deviations from two-state behavior. An analysis of the villin headpiece and FiP35 WW domain detects multiple native substates that are consistent with experimental data. Applying the same protocol to GTT, NTL9, and protein G suggests that some beta containing proteins can form long-lived native-like states with small register shifts. Even the simplest protein systems show folding and functional dynamics involving three or more states.

SUMMARY.
This is a major, major paper on protein folding. Acyl-CoA is the current benchmark for the slowest folding protein (approximetely 10 milliseconds) and the most complex (contains 86 amino acids). We worked closely with experimentalists to test and connect our work to their experiments.

ABSTRACT.
Protein folding is a fundamental process in biology, key to understanding many human diseases. Experimentally, proteins often appear to fold via simple two- or three-state mechanisms involving mainly native-state interactions, yet recent network models built from atomistic simulations of small proteins suggest the existence of many possible metastable states and folding pathways. We reconcile these two pictures in a combined experimental and simulation study of acyl-coenzyme A binding protein (ACBP), a two-state folder (folding time ~10 ms) exhibiting residual unfolded-state structure, and a putative early folding intermediate. Using single-molecule FRET in conjunction with side-chain mutagenesis, we first demonstrate that the denatured state of ACBP at near-zero denaturant is unusually compact and enriched in long-range structure that can be perturbed by discrete hydrophobic core mutations. We then employ ultrafast laminar-flow mixing experiments to study the folding kinetics of ACBP on the microsecond time scale. These studies, along with Trp-Cys quenching measurements of unfolded-state dynamics, suggest that unfolded-state structure forms on a surprisingly slow (~100 μs) time scale, and that sequence mutations strikingly perturb both time-resolved and equilibrium smFRET measurements in a similar way. A Markov state model (MSM) of the ACBP folding reaction, constructed from over 30 ms of molecular dynamics trajectory data, predicts a complex network of metastable stables, residual unfolded-state structure, and kinetics consistent with experiment but no well-defined intermediate preceding the main folding barrier. Taken together, these experimental and simulation results suggest that the previously characterized fast kinetic phase is not due to formation of a barrier-limited intermediate but rather to a more heterogeneous and slow acquisition of unfolded-state structure.

SUMMARY.
It has long been known that a protein called IL-2 can help stimulate an immune response to fight AIDS or cancer, so in theory giving people with diseases like immune deficiencies IL-2 could be tremendously helpful. In practice, however, giving them IL-2 often leads to severe heart and lung problems. To find a better solution, collaborators at Stanford designed a variant of IL-2 that can stimulate an immune response without causing any side effects. However, they couldn’t understand how it worked because the two proteins had almost identical structures! Using Folding@home, we showed that IL-2 is a relatively floppy protein while our collaborators’ variant is locked into a structure that is poised to stimulate an immune response.

ABSTRACT.
The immunostimulatory cytokine interleukin-2 (IL-2) is a growth factor for a wide range of leukocytes, including T cells and natural killer (NK) cells. Considerable effort has been invested in using IL-2 as a therapeutic agent for a variety of immune disorders ranging from AIDS to cancer. However, adverse effects have limited its use in the clinic. On activated T cells, IL-2 signals through a quaternary ‘high affinity’ receptor complex consisting of IL-2, IL-2Rα (termed CD25), IL-2Rβ and IL-2Rγ. Naive T cells express only a low density of IL-2Rβ and IL-2Rγ, and are therefore relatively insensitive to IL-2, but acquire sensitivity after CD25 expression, which captures the cytokine and presents it to IL-2Rβ and IL-2Rγ. Here, using in vitro evolution, we eliminated the functional requirement of IL-2 for CD25 expression by engineering an IL-2 ‘superkine’ (also called super-2) with increased binding affinity for IL-2Rβ. Crystal structures of the IL-2 superkine in free and receptor-bound forms showed that the evolved mutations are principally in the core of the cytokine, and molecular dynamics simulations indicated that the evolved mutations stabilized IL-2, reducing the flexibility of a helix in the IL-2Rβ binding site, into an optimized receptor-binding conformation resembling that when bound to CD25. The evolved mutations in the IL-2 superkine recapitulated the functional role of CD25 by eliciting potent phosphorylation of STAT5 and vigorous proliferation of T cells irrespective of CD25 expression. Compared to IL-2, the IL-2 superkine induced superior expansion of cytotoxic T cells, leading to improved antitumour responses in vivo, and elicited proportionally less expansion of T regulatory cells and reduced pulmonary oedema. Collectively, we show that in vitro evolution has mimicked the functional role of CD25 in enhancing IL-2 potency and regulating target cell specificity, which has implications for immunotherapy.

ABSTRACT.
The aggregation of amyloid beta (Aβ) peptides plays an important role in the development of Alzheimer’s disease. Despite extensive effort, it has been difficult to characterize the secondary and tertiary structure of the Aβ monomer, the starting point for aggregation, due to its hydrophobicity and high aggregation propensity. Here, we employ extensive molecular dynamics simulations with atomistic protein and water models to determine structural ensembles for Aβ(42), Aβ(40), and Aβ(42)-E22K (the Italian mutant) monomers in solution. Sampling of a total of >700 microseconds in all-atom detail with explicit solvent enables us to observe the effects of peptide length and a pathogenic mutation on the disordered Aβ monomer structural ensemble. Aβ(42) and Aβ(40) have crudely similar characteristics but reducing the peptide length from 42 to 40 residues reduces β-hairpin formation near the C-terminus. The pathogenic Italian E22K mutation induces helix formation in the region of residues 20-24. This structural alteration may increase helix-helix interactions between monomers, resulting in altered mechanism and kinetics of Aβ oligomerization.

ABSTRACT.
Protein folding is a fundamental process in biology, key to understanding many human diseases. Experimentally, proteins often appear to fold via simple two- or three-state mechanisms involving mainly native-state interactions, yet recent network models built from atomistic simulations of small proteins suggest the existence of many possible metastable states and folding pathways. We reconcile these two pictures in a combined experimental and simulation study of acyl-coenzyme A binding protein (ACBP), a two-state folder (folding time ∼10 ms) exhibiting residual unfolded-state structure, and a putative early folding intermediate. Using single-molecule FRET in conjunction with side-chain mutagenesis, we first demonstrate that the denatured state of ACBP at near-zero denaturant is unusually compact and enriched in long-range structure that can be perturbed by discrete hydrophobic core mutations. We then employ ultrafast laminar-flow mixing experiments to study the folding kinetics of ACBP on the microsecond time scale. These studies, along with Trp-Cys quenching measurements of unfolded-state dynamics, suggest that unfolded-state structure forms on a surprisingly slow (∼100 μs) time scale, and that sequence mutations strikingly perturb both time-resolved and equilibrium smFRET measurements in a similar way. A Markov state model (MSM) of the ACBP folding reaction, constructed from over 30 ms of molecular dynamics trajectory data, predicts a complex network of metastable stables, residual unfolded-state structure, and kinetics consistent with experiment but no well-defined intermediate preceding the main folding barrier. Taken together, these experimental and simulation results suggest that the previously characterized fast kinetic phase is not due to formation of a barrier-limited intermediate but rather to a more heterogeneous and slow acquisition of unfolded-state structure.

ABSTRACT.
A simple model is presented that describes general features of protein folding, in good agreement with experimental results and detailed all-atom simulations. Starting from microscopic physics, and with no free parameters, this model predicts that protein folding occurs remarkably quickly because native-like states are kinetic hubs. A hub-like network arises naturally out of microscopic physical concerns, specifically the kinetic longevity of native contacts during a search of globular conformations. The model predicts folding times scaling as τ(f) ∼ e(ξN) in the number of residues, but because the model shows ξ is small, the folding times are much faster than Levinthal’s approximation. Importantly, the folding time scale is found to be small due to the topology and structure of the network. We show explicitly how our model agrees with generic experimental features of the folding process, including the scaling of τ(f) with N, two-state thermodynamics, a sharp peak in C(V), and native-state fluctuations.

SUMMARY.
These results have been a long time in coming and in many ways represents a major achievement for Folding@home (FAH) in general, demonstrating that the approach we started 10 years ago can make significant steps forward in our long term goals.

Specifically, our long term goals have been to 1) develop new methods to tackle the computational challenges of simulating protein folding; 2) apply these methods to gain new insights into protein folding; 3) use these methods and new insights to simulate Aß protein misfolding, a key process in the toxicity of Alzheimer’s Disease (AD); and finally 4) to use those simulations to develop new small molecule drug candidates for AD. In the early years of FAH, we concentrated on the first two goals above. In the last 5-7 years, we have worked to accomplish the third goal. I’m now very excited to report our progress on the last goal –– using FAH for the development of new therapeutic strategies for AD.

The next steps, now underway in our lab, are to take this lead compound and help push it towards a viable drug. It’s too early to report on our preliminary results there (I like to only talk publicly about work after it’s passed through peer review), I’m very excited that the directions set out in this paper do appear to be bearing fruit in terms of a viable drug (not just a drug candidate).

Abstract.
Drug design studies targeting one of the primary toxic agents in Alzheimer’s disease, soluble oligomers of amyloid β-protein (Aβ), have been complicated by the rapid, heterogeneous aggregation of Aβ and the resulting difficulty to structurally characterize the peptide. To address this, we have developed [Nle35, d-
Pro37]Aβ42, a substituted peptide inspired from molecular dynamics simulations which forms structures stable enough to be analyzed by NMR. We report herein that [Nle35, d-Pro37]Aβ42 stabilizes the trimer and prevents mature fibril and β-sheet formation. Further, [Nle35, d-Pro37]Aβ42 interacts with WT Aβ42 and reduces aggregation levels and fibril formation in mixtures. Using ligand-based drug design based on [Nle35, d-Pro37]Aβ42, a lead compound was identified with effects on inhibition similar to the peptide. The ability of [Nle35, d-Pro37]Aβ42 and the compound to inhibit the aggregation of Aβ42 provides a novel tool to study the structure of Aβ oligomers. More broadly, our data demonstrate how molecular dynamics simulation can guide experiment for further research into AD.

SUMMARY.
It is believed that Alzheimer’s Disease results from the misfolding of the Abeta peptide. Understanding how Abeta misfolds could give us some key insights into how to cure Alzheimer’s Disease. This paper experimentally tests a key prediction made in an earlier paper (paper #58: “Simulating oligomerization at experimental concentrations and long timescales: A Markov state model approach” by Nicholas W. Kelley, V. Vishal, Grant A. Krafft, and Vijay S. Pande. J. Chem. Phys. 129, 214707 (2008); DOI:10.1063/1.3010881). In this paper, we show experimentally that there appears to be a beta turn in the Abeta as predicted. This leads to a very stable form of misfolded Abeta which could be used as a starting point for a new Alzheimer’s therapy. We are heavily pursuing this research direction at the moment.

ABSTRACT.
Enhanced production of a 42-residue beta amyloid peptide (Aβ42) in affected parts of the brain has been suggested to be the main causative factor for the development of Alzheimer’s Disease (AD). The severity of the disease depends not only on the amount of the peptide but also its conformational transition leading to the formation of oligomeric amyloid-derived diffusible ligands (ADDLs) in the brain of AD patients. Despite being significant to the understanding of AD mechanism, no atomic-resolution structures are available for these species due to the evanescent nature of ADDLs that hinders most structural biophysical investigations. Based on our molecular modeling and computational studies, we have designed Met35Nle and G37p mutations in the Aβ42 peptide (Aβ42Nle35p37) that appear to organize Aβ42 into stable oligomers. 2D NMR on the Aβ42Nle35p37 peptide revealed the occurrence of two β-turns in the V24-N27 and V36-V39 stretches that could be the possible cause for the oligomer stability. We did not observe corresponding NOEs for the V24-N27 turn in the Aβ21–43Nle35p37 fragment suggesting the need for the longer length amyloid peptide to form the stable oligomer promoting conformation. Because of the presence of two turns in the mutant peptide which were absent in solid state NMR structures for the fibrils, we propose, fibril formation might be hindered. The biophysical information obtained in this work could aid in the development of structural models for toxic oligomer formation that could facilitate the development of therapeutic approaches to AD.

Biomolecular simulation is a core application on supercomputers, but it is exceptionally difficult to achieve the strong scaling necessary to reach biologically relevant timescales. Here, we present a new paradigm for parallel adaptive molecular dynamics and a publicly available implementation: Copernicus. This framework combines performance-leading molecular dynamics parallelized on three levels (SIMD, threads, and message-passing) with kinetic clustering, statistical model building and real-time result monitoring.

Copernicus enables execution as single parallel jobs with automatic resource allocation. Even for a small protein such as villin (9,864 atoms), Copernicus exhibits near-linear strong scaling from 1 to 5,376 AMD cores. Starting from extended chains we observe structures 0.6Å from the native state within 30h, and achieve sufficient sampling to predict the native state without a priori knowledge after 80-90h. To match Copernicus’ efficiency, a classical simulation would have to exceed 50μs per day, currently infeasible even with custom hardware designed for simulations.

SUMMARY.
We are constantly honing our methods to improve Folding@home’s ability to predict the behavior of proteins. This paper demonstrates the current state of the art in terms of both sampling and analysis. When compared to detailed TTET experiments, we show that our methods can piece out even fairly detailed aspects of folding. However, we also see the ways in which our models are not perfect, suggesting how we can improve our methods even further.

ABSTRACT.
As the fastest folding protein, the villin headpiece (HP35) serves as an important bridge between simulation and experimental studies of protein folding. Despite the simplicity of this system, experiments continue to reveal a number of surprises, including structure in the unfolded state and complex equilibrium dynamics near the native state. Using 2.5 ms of molecular dynamics and Markov state models, we connect to current experimental results in three ways. First, we present and validate a novel method for the quantitative prediction of triplet–triplet energy transfer experiments. Second, we construct a many-state model for HP35 that is consistent with previous experiments. Finally, we predict contact-formation time traces for all 1,225 possible triplet–triplet energy transfer experiments on HP35.

ABSTRACT.
A common theme of studies using molecular simulation is a necessary compromise between computational efficiency and resolution of the force field that is used. Significant efforts have been directed at combining multiple levels of granularity within a single simulation in order to maintain the efficiency of coarse-grained models, while using finer resolution in regions where such details are expected to play an important role. A specific example of this paradigm is the development of hybrid solvent models, which explicitly sample the solvent degrees of freedom within a specified domain while utilizing a continuum description elsewhere. Unfortunately, these models are complicated by the presence of structural artifacts at or near the explicit/implicit boundary. The presence of these artifacts significantly complicates the use of such models, both undermining the accuracy obtained and necessitating the parameterization of effective potentials to counteract the artificial interactions. In this work, we introduce a novel hybrid solvent model that employs a smoothly decoupled particle interface (SDPI), a switching region that gradually transitions from fully interacting particles to a continuum solvent. The resulting SDPI model allows for the use of an implicit solvent model based on a simple theory that needs to only reproduce the behavior of bulk solvent rather than the more complex features of local interactions. In this study, the SDPI model is tested on spherical hybrid domains using a coarse-grained representation of water that includes only Lennard-Jones interactions. The results demonstrate that this model is capable of reproducing solvent configurations absent of boundary artifacts, as if they were taken from full explicit simulations.

ABSTRACT.
Protein folding is an important problem in structural biology with significant medical implications, particularly for misfolding disorders like Alzheimer’s disease. Solving the folding problem will ultimately require a combination of theory and experiment, with theoretical models providing a comprehensive view of folding and experiments grounding these models in reality. Here we review progress towards this goal over the past decade, with an emphasis on recent theoretical advances that are empowering chemically detailed models of folding and the new results these technologies are providing. In particular, we discuss new insights made possible by Markov state models (MSMs), including the role of non-native contacts and the hub-like character of protein folded states.

ABSTRACT.
Protein folding is a classic grand challenge that is relevant to numerous human diseases, such as protein misfolding diseases like Alzheimer’s disease. Solving the folding problem will ultimately require a combination of theory, simulation, and experiment, with theory and simulation providing an atomically detailed picture of both the thermodynamics and kinetics of folding and experimental tests grounding these models in reality. However, theory and simulation generally fall orders of magnitude short of biologically relevant time scales. Here we report significant progress toward closing this gap: an atomistic model of the folding of an 80-residue fragment of the λ repressor protein with explicit solvent that captures dynamics on a 10 milliseconds time scale. In addition, we provide a number of predictions that warrant further experimental investigation. For example, our model’s native state is a kinetic hub, and biexponential kinetics arises from the presence of many free-energy basins separated by barriers of different heights rather than a single low barrier along one reaction coordinate (the previously proposed incipient downhill folding scenario).

ABSTRACT.
As nascent proteins are synthesized by the ribosome, they depart via an exit tunnel running through the center of the large subunit. The exit tunnel likely plays an important part in various aspects of translation. Although water plays a key role in many bio-molecular processes, the nature of water confined to the exit tunnel has remained unknown. Furthermore, solvent in biological cavities has traditionally been characterized as either a continuous dielectric fluid, or a discrete tightly bound molecule. Using atomistic molecular dynamics simulations, we predict that the thermodynamic and kinetic properties of water confined within the ribosome exit tunnel are quite different from this simple two-state model. We find that the tunnel creates a complex microenvironment for the solvent resulting in perturbed rotational dynamics and heterogeneous dielectric behavior. This gives rise to a very rugged solvation landscape and significantly retarded solvent diffusion. We discuss how this non-bulk-like solvent is likely to affect important biophysical processes such as sequence dependent stalling, co-translational folding, and antibiotic binding. We conclude with a discussion of the general applicability of these results to other biological cavities.

SUMMARY.
GPU computing has great potential, but consumer GPUs lack some key features that are present in CPUs, especially error checking RAM. Do memory issues on GPUs cause problems? We used a small fraction of the large number of GPUs on Folding@home to test this. Understanding these issues were key to reliable large-scale GPU computing in Folding@home.

ABSTRACT.
Graphics processing units (GPUs) are gaining widespread use in computational chemistry and other scientific simulation contexts because of their huge performance advantages relative to conventional CPUs. However, the reliability
of GPUs in error-intolerant applications is largely unproven. In particular, a lack of error checking and correcting (ECC) capability in the memory subsystems of graphics cards has been cited as a hindrance to the acceptance of GPUs as
high performance coprocessors, but the impact of this design has not been previously quantified. In this article we present MemtestG80, our software for assessing memory error rates on NVIDIA G80 and GT200-architecture-based graphics cards. Furthermore, we present the results of a large-scale assessment of GPU error rate, conducted by running MemtestG80 on over 20,000 hosts on the Folding@home distributed computing network. Our control experiments on consumer-grade and dedicated-GPGPU hardware in a controlled environment found no errors. However, our survey over cards on Folding@home finds that, in their installed environments, two-thirds of tested GPUs exhibit a detectable, pattern-sensitive rate of memory soft errors. We demonstrate that these errors persist after controlling for overclocking and environmental proxies for temperature, but depend strongly on board architecture.

ABSTRACT.
Simulating protein folding has been a challenging problem for decades due to the long timescales involved (compared with what is possible to simulate) and the challenges of gaining insight from the complex nature of the resulting simulation data. Markov State Models (MSMs) present a means to tackle both of these challenges, yielding simulations on experimentally relevant timescales, statistical significance, and coarse grained representations that are readily humanly understandable. Here, we review this method with the intended audience of non-experts, in order to introduce the method to a broader audience. We review the motivations, methods, and caveats of MSMs, as well as some recent highlights of applications of the method. We conclude by discussing how this approach is part of a paradigm shift in how one uses simulations, away from anecdotal single-trajectory approaches to a more comprehensive statistical approach.

Proceedings of the National Academy of Sciences, USA 107 10890-10895 (2010)

ABSTRACT.
Understanding molecular kinetics, and particularly protein folding, is a classic grand challenge in molecular biophysics. Network models, such as Markov state models (MSMs), are one potential solution to this problem. MSMs have recently yielded quantitative agreement with experimentally derived structures and folding rates for specific systems, leaving them positioned to potentially provide a deeper understanding of molecular kinetics that can lead to experimentally testable hypotheses. Here we use existing MSMs for the villin headpiece and NTL9, which were constructed from atomistic simulations, to accomplish this goal. In addition, we provide simpler, humanly comprehensible networks that capture the essence of molecular kinetics and reproduce qualitative phenomena like the apparent two-state folding often seen in experiments. Together, these models show that protein dynamics are dominated by stochastic jumps between numerous metastable states and that proteins have heterogeneous unfolded states (many unfolded basins that interconvert more rapidly with the native state than with one another) yet often still appear two-state. Most importantly, we find that protein native states are hubs that can be reached quickly from any other state. However, metastability and a web of nonnative states slow the average folding rate. Experimental tests for these findings and their implications for other fields, like protein design, are also discussed.

ABSTRACT.
Computer simulations can complement experiments by providing insight into molecular kinetics with atomic resolution. Unfortunately, even the most powerful supercomputers can only simulate small systems for short time scales, leaving modeling of most biologically relevant systems and time scales intractable. In this work, however, we show that molecular simulations driven by adaptive sampling of networks called Markov State Models (MSMs) can yield tremendous time and resource savings, allowing previously intractable calculations to be performed on a routine basis on existing hardware. We also introduce a distance metric (based on the relative entropy) for comparing MSMs. We primarily employ this metric to judge the convergence of various sampling schemes but it could also be employed to assess the effects of perturbations to a system (e.g., determining how changing the temperature or making a mutation changes a system’s dynamics).

SUMMARY.
OpenMM is the key library which powers GPU computing in Folding@home. This paper discusses some key aspects of how OpenMM works.

ABSTRACT.
The wide diversity of computer architectures today requires a new approach to software development. OpenMM is an abstraction layer for molecular mechanics simulations, allowing a single program to run efficiently on a variety of hardware platforms.

ABSTRACT.
Molecular force fields have been approaching a generational transition over the past several years, moving away from well-established and well-tuned, but intrinsically limited, fixed point charge models toward more intricate and expensive polarizable models that should allow more accurate description of molecular properties. The recently introduced AMOEBA force field is a leading publicly available example of this next generation of theoretical model, but to date, it has only received relatively limited validation, which we address here. We show that the AMOEBA force field is in fact a significant improvement over fixed charge models for small molecule structural and thermodynamic observables in particular, although further fine-tuning is necessary to describe solvation free energies of drug-like small molecules, dynamical properties away from ambient conditions, and possible improvements in aromatic interactions. State of the art electronic structure calculations reveal generally very good agreement with AMOEBA for demanding problems such as relative conformational energies of the alanine tetrapeptide and isomers of water sulfate complexes. AMOEBA is shown to be especially successful on protein−ligand binding and computational X-ray crystallography where polarization and accurate electrostatics are critical.

ABSTRACT.
While several experimental techniques now exist for characterizing protein unfolded states, all-atom simulation of unfolded states has been challenging due to the long time scales and conformational sampling required. We address this problem by using a combination of accelerated calculations on graphics processor units and distributed computing to simulate tens of thousands of molecular dynamics trajectories each up to 10 μs (for a total aggregate simulation time of 127 ms). We used this approach in conjunction with Trp-Cys contact quenching experiments to characterize the unfolded structure and dynamics of protein L. We employed a polymer theory method to make quantitative comparisons between high-temperature simulated and chemically denatured experimental ensembles and find that reaction-limited quenching rates calculated from simulation agree remarkably well with experiment. In both experiment and simulation, we find that unfolded-state intramolecular diffusion rates are very slow compared to highly denatured chains and that a single-residue mutation can significantly alter unfolded-state dynamics and structure. This work suggests a view of the unfolded state in which surprisingly low diffusion rates could limit folding and opens the door for all-atom molecular simulation to be a useful predictive tool for characterizing protein unfolded states along with experiments that directly measure intramolecular diffusion.

ABSTRACT.
Molecular kinetics underlies all biological phenomena and, like many other biological processes, may best be understood in terms of networks. These networks, called Markov state models (MSMs), are typically built from physical simulations. Thus, they are capable of quantitative prediction of experiments and can also provide an intuition for complex conformational changes. Their primary application has been to protein folding; however, these technologies and the insights they yield are transferable. For example, MSMs have already proved useful in understanding human diseases, such as protein misfolding and aggregation in Alzheimer’s disease.

SUMMARY.
Building on our previous simulations of membrane fusion, we have used the power of Folding@home to systematically analyze the fusion reaction between two vesicles and the molecular nature of water in this reaction. For purposes of computational tractability, many approaches neglect the detailed structure of water for large membrane simulations. In this case, we show that this detailed structure affects both the thermodynamics and the dynamics of the fusion reaction. These results have important implications for how we should perform vesicle fusion simulations; they also give a new example of structured water between two flexible hydrophilic interfaces. This water structure may be important in a number of cell-cell interactions.

ABSTRACT.
Membrane interfaces are critical to many cellular functions, yet the vast array of molecular components involved make the fundamental physics of interaction difficult to define. Water has been shown to play an important role in the dynamics of small biological systems, for example when trapped in hydrophobic regions, but the molecular details of water have generally been thought dispensable when considering large membrane interfaces. Nevertheless, spectroscopic data indicate that water has distinct, ordered behavior near membrane surfaces. While coarse-grained simulations have achieved success recently in aiding understanding the dynamics of membrane assemblies, it is natural to ask, does the missing chemical nature of water play an important role? We have therefore performed atomic-resolution simulations of vesicle fusion to understand the role of chemical detail, particularly the molecular structure of water, in membrane fusion and at membrane interfaces more generally. These membrane interfaces present a form of hydrophilic confinement, yielding surprising, non-bulk-like water behavior.

SUMMARY.
We continue to study a small but ubiquitous RNA structural motif known as the GNRA tetraloop. This structure plays a role in the formation of larger RNA’s and is also of great interest due to its statistical overabundance in RNA structure. Our study demonstrates the highly flexible and dynamic properties of this structure, and also highlights the ability of this sequence to take on a number of non-native configurations in order to interact with adjacent RNA strands, suggesting that conformational entropy acts to stabilize this loop when not in its native conformation.

ABSTRACT.
Conformational equilibrium within the ubiquitous GNRA tetraloop motif was simulated at the ensemble level, including 10,000 independent all atom molecular dynamics trajectories totaling over 110 microseconds of simulation time. This robust sampling reveals a highly dynamic structure comprised of 15 conformational microstates. We assemble a Markov model that includes transitions ranging from the nanosecond to microsecond timescales and is dominated by six key loop conformations that contribute to fluctuations around the native state. Mining of the Protein Data Bank provides an abundance of structures in which GNRA tetraloops participate in tertiary contact formation. Most predominantly observed in the experimental data are interactions of the native loop structure within the minor groove of adjacent helical regions. Additionally, a second trend is observed in which the tetraloop assumes non-native conformations while participating in multiple tertiary contacts, in some cases involving multiple possible loop conformations. This tetraloop flexibility can act to counterbalance the energetic penalty associated with assuming non-native loop structures in forming tertiary contacts. The GNRA motif has thus evolved not only to readily participate in simple tertiary interactions involving native loop structure, but also to easily adapt tetraloop secondary conformation in order to participate in larger, more complex tertiary interactions.

SUMMARY.
Our assessment of biophysical force fields as applied to helical peptides and proteins continues with the comparison of “next generation” AMBER-03 and AMBER-99SB to our previous results, particularly with respect to our AMBER variant, AMBER-99phi. Here we also incorporate simulations of a flexible and largely helical protein in order to assess the ability of these molecular models to adequately stabilize such structures.

ABSTRACT.
Multiple variants of the AMBER all-atom force field were quantitatively evaluated with respect to their ability to accurately characterize helix-coil equilibria in explicit solvent simulations. Using a global distributed computing network, absolute conformational convergence was achieved for large ensembles of the capped A21 and Fs helical peptides. Further assessment of these AMBER variants was conducted via simulations of a flexible 164-residue five-helix-bundle protein, apolipophorin-III, on the 100 ns timescale. Of the contemporary potentials that had not been assessed previously, the AMBER-99SB force field showed significant helix-destabilizing tendencies, with beta bridge formation occurring in helical peptides, and unfolding of apolipophorin-III occurring on the tens of nanoseconds timescale. The AMBER-03 force field, while showing adequate helical propensities for both peptides and stabilizing apolipophorin-III, (i) predicts an unexpected decrease in helicity with ALA to ARG+ substitution, (ii) lacks experimentally observed 3-10 helical content, and (iii) deviates strongly from average apolipophorin-III NMR structural properties. As is observed for AMBER-99SB, AMBER-03 significantly overweighs the contribution of extended and polyproline backbone configurations to the conformational equilibrium. In contrast, the AMBER-99phi force field, which was previously shown to best reproduce experimental measurements of the helix-coil transition in model helical peptides, adequately stabilizes apolipophorin-III and yields both an average gyration radius and polar solvent exposed surface area that are in excellent agreement with the NMR ensemble.

SUMMARY.
Recent work from detailed simulations of protein folding resulting from Folding@home have suggested some surprises and radical changes in how one conceptualizes protein folding kinetics. One of the more unusual aspects found in these simulations is the role of the native state as a kinetic hub (see paper #74). Here, we propose a new theory of protein folding that uses structural information in its kinetic equations and gives a much richer picture than previous theories. One key result is a prediction for what would cause the native state to be a kinetic hub and when one would see this effect (and in particular why it was not seen in simpler simulation studies previously).

ABSTRACT.
We present a simple model of protein folding dynamics that captures key qualitative elements recently seen in all-atom simulations. The goals of this theory are to serve as a simple formalism for gaining deeper insight into the physical properties seen in detailed simulations as well as to serve as a model to easily compare why these simulations suggest a different kinetic mechanism than previous simple models. Specifically, we find that non-native contacts play a key role in determining the mechanism, which can shift dramatically as the energetic strength of non-native interactions is changed. For protein-like non-native interactions, our model finds that the native state is a kinetic hub, connecting the strength of relevant interactions directly to the nature of folding kinetics.

G. R. Bowman and V. S. Pande.
Proceedings of the National Academy of Sciences, USA (2010).

SUMMARY.
By analyzing recent results from Folding@home, we have found a set of general properties emerging regarding how proteins fold. In particular, one of them comes as a surprise compared to previous models: the native state is a kinetic hub. This has implications for how we think about protein folding in general as well as applications of protein folding in biology and disease.

ABSTRACT.
Understanding molecular kinetics, and particularly protein folding, is a classic grand challenge in molecular biophysics. Network models, such as Markov state models (MSMs), are one potential solution to this problem. MSMs have recently yielded quantitative agreement with experimentally derived structures and folding rates for specific systems, leaving them positioned to potentially provide a deeper understanding of molecular kinetics that can lead to experimentally testable hypotheses. Here we use existing MSMs for the villin headpiece and NTL9, which were constructed from atomistic simulations, to accomplish this goal. In addition, we provide simpler, humanly comprehensible networks that capture the essence of molecular kinetics and reproduce qualitative phenomena like the apparent two-state folding often seen in experiments. Together, these models show that protein dynamics are dominated by stochastic jumps between numerous metastable states and that proteins have heterogeneous unfolded states (many unfolded basins that interconvert more rapidly with the native state than with one another) yet often still appear two-state. Most importantly, we find that protein native states are hubs that can be reached quickly from any other state. However, metastability and a web of nonnative states slow the average folding rate. Experimental tests for these findings and their implications for other fields, like protein design, are also discussed.

SUMMARY.
Membrane fusion is a common underlying process critical to neurotransmitter release, cellular trafficking, and infection by many viruses. Proteins have been identified that catalyze fusion, and mutations to these proteins have yielded important information on how fusion occurs. However, the precise mechanism by which membrane fusion begins is the subject of active investigation. We have used atomic-resolution simulations to model the process of vesicle fusion and to identify a transition state for the formation of an initial fusion stalk. Doing so required substantial technical advances in combining high-performance simulation and distributed computing to analyze the transition state of a complex reaction in a large system. The transition state we identify in our simulations involves specific structural changes by a few lipid molecules. We also simulate fusion peptides from influenza hemagglutinin and show that they promote the same structural changes as are required for fusion in our model. We therefore hypothesize that these changes to individual lipid molecules may explain a portion of the catalytic activity of fusion proteins such as influenza hemagglutinin.

ABSTRACT.
Membrane fusion is essential to both cellular vesicle trafficking and infection by enveloped viruses. While the fusion protein assemblies that catalyze fusion are readily identifiable, the specific activities of the proteins involved and nature of the membrane changes they induce remain unknown. Here, we use many atomic-resolution simulations of vesicle fusion to examine the molecular mechanisms for fusion in detail. We employ committor analysis for these million-atom vesicle fusion simulations to identify a transition state for fusion stalk formation. In our simulations, this transition state occurs when the bulk properties of each lipid bilayer remain in a lamellar state but a few hydrophobic tails bulge into the hydrophilic interface layer and make contact to nucleate a stalk. Additional simulations of influenza fusion peptides in lipid bilayers show that the peptides promote similar local protrusion of lipid tails. Comparing these two sets of simulations, we obtain a common set of structural changes between the transition state for stalk formation and the local environment of peptides known to catalyze fusion. Our results thus suggest that the specific molecular properties of individual lipids are highly important to vesicle fusion and yield an explicit structural model that could help explain the mechanism of catalysis by fusion proteins.

SUMMARY.
Simulating protein folding on the millisecond timescale has been a major challenge for many years. When we started Folding@home, our first goal was to break the microsecond barrier. This barrier is 1000x fold harder and represents a major step forward in molecular simulation. Specifically, in a recent paper (http://pubs.acs.org/doi/abs/10.1021/ja9090353), Folding@home researchers Vincent Voelz, Greg Bowman, Kyle Beauchamp, and Vijay Pande have broken this barrier. The movie below is of one of the trajectories that folded (i.e. started unfolded and ended up in the folded state). From simulations like these, we have found some new surprises in how proteins fold. Please see the paper (url above) for more details.

Why is this important? This is important since protein misfolding occurs on long timescales and this first simulation on the millisecond simulation for protein folding means we have demonstrated our new Markov State Model (MSM) technology can successfully simulate long timescales. It make sense to go after protein folding first, since there is a wealth of experimental data for us to test our simulations. While this paper on protein folding has just come out, we have already been using this MSM technology to study protein misfolding in Alzheimer’s Disease, following up from our 2008 paper. While our previous paper (#58 below) was able to get to long enough timescales to see small molecular weight oligomers, this new methodology gives us hope to push further with our simulations of Alzheimer’s, making more direct connections to larger, more complex Abeta oligomers than we were previously able to do.

This is a pretty exciting moment for us in terms of what we can now do with simulations, and we’re looking forward to new applications of this technology.

ABSTRACT.
To date, the slowest-folding proteins folded ab initio by all-atom molecular dynamics simulations have had folding times in the range of nanoseconds to microseconds. We report simulations of several folding trajectories of NTL9(1−39), a protein which has a folding time of 1.5 ms. Distributed molecular dynamics simulations in implicit solvent on GPU processors were used to generate ensembles of trajectories out to 40 μs for several temperatures and starting states. At a temperature less than the melting point of the force field, we observe a small number of productive folding events, consistent with predictions from a model of parallel uncoupled two-state simulations. The posterior distribution of the folding rate predicted from the data agrees well with the experimental folding rate (640/s). Markov State Models (MSMs) built from the data show a gap in the implied time scales indicative of two-state folding and heterogeneous pathways connecting diffuse mesoscopic substates. Structural analysis of the 14 out of 2000 macrostates transited by the top 10 folding pathways reveals that native-like pairing between strands 1 and 2 only occurs for macrostates with pfold > 0.5, suggesting β12 hairpin formation may be rate-limiting. We believe that using simulation data such as these to seed adaptive resampling simulations will be a promising new method for achieving statistically converged descriptions of folding landscapes at longer time scales than ever before.

SUMMARY.
Markov State Models (MSMs) are one of the most common ways to analyze Folding@Home simulations. This paper introduces a new validation method, which could play an important role in automating their construction

ABSTRACT
Discrete-space Markov models are a convenient way of describing the
kinetics of biomolecules. The most common strategies used to validate
these models employ statistics from simulation data, such as the
eigenvalue spectrum of the inferred rate matrix, which are often
associated with large uncertainties. Here, we propose a Bayesian
approach, which makes it possible to differentiate between models at a
fixed lag time making use of short trajectories. The hierarchical
definition of the models allows one to compare instances with any
number of states. We apply a conjugate prior for reversible Markov
chains, which was recently introduced in the statistics literature.
The method is tested in two different systems, a Monte Carlo dynamics
simulation of a two-dimensional model system a

SUMMARY.
The influenza virus infects people and animals by binding to complex sugar molecules on the surface of the respiratory tract. Bird viruses bind most strongly to bird cell-surface sugars and human viruses bind most strongly to human cell-surface sugars. As the recent swine-origin influenza virus has demonstrated, there is considerable overlap between the binding ability of human and pig viruses to cells of the other host. Changes to this binding affinity are one key component for viruses to make a jump between species, and it is difficult to predict the necessary mutations ahead of time. We would like to predict high-risk mutations to enable better surveillance and early control of potential inter-species transmission events. This work represents a first step in that direction, as we examine mutations to H5N1 avian influenza that alter ligand binding. We use Folding@home as a powerful computational screen to evaluate mutations that will eventually require experimental testing to verify.

ABSTRACT.
Influenza virus attaches to and infects target cells via binding of cell-surface glycans by the viral hemagglutinin. This binding specificity is considered a major reason why avian influenza is typically poorly transmitted between humans, while swine influenza is better transmitted due to glycan similarity between the human and swine upper respiratory tract. Predicting mutations that control glycan binding is thus important to continued surveillance against new pandemic influenza strains. We have designed a molecular-dynamics approach for scoring potential mutants with predictive power for both receptor-binding-domain and allosteric mutations similar to those identified from clinical isolates of avian influenza. We have performed thousands of simulations of 17 different hemagglutinin mutants totaling >1 ms in length and employ a Bayesian model to rank mutations that disrupt the stability of the hemagglutinin−ligand complex. Based on our simulations, we predict a significantly increased koff for seven of these mutants. This means of using molecular dynamics analysis to make experimentally verifiable predictions offers a potentially general method to identify ligand-binding mutants, particularly allosteric ones. Our analysis of ligand dissociation provides a means to evaluate mutants prior to experimental mutagenesis and testing and constitutes an important step toward understanding the determinants of ligand binding by H5N1 influenza.

ABSTRACT.
Here we continue our efforts to use methods developed in the folding mechanism community to both better understand and improve structure prediction. Our previous work demonstrated that Rosetta’s coarse-grained potentials may actually impede accurate structure prediction at full-atom resolution. Based on this work we postulated that it may be time to work completely at full-atom resolution but that doing so may require more careful attention to the kinetics of convergence.

ABSTRACT
Part of understanding a molecule’s conformational dynamics is mapping out the dominant metastable, or long lived, states that it occupies. Once identified, the rates for transitioning between these states may then be determined in order to create a complete model of the system’s conformational dynamics. Here we describe the use of the MSMBuilder package (now available at http://simtk.org/home/msmbuilder/) to build Markov State Models (MSMs) to identify the metastable states from Generalized Ensemble (GE) simulations, as well as other simulation datasets. Besides building MSMs, the code also includes tools for model evaluation and visualization.

ABSTRACT
Accurate simulation of biophysical processes requires vast computing resources. Folding@home is a distributed computing system first released in 2000 to provide such resources needed to simulate protein folding and other biomolecular phenomena. Now operating in the range of 5 PetaFLOPS sustained, it provides more computing power than can typically be gathered and operated locally due to cost, physical space, and electrical/cooling load. This paper describes the architecture and operation of Folding@home, along with some lessons learned over the lifetime of the project.

ABSTRACT
Recently a temperature-jump FTIR study of a designed three-stranded sheet showing a fast relaxation time of ~140 ± 20 ns was published. We performed massively parallel molecular dynamics simulations in explicit solvent to probe the structural events involved in this relaxation. While our simulations produce similar relaxation rates, the structural ensemble is broad. We observe the formation of turn structure, but only very weak interaction in the strand regions, which is consistent with the lack of strong backbone-backbone NOEs in previous structural NMR studies. These results suggest that either DPDP-II folds at time scales longer than 240 ns, or that DPDP-II is not a well-defined three-stranded β-sheet. This work also provides an opportunity to compare the performance of several popular force field models against one another.

ABSTRACT
We present a new multiscale method that combines all-atom molecular dy-
namics with coarse-grained sampling, towards the aim of bridging two levels of
physiology: the atomic scale of protein side chains and small molecules, and the
huge scale of macromolecular complexes like the ribosome. Our approach uses
all-atom simulations of peptide (or other ligand) fragments to calculate local
3D spatial potentials of mean force (PMF). The individual fragment PMFs are
then used as a potential for a coarse-grained chain representation of the entire
molecule. Conformational space and sequence space are sampled efficiently us-
ing generalized ensemble Monte Carlo. Here, we apply this method to the study
of nascent polypeptides inside the cavity of the ribosome exit tunnel. We show
how the method can be used to explore the accessible conformational and se-
quence space of nascent polypeptide chains near the ribosome peptidyl transfer
center (PTC), with the eventual aim of understanding the basis of speciﬁcity
for co-translational regulation. The method has many potential applications
to predicting binding speciﬁcity and design, and is sufficiently general to allow even greater separation of scales in future work.

ABSTRACT.
We describe molecular dynamics simulations resulting in the folding the Fip35 Hpin1 WW domain. The simulations were run on a distributed set of graphics processors, which are capable of providing up to two orders of magnitude faster computation than conventional processors. Using the Folding@home distributed computing system, we generated thousands of independent trajectories in an implicit solvent model, totaling over 2.73 ms of simulations. A small number of these trajectories folded; the folding proceeded along several distinct routes and the system folded into two distinct three-stranded beta-sheet conformations, showing that the folding mechanism of this system is distinctly heterogeneous.

SUMMARY. This paper describes the code behind the Folding@home GPU clients, detailing how they work, how we achieved such a significant speed up on GPUs, and other implementation details.

ABSTRACT. We describe a complete implementation of all-atom protein molecular dynamics running entirely on a graphics processing unit (GPU), including all standard force field terms, integration, constraints, and implicit solvent. We discuss the design of our algorithms and important optimizations needed to fully take advantage of a GPU. We evaluate its performance, and show that it can be more than 700 times faster than a conventional implementation running on a single CPU core.

SUMMARY. The aggregation of the Huntingtin (Htt) protein has been implicated as the cause of Huntington’s disease. However, how this aggregation occurs and why can happen so quickly is still largely unknown. Inspired by recent experimental results in the Frydman lab at Stanford, we have investigated a new possible mechanism for the aggregation of the Huntingtin (Htt) protein, with implications for better understanding how the Htt protein aggregates.

SUMMARY.
The influenza hemagglutinin protein performs several important
functions, including attaching the virus to cells it will infect and
releasing the viral genome into the interior of the cell. Most
protective antibodies against influenza also bind to the hemagglutinin
protein. We wish to understand how mutations to hemagglutinin affect
viral function, including what keeps avian influenza (“bird flu”) from
being readily transmissible between humans. In this paper, we have
applied a technique from information theory known as mutual
information to genetic sequence data to predict important mutation
sites on the hemagglutinin protein. In follow-up work, we are
combining this technique with other methods to refine these
predictions and test some of them using Folding@home.

ABSTRACT.
Influenza hemagglutinin mediates both cell-surface binding and cell
entry by the virus. Mutations to hemagglutinin are thus critical in
determining host species specificity and viral infectivity. Previous
approaches have primarily considered point mutations and sequence
conservation; here we develop a complementary approach using mutual
information to examine concerted mutations. For hemagglutinin,
several overlapping selective pressures can cause such concerted
mutations, including the host immune response, ligand recognition and
host specificity, and functional requirements for pH-induced
activation and membrane fusion. Using sequence mutual information as
a metric, we extracted clusters of concerted mutation sites and
analyzed them in the context of crystallographic data. Comparison of
influenza isolates from two subtypes—human H3N2 strains and human and
avian H5N1 strains—yielded substantial differences in spatial
localization of the clustered residues. We hypothesize that the
clusters on the globular head of H3N2 hemagglutinin may relate to
antibody recognition (as many protective antibodies are known to bind
in that region), while the clusters in common to H3N2 and H5N1
hemagglutinin may indicate shared functional roles. We propose that
these shared sites may be particularly fruitful for mutagenesis
studies in understanding the infectivity of this common human
pathogen. The combination of sequence mutual information and
structural analysis thus helps generate novel functional hypotheses
that would not be apparent via either method alone.

SUMMARY. In this paper, we detail how we were able to get great speed increases for Folding@home (and actually certain molecular dynamics calculations in general) on the PS3. This is our first paper using the PS3, laying out the “how does it work,” with a follow up paper in the works describing the results obtained in FAH from PS3 clients. It is also worth noting that this paper is a collaboration between FAH team members (Luttmann, Ensign, Vaidyanathan, Houston [now at AMD], Jayachandran, Friedrichs, and Pande) with developers at Sony (Rimon and Øland and their coworkers).

ABSTRACT. Implementation of molecular dynamics (MD) calculations on novel architectures will vastly increase its power to calculate the physical properties of complex systems. Herein, we detail algorithmic advances developed to accelerate MD simulations on the Cell processor, a commodity processor found in PlayStation 3 (PS3). In particular, we discuss issues regarding memory access versus computation and the types of calculations which are best suited for streaming processors such as the Cell, focusing on implicit solvation models. We conclude with a comparison of improved performance on the PS3′s Cell processor over more traditional processors.

SUMMARY The ribosome is a fascinating molecular machine, responsible for the synthesis of proteins. For this reason it is of fundamental importance to protein folding (as the last step in the central dogma of biology) as well as to human health (since the ribosome is the target of a very large fraction of antibiotics). One of the questions revolving around ribosome function is why is there a large tunnel inside the ribosome, through which proteins exit after being synthesized. In this paper, we used “bigWU” classic clients (clients which allow larger systems to run) since the ribosome is so huge that it would not run on regular classic clients. The primary goal of this paper was to analyze the surface of the ribosome tunnel. Understanding the nature of this surface would be useful for both understanding the fundamental nature of protein synthesis as well as how key antibiotics interact with the ribosome. An interesting related discovery was the identification of a potential “ribosome gate” which can open and close selectively, based on what is interacting with the gate. This suggests novel hypotheses for several aspects of ribosome function as well as interesting new directions for work on studying the ribosome and for new routes for antibiotics.

ABSTRACT The ribosome is a large complex catalyst responsible for the synthesis of new proteins, an essential function for life. New proteins emerge from the ribosome through an exit tunnel as nascent polypeptide chains. Recent findings indicate that tunnel interactions with the nascent polypeptide chain might be relevant for the regulation of translation. However, the specific ribosomal structural features that mediate this process are unknown. Performing molecular dynamics simulations, we are studying the interactions between components of the ribosome exit tunnel and different chemical probes (specifically different amino acid side chains or monovalent inorganic ions). Our free-energy maps describe the physicochemical environment of the tunnel, revealing binding crevices and free-energy barriers for single amino acids and ions. Our simulations indicate that transport out of the tunnel could be different for diverse amino acid species. In addition, our results predict a notable protein–RNA interaction between a flexible 23S rRNA tetraloop (gate) and ribosomal protein L39 (latch) that could potentially obstruct the tunnel’s exit. By relating our simulation data to earlier biochemical studies, we propose that ribosomal features at the exit of the tunnel can play a role in the regulation of nascent chain exit and ion flux. Moreover, our free-energy maps may provide a context for interpreting sequence-dependent nascent chain phenomenology.

SUMMARY. Abeta misfolding and aggregation is believed to be the cause of Alzheimer’s Disease. Simulations, like Folding@home, are a natural way to understand this process. However, there are several key challenges for simulating the key step — oligomerization.
This work represents a new way to simulate Abeta oligomerization, with a key advance of being able to simulate experimentally relevant timescales and concentrations, using a novel method. We use this new method and the power provided by Folding@home donors to simulate oligomerization in all-atom detail. This has lead to specific predictions about the process, which we are now testing experimentally. In many ways, this paper is the “tip of the iceberg” for the Folding@home activities in AD, with a lot more interesting results to come, especially in terms of experimental tests of our predictions and interesting new possibilities for new drugs and AD therapeutics.

This work ran exclusively on classic clients. For the follow up simulations, we are using a mixture of GPU, SMP, and classic clients. Due to the large number of classic clients, they allow us to calculations not possible on the other platforms. However, the raw speed (but smaller number) of the GPU and SMP clients allow us to get a good rough idea quickly, refining later with classic clients.

ABSTRACT. Here, we present a novel computational approach for describing the formation of oligomeric assemblies at experimental concentrations and timescales. We propose an extension to the Markovian state model approach, where one includes low concentration oligomeric states analytically. This allows simulation on long timescales (seconds timescale) and at arbitrarily low concentrations (e.g., the micromolar concentrations found in experiments), while still using an all-atom model for protein and solvent. As a proof of concept, we apply this methodology to the oligomerization of an Abeta peptide fragment (Abeta 21–43). Abeta oligomers are now widely recognized as the primary neurotoxic structures leading to Alzheimer’s disease. Our computational methods predict that Abeta trimers form at micromolar concentrations in 10 ms, while tetramers form 1000 times more slowly. Moreover, the simulation results predict specific intermonomer contacts present in the oligomer ensemble as well as putative structures for small molecular weight oligomers. Based on our simulations and statistical models, we propose a novel mutation to stabilize the trimeric form of Abeta in an experimentally verifiable manner.

ABSTRACT. Hairpins are a ubiquitous secondary structure motif in RNA molecules. Despite their simple structure, there is some debate over whether they fold in a two-state or multi-state manner. We have studied the folding of a small tetraloop hairpin using a serial version of replica exchange molecular dynamics on a distributed computing environment. On the basis of these simulations, we have identified a number of intermediates that are consistent with experimental results. We also find that folding is not simply the reverse of high-temperature unfolding and suggest that this may be a general feature of biomolecular folding.

ABSTRACT.We have implemented the serial replica exchange method (SREM) and simulated tempering (ST) enhanced sampling algorithms in a global distributed computing environment. Here we examine the helix-coil transition of a 21 residue alpha-helical peptide in explicit solvent. For ST, we demonstrate the efficacy of a new method for determining initial weights allowing the system to perform a random walk in temperature space based on short trial simulations. These weights are updated throughout the production simulation by an adaptive weighting method. We give a detailed comparison of SREM, ST, as well as standard MD and find that SREM and ST give equivalent results in reasonable agreement with experimental data. In addition, we find that both enhanced sampling methods are much more efficient than standard MD simulations. The melting temperature of the Fs peptide with the AMBER99phi potential was calculated to be about 310 K, which is in reasonable agreement with the experimental value of 334 K. We also discuss other temperature dependent properties of the helix-coil transition. Although ST has certain advantages over SREM, both SREM and ST are shown to be powerful methods via distributed computing and will be applied extensively in future studies of complex bimolecular systems.

SUMMARY. This paper details our first efforts with GPU’s for molecular dynamics. This work lead to the GPU1 FAH core. We have other papers in the works describing the successor to the GPU1 core as well as the PS3 core.

ABSTRACT. Commercial graphics processors (GPUs) have high compute capacity at very low cost, which makes them attractive for general purpose scientific computing. In this poster we show how graphics processors can be used for N-body simulations to obtain large improvements in performance over current generation CPUs. We have developed a highly optimized algorithm for performing the O(N^2) force calculations that constitute the major part of stellar and molecular dynamics simulations. In the calculations, we achieve sustained performance of nearly 100 GFlops on an ATI X1900XTX. The performance on GPUs 25x an Intel Pentium 4, and 2x specialized hardware such as GRAPE-6A, but at a fraction of the cost. Furthermore, the wide availability of GPUs has significant implications for cluster computing and distributed computing efforts like Folding@home.

SUMMARY. This paper lays out how one can revamp FAH calculations to make them considerably more efficient, perhaps by as much as 1000x reduction in the needed computer time. The basic idea is that we use FAH to build a model of the problem in question (a so-called Markovian state model or MSM) and then use the MSM to predict experimental quantities. When using an MSM to make predictions, the question is usually have we done enough computation to make a sufficiently good (precise) prediction. By calculating the uncertainty (precision) on the fly, we can now send FAH clients to the parts of the problem which are uncertainty limiting. We show that this approach can be considerably more efficiently (1000x) than just running with even sampling. This approach is being incorporated into the FAH server code. One exciting ramification of this work is that while MSM’s were originally formulated as a means to use a large distributed cluster (like Folding@home with 300,000 processors) to try to reproduce what a single, hypothetical machine which is 300,000x faster (which doesn’t exist) could do. However, even if that 300,000x faster machine did exist, we show that our approach would be more efficient than a single, long trajectory, suggesting that MSM-based methods should be useful for a very broad set of computer hardware, not just distributed computing platforms.

SUMMARY: This paper describes the first set of results generated using the SMP clients. The main advantage of using SMP for these sorts of calculations is that the amount of computation that one client can do is several times larger than the traditional clients. This means that our simulations can get many times longer that before; in fact, this has allowed us to generate several hundred folding trajectories of the fastest-folding protein known, the HP35-NleNle variant of the villin headpiece subdomain. In this paper, because our simulation time scales compare well to the 700-nanosecond experimental folding time of this protein, AND we’ve generated enough trajectories to get good statistics, we can shed some light on the experimental results. To summarize the result, the first helix of the protein was thought to be highly structured in the unfolded state of the protein; we’ve suggested that structure in this part of the molecule is not enough to lead to fast folding, and that longer time scales than the 700-ns mark may be present in this system.

Check out the movie: it shows some simulation we did for this work, although watching one trajectory is emphatically NOT statistically significant! Some more visualizations of villin from our earlier work can be found on this page.

We have also made the raw data available to researchers on a SimTk.org page. This site includes the raw data, as well as scripts to automate the process and a VMD plugin to allow for browsing of the data. Please contact simbiosfeedback@stanford.edu if you need help with doing this.

ABSTRACT: We have performed molecular dynamics simulations on a set of nine unfolded conformations of the fastest-folding protein yet discovered, a variant of the villin headpiece subdomain (HP-35 NleNle). The simulations were generated using a new distributed computing method, yielding hundreds of trajectories each on a time scale comparable to the experimental folding time, despite the large (10,000 atom) size of the simulation system. This strategy eliminates the need to assume a two-state kinetic model or to build a Markov state model. The relaxation to the folded state at 300 K from the unfolded configurations (generated by simulation at 373 K) was monitored by a method intended to reflect the experimental observable (quenching of tryptophan by histidine). We also monitored the relaxation to the native state by directly comparing structural snapshots with the native state. The rate of relaxation to the native state and the number of resolvable kinetic time scales both depend upon starting structure. Moreover, starting structures with folding rates most similar to experiment show some native-like structure in the N-terminal helix (helix 1) and the phenylalanine residues constituting the hydrophobic core, suggesting that these elements may exist in the experimentally relevant unfolded state. Our large-scale simulation data reveal kinetic complexity not resolved in the experimental data. Based on these findings, we propose additional experiments to further probe the kinetics of villin folding.

SUMMARY: Here, we use molecular molecular-dynamics simulations of lipid vesicle fusion under different lipid compositions to generate a more detailed explanation for how composition controls membrane fusion. We predict that lipid composition affects both the initial process of forming a contact stalk between two vesicles and the formation of a metastable hemifused intermediate. These two roles act in concert to change both the rate of fusion and the level of detectable fusion intermediates. We also present initial results on fusion of vesicles at different membrane curvatures. Recent experimental results suggest that the creation of highly curved membranes is important to fusion of synaptic vesicles. Our simulations cover a curvature regime similar to these experimental systems. In combination with previous results, we predict that the effect of lipid composition on fusion is general across different membrane curvatures, but that the rate of fusion is controlled by both composition and curvature.

ABSTRACT: Membrane fusion is critical to biological processes such as viral infection, endocrine hormone secretion, and neurotransmission, yet the precise mechanistic details of the fusion process remain unknown. Current experimental and computational model systems approximate the complex physiological membrane environment for fusion using one or a few protein and lipid species. Here, we report results of a computational model system for fusion in which the ratio of lipid components was systematically varied, using thousands of simulations of up to a microsecond in length to predict the effects of lipid composition on both fusion kinetics and mechanism. In our simulations, increased phosphatidylcholine content in vesicles causes increased activation energies for formation of the initial stalk-like intermediate for fusion and of hemifusion intermediates, in accordance with previous continuum-mechanics theoretical treatments. We also use our large simulation dataset to quantitatively compare the mechanism by which vesicles fuse at different lipid compositions, showing a significant difference in fusion kinetics and mechanism at different compositions simulated. As physiological membranes have different compositions in the inner and outer leaflets, we examine the effect of such asymmetry, as well as the effect of membrane curvature on fusion. These predicted effects of lipid composition on fusion mechanism both underscore the way in which experimental model system construction may affect the observed mechanism of fusion and illustrate a potential mechanism for cellular regulation of the fusion process by altering membrane composition.

SUMMARY: One challenge in analyzing membrane fusion pathways is simply characterizing the structural intermediates involved. This paper describes use of methods from computational topology and geometry to better measure changes in vesicle structure relevant to fusion.

ABSTRACT:
MOTIVATION: Membrane fusion constitutes a key stage in cellular processes such as synaptic neurotransmission and infection by enveloped viruses. Current experimental assays for fusion have thus far been unable to resolve early fusion events in fine structural detail. We have previously used molecular dynamics simulations to develop mechanistic models of fusion by small lipid vesicles. Here, we introduce a novel structural measurement of vesicle topology and fusion geometry: persistent voids.
RESULTS: Persistent voids calculations enable systematic measurement of structural changes in vesicle fusion by assessing fusion stalk widths. They also constitute a generally applicable technique for assessing lipid topological change. We use persistent voids to compute dynamic relationships between hemifusion neck widening and formation of a full fusion pore in our simulation data. We predict that a tightly coordinated process of hemifusion neck expansion and pore formation is responsible for the rapid vesicle fusion mechanism, while isolated enlargement of the hemifusion diaphragm leads to the formation of a metastable hemifused intermediate. These findings suggest that rapid fusion between small vesicles proceeds via a small hemifusion diaphragm rather than a fully expanded one.

SUMMARY: When proteins fold inside a cell, they are frequently subjected to various amounts of spatial confinement. Specifically, misfolded or unfolded proteins can be encapsulated inside a helper molecule called a chaperonin. These chaperonins are involved with helping proteins fold inside cells. Here we investigate how confinement affects protein folding using a simple model: a fast folding mini-protein confined to a nanopore. We find that if we confine the protein, but allow the surrounding water molecules to pass freely in and out of the nanopore, the protein is more likely to reach the folded state. On the other hand, if we make the nanopore water-tight, we find that the protein is less likely to fold. Specifically it is pushed into a small non-native globule. This suggests that when thinking of folding inside a confined space (like a chaperonin) it is important to remember both protein and water are confined, and this confined water can have an affect on protein folding.

ABSTRACT: Although most experimental and theoretical studies of protein folding involve proteins in vitro, the effects of spatial confinement may complicate protein folding in vivo. In this study, we examine the folding dynamics of villin (a small fast folding protein) with explicit solvent confined to an inert nanopore. We have calculated the probability of folding before unfolding (P fold) under various confinement regimes. Using P fold correlation techniques, we observed two competing effects. Confining protein alone promotes folding by destabilizing the unfolded state. In contrast, confining both protein and solvent gives rise to a solvent-mediated effect that destabilizes the native state. When both protein and solvent are confined we see unfolding to a compact unfolded state different from the unfolded state seen in bulk. Thus, we demonstrate that the confinement of solvent has a significant impact on protein kinetics and thermodynamics. We conclude with a discussion of the implications of these results for folding in confined environments such as the chaperonin cavity in vivo.

SUMMARY: In order to break up calculations to run on Folding@home and then repiece them together in order to act like a single, very, very, very fast computer, we need special algorithms. We are constantly trying to improve our methods in these directions and this paper represents our latest state of the art in this direction.

ABSTRACT: To meet the challenge of modeling the conformational dynamics of biological macromolecules over long timescales, much recent effort has been devoted to constructing stochastic kinetic models, often in the form of discrete-state Markov models, from short molecular dynamics simulations. To construct useful models that faithfully represent dynamics at the timescales of interest, it is necessary to decompose configuration space into a set of kinetically metastable states. Previous attempts to define these states have relied upon either prior knowledge of the slow degrees of freedom or on the application of conformational clustering techniques which assume that conformationally distinct clusters are also kinetically distinct. Here, we present a first version of an automatic algorithm for the discovery of kinetically metastable states that is generally applicable to solvated macromolecules. Given molecular dynamics trajectories initiated from a well-defined starting distribution, the algorithm discovers long-lived, kinetically metastable states through successive iterations of partitioning and aggregating conformation space into kinetically related regions. We apply this method to three peptides in explicit solvent terminally blocked alanine, the engineered 12-residue beta-hairpin trpzip2, and the 21-residue helical Fs peptide to assess its ability to generate physically meaningful states and faithful kinetic models.

SUMMARY: Storage@home is a distributed storage infrastructure
developed to solve the problem of backing up and sharing
petabytes of scientific results using a distributed model of
volunteer managed hosts. Data is maintained by a
mixture of replication and monitoring, with repairs done
as needed.

SUMMARY: We have been applying Folding@home to study the nature of key proteins involved in how flu (the influenza virus) gains access into host cells. This paper reflects our first work in this direction.

ABSTRACT: Massively parallel all-atom, explicit solvent molecular dynamics simulations were used to explore the formation and existence of local structure in two small alpha-helical proteins, the villin headpiece and the helical fragment B of protein A. We report on the existence of transient helices and combinations of helices in the unfolded ensemble, and on the order of formation of helices, which appears to largely agree with previous experimental results. Transient local structure is observed even in the absence of overall native structure. We also calculate sets of residue-residue pairs that are statistically predictive of the formation of given local structures in our simulations.

Some more visualizations of villin from our earlier work can be found on this page.

ABSTRACT: Using distributed molecular dynamics simulations we located 4 distinct folding transitions for a 39 residue beta-beta-alpha-beta protein fold. We introduce and sequentially determine the transmission probability, Ptrans, of 500 conformations along each free energy barrier at room temperature, and determined which conformations were transition state ensemble members (Ptrans ≈ 0.5). We ran similar simulations at 82°C, determined the change in Ptrans with temperature for all 2,000 conformations, and observed Hammond behavior directly using Ptrans correlation. The polymer temperature increase only slightly perturbed the transition probabilities. We propose that diffusion along Ptrans may provide the configurational diffusion rate at the top of the barrier. Specifically, given a transition state conformation x0 with estimated Ptrans = 0.5, we selected a large set of subsequent conformations from independent trajectories, each exactly a small time δt after x0 (250ps). Then we calculated Ptrans for each of the new trial conformations. The P(Ptrans|δt=250ps) distribution reflects diffusion along an ideal kinetic reaction coordinate. This approach provides a novel perspective on the nature of a protein folding transition, and provides a framework for quantitative study of activated relaxation kinetics.

ABSTRACT: We present a technique for biomolecular free energy calculations that exploits highly parallelized sampling to significantly reduce the time to results. The technique combines free energies for multiple, nonoverlapping configurational macrostates and is naturally suited to distributed computing. We describe a methodology that uses this technique with docking, molecular dynamics, and free energy perturbation to compute absolute free energies of binding quickly compared to previous methods. The method does not require a priori knowledge of the binding pose as long as the docking technique used can generate reasonable binding modes. We demonstrate the method on the protein FKBP12 and eight of its inhibitors.

SUMMARY: We have developed a new method which greatly extends Folding@home’s ability to simulate long timescales. This new method (MSM) will be applied to essentially all new Folding@home projects. This paper demonstrates MSM’s applied to a challenging target — the villin headpiece.

ABSTRACT: We report on the use of large-scale distributed computing simulation and novel analysis techniques for examining the dynamics of a small protein. Matters addressed include folding rate, very long timescale kinetics, ensemble properties, and interaction with water. The target system for the study, the villin headpiece, has been of great interest to experimentalists and theorists both. Sampling totaled nearly 500 of the most extensive published to date for a system of villin’s size in explicit solvent with all atom detail and was in the form of tens of thousands of independent molecular dynamics trajectories, each several tens of nanoseconds in length. We report on kinetics sensitivity analyses that, using a set of short simulations, probed the role of water in villin’s folding and sensitivity to the simulation’s electrostatics treatment. By constructing Markovian state models from the collected data, we were able to propagate dynamics to times far beyond those directly simulated and to rapidly compute mean first passage times, long time kinetics (tens of microseconds), and evolution of ensemble property distributions over long times, otherwise currently impossible. We also tested our MSM by using it to predict the structure of villin de novo.

SUMMARY: These first results describe work we’ve been doing to study membrane fusion, the process by which two lipid membranes become one. This process is critical to proper functioning of the cell and also phenomena such as neurotransmission and infection by many viruses. We are seeking to understand how membrane fusion works so that we can eventually manipulate it. We hope such an understanding will lead to the development of new and more effective drugs to combat viral infection and treat neurologic diseases.

ABSTRACT: Lipid membrane fusion is critical to cellular transport and signaling processes such as constitutive secretion, neurotransmitter release, and infection by enveloped viruses. Here, we introduce a powerful computational methodology for simulating membrane fusion from a starting configuration designed to approximate activated prefusion assemblies from neuronal and viral fusion, producing results on a time scale and degree of mechanistic detail not previously possible to our knowledge. We use an approach to the long time scale simulation of fusion by constructing a Markovian state model with large-scale distributed computing, yielding an understanding of fusion mechanisms on time scales previously impossible to simulate to our knowledge. Our simulation data suggest a branched pathway for fusion, in which a common stalk-like intermediate can either rapidly form a fusion pore or remain in a metastable hemifused state that slowly forms fully fused vesicles. This branched reaction pathway provides a mechanistic explanation both for the biphasic fusion kinetics and the stable hemifused intermediates previously observed experimentally. Our distributed computing and Markovian state model approaches provide sufficient sampling to detect rare transitions, a systematic process for analyzing reaction pathways, and the ability to develop quantitative approximations of reaction kinetics for fusion.

SUMMARY: The ability to quantitatively predict electric fields in proteins has remained a great challenge. In this paper, we combine new experimental methods with new theoretical methods made possible by Folding@home distributed computing to greatly push the boundary of what one could previously predict. In particular, we see that a single structure is insufficient to make accurate predictions, suggesting that the ensemble approaches inherent to Folding@home may be important in predicting electrostatics in proteins.

ABSTRACT: The electric fields produced in folded proteins influence nearly every aspect of protein function. We present a vibrational spectroscopy technique that measures changes in electric field at a specific site of a protein as shifts in frequency (Stark shifts) of a calibrated nitrile vibration. A nitrile-containing inhibitor is used to deliver a unique probe vibration to the active site of human aldose reductase, and the response of the nitrile stretch frequency is measured for a series of mutations in the enzyme active site. These shifts yield quantitative information on electric fields that can be directly compared with electrostatics calculations. We show that extensive molecular dynamics simulations and ensemble averaging are required to reproduce the observed changes in field.

SUMMARY: Roughly half of all known cancers involve a mutation in a single protein: p53. P53 serves to protect us from getting cancer; when p53 fails, one often gets cancer. We have developed a new method for predicting how mutations in p53, a protein central to cancer, would impact p53. This new method is naturally suited for distributed computing and can predict several mutations found to date.

ABSTRACT: We have developed a novel computational alanine scanning approach that involves analysis of ensemble unfolding kinetics at high temperature to identify residues that are critical for the stability of a given protein. This approach has been applied to dimerization of the oligomerization domain (residues 326-355) of tumor suppressor p53. As validated by experimental results, our approach has reasonable success in identifying deleterious mutations, including mutations that have been linked to cancer. We discuss a method for determining the effect of mutations on the location of the dimerization transition state.

SUMMARY: Markov State Models (MSM’s) have become a major part of how Folding@home calculations are performed. In particular, the MSM technique is at the heart of how one can divide complex calculations like protein folding or lipid vesicle dynamics on 10,000 to 100,000 CPU’s — i.e. how distributed computing can tackle complex problems. This paper presents a new way to test the validity of MSM’s generated to make sure that the models are suitable and self-consistent.

ABSTRACT: Markov state models are kinetic models built from the dynamics of molecular simulation trajectories by grouping similar configurations into states and examining the transition probabilities between states. Here we present a procedure for validating the underlying Markov assumption in Markov state models based on information theory using Shannon’s entropy. This entropy method is applied to a simple system and is compared with the previous eigenvalue method. The entropy method also provides a way to identify states that are least Markovian, which can then be divided into finer states to improve the model.

SUMMARY: How important are local chemical features of proteins during the folding process? We assess protein folding models with varying degrees of chemical detail to gain an understanding of how they perform relative to some of today’s most sophisticated models.

ABSTRACT: Is an all-atom representation for protein and solvent necessary for simulating protein folding kinetics or can simpler models reproduce the results of more complex models? This question is relevant not just for simulation methodology, but also for the general understanding of the chemical details relevant for protein dynamics. With recent advances in computational methodology, it is now possible to simulate the folding kinetics of small proteins in all-atom detail. Therefore, with both detailed and simplified models of folding in hand, the outstanding questions are what the differences in these models are for the description of protein folding dynamics, and how we can quantitatively compare the folding mechanisms found in the models. To address the outstanding problem of how to determine the differences between folding mechanism in a sensitive and quantitative manner, we suggest a new method to quantify the non-linear correlation in folding commitment probability (Pfold) values. We use this method to probe the differences between a wide range of models for folding simulations, ranging from coarse grained Go models to all-atom models with implicit or explicit solvation. While the differences between less-detailed models (Go and implicit solvation models) and explicit solvation models are large, the differences within various explicit solvation models appear to be small, suggesting that the discrete nature of water may play a role in folding kinetics.

ABSTRACT: In striking contrast to simple polymer physics theory, which does not account for solvent effects, we find that physical confinement of solvated biopolymers decreases solvent entropy, which in turn leads to a reduction in the organized structural content of the polymer. Since our theory is based on a fundamental property of water-protein statistical mechanics, we expect it to have broad implications in many biological and material science contexts.

SUMMARY: How complicated is a helix, and how is the complexity of helical structure affected by the solvent? Here we show, through a novel “computational hydrophobic titration” experiment, that many features of helices can be rationalized and/or explained by considering the interactions along the peptide-solvent interface.

TECHNICAL ABSTRACT: The 21-residue polyalanine-based Fs peptide was studied using thousands of long, explicit solvent, atomistic molecular dynamics simulations which reached equilibrium at the ensemble level. Peptide conformational preference as a function of hydrophobicity was examined using a spectrum of explicit solvent models, and the peptide length dependence of the hydrophilic and hydrophobic components of solvent-accessible surface area for several ideal conformational types was also considered. Our results demonstrate how the character of the solvation interface induces several conformational preferences, including a decrease in mean helical content with increased hydrophilicity, which occurs predominantly through reduced nucleation tendency and, to a lesser extent, destabilization of helical propagation. Interestingly, an opposing effect occurs through increased propensity for 310-helix conformations, as well as increased polyproline structure. Our observations provide a framework for understanding previous reports of conformational preferences in polyalanine-based peptides including (i) terminal 310-helix prominence, (ii) low p-helix propensity, (iii) increased polyproline conformations in short and unfolded peptides, and (iv) membrane helix stability in the presence and absence of water. These observations lend physical insight into the role of water in peptide conformational equilibria at the atomic level, and expand our view of the complexity of even the most “simple” of biopolymers. Whereas previous studies have focused predominantly on hydrophobic effects with respect to tertiary structure, this report highlights the need for consideration of such effects on the secondary structural level.

SUMMARY: In allosteric regulation, protein activity is altered when ligand binding (or unbinding) causes changes in the protein conformation. Little is known about which aspects of the protein architecture are responsible for allosteric regulation, however most of these changes involve collective displacements of atoms (domain and hinge-bending motions) which are likely to occur in the microsecond timescale. Normal mode analysis (NMA) decouples the complex motions and fluctuations of proteins into a linear combination of orthogonal basis vectors, each representing an independent concerted harmonic motion with a characteristic frequency. In principle, it would be a natural basis in which to represent conformational change that involves collective motions of atoms. This paper addresses the limitations of NMA, namely how many normal modes are necessary to achieve a certain degree of accuracy in the representation.

TECHNICAL ABSTRACT: We suggest a simple method to assess how many normal modes are needed to map a conformational change. By projecting the conformational change onto a subspace of the normal mode vectors and, using RMSD as a test of accuracy, we find that the first 20 modes only contribute 50% or less of the total conformational change in four test cases (myosin, calmodulin, NtrC, and hemoglobin). In some allosteric systems, like the molecular switch NtrC, the conformational change is localized to a limited number of residues. We find that many more modes are necessary to accurately map this collective displacement. In addition, the normal mode spectra can provide useful information about the details of the conformational change, especially when comparing structures with different bound ligands, in this case, calmodulin. Indeed, this approach presents normal mode analysis as a useful basis in which to capture the mechanism of conformational change, and shows that the number of normal modes needed to capture the essential collective motions of atoms should be chosen according to the required accuracy.

SUMMARY: Direct comparisons are made between Folding@home simulations and experimental measurements (SAXS) to determine molecular size of helical peptides of varying length, revealing the compact nature of such helical peptides.

TECHNICAL ABSTRACT: Using synchrotron radiation and the small-angle X-ray scattering technique we have measured the radii of gyration of a series of alaninebased a-helix-forming peptides of the composition Ace-(AAKAA)n-GYNH2, nZ2-7, in aqueous solvent at 10C. In contrast to other techniques typically used to study a-helices in isolation (such as nuclear magnetic resonance and circular dichroism), small-angle X-ray scattering reports on the global structure of a molecule and, as such, provides complementary information to these other, more sequence-local measuring techniques. The radii of gyration that we measure are, except for the 12-mer, lower than the radii of gyration of ideal a-helices or helices with frayed ends of the equivalent sequence-length. For example, the measured radius of gyration of the 37-mer is 14.2 A , which is to be compared with the radius of gyration of an ideal 37-mer a-helix of 17.6 A . Attempts are made to analyze the origin of this discrepancy in terms of the analytical Zimm-Bragg-Nagai (ZBN) theory, as well as distributed computing explicit solvent molecular dynamics simulations using two variants of the AMBER force-field. The ZBN theory, which treats helices as cylinders connected by random walk segments, predicts markedly larger radii of gyration than those measured. This is true even when the persistence length of the random walk parts is taken to be extremely short (about one residue). Similarly, the molecular dynamics simulations, at the level of sampling available to us, give inaccurate values of the radii of gyration of the molecules (by overestimating them by around 25% for longer peptides) and/or their helical content. We conclude that even at the short sequences examined here (%37 amino acid residues), these a-helical peptides behave as fluctuating semi-broken rods rather than straight cylinders with frayed ends.

SUMMARY: We validate the new Markovian State Model (MSM) for describing protein dynamics, and show how to efficiently calculate how accurate these models are. We also describe how to start new FAH simulations to best improve the accuracy of the model.

TECHNICAL ABSTRACT: In previous work, we described a Markovian state model (MSM) for analyzing molecular-dynamics trajectories, which involved grouping conformations into states and estimating the transition probabilities between states. In this paper, we analyze the errors in this model caused by finite sampling. We give different methods with various approximations to determine the precision of the reported mean first passage times. These approximations are validated on an 87 state toy Markovian system. In addition, we propose an efficient and practical sampling algorithm that uses these error calculations to build a MSM that has the same precision in mean first passage time values but requires an order of magnitude fewer samples. We also show how these methods can be scaled to large systems using sparse matrix methods.

SUMMARY: Drug design calculations are generally very difficult. Here we show that calculations made previously on the Folding@home network are possible on a much smaller supercomputer system without loss of numerical precision.

TECHNICAL ABSTRACT: Direct calculations of the absolute binding free energies for eight FKBP ligands were performed using the Fujitsu BioServer massively parallel computer. Using latest version of the general AMBER force field (GAFF) for ligand model parameters and the Bennett acceptance ratio for computing free energy differences, we obtained an excellent linear fit between the calculated and experimental binding free energies. The RMS error from a linear fit is 0.4 kcal/mol for eight ligand complexes. In comparison with a previous study of the binding energies of these same eight ligand complexes, these results suggest that the use of improved model parameters can lead to more predictive binding estimates, and that these estimates can be obtained with significantly less computer time than previously thought. These findings make such direct methods more attractive for use in rational drug design.

SUMMARY: Simulation of the collagen triple helix has been given less attention than more common protein “folds.” Here we present newly derived parameters for such simulations to gain better agreement with experimental data, and thereby offering insight into the stability of the triple helix structure.

TECHNICAL ABSTRACT: Recently, the importance of proline ring pucker conformations in collagen has been suggested in the context of hydroxylation of prolines. The previous molecular mechanics parameters for hydroxyproline, however, do not reproduce the correct pucker preference. We have developed a new set of parameters that reproduces the correct pucker preference. Our molecular dynamics simulations of proline and hydroxyproline monomers as well as collagen-like peptides, using the new parameters, support the theory that the role of hydroxylation in collagen is to stabilize the triple helix by adjusting to the right pucker conformation (and thus the right f angle) in the Y position.

SUMMARY: We test new methods for free energy calculations — relevant for our computational drug design methodology. We find that the BAR method we previously investigated is significantly better than methods commonly employed. We have already gotten a lot of positive feedback about this work from others in the field, as they have been starting to use the results of this work to improve their calculations as well.

TECHNICAL ABSTRACT: Recent work has demonstrated the Bennett acceptance ratio method is the best asymptotically unbiased method for determining the equilibrium free energy between two end states given work distributions collected from either equilibrium and non-equilibrium data. However, it is still not clear what the practical advantage of this acceptance ratio method is over other common methods in atomistic simulations. In this study, we first review theoretical estimates of the bias and variance of exponential averaging (EXP), thermodynamic integration (TI), and the Bennett acceptance ratios (BAR). In the process, we present a new simple scheme for computing the variance and bias of many estimators, and demonstrate the connections between BAR and the weighted histogram analysis method. Next, a series of analytically solvable toy problems is examined to shed more light on the relative performance in terms of the bias and efficiency of these three methods. Interestingly, it is impossible to conclusively identify a best method for calculating the free energy, as each of the three methods performs more efficiently than the others in at least one situation examined in these toy problems. Finally, sample problems of the insertion/deletion of both a Lennard-Jones particle and a much larger molecule in TIP3P water are examined by these three methods. In all tests of atomistic systems, free energies obtained with BAR have significantly lower bias and smaller variance than when using EXP or TI, especially when the overlap in phase space between end states is small. For example, BAR can extract as much information from multiple fast, far-from-equilibrium simulations as from fewer simulations near equilibrium, which EXP cannot. Although TI and sometimes even EXP can be somewhat more efficient in idealized toy problems, in the realistic atomistic situations tested in this paper, BAR is significantly more efficient than all other methods.

SUMMARY: This paper is a test of our methods for free energy calculation — critical to our computational drug design methodology. We achieve a higher level of accuracy and precision than before. Moreover, our recent research in computational efficiency of free energy methods allows us to perform simulations on a local cluster that previously required large scale distributed computing, performing four times as much computational work in approximately a tenth of the computer time as a similar study a year ago.

TECHNICAL ABSTRACT: Quantitative free energy computation involves both using a model that is sufficiently faithful to the experimental system under study (accuracy) and establishing statistically meaningful measures of the uncertainties resulting from finite sampling (precision). In order to examine the accuracy of a range of common water models used for protein simulation for their solute/solvent properties, we calculate the free energy of hydration of 15 amino acid side chain analogs derived from the OPLS-AA parameter set with the TIP3P, TIP4P, SPC, SPC/E, TIP3P-MOD, and TIP4P-Ew water models. We achieve a high degree of statistical precision in our simulations, obtaining uncertainties for the free energy of hydration of 0.02-0.06 kcal/mol, equivalent to that obtained in experimental hydration free energy measurements of the same molecules. We find that TIP3P-MOD, a model designed to give improved free energy of hydration for methane, gives uniformly the closest match to experiment; we also find that the ability to accurately model pure water properties does not necessarily predict ability to predict solute/solvent behavior. We also evaluate the free energies of a number of novel modifications of TIP3P designed as a proof of concept that it is possible to obtain much better solute/solvent free energetic behavior without substantially negatively affecting pure water properties. We decrease the average error to zero while reducing the rms error below that of any of the published water models, with measured liquid water properties remaining almost constant with respect to our perturbations. This demonstrates there is still both room for improvement within current fixed-charge biomolecular force fields and significant parameter flexibility to make these improvements. Recent research in computational efficiency of free energy methods allows us to perform simulations on a local cluster that previously required large scale distributed computing, performing four times as much computational work in approximately a tenth of the computer time as a similar study a year ago.

SUMMARY: Here, we lay out some of the first applications of a new method for future FAH calculations. This new method, Markovian State Models (MSM), allows FAH to solve some important limitations of previous methods. Since these limitations are most relevant for larger and more complex systems than what has been done in FAH so far, this does not affect the work in the past. However, it lays the foundation for FAH to tackle even more complex and challenging problems.

TECHNICAL ABSTRACT: In this article, we analyze the folding dynamics of an all-atom model of a polyphenylacetylene (pPA) 12-mer in explicit solvent for four common organic and aqueous solvents: acetonitrile,chloroform, methanol, and water. The solvent quality has a dramatic effect on the time scales in which pPA 12-mers fold. Acetonitrile was found to manifest ideal folding conditions as suggested by optimal folding times on the order of ~100-200 ns, depending on temperature. In contrast,
chloroform and water were observed to hinder the folding of the pPA 12-mer due to extreme solvation conditions relative to acetonitrile; chloroform denatures the oligomer, whereas water promotes aggregation and traps. The pPA 12-mer in a pure methanol solution folded in ~400 ns at 300 K, compared relative to the experimental 12-mer folding time of ~160 ns measured in a 1:1 v/v THF/methanol solution. Requisite in drawing the aforementioned conclusions, analysis techniques based on Markov state models are applied to multiple short independent trajectories to extrapolate the long-time scale dynamics of the 12-mer in each respective solvent. We review the theory of
Markov chains and derive a method to impose detailed balance on a transition probability matrix computed from simulation data.

SUMMARY: Here, we lay out some new methodology for simulation for future FAH calculations. This new method, Markovian State Models (MSM), allows FAH to solve some important limitations of previous methods. Since these limitations are most relevant for larger and more complex systems than what has been done in FAH so far, this does not affect the work in the past. However, it lays the foundation for FAH to tackle even more complex and challenging problems.

TECHNICAL ABSTRACT: The structural landscape of poly-phenylacetylene (pPA), otherwise known as m-phenylene ethynylene oligomers, has been shown to consist of a very diverse set of conformations, including helices, turns, and knots. Defining a state space decomposition to classify these conformations into easily identifiable states is an important step in understanding the dynamics in relation to Markov state models. We define the state decomposition of pPA oligomers in terms of the sequence of discretized dihedral angles between adjacent phenyl rings along the oligomer backbone. Furthermore, we derive in mathematical detail an approach to further reduce the number of states by grouping symmetrically equivalent states into a single parent state. A more challenging problem requires a formal definition for knotted states in the structural landscape. Assuming that the oligomer chain can only cross the ideal helix path once, we propose a technique to define a knotted state derived from a helical state determined by the position along the helical nucleus where the chain crosses the ideal helix path. Several examples of helical states and knotted states from the pPA 12-mer illustrate the principles outlined in this article.

SUMMARY: This study probes the structural character of a small peptide using experiment and simulation. It highlights the differences between global and local structural information, suggesting a new model for PPII conformational character, which is thought to be dominant in the unfolded state of proteins.

TECHNICAL ABSTRACT: Polyproline type II (PPII) helix has emerged recently as the dominant paradigm for describing the conformation of unfolded polypeptides. However, most experimental observables used to characterize unfolded proteins typically provide only short-range, sequence-local structural information that is both time- and ensemble- averaged, giving limited detail about the long-range structure of the chain. Here, we report a study of a long-range property: the radius of gyration of an alanine-based peptide, Ace-(diaminobutyric acid)2-(Ala)7-(ornithine)2-NH2. This molecule has previously been studied as a model for the unfolded state of proteins under folding conditions and is believed to adopt a PPII fold based on short-range techniques such as NMR and CD. By using synchrotron radiation and small-angle x-ray scattering, we have determined the radius of gyration of this peptide to be 7.4(+/-0.5), which is significantly less than the value expected from an ideal PPII helix in solution (13.1). To further study this contradiction, we have used molecular dynamics simulations using six variants of the AMBER force field and the GROMOS 53A6 force field. However, in all cases, the simulated ensembles underestimate the PPII content while overestimating the experimental radius of gyration. The conformational model that we propose, based on our small angle x-ray scattering results and what is known about this molecule from before, is that of a very flexible, fluctuating structure that on the level of individual residues explores a wide basin around the ideal PPII geometry but is never, or only rarely, in the ideal extended PPII helical conformation.

SUMMARY: Rather than reporting new data from the Folding@home project, this review article offers an in-depth look at the current state-of-the-art in simulation-based prediction. This includes work by our group and others in the field, including many computational models and methods of extracting information that can be directly compared to experiment.

TECHNICAL ABSTRACT: Simulation of protein folding has come a long way in five years. Notably, new quantitative comparisons with experiments for small, rapidly folding proteins have become possible. As the only way to validate simulation methodology, this achievement marks a significant advance. Here, we detail these recent achievements and ask whether simulations have indeed rendered quantitative predictions in several areas, including protein folding kinetics, thermodynamics, and physics-based methods for structure prediction. We conclude by looking to the future of such comparisons between simulations and experiments.

SUMMARY: How do the results of peptide simulations change with slight variations to the models employed? Here we answer this question with respect to very local changes in the energetics of the polymer, demonstrating the sensitivity of simulated bulk (i.e. ensemble averaged) structural equilibrium on the parameters of the model.

TECHNICAL ABSTRACT: The kinetic and thermodynamic aspects of the helix-coil transition in polyalanine-based peptides have been studied at the ensemble level using a distributed computing network. This study builds on a previous report, which critically assessed the performance of several contemporary force fields in reproducing experimental measurements and elucidated the complex nature of helix-coil systems. Here we consider the effects of modifying backbone torsions and the scaling of noncovalent interactions. Although these elements determine the potential of mean force between atoms separated by three covalent bonds (and thus largely determine the local conformational distributions observed in simulation), we demonstrate that the interplay between these factors is both complex and force field dependent. We quantitatively assess the heliophilicity of several helix-stabilizing potentials as well as the changes in heliophilicity resulting from such modifications, which can “make or break” the accuracy of a given force field, and our findings suggests that future force field development may need to better consider effect that vary with peptide length. This report also serves as an example of the utility of distributed computing in analyzing and improving upon contemporary force fields at the level of absolute ensemble equilibrium, the next step in force field development.

SUMMARY: How good are our models for folding? This question is important to address in order to understand the usefulness of our work, as well as the work of everyone in the atomistic simulation field in general. Here, we’ve done extremely extensive tests of models used in folding to show their strengths and weaknesses. Based on their weaknesses, we have proposed a new model which appears to have a much stronger agreement with experiment.

TECHNICAL ABSTRACT: The ensemble folding of two 21-residue a-helical peptides has been studied using all-atom simulations under several variants of the AMBER potential in explicit solvent using a global distributed computing network. Our extensive sampling, orders of magnitude greater than the experimental folding time, results in complete convergence to ensemble equilibrium. This allows for a quantitative assessment of these potentials, including a new variant of the AMBER-99 force field, denoted AMBER-99f, which shows improved agreement with experimental kinetic and thermodynamic measurements. From bulk analysis of the simulated AMBER-99f equilibrium, we find that the folding landscape is pseudo-two-state, with complexity arising from the broad, shallow character of the ‘native’ and ‘unfolded’ regions of the phase space. Each of these macrostates allows for configurational diffusion among a diverse ensemble of conformational microstates with greatly varying helical content and molecular size. Indeed, the observed structural dynamics are better represented as a conformational diffusion than as a simple exponential process, and equilibrium transition rates spanning several orders of magnitude are reported. After multiple nucleation steps, on average, helix formation proceeds via a kinetic “alignment” phase in which two or more short, low-entropy helical segments form a more ideal, single-helix structure.

SUMMARY: While previous studies on the folding of nucleic acid hairpins have employed simplified models of either the nucleic acid or the solvent, this paper reports the first such study using an explicit treatment of the surrounding water and counterions. We show that accounting for water molecules in this manner is necessary to most accurately characterize the energetics of hairpin folding, whereas monovalent ions appear to play only a background role.

TECHNICAL ABSTRACT: Nucleic acid structure and dynamics are known to be closely coupled to local environmental conditions and, in particular, to the ionic character of the solvent. Here we consider what role the discrete properties of water and ions play in the collapse and folding of small nucleic acids. We study the folding of an experimentally well-characterized RNA hairpin-loop motif (sequence 5′-GGGC[GCAA]GCCU-3′) via ensemble molecular dynamics simulation and, with nearly 500 of aggregate simulation time using an explicit representation of the ionic solvent, report successful ensemble folding simulations, with a predicted folding time of 8.8(2.0)s, in agreement with experimental measurements of ~10s. Comparing our results to previous folding simulations using the GB/SA continuum solvent model shows that accounting for water-mediated interactions is necessary to accurately characterize the free energy surface and stochastic nature of folding. The formation of secondary structure appears to be more rapid than the fastest ionic degrees of freedom, and counterions do not participate discretely in observed folding events. We find that hydrophobic collapse follows a predominantly expulsive mechanism in which a diffusion-search of early structural compaction is followed by final formation of native structure that occurs in tandem with solvent evacuation.

SUMMARY: Roughly half of all known cancers result from mutations in p53. Our first work in the cancer area examines the tetramerization domain of p53. We predict how p53 folds and in doing so, we can predict which amino acid mutations would be relevant. When compared with experiments, our predictions have appeared to agree with experiment and give a new interpretation to existing data.

TECHNICAL ABSTRACT: Dimerization of the p53 oligomerization domain involves coupled folding and binding of monomers. To examine the dimerization, we have performed molecular dynamics (MD) simulations of dimer folding from the rate-limiting transition state ensemble (TSE). Among 799 putative transition state structures that were selected from a large ensemble of high-temperature unfolding trajectories, 129 were identified as members of the TSE via calculation of a 50% transmission coefficient from at least 20 room-temperature simulations. This study is the first to examine the refolding of a protein dimer using MD simulations in explicit water, revealing a folding nucleus for dimerization. Our atomistic simulations are consistent with experiment and offer insight that was previously unobtainable.

SUMMARY: How can Folding@home use thousands to millions of CPUs to efficiently simulate long timescale biomolecular dynamics? This paper outlines the “Markovian State Model” method which is the foundation of how most new Folding@home calculations are performed. The MSM method allows for a very efficient use of uncoupled simulations, as one would easily get from distributed computing.

TECHNICAL ABSTRACT: We propose an efficient method for the prediction of protein folding rate constants and mechanisms. We use molecular dynamics simulation data to build Markovian state models (MSMs), discrete representations of the pathways sampled. Using these MSMs, we can quickly calculate the folding probability (Pfold) and mean first passage time of all the sampled points. In addition, we provide techniques for evaluating these values under perturbed conditions without expensive recomputations. To demonstrate this method on a challenging system, we apply these techniques to a two-dimensional model energy landscape and the folding of a tryptophan zipper beta hairpin.

ABSTRACT: There are many unresolved questions regarding the role of water in protein folding. Does water merely induce hydrophobic forces, or does the discrete nature of water play a structural role in folding? Are the nonadditive aspects of water important in determining the folding mechanism? To help to address these questions, we have performed simulations of the folding of a model protein (BBA5) in explicit solvent. Starting 10,000 independent trajectories from a fully unfolded conformation, we have observed numerous folding events, making this work a comprehensive study of the kinetics of protein folding starting from the unfolded state and reaching the folded state and with an explicit solvation model and experimentally validated rates. Indeed, both the raw TIP3P folding rate (4.5 +/- 2.5s) and the diffusion-constant corrected rate (7.5 +/- 4.2s) are in strong agreement with the experimentally observed rate of 7.5 +/- 3.5s. To address the role of water in folding, the mechanism is compared with that predicted from implicit solvation simulations. An examination of solvent density near hydrophobic groups during folding suggests that in the case of BBA5, there are water-induced effects not captured by implicit solvation models, including signs of a concurrent mechanism of core collapse and desolvation.

ABSTRACT: We studied the microsecond folding dynamics of three hairpins (Trp zippers 1-3, TZ1-TZ3) by using temperature-jump fluorescence and atomistic molecular dynamics in implicit solvent. In addition, we studied TZ2 by using time-resolved IR spectroscopy. By using distributed computing, we obtained an aggregate simulation time of 22 ms. The simulations included 150, 212, and 48 folding events at room temperature for TZ1, TZ2, and TZ3, respectively. The all-atom optimized potentials for liquid simulations (OPLSaa) potential set predicted TZ1 and TZ2 properties well; the estimated folding rates agreed with the experimentally determined folding rates and native conformations were the global potential-energy minimum. The simulations also predicted reasonable unfolding activation enthalpies. This work, directly comparing large simulated folding ensembles with multiple spectroscopic probes, revealed both the surprising predictive ability of current models as well as their shortcomings. Specifically, for TZ1-TZ3, OPLS for united atom models had a nonnative free-energy minimum, and the folding rate for OPLSaa TZ3 was sensitive to the initial conformation. Finally, we characterized the transition state; all TZs fold by means of similar, native-like transition-state conformations.

ABSTRACT: Recent studies in protein folding suggest that native state topology plays a dominant role in determining the folding mechanism, yet an analogous statement has not been made for RNA, most likely due to the strong coupling between the ionic environment and conformational energetics that make RNA folding more complex than protein folding. Applying a distributed computing architecture to sample nearly 5000 complete tRNA folding events using a minimalist, atomistic model, we have characterized the role of native topology in tRNA folding dynamics: the simulated bulk folding behavior predicts well the experimentally observed folding mechanism. In contrast, single-molecule folding events display multiple discrete folding transitions and compose a largely diverse, heterogeneous dynamic ensemble. This both supports an emerging view of heterogeneous folding dynamics at the microscopic level and highlights the need for single-molecule experiments and both single-molecule and bulk simulations in interpreting bulk experimental measurements.

ABSTRACT: Recently, we have proposed that, on average, the structure of the unfolded state of small, mostly alpha-helical proteins may be similar to the native structure (the ‘mean-structure’ hypothesis). After examining thousands of simulations of both the folded and the unfolded states of five polypeptides in atomistic detail at room temperature, we report here a result that seems at odds with the mean-structure hypothesis. Specifically, the average inter-residue distances in the collapsed unfolded structures agree well with the statistics of the ideal random-flight chain with link length of 3.8 (the length of one amino acid). A possible resolution of this apparent contradiction is offered by the observation that the inter-residue distances in a typical alpha-helix over short stretches are close to the average distances in an ideal random-flight chain.

ABSTRACT: We present a maximum likelihood argument for the Bennett acceptance ratio method, and derive a simple formula for the variance of free energy estimates generated using this method. This derivation of the acceptance ratio method, using a form of logistic regression, a common statistical technique, allows us to shed additional light on the underlying physical and statistical properties of the method. For example, we demonstrate that the acceptance ratio method yields the lowest variance for any estimator of the free energy which is unbiased in the limit of large numbers of measurements.

ABSTRACT: Quantitative free energy computation involves both using a model that is sufficiently faithful to the experimental system under study (accuracy) and establishing statistically meaningful measures of the uncertainties resulting from finite sampling (precision). We use large-scale distributed computing to access sufficient computational resources to extensively sample molecular systems and thus reduce statistical uncertainty of measured free energies. In order to examine the accuracy of a range of common models used for protein simulation, we calculate the free energy of hydration of 15 amino acid side chain analogs derived from recent versions of the OPLS-AA, CHARMM, and AMBER parameter sets in TIP3P water using thermodynamic integration. We achieve a high degree of statistical precision in our simulations, obtaining uncertainties for the free energy of hydration of 0.02-0.05 kcal/mol, which are in general an order of magnitude smaller than those found in other studies. Notably, this level of precision is comparable to that obtained in experimental hydration free energy measurements of the same molecules. Root mean square differences from experiment over the set of molecules examined using AMBER-, CHARMM-, and OPLS-AA-derived parameters were 1.35 kcal/mol, 1.31 kcal/mol, and 0.85 kcal/mol, respectively. Under the simulation conditions used, these force fields tend to uniformly underestimate solubility of all the side chain analogs. The relative free energies of hydration between amino acid side chain analogs were closer to experiment but still exhibited significant deviations. Although extensive computational resources may be needed for large numbers of molecules, sufficient computational resources to calculate precise free energy calculations for small molecules are accessible to most researchers.

ABSTRACT: By using distributed computing techniques and a supercluster of more than 20,000 processors we simulated folding of a 20-residue Trp Cage miniprotein in atomistic detail with implicit GB/SA solvent at a variety of solvent viscosities (g). This allowed us to analyze the dependence of folding rates on viscosity. In particular, we focused on the low-viscosity regime (values below the viscosity of water). In accordance with Kramers’ theory, we observe approximately linear dependence of the folding rate on 1/g for values from 1-10^(-1) that of water viscosity. However, for the regime between 10^(-4) – 10^(-1) that of water viscosity we observe power-law dependence of the form k ~ g^(-1/5). These results suggest that estimating folding rates from molecular simulations run at low viscosity under the assumption of linear dependence of rate on inverse viscosity may lead to erroneous results.

ABSTRACT: The helical hairpin is one of the most ubiquitous and elementary secondary structural motifs in nucleic acids, capable of serving functional roles and participating in long-range tertiary contacts. Yet the self-assembly of these structures has not been well-characterized at the atomic level. With this in mind, the dynamics of nucleic acid hairpin formation and disruption have been studied using a novel computational tool: large-scale, parallel, atomistic molecular dynamics simulation employing an inhomogeneous distributed computer consisting of more than 40,000 processors. Using multiple methodologies, over 500 ms of atomistic simulation time has been collected for a large ensemble of hairpins (sequence 5′- GGGC[GCAA]GCCU-3′), allowing characterization of rare events not previously observable in simulation. From uncoupled ensemble dynamics simulations in unperturbed folding conditions, we report on 1), competing pathways between the folded and unfolded regions of the conformational space; 2), observed non-native stacking and basepairing traps; and 3), a helix unwinding-rewinding mode that is differentiated from the unfolding and folding dynamics. A heterogeneous transition state ensemble is characterized structurally through calculations of conformer-specific folding probabilities and a multiplexed replica exchange stochastic dynamics algorithm is used to derive an approximate folding landscape. A comparison between the observed folding mechanism and that of a peptide b-hairpin analog suggests that although native topology defines the character of the folding landscape, the statistical weighting of potential folding pathways is determined by the chemical nature of the polymer.

ABSTRACT: Simulating protein folding thermodynamics starting purely from a protein sequence is a grand challenge of computational biology. Here, we present an algorithm to calculate a canonical distribution from molecular dynamics simulation of protein folding. This algorithm is based on the replica exchange method where the kinetic trapping problem is overcome by exchanging noninteracting replicas simulated at different temperatures. Our algorithm uses multiplexed-replicas with a number of independent molecular dynamics runs at each temperature. Exchanges of configurations between these multiplexed-replicas are also tried, rendering the algorithm applicable to large-scale distributed computing (i.e., highly heterogeneous parallel computers with processors having different computational power). We demonstrate the enhanced sampling of this algorithm by simulating the folding thermodynamics of a 23 amino acid miniprotein. We show that better convergence is achieved compared to constant temperature molecular dynamics simulation, with an effcient scaling to large number of computer processors. Indeed, this enhanced sampling results in (to our knowledge) the first example of a replica exchange algorithm that samples a folded structure starting from a completely unfolded state.

ABSTRACT: A number of rapidly folding proteins have been characterized in recent years.1 These small proteins can provide the first direct comparisons between simulated and experimental protein folding kinetics and pathways. Proteins have been characterized through thermodynamic sampling methods, unfolding simulations, and folding simulations using simple potentials. Here, as described recently, we use several thousand stochastic dynamics simulations in a generalized-Born implicit solvent (in atomic detail) to simulate the folding dynamics of the Trp cage mini-protein under experimental conditions (27 °C with full solvent viscosity,) 91 ps-1). The Folding@home distributed computing project was used to generate an aggregate simulation time of ~100 us (~250 CPU years). First we capture the rapid relaxation from an extended starting condition to a relaxed unfolded state ensemble of thousands of conformations. With continued simulation, a small fraction of these simulations reach the folded state. Furthermore, the topology of the collapsed unfolded state closely resembles the native state.

ABSTRACT: Protein folding is difficult to simulate with classical molecular dynamics. Secondary structure motifs such as -helices and -hairpins can form in 0.1-10 (ref. 1), whereas small proteins have been shown to fold completely in tens of microseconds. The longest folding simulation to date is a single 1- s simulation of the villin headpiece; however, such single runs may miss many features of the folding process as it is a heterogeneous reaction involving an ensemble of transition states. Here, we have used a distributed computing implementation to produce tens of thousands of 5-20-ns trajectories (700s) to simulate mutants of the designed mini-protein BBA5. The fast relaxation dynamics these predict were compared with the results of laser temperature-jump experiments. Our computational predictions are in excellent agreement with the experimentally determined mean folding times and equilibrium constants. The rapid folding of BBA5 is due to the swift formation of secondary structure. The convergence of experimentally and computationally accessible timescales will allow the comparison of absolute quantities characterizing in vitro and in silico (computed) protein folding.

ABSTRACT: The nature of the unfolded state plays a great role in our understanding of proteins. However, accurately studying the unfolded state with computer simulation is difficult, due to its complexity and the great deal of sampling required. Using a supercluster of over 10,000 processors we have performed close to 800 ms of molecular dynamics simulation in atomistic detail of the folded and unfolded states of three polypeptides from a range of structural classes: the all-alpha villin headpiece molecule, the beta hairpin tryptophan zipper, and a designed alpha-beta zinc finger mimic. A comparison between the folded and the unfolded ensembles reveals that, even though virtually none of the individual members of the unfolded ensemble exhibits native-like features, the mean unfolded structure (averaged over the entire unfolded ensemble) has a native-like geometry. This suggests several novel implications for protein folding and structure prediction as well as new interpretations for experiments which find structure in ensemble-averaged measurements.

ABSTRACT: By employing thousands of PCs and new worldwide-distributed computing techniques, we have simulated in atomistic detail the folding of a fastfolding 36-residue a-helical protein from the villin headpiece. The total simulated time exceeds 300 ms, orders of magnitude more than previous simulations of a molecule of this size. Starting from an extended state, we obtained an ensemble of folded structures, which is on average 1.7 and 1.9 away from the native state in Ca distance-based root-meansquare deviation (dRMS) and Cb dRMS sense, respectively. The folding mechanism of villin is most consistent with the hydrophobic collapse view of folding: the molecule collapses non-specifically very quickly (20 ns), which greatly reduces the size of the conformational space that needs to be explored in search of the native state. The conformational search in the collapsed state appears to be rate-limited by the formation of the aromatic core: in a significant fraction of our simulations, the C-terminal phenylalanine residue packs improperly with the rest of the hydrophobic core. We suggest that the breaking of this interaction may be the rate-determining step in the course of folding. On the basis of our simulations we estimate the folding rate of villin to be approximately 5 ms. By analyzing the average features of the folded ensemble obtained by simulation, we see that the mean folded structure is more similar to the native fold than any individual folded structure. This finding highlights the need for simulating ensembles of molecules and averaging the results in an experiment-like fashion if meaningful comparison between simulation and experiment is to be attempted. Moreover, our results demonstrate that (1) the computational methodology exists to simulate the multi-microsecond regime using distributed computing and (2) that potential sets used to describe interatomic interactions may be sufficiently accurate to reach the folded state, at least for small proteins. We conclude with a comparison between our results and current protein-folding theory.

ABSTRACT: For decades, researchers have been applying computer simulation to address problems in biology. However, many of these grand challenges in computational biology, such as simulating how proteins fold, remained unsolved due to their great complexity. Indeed, even to simulate the fastest folding protein would require decades on the fastest modern CPUs. Here, we review novel methods to fundamentally speed such previously intractable problems using a new computational paradigm: distributed computing. By efficiently harnessing tens of thousands of computers throughout the world, we have been able to break previous computational barriers. However, distributed computing brings new challenges, such as how to efficiently divide a complex calculation of many PCs that are connected by relatively slow networking. Moreover, even if the challenge of accurately reproducing reality can be conquered, a new challenge emerges: how can we take the results of these simulations (typically tens to hundreds of gigabytes of raw data) and gain some insight into the questions at hand. This challenge of the analysis of the sea of data resulting from large-scale simulation will likely remain for decades to come.

ABSTRACT: Atomistic simulations of protein folding have the potential to be a great complement to experimental studies, but have been severely limited by the time scales accessible with current computer hardware and algorithms. By employing a worldwide distributed computing network of tens of thousands of PCs and algorithms designed to efficiently utilize this new many-processor, highly heterogeneous, loosely coupled distributed computing paradigm, we have been able to simulate hundreds of microseconds of atomistic molecular dynamics. This has allowed us to directly simulate the folding mechanism and to accurately predict the folding rate of several fast-folding proteins and polymers, including a nonbiological helix, polypeptide a-helices, a b-hairpin, and a three-helix bundle protein from the villin headpiece. Our results demonstrate that one can reach the time scales needed to simulate fast folding using distributed computing, and that potential sets used to describe interatomic interactions are sufficiently accurate to reach the folded state with experimentally validated rates, at least for small proteins.

ABSTRACT: We have used distributed computing techniques and a supercluster of thousands of computer processors to study folding of the C-terminal b-hairpin from protein G in atomistic detail using the GB/SA implicit solvent model at 300 K. We have simulated a total of nearly 38 ms of folding time and obtained eight complete and independent folding trajectories. Starting from an extended state, we observe relaxation to an unfolded state characterized by non-specific, temporary hydrogen bonding. This is followed by the appearance of interactions between hydrophobic residues that stabilize a bent intermediate. Final formation of the complete hydrophobic core occurs cooperatively at the same time that the final hydrogen bonding pattern appears. The folded hairpin structures we observe all contain a closely packed hydrophobic core and proper b-sheet backbone dihedral angles, but they differ in backbone hydrogen bonding pattern. We show that this is consistent with the existing experimental data on the hairpin alone in solution. Our analysis also reveals short-lived semi-helical intermediates which denote a thermodynamic trap. Our results are consistent with a three-state mechanism with a single rate-limiting step in which a varying final hydrogen bond pattern is apparent, and semi-helical off-pathway intermediates may appear early in the folding process. We include details of the ensemble dynamics methodology and a discussion of our achievements using this new computational
device for studying dynamics at the atomic level.

ABSTRACT: A set of parallel replicas of a single simulation can be statistically coupled to closely approximate long trajectories. In many cases, this produces nearly linear speedup over a single simulation (M times faster with M simulations), rendering previously intractable problems within reach of large computer clusters. Interestingly, by varying the coupling of the parallel simulations, it is possible in some systems to obtain greater than linear speedup. The methods are generalizable to any search algorithm with long residence times in intermediate states.

Summary: Is distributed computing a fundamental advance or simply fashionable computing? In this brief letter, we show how distributed computing can be used to tackle problems which make even supercomputers quake. Indeed, we show how distributed computing has the ability to create a supercomputer thousands of times more powerful than any existing machine, due the large number of processors on the internet (hundreds of millions) and the relatively small number of computer processors in supercomputers (thousands).