Computing Levinthal’s Paradox: Protein Folding, Part 2

Author: Sarah Kearns

Editors: David Mertz, Zuleirys Santana Rodriguez, and Scott Barolo

In a previous post, we discussed how proteins fold into unique shapes that allow them to perform their biological functions. Through many physical and chemical properties, like hydrogen bonding and hydrophobicity, proteins are able to fold correctly. However, proteins can fold improperly, and sometimes these malformed peptides aggregate, leading to diseases like Alzheimer’s.

How can we figure out when the folding process goes wrong? Can we use computers to figure out the folding/misfolding process and develop methods to prevent or undo the damage done by protein aggregates?

Levinthal’s Paradox

In the late 1960s, a scientist named Cyrus Levinthal noted that protein folding is different from regular chemical reactions. Chemical reactions proceed from a reactant to a product via a set pathway of structures and intermediates. Proteins do not do this because a protein doesn’t find just one intermediate shape as it folds — it can potentially find millions. Levinthal concluded that a new protein, moving through so many intermediate structures, must take an enormously long time to find its final native state.

To understand the vast number of conformational possibilities, let’s take a polypeptide of 101 amino acids. There will be a total of 100 bonds connecting amino acids, each bond having six possible conformations (see Figure 1). This means that a protein of 101 amino acids has 3100, or 5*1047, configurations—and some proteins are five or ten times longer!

Figure 1. One polypeptide bond exists between yellow and aqua amino acids. The bonds within the amino acids can rotate and account for three conformations. Even for this short sequence, there are 81 (3^4) different options.

Even if our 101-amino acid protein were able to sample 1013 conformations per second, it would still need 1027 years to try all possible shapes. However, in reality, it takes seconds, not eons, for a protein to find its native conformation. This leads to a big question: Can humans predict how proteins will fold? Even with the help of computers, which can test each possible shape in microseconds, testing them all would require 30 years of computation just for one protein.

Simplifying Structure Prediction

Protein structures, such as hydrogen and ionic bonding and hydrophobic interactions, are difficult to predict rationally just based on the amino acid sequence. Instead, a database of protein structures found by x-ray crystallography, called the Protein Data Bank, has been more helpful in determining the rules of protein folding. Still, determining protein structures accurately is difficult and time-consuming. Some computational shortcuts have made the process simpler, but the predicted folds still are not exact.

The biggest simplifications are made by assuming a lattice structure or using a coarse-grained representation. The former takes a globular protein that typically has variable bond lengths between each amino acid into a lattice (has uniform bond lengths) and places each residue into a 3D grid structure thus limiting the number of possibilities the possible placements of each amino acid. A coarse-grained model would simplify a protein structure by representing amino acids as a single point (see Figure 2).

Figure 2. The coarse grained model reduces a whole amino acid to one point of either hydrophobic or hydrophilic nature. It simplifies the number of objects to calculate, greatly reducing computer time.

So far, computational prediction of protein structures is limited to these simpler models because more realistic all-atom energy diagrams are too complex and computationally heavy. In our protein of 101 amino acids, there are close to 2000 atoms to move around in 3100 configurations. With the advent of quantum computing, such problems are becoming easier to solve, but for now, they still use coarse-grained representations.

How Your PC Can Help Mine Data

Some researchers have turned such computational problems into citizen science projects. Perhaps the most famous of these is FoldIt, developed by the Center for Game Science and the Department of Biochemistry at the University of Washington. Foldit is an online game where players compete to create accurate protein structures by moving around the backbone chain, amino acid residues, and domains. Players score points by packing the protein, hiding hydrophobic residues, and clearing any clashes between side chains to minimize the energy of the overall structure. The lowest-energy conformations from the game are then collected and analyzed to improve real-life folding algorithms.

A less hands-on folding program is Folding@home from Stanford University, which borrows unused processors on your personal computer to work on a folding algorithm. While users check their emails or listen to music, or even when the screensaver runs, their computers solve structures and compute minimization functions.

All this data has gone towards the goal of figuring out both how malformed proteins aggregate and how to design drugs that will prevent misfolding. FoldIt has already produced a retrovirus structure that is being used to determine inhibitors of HIV. One of the labs behind FoldIt has been focusing on proteins involved in cancer, AIDS, and other diseases. The Folding@home project has produced about 130 peer-reviewed papers describing its accomplishments in simulating, not only protein folding but also molecular dynamics, which help determine the ability for drugs to bind.

Having an idea of what the protein does and where it does it, without having to use expensive machines to do crystallography (to get the structure of a protein) or high-throughput screening (to find the substrates of a protein), saves both time and resources when developing a drug. More work has to be done before computational predictions perfectly line up with crystal structures. But when that day comes, we will be much closer to understanding how proteins work, and how to cure diseases of protein folding and function.

About the author

Sarah Kearns is a first year in the Chemical Biology Doctoral Program at the University of Michigan. Currently, she is doing research rotations to find a long-term lab environment ideally focusing on enzyme structure-function relationships for drug development applications. Before attending UMich, Sarah got her BS in biochemistry and a minor in mathematics at the Rochester Institute of Technology where she modeled the biophysical stresses on the extracellular matrix. Her interests/hobbies other than science, are open access development, philosophy, baking bread, and hiking. You find her on Twitter (@annotated_sci), LinkedIn, or at her website Annotated Science.