Our review paper is an attempt to summarize recent advances in 1) how mutations affect the biophysical properties of proteins and 2) how modelling evolution can be greatly enhanced by incorporating biophysical principles, especially an explicit molecular chain. First of all, what do we mean by our title “Biophysics of protein evolution and evolutionary protein biophysics” ?

“Biophysics of protein evolution”What are the biophysical limits that constrain the evolution of proteins? How can we use biophysical principles to enhance the methods we use when we study the evolution of proteins?Example: Evolutionary biologists often try to estimate the speed at which evolution occurs. They look at a certain gene that when compared between different species shows certain differences caused by mutations. The question is which positions along a gene sequence change very frequently and which do change very slowly. In the example of a virus, this can tell us which parts of the gene – and therefore the protein – are under selection to adapt rapidly to the immune system catching up and which parts are conserved because they are functionally too important to be changed. The problem is that the probability of a change in the gene sequence will depend on where in the 3D structure of the protein that change can be mapped to. Mutations occurring on the surface of the protein are more likely to be tolerated than those buried inside the protein structure. Using this biophysical information can help us find out whether a certain gene sequence position evolves more rapidly than others because it is under selection pressure or simply because it happens to be located at the surface of the structure.

Co-evolving residues and sectors in rat trypsin. [click image for full caption]

“Evolutionary protein biophysics”How can we improve our understanding of protein structures by looking at evolutionary relationships to similar proteins? What clues does evolutionary information give us about functionally important biophysical interactions in a protein structure?Example: A new gene is discovered that likely encodes for a protein – but the protein structure is unknown. Comparing the gene to all similar genes in the databases will identify positions along the gene sequence that statistically “co-evolve”, which means that a change (mutation) at one position is often accompanied by a change at a specific second position. These two correlated positions are likely to physically interact within the 3-dimensional protein structure. Therefore, protein structure prediction (which is usually done by biophysicists) can be improved by using evolutionary information.

Schematics of some of the possible effects of mutations on protein folding and interaction. [click image for full caption]

What do mutations do to a protein? We outline several ways a mutation can alter the performance of a protein. This includes protein folding (i.e. the transition from an unfolded amino acid chain to a native folded structure) and the (de)stabilization of non-native intermediate structures with potential detrimental outcomes such as mis-folding and aggregation. Even in the absence of stable non-native structures, mutations can destabilize the native structure, leading to impaired function. Apart from folding, binding is the second category of biophysical properties that are altered by mutations – although, come to think of it, folding is really a special case of the protein chain binding to itself. Since proteins are constantly surrounded by other molecules, folding and binding can be seen as overlapping processes. Intra- as well as inter-molecular binding interactions have been scrutinized by natural selection to produce as few ‘unwanted’ interactions while making the main functional properties as robust as possible. Of course, mutations do not only cause problems, they can also lead to new traits that increase the fitness of an organism. This probably happens through a combination of neutral and adaptive events. Many mutations are neutral and do not have a dramatic effect, which is owed to the mutational robustness of proteins. This robustness in turn is a consequence of having a compact folded structure with a hydrophobic core (and therefore, different rules should apply to the evolution of intrinsically disordered proteins). Neutral mutations are tolerated throughout evolution – many of which lead to weakly active, latent traits that arise by mere chance at first (or as a by-product of optimizing some other trait), but may come under natural selection later on and enable adaptive mutations.

The space of foldable HP sequences gives a glimpse of what real protein sequence space might look like. [click image for full caption]

Evolution in a biophysical sandbox Computer models are a great tool for testing new ideas, especially those that are impossible or impractical to test otherwise. When it comes to evolution this is especially true, since evolution is inherently a long-term process that for this reason often escapes direct experimentation. Models and simulations are also great for combining biophysical principles with evolution. As outlined above, mutations affect proteins in many and often subtle ways that can only be fully understood by explicitly representing the folding and binding of a chain-like molecule and thus obtaining insights into the genotype-phenotype mapping required for drawing detailed evolutionary landscapes. For actual proteins such complete landscapes are not (yet) obtainable. A call for collaboration We would like to see a continuing fusion between the fields – allowing the close collaboration between multiple disciplines ranging from Biology, Physics, and Chemistry to Mathematics and Computer Science. It is also important to emphasize that there needs to be a close interaction between theoretical, computational, and experimental researchers. Theories and computer simulations can only be meaningful when based on previous experimental data and when confirmed by subsequent experiments. Experimentalists can also draw from previous experimental data – but often experiments by their very nature are designed in such a way that they only confirm or refute an expected outcome. Often it requires a change in theory to correctly interpret unexpected results or look for previously ignored properties of the experimental system.

Mutational paths can be guided by selection for hidden conformational states. [click image for full caption]

My favourite example of such an interdisciplinary effort between theoreticians and experimentalists is the use of simple-exact lattice models to explore the genotype-phenotype mapping of proteins based on general biophysical principles of hydrophobic collapse during protein folding. Despite the simplicity of the model many of its predictions have found a corresponding experimental match or in some cases are still awaiting such experiments. One example is the discovery of promiscuous enzymatic reactions and how important they can be for evolutionary innovation. Likewise, our simple lattice proteins predict that there are latent or “promiscuous” structure states (that may or may not coincide with biochemical functions) which evolution could respond to without jeopardizing the existing native structure and its associated functions. Such ‘exaptations’ (the subsequent selection of traits that originally arose accidentally or for a different function) are a powerful contributor to evolutionary novelty. Future Ultimately, what we strive for is a complete understanding of the relationship between sequence, structure, and function. Despite many advances, structure prediction based on sequence alone is still an unsolved scientific problem. Likewise, the prediction of the biophysical effects of mutations on structure is fundamentally flawed until we incorporate the entire range of structural and functional states of a protein molecule. Evolutionary themes such as epistasis or the interpretation of evolutionary rates already have benefited from the biophysical/structure perspective. Eventually all evolutionary phenomena will be traced down to the molecular level. We have only focused on proteins in our review, because this class of molecules has been studied most extensively. Nevertheless, important biophysical interactions exist between proteins, DNA, RNA, lipids, and any other molecule in the cell (and even outside of the cell). All these interactions may be affected by mutations. Furthermore, we have only discussed the effect of point mutations leading to single amino acid substitutions. Genomes, however are in a dynamic state of constant rearrangement owing in most part to prolific genetic elements (transposons etc) that lead to complex patterns of deletion and duplication of entire segments of DNA. Proteins also evolve through the recombination of long stretches of amino acids and cohesive structural units (fragments, sectors, domains, …). Elucidating all these complex relationships at the molecular level and how they evolve will certainly keep us busy in the decades to come.