First Fully Automatic Design of a Protein Achieved by Caltech Scientists

PASADENA—Caltech scientists have found the Holy Grail of protein design. In fact, they've snatched it out of a giant pile of 1.9 x 1027 other chalices.

In the October 3 issue of the journal Science, Stephen L. Mayo, an Assistant Professor of Biology and a Howard Hughes Medical Institute Assistant Investigator, and chemistry graduate student Bassil I. Dahiyat report on their success in constructing a protein of their choice from scratch.

Researchers for some time have been able to create proteins in the lab by stringing together amino acids, but this has been a very hit-and-miss process because of the vast number of ways that the 20 amino acids found in nature can go together.

The number 1.9 x 1027, in fact, is the number of slightly different chains that 28 amino acids can form. And because slight differences in the geometry of protein chains are responsible for biological functions, the total control of formation is necessary to create new biological materials of choice.

By using a Silicon Graphics supercomputer to sort through all possible combinations for a selected protein, Mayo and Dahiyat have identified the target protein's best possible amino acid sequence. Then they have managed to take this knowledge and create the protein in the lab with existing technical processes.

This is a first, says Mayo. "Our goal has been to design brand-new proteins that do what we want them to do. This new result is the first major step in that direction. "Moreover, it shows that a computer program is the way to go in creating biological materials."

Proteins are the molecular building blocks of all living organisms. Composed of various combinations of the 20 amino acids, protein molecules can each comprise just a few hundred atoms, or literally millions of atoms. Most proteins involved in life processes have at least 100 amino acids, Mayo says.

Mayo and Dahiyat, who have been working on this research for five years, have developed a system that automatically determines the string of amino acids that will fold to most nearly duplicate the 3-D shape of a target structure. The system calculates a sequence's 3-D shape and evaluates how closely this matches the 3-D structure of the target protein.

One problem the researchers face is the sheer number of combinations needed to design a protein of choice. The protein that is the subject of this week's Science paper is a fragment of a fairly inconspicuous molecule involved in gene expression, and as such has only 28 amino acids. Even this small number takes a prodigious amount of computational power. A more desirable protein might involve 100 amino acids, which could make the staggering number of 10130 possible amino acid sequences.

Because this number is larger than the number of atoms in the universe, the researchers have had to find clever computational strategies to circumvent the impossible task of grinding out all the calculations.

In this case, the fastest way to the answer is by working backward. Starting with all the amino acid sequences possible for the protein, the computer program finds arrangements of amino acids that are a bad fit to the target structure. By repeatedly searching for, and eliminating, poorly matching amino acid combinations, the system rapidly converges on the best possible sequence for the target.

Subsequently, the simulation can be used to find other sequences that are nearly as good a fit as the best one.

This process has been honed by designing sequences for several different proteins, synthesizing them in the laboratory, and testing their actual properties.

With their innovative strategy, Mayo and Dahiyat are now reproducing proteins that are very similar to the target molecules. (The accompanying illustration shows how closely the protein they have formulated matches the target protein.)

But the goal is not just to create the proteins that already exist in nature. The researchers can actually improve on nature in certain circumstances. By making subtle changes in the amino acid sequence of a protein, for example, they are able to make a molecule more stable in harsh chemicals or hot environments (proteins tend to change irreversibly with a bit of heat, as anyone who has cooked an egg can attest).

"Our technology can actually change the proteins so that they behave a lot better," said Dahiyat, who recently finished his Caltech doctorate in chemistry and will now head Xencor, a start-up company established to commercialize the technology. The ability to create new proteins, and to adapt existing proteins to different environments and functions, could have profound implications for a number of emerging fields in biotechnology.

And, of course, it could help further the understanding of living processes.

"Paraphrasing Richard Feynman, if you can build it, you can understand it," says Mayo. "We think we can soon achieve a better understanding of proteins by going into a little dark room and building them to do exactly what we want them to do."