Capturing Protein Interactions

Inside cells, proteins do the work that keeps life humming. Proteins are the building blocks of all cellular structures, for one. They attack invaders, tote other molecules from one spot to another, convey important messages and enact vital chemical reactions. But proteins rarely act alone. Capturing interactions between thousands of proteins operating on a cellular scale has been difficult. Last fall, four laboratories led by a Harvard Medical School team filled in many of the blanks. After five years of study, they produced the most detailed map yet of protein interaction in a multicellular organism, the fruit fly Drosophila melanogaster. Spyros Artavanis-Tsakonas, the Harvard cell biology professor who led the project, explained the map and its usefulness to American Scientist senior editor Catherine Clabby.

A. S. To start with, why is there so much enthusiasm for better understanding proteins in Drosophila melanogaster?

S. A.-T. The existence of so much conservation of the genome and its basic functions across species has been perhaps the most profound lesson learned from genomic analysis over the past few decades. Studies by numerous groups have established the fruit fly as an excellent model system for human biology. The map provides a blueprint for protein interactions in higher organisms. We can use it to examine how, for example, protein relationships may be altered by different biological, including genetic, conditions or when cells are treated with drugs. Our laboratory, for example, uses Drosophila to study and gain insights into devastating human neurodegenerative diseases such as spinal muscular atrophy, Parkinson’s, ALS or Lou Gehrig’s disease and CADASIL, a syndrome involved in ischemic strokes and vascular dementia. Our goal is to dissect the cellular pathways that these diseases affect in the hope of identifying therapeutic targets.

A. S. Why is it vital to understand how proteins interact?

S. A.-T. From genome sequencing we knew the number of proteins and their molecular sequences, but not their interconnections and their specific functions. Yet the way that proteins interact to form protein complexes must be determined to understand their roles in cellular function. Our mapping study establishes a basis for understanding all the functional and structural connections between proteins in the organism. Our large-scale analysis also ties together many complexes that had previously been thought to be independent. Analysis of first- and second-degree neighbors of each of the proteins in the map established connections not only between proteins but also between the functional clusters that they are a part of.

A. S. What surprised you most about the protein interactions that you captured?

S. A.-T. Three things. First was the number of stable interactions involving proteins that had not been studied before. Our map provides the first empirical evidence for their functions and places them in an interaction network. Second, we were pleasantly surprised at how well we recapitulated some of the well-studied protein complexes. That gave us an indication of the quality of the map. Finally, at a more technical level, we were surprised to see how many membrane proteins we were able to recover. Membrane proteins are notoriously difficult to work with, but our methods were surprisingly robust at retrieving them.

A. S. How did protein interaction maps evolve?

S. A.-T. Traditionally, most studies examining protein interactions focused on a small number of proteins. But in unicellular model organisms, such as bacteria and yeast, multidisciplinary teams were assembled to attack the problem on a large scale. The first comprehensive map of interactions for thousands of proteins was for the common baker’s yeast, published in two parts, in 2002 and 2006.

Before beginning any “big science” endeavor, it is always necessary to produce a lot of tools and reagents. In our case, we had to wait for the Drosophila genome sequence to be completed and for thousands of genes to be cloned. Just as important, our analysis—which tracked tagged proteins— relied heavily on a technology called liquid chromatography– tandem mass spectrometry, which became affordable and robust enough for this scale of project only in the past five years.

A. S. You and your colleagues shared the data used to build this map. Why?

S. A.-T. As a large consortium receiving substantial National Institutes of Health funding for this project, we felt an obligation to disclose the results as soon as they passed our quality control measures. So we set up a routine at our web site (https:/interfly.med.harvard.edu) along with the central research repository FlyBase, to make the data available to researchers at frequent intervals, before regular publication. The NIH has been encouraging scientists to disseminate data as they are produced rather than waiting until a project ends. We feel that this is important for such a large research project.

A. S. What statistical tools did you use?

S. A.-T. We had to develop a number of bioinformatics and statistical approaches throughout the work. The most important were the ways that we vetted our data to ensure that the final analysis was performed on only the highest quality subset of results. We use these statistical tools to categorize complex protein mixtures into real versus false interactions. We developed a novel statistical metric called the HGScore, which gave us an unbiased way to rank more than 200,000 observed interactions and to identify the 11,000 high-confidence interactions, shown in the map. These were clustered to find the higher- order relationships to define potential protein complexes that perform a particular biological function in the cell. This also allowed us to compare our results with what is known from other biological systems.