Better genomics through chemistry

There’s been a littleflurry of papers from UCSF recently about using chemical and environmental perturbations to ask when and why you need the function of a particular gene. I originally thought I might try to write about all of them at once, but no — there’s more here than I can do justice to in a single post. So I picked Nichols et al. 2011 (Phenotypic landscape of a bacterial cell, Cell PMID: 21185072), partly because it most clearly describes the approach, but mostly because a graduate student (Rupinder Sayal from MSU) sent it to me and suggested that I write about it. (Thanks, Rupinder!)

Biologists owes quite a debt to chemistry. It’s probable that one of your favorite proteins was discovered because it was the target of a drug. Target of rapamycin (Tor) is one example where the discovery process is immortalized right there in the name of the protein, but there are lots of others. Tubulin was discovered as the target of colchicine. I could go on, or you could ask Tim Mitchison, who can be eloquent on this subject if roused. Discoveries such as these opened up whole new areas of biology. Now that we have genomics tools, though, can drugs tell us even more? (You know I wouldn’t be asking this question if the answer were not at least partly yes. Indulge me.)

Nichols et al. collected a set of ~4000 E. coli mutant strains, each with a specific gene deletion or impairment, and asked, in essence, what makes them unhappy. They chose 324 different conditions representing 114 different kinds of stresses, mostly drug-based, and looked for differences in the growth phenotypes of the mutant strains versus the wild-type. If a mutant strain grows faster or slower than wild type in one of the conditions, that would be a sign that the gene deleted or impaired is somehow involved in responding to that stress.

That’s not the clever part, though it’s useful enough. The clever part is that the set of phenotypes an individual strain shows across the 324 different conditions constitutes a profile — the authors call it a “phenotypic signature” — that can then be used to compare one strain with another, or one stress with another. Mutations in genes that we know have similar functions have similar phenotypic signatures; and this, in principle anyway, offers a new way to find clues to the functions of genes whose activities we don’t understand.

Another way of thinking about this approach is that it’s a way of getting around the limitations of working in the lab. Many genes look non-essential to us, because we grow our bacteria on defined media, in controlled conditions, and probably a lot of what E. coli has sitting in its genome evolved to deal with different kinds of challenges: competition from a neighboring bacterium, the sudden shock of going from sitting on a piece of grass to traversing a cow’s gastrointestinal tract, the fact that the temperature just dropped to –18°C. Exposing bacteria to a range of stresses is a way to ask what’s essential, or useful, in conditions that more closely approximate what those bacteria have had to survive over evolutionary time. And — if we think about performing similar analyses on pathogens — some of these may be genes that are needed for the bacterium to survive and thrive in the human body. So the first interesting result in this study is that almost half of the genes have at least one phenotype, i.e. there was at least one condition in which the function of the gene was important. Most genes were only important in a few of the conditions. A few, though, were needed much more often — some genes were logged as having >30 phenotypes. These must be the generalist genes, the plumbers and electricians, the genes that always get called on to fix a problem. And yet, even those are active only in something like 10% of the conditions tried.

Second interesting point: genes whose functions we think we know (annotated genes) are much more likely to have multiple phenotypes than so-called orphan genes, whose functions are a complete mystery. This makes sense (to me) because if a gene is important in many conditions (the plumber gene) we’re more likely to have tripped over its function in random experimentation than if it’s rarely called on (the Latin translation gene; I still use mine occasionally). A good fraction of orphan genes, when mutated, have phenotype profiles that are similar to those of mutated annotated genes. So now we have a hypothesis to pursue: the orphan genes have similar functions to the annotated genes whose phenotype profiles they match. Or, if the orphan gene shows an opposite behavior to the annotated gene, perhaps it has an opposing function to the annotated gene. The authors identify marB, an orphan gene in the multiple antibiotic resistance operon, as a gene that anti-correlates with an annotated gene (marA, a transcriptional activator) and suggest that it may act as a transcriptional repressor.

What about the orphan genes that have multiple phenotypes, why don’t we know what they do? It turns out that this subset of orphans is rather narrowly distributed in phylogenetic space: most of them are restricted to fairly close relatives of E. coli. These might be genes that are very useful in certain niches, but have relatively specialized functions. (A plumber who doesn’t deal with frozen pipes, but specializes in installing Japanese toilets, perhaps.)

For those of us who are interested in drug-drug interactions, there is an interesting vignette here about the interaction between two classes of drugs, Sulfa and trimethoprim. These drugs are highly synergistic, so much so that they’re almost always given together. They both target the E. coli pathway that produces tetrahydrofolate, and they’re known to target different members of the pathway. In textbooks, the pathway is represented as linear, with Sulfa drugs targeting an upstream step and trimethoprim targeting a downstream step. So they shouldn’t be synergistic: they’re both partially blocking the flow through a single linear pipeline. Based on the gene interactions Nichols et al. identified, however, in addition to their known effects Sulfa and trimethoprim seem to have different effects on downstream steps in the pathway, and in particular on the two routes tetrahydrofolate can go down to be converted to 5,10-methylene-THF. It’s not clear why: do the drugs have unanticipated effects on other enzymes in the pathway, or is the effect mediated by build up of different metabolites when different steps in the tetrahydrofolate synthesis pathway are blocked? But what is clear is that blocking two different branches of a pathway is a very good rationalization for synergy.

This paper offers up several new hypotheses for drug mechanisms and gene functions, none of which are yet tested. So it’s not yet clear how large the overall impact of this work will be. (Although another prediction panned out and was published separately, which is encouraging. If you want to check the dataset for information about your own favorite gene or drug, the whole set of fitness scores is available as Supplementary Table 2.) It’s also not clear how easy it’ll be to do this for other organisms: the whole approach rests on the availability of well-characterized gene deletion libraries. But I’m definitely intrigued by the potential for identifying functions for genes that are “special” to the niche of the organism.