A computational biologist discusses the technical, biological, and philosophical aspects of evolutionary biology.

Saturday, September 8, 2012

The genome as Detroit - my reaction to ENCODE

Ten years ago we saw the culmination of two decades worth of work in the publication of the human genome, a bloated mess of a As, Cs, Gs, and Ts half-full of seemingly repetitive nonsense obtained at the price of a dollar per precious base pair. Then, the media fed the public's collective chagrin as they announced the paltry figure of 1% deemed "functional" at the time. That word - functional - only encompassed those portions of the genome known or predicted to be translated into proteins or transcribed into the limited suite of RNAs with well-defined catalytic roles at the time. Over the last decade - using a set of model systems from the weedy thale cress to the diminutive fruit fly - that word has evolved to encompass an alphabet soup of specialized RNAs, regulatory binding sites, and activity-modifying marks that have yielded ever-increasing insight into the dynamics of the eukaryotic genome. Of course each small step in that pursuit hardly merited the front cover of multiple high-profile journals. And so the ENCODE project bid the scientific community to hold its collective breath until this past Wednesday, when the fleet of ENCODE publications sailed forth into public view with a large "80% functional" above each masthead.

The wet dream of comparative genomics

Yes, 80% of the human genome was found to either produce some RNA, or sometimes bind to a regulatory protein, or contain marks indicative of transcriptionally active regions. The authors also demonstrated that, on average, most of these regions are less variable than expected if selection were not acting to constrain them. This does imply that certain subsets of these elements do have biochemical roles with enough impact on reproductive success to overwhelm the stochastic fluctuation of variant frequencies from generation to generation and that the data sets were large enough to detect these subsets. Yet the 80% figure implies, at least to the average person, that the overwhelming majority of our genomes cannot sustain mutation without a non-negligible impact on fitness, which would be extraordinary, but for now remains disingenuous.

The New York Times used a Google Maps analogy for ENCODE, and I thought this could fit quite well. I could envision the human genome as a major metropolis, Detroit perhaps. The downtown still contains the bastions of yesterday's economic glory, without which the city might completely turn to shambles. As someone viewing a Google map of Detroit, I could easily posit that these buildings still serve valuable economic and governmental functions, similar to how the ENCODE elements conserved across class Mammalia likely encode important developmental and housekeeping functions within humans. However, as I pan over the extremities of the city, I would find houses in various states of disrepair. Certainly, from my vantage point many would have all the characteristics of functional domiciles - roofs, driveways, fences - to discriminate them from rural areas of the country without human presence. Yet if I were to explore those areas on the ground level, I would find many of the homes abandoned. Granted, this wouldn't stop the occasional transient homeless person from squatting for certain periods of time; but that hardly meets the usual definition of functional.

The bleaker reality of population genetics

This is how I suspect much of the human genome works. Most of it is capable of binding the occasional transcription factor, transcribing the odd RNA, and accepting contextual epigenetic markings. On rare occasions the insert of a duplicated gene or the local rearrangement of the DNA may provide opportunities for sections with previously transient but useless biochemical activity to take regulatory roles that have non-negligible effects on fitness and eventually become conserved, much like how the introduction of a successful business into a dying community can drive the revitalization of existing infrastructure. However, like the successful business and its surrounding region, the conserved genomic elements wink in and out of existence on a longer timescale.

I am excited by the genomic "Google map" that the ENCODE project has provided us, and I am sure it will lend considerable insight into human disease when combined with the theoretical power of population genetics. I just don't think the authors should imply that everything with the characteristics of function is presently important.