Thursday, September 06, 2012

The ENCODE Data Dump and the Responsibility of Science Journalists

ENCODE (ENcyclopedia Of DNA Elements) is a massive consortium of scientists dedicated to finding out what's in the human genome.

They published the results of a pilot study back in July 2007 (ENCODE, 2007) in which they analyzed a specific 1% of the human genome. That result suggested that much of our genome is transcribed at some time or another or in some cell type (pervasive transcription). The consortium also showed that the genome was littered with DNA binding sites that were frequently occupied by DNA binding proteins.

THEME

Genomes & Junk DNAAll of this suggested strongly that most of our genome has a function. However, in the actual paper the group was careful not to draw any firm conclusions.

... we also uncovered some surprises that challenge the current dogma on biological mechanisms. The generation of numerous intercalated transcripts spanning the majority of the genome has been repeatedly suggested, but this phenomenon has been met with mixed opinions about the biological importance of these transcripts. Our analyses of numerous orthogonal data sets firmly establish the presence of these transcripts, and thus the simple view of the genome as having a defined set of isolated loci transcribed independently does not seem to be accurate. Perhaps the genome encodes a network of transcripts, many of which are linked to protein-coding transcripts and to the majority of which we cannot (yet) assign a biological role. Our perspective of transcription and genes may have to evolve and also poses some interesting mechanistic questions. For example, how are splicing signals coordinated and used when there are so many overlapping primary transcripts? Similarly, to what extent does this reflect neutral turnover of reproducible transcripts with no biological role?

This didn't stop the hype. The results were widely interpreted as proof that most of our genome has a function and the result featured prominently in the creationist literature.

I don't blame science journalists for this. Lots of scientists also used the ENCODE result in 2007 to attack junk DNA. They honestly felt at the time that if a sequence was transcribed, no matter how rarely, it must have a function. They honestly felt that if a DNA binding protein bound to a piece of DNA then that site had a function.

THEME:TranscriptionOther scientists expressed skepticism over the interpretation of the ENCODE pilot project result. Some of them even disputed the data by showing that different techniques gave a different result on the pervasiveness of transcription. The most famous of these papers is the once from my colleagues here at the University of Toronto, Ben Blencow and Tim Hughes (van Bakel et al. 2010). There was lots of activity in the blogosphere as well [Pervasive Transcription].

I'm not saying the issue is settled, although I strongly favor the idea that most of our genome is junk. What I'm saying is that in spite of the hype in 2007 the supporters of junk DNA have made a good case and this is still a legitimate scientific controversy.

We have also pointed out that just because a site is occupied by a DNA binding protein does not mean that it is functional. In fact, once you understand how DNA binding proteins work you expect many of them to be sitting nonproductively at sites that resemble the actual functional binding site [DNA Binding Proteins] [Slip Slidin' Along - How DNA Binding Proteins Find Their Target]. It has been widely known since 1976 that the problem with large genomes is that they soak up DNA binding proteins that are binding nonspecifically to DNA (Yamamoto and Alberts, 1976). This is not controversial, if you know what you're talking about.

Now comes the followup ENCODE study extended to cover (almost) the entire genome. The results are published in 30 papers, several of them in a single issue of Nature (Sept. 6, 2012) [Nature ENCODE: Research Papers]. I haven't read all the papers but my first impression is that there's not much that's new except that the dataset is now more complete. Here's what the consortium members say in the abstract [An integrated encyclopedia of DNA elements in the human genome].

The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

This is 2012. A simple Google search will reveal that the concept of junk DNA is still alive and well. A search like that will also reveal the problems with interpreting the ENCODE result since we've had years of debate over the initial pilot study. There's no excuse for this kind of sloppy journalism.

Science journalist have been badly burned several times in the past few years. Surely they should know by now that a single paper on a new fossil won't overthrow our understanding of human evolution [Good Science? Bad Science Journalism?] nor will a single paper on arsenic in DNA make me rewrite my textbook. Science doesn't work that way. A single study won't cause us to entirely re-think our concept of the genome even if it's in thirty papers in Nature.

Responsible science journalist should have dug deeper to find out whether the new ENCODE data was any better than the earlier data and whether their interpretation of the results is being widely accepted in the scientific community. They don't have an excuse this time.

[The scientists who wrote the paper and the scientists who reviewed it will get theirs in a separate post.]

The ENCODE Project Consortium (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 799-816. [doi:10.1038/nature05874]

34 comments
:

Larry, the real world of information storage systems guarantees that there will be an accumulation of 'junk' information, be it DVD's, magnetic tape or DNA. Natural selection, however, will tend to impose a net deletional bias on trash DNA because of the energy/fitness costs in carrying it around and transcribing it. It would be surprising, therefore, if it was the norm for various life forms to carry around huge amounts of trash code in their DNA. Occasional examples as anomalies, maybe, but we would predict that it should not be the norm. At to that what we are learning about the regulatory part of the genome. It is looking more and more like a gigantic piece of software that runs numerous sections in parallel, each module providing inputs to the other sections so that the whole output is guided by many constantly changing variables. I think Venter has started to move 19th century Darwinian biology out of the dark ages of the 20th century and into the 21th century when he recently stated, ""All living cells that we know of on this planet are 'DNA software'-driven biological machines comprised of hundreds of thousands of protein robots, coded for by the DNA, that carry out precise functions," (New Scientist, July 13, 2012). My point is that in a complex software package like what is encoded into the human genome, it should not be at all surprising if the trend is to discover that more and more of it is actually used at some point, even if it is only a redundant back up system and the 'junk' portion is smaller than we thought, albeit there will always be errors and junk in real life. The trend in science is that there is a lot less junk than we thought and, maybe, more use for seemingly useless stretches of DNA that we had surmised. True, this is a prediction of those scientists who suspect that intelligence was involved in the programming of life, but diminishing 'junk' is also a prediction of natural selection. So .... why fight it?

Cleaning Lady: "It [the genome] is looking more and more like a gigantic piece of software that runs numerous sections in parallel, each module providing inputs to the other sections so that the whole output is guided by many constantly changing variables."

That's begging the question. The genome certainly does not look like "a gigantic piece of software that runs numerous sections in parallel." That is exactly what you need to prove. You're assuming what you need to prove. The ENCODE results have not moved us much closer than that-- they found more regulatory elements, so that might be 8 or 9% of the genome.

At no point have you addressed any of the POSITIVE arguments for junk DNA, which have been around for decades and which Larry has repeated over and over.

Cleaning Lady: "The trend in science is that there is a lot less junk than we thought..."

"The trend." Like the trend that North America is approaching China via plate tectonics, at about the same speed.

It is looking more and more like a gigantic piece of software that runs numerous sections in parallel, each module providing inputs to the other sections so that the whole output is guided by many constantly changing variables.

Not at all, in fact the results point in exactly the opposite direction - the genome is a giant stochastic mess and it will be extremely difficult to understand it in terms of computer analogies.

The presence of absence of junk DNA is not something that you can simply deduce because you want our genome to look like a manufactured storage system. Nor can you deduce what our genome should look like by expressing a faulty understanding of evolution.

There's real science that you have to deal with. When would you like to start?

@Cleaning lady: you are correct in that this is not about science vs ID. It's about which of adaptation and drift are dominant at the molecular level. A key unknown quantity is what percentage of molecular changes that rise to fixation are adaptive - we expect the amount of junk DNA to be small if, and only if, this percentage is high.

"The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases isunknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription,transcription factor association, chromatin structure and histone modification. These data enabled us to assignbiochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Manydiscovered candidate regulatory elements are physically associated with one another and with expressed genes,providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statisticalcorrespondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation.Overall, the project provides new insights into the organization and regulation of our genes and genome, and is anexpansive resource of functional annotations for biomedical research."

http://www.nature.com/nature/journal/v489/n7414/pdf/nature11247.pdf

The other week you called me a liar about this if the data vindicates me tell me prof am I still the liar?

The other week you called me a liar about this if the data vindicates me tell me prof am I still the liar?

If the interpretation of the data is correct then the implications are enormous and I was totally wrong about the amount of junk DNA in our genome.

We really would have to re-write the textbooks because the new interpretation tells us that most of the DEFECTIVE transposons in our genome have a function and most of the sequences in introns are functional. The interpretation also tells us that a typical gene requires about 10,000 bp of regulatory sequences to control its expression.

Unfortunately for you, science is not done by quoting what the authors of some paper said, but rather by looking at the data and checking if such data supports the quoted stuff. Creationists don't know this. IDiots don't know this. Scientists, if they do not know, they should know.

Are you prepared to weigh up the pros and cons of the various arguments regarding the use of 'function' in this paper? You seem mightily eager to accept this report unquestioningly, in a way that I'd be betting you would not for papers that pointed the other way.

Many people are skeptical of the interpretation of these results - not because 'junk' buttresses a worldview, but because there are good reasons for supposing that a) junk - non-essential sequence - is a real categoryb) Particularly in eukaryotes, it is a substantial fraction.

So ... victory-dances are premature.

The problem remains of explaining the wide variation across the eukaryotes in the proportion of noncoding to coding DNA - even within the same genus or species. If it all has a 'function' (by virtue of showing up in an assay), why do some need so much more of it than others?

Only about 5% of the human genome shows a lower-than-neutral substitution rate. This suggests that, whatever its 'function', the other 95% does not depend on sequence.

Although the opening up of chromatin by transcription of whatever lies within may truly be a sequence-independent 'function', with genuine phenotypic consequences, one really has to explain why some species devote 95% of their genome to this function, others a mere 30% or less. It's like saying the 'function' of the 10,000 feet of cabling I have used to attach my headphones to the iPod in my pocket is to enable me to listen to the music!

What if you were to start doing your job as a biologist and try to find out what DNA does, instead of declaring it junk?

What if the authors of the ENCODE project started doing their jobs as biologists and tried to find out what all that transcribed RNA does, and what all those transcription factor binding sites do, instead of just ASSUMING they are non-junk?

The comments by Diogenes, Georgi Marinov, and Moran are classic responses from people who have their heads firmly planted in the sand and refuse to go where the science points. Time to pull those heads out of those dark holes and either retire, or move into the 21st century. Your Archie Bunker 'it's junk DNA so no need to worry about it' approach is a real science-stopper. Kinda like a stone-age tribesman finding a cellphone and, since it fits no function so far as hunting and fishing goes, he chucks it into the river and continues to forage for grubs.

Your Archie Bunker 'it's junk DNA so no need to worry about it' approach is a real science-stopper.

Back that up with evidence, or shut your lie-hole. Idiot can't prove that with any evidence. Egomaniac asshole complimenting herself for her supposed superior intelligence. Fuck you, egomaniac narcissist.

It's like the Inquisition of Galileo calling heliocentrism a "science-stopper" and complimenting themselves on their superior intelligence. Narcissistic egomaniacs.

Name ten nucleotides of non-coding DNA (out of 3 billion in the human genome) with a novel function discovered by creationists or ID proponents. Name just 10.

Lots of functional regions have been discovered by evolutionists using evolutionary assumptions. Name ten nucleotides of non-coding DNA (out of 3 billion in the human genome) with a novel function discovered by anti-evolutionists using anti-evolutionary assumptions.

So what's the real science-stopper, you stupid fuck?

Cleaning Lady: "It [the genome] is looking more and more like a gigantic piece of software that runs numerous sections in parallel, each module providing inputs to the other sections so that the whole output is guided by many constantly changing variables."

The genome certainly does not look like "a gigantic piece of software that runs numerous sections in parallel." That is exactly what you need to prove. You're assuming what you need to prove. The ENCODE results have not moved us much closer than that-- they found more regulatory elements, so that might be 8 or 9% of the genome.

Cleaning Lady: "The trend in science is that there is a lot less junk than we thought..."

"The trend." Like the trend that North America is approaching China via plate tectonics, at about the same speed.

"A lot less." Way to be quantitative. What next? Casey Luskin's "Much DNA"?

Can you quantify your "a lot less"? You're the advanced intellect here, right? Scientists are just cavemen compared to a "21st century" advanced intellect like you.

So please, o great "21st century" advanced intellect, please enlighten us scientist-cavemen by telling us what fraction of nucleotides in the human genome are biochemically constrained in sequence by their function.

Larry gave a specific number. Since you're a "21st century" advanced intellect and we scientists are just cavemen to your great 21st century brain, why don't you provide a number too, with references?

Please provide a counter-argument for any of the POSITIVE arguments for junk DNA, which have been around for decades and which Larry has repeated over and over.

If you don't answer the above simple, simple, simple, undergrad simple questions, if you evade these questions, then we have the right to call you an egomaniacal stupid fuck.

How much of this is really about semantics? I admit I haven't even skimmed all (or read more than two thoroughly) the papers that were published yesterday, but isn't this just a disagreement about the word "functional" and whether it should be used for all nucleotides that are transcribed whether they appear to affect any function in a cell or not? Or are the authors really claiming that by biochemically active this 80% of the genome affects phenotype? Could we as biologists agree to completely toss the term "junk DNA" (though I also thought we'd abandoned it) and rephrase it as something else that better describes its presence in a state that appears to have no current effect on phenotype? (But leave open the possibility that a function might exist but has not been identified yet . . as to not shut off avenues of research - though again I think the cleaning lady's argument on this point is spurious). I have a suspicion that the phrase "God didn't make no junk" is at the root of some of this vitriol - that some people just can't stand the idea that there is "junk" in the genome. Maybe if it were called something else .. . ? In the meantime, I'm still trying to think of relatively simple ways of explaining the presence of nonfunctional DNA to my students in the face of this media explosion.

No, it doesn't make any sense at all to use the word "functional" to mean "may or may not have a function". This is _not_ about semantics.

The term "junk DNA" conveys exactly what it denotes - sequence that is there and may be co-opted for some function in future, but is not functional right now. Here's Sydney Brenner on junk vs trash: "Everyone knows that you throw away trash. But junk we keep in the attic until there may be some need for it." Why would we want to replace such a clear term with something else that's less widely used?

You have a point, but I guess I am still uncertain as to whether the ENcoDE authors really are definitely claiming phenotype affecting function for that 80%. I'm mostly bothered by the word functional and how the nonbiologists will interpret it. As to the quote about junk/trash...junk might never be useful, it might just take too much energy to take it to the dump.

Diogenes, your statement, "Back that up with evidence, or shut your lie-hole. Idiot can't prove that with any evidence. Egomaniac asshole complimenting herself for her supposed superior intelligence. Fuck you, egomaniac narcissist", is truly awesome. I am impressed by how you express yourself when you find yourself over your head! (Primate researchers, take note of a possible research subject) Do you, perchance, have a part time job as a stevedore?

Anonymous: Genetic Loading is a biological example of something that can be observed and modeled in computer science, the corruption and/or loss of information in a large computer program (say, a copy of MS Office). All aspects of genetic load can be modeled computationally, which, in so doing, gives us a better understanding of what is going on in biology and shows that our genome contains something a lot closer to software than what the knuckle-draggers are wont to admit. Furthermore, we can computationally model the future of an organism or population focusing on the effects of genetic loading.

In general to all: if you take the entire sequenced DNA, and treat it as a giant software package which is becoming increasingly corrupted, then factor in the effects of genetic drift and natural selection on such a package, you will take a giant leap forward in understanding what is going on in biology on the long term .... and you might want to find yourself quietly leaving Moran's sinking ship.

All aspects of genetic load can be modeled computationally, which, in so doing, gives us a better understanding of what is going on in biology and shows that our genome contains something a lot closer to software than what the knuckle-draggers are wont to admit.

Funny... my software doesn't mutate with each subsequent install.

And re: you answer - based on your vague walk around, you have no idea what you are talking about, correct? You didn't even answer the question.

Laurence A. Moran

Larry Moran is a Professor in the Department of Biochemistry at the University of Toronto. You can contact him by looking up his email address on the University of Toronto website.

Sandwalk

The Sandwalk is the path behind the home of Charles Darwin where he used to walk every day, thinking about science. You can see the path in the woods in the upper left-hand corner of this image.

Disclaimer

Some readers of this blog may be under the impression that my personal opinions represent the official position of Canada, the Province of Ontario, the City of Toronto, the University of Toronto, the Faculty of Medicine, or the Department of Biochemistry. All of these institutions, plus every single one of my colleagues, students, friends, and relatives, want you to know that I do not speak for them. You should also know that they don't speak for me.

Subscribe to Sandwalk

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake.
Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory.
Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change.
Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance.
Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change.
Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat.
Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is TrueI once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000
It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma
One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick
There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner
An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins
Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod
The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.