Friday, September 27, 2013

Dark Matter Is Real, Not Just Noise or Junk

UPDATE: The title is facetious. I don't believe for one second that most so-called "dark matter" has a function. In fact, there's no such thing as "dark matter." Most of our genome is junk. I mention this because one of the well-known junk DNA kooks is severely irony-impaired and thought that I had changed my mind.

It seems pretty clear to me that Pugh (and probably Venters) actually think they are on to something. Here's part of the press release quoting Franklin "Frank" Pugh, a Professor in the Department of Molecular Biology at Penn State.

The remaining 150,000 initiation machines -- those Pugh and Venters did not find right at genes -- remained somewhat mysterious. "These initiation machines that were not associated with genes were clearly active since they were making RNA and aligned with fragments of RNA discovered by other scientists," Pugh said. "In the early days, these fragments of RNA were generally dismissed as irrelevant since they did not code for proteins." Pugh added that it was easy to dismiss these fragments because they lacked a feature called polyadenylation -- a long string of genetic material, adenosine bases -- that protect the RNA from being destroyed. Pugh and Venters further validated their surprising findings by determining that these non-coding initiation machines recognized the same DNA sequences as the ones at coding genes, indicating that they have a specific origin and that their production is regulated, just like it is at coding genes.

"These non-coding RNAs have been called the 'dark matter' of the genome because, just like the dark matter of the universe, they are massive in terms of coverage -- making up over 95 percent of the human genome. However, they are difficult to detect and no one knows exactly what they all are doing or why they are there," Pugh said. "Now at least we know that they are real, and not just 'noise' or 'junk.' Of course, the next step is to answer the question, 'what, in fact, do they do?'"

Pugh added that the implications of this research could represent one step towards solving the problem of "missing heritability" -- a concept that describes how most traits, including many diseases, cannot be accounted for by individual genes and seem to have their origins in regions of the genome that do not code for proteins. "It is difficult to pin down the source of a disease when the mutation maps to a region of the genome with no known function," Pugh said. "However, if such regions produce RNA then we are one step closer to understanding that disease."

I'm puzzled by such statements. It's been one year since the ENCODE publicity fiasco and there have been all kinds of blogs and published papers pointing out the importance of junk DNA and the distinct possibility that most pervasive transcription is, in fact, noise.

It's possible that Pugh and his postdoc are not aware of the controversy. That would be shocking. It's also possible that they are aware of the controversy but decided to ignore it and not reference any of the papers that discuss alternate explanations of their data. That would be even more shocking (and unethical).

Are there any other possibilities that you can think of?

And while we're at it. What excuse can you imagine that lets the editors of Nature off the hook?

22 comments
:

It is truly frustrating to have gone ahead and bothered to educate myself on junk DNA and molecular biology, only to discover how time and again there are professional scientists who aren't even aware of the developments in their own field, and who continue to get their facts wrong. Something have gone completely wrong somewhere in these people's educations. They're completely oblivious to the history and the science behind junk.

Considering how I'm only a lab technician and have only recently come to understand the subject, I can barely imagine how frustrating it must be for someone like Larry.

Think of how frustrating it is for people in the functional genomics field who do understand the basics of molecular evolution. Over the last few days Larry has repeatedly basically dismissed the whole field as pointless exercise in technical wizardry that does not contribute anything to our understanding of biology. And he's not alone in that view.

That is both wrong and hurts deeply those who are trying to put the power of the technology to good use.

This is science, criticism is an intrinsic property of the game. If they want to avoid harsher criticism, maybe they should take some time to accurately report their findings instead of stooping to unsupportable sensationalism.

I'm perfectly willing to admit that all those expensive mapping studies might contribute to our understanding of genomes and gene expression. I'm sure they have. I'm sure there are lots of solid conclusions that are going to make it into the next editions of the textbooks.

I apologize for forgetting about the important insights that have come from mapping transcription binding sites and methylation sites. Could you just briefly remind me what they are?

Also, it is not at all as expensive to do these as it used to be. At this point in time, if you want to profile a few dozens of sites by ChIP-qPCR, the cost of the primers and the reagents can easily run into the neighborhood of the cost of going genome-wide. Might just as well do the whole genome and generalize your results. Whole-genome bisulphite is of course expensive but it will become cheaper.

It's just a tool that has become a normal part of doing the science in the field. I don't know why you are under the impression that there is a separation between the people who do "expensive mappings" and those who do traditional biology, There isn't - the latter group simply doesn't exist, everybody has adopted the functional genomic methods because they have a lot of advantages.

Now, there is one thing I have myself repeatedly pointed out and it is that the laboratories that have well defined biological problems they are trying to address and are using genomics to answer them in a focused way have indeed contributed more "important insights" (at least what I would call an important insight) that the large-scale consortia. But the large-scale consortia weren't really set up to provide biological insights, their goal is primarily to produce data. And they drove the development of the technology. Which, I would expect you to agree, is a positive thing - otherwise we can just go back 30 years ago and say that there was no point in developing genome sequencing because it's just a technical advance that won't provide any "important insights"

Finally, none of this conversation would be happening if there was enough funding for both the large-scale efforts and the individual labs, there was no competition between them for a limited resource. But that's a different topic

I'll accept anything you want to offer. For example. have the studies clarified the number of genes in the human genome? Have they led to any new (true) insights on the regulation of gene expression?

As you know, many of your colleagues in the field have CLAIMED that they have answers to those questions. They claim that their data shows that most of the genome is functional and they claim that their data shows that each gene is regulated by a host of transcription factors and noncoding RNAs.

Do you agree with them? If not, what is the most important contribution in your opinion?

If you don't consider that a fundamental insight, you must have impossible standards, which if we are to follow we would be better off if we just stopped doing science altogether.

As far as who claimed what, that's completely orthogonal to the question of whether the technological advances are useful. It's more of sociology of science question question. My position has been made clear many times

First, "expensive" includes the time an effort of graduate students and postdocs who could be doing something more important.

Second, I agree with you when you say, "But the large-scale consortia weren't really set up to provide biological insights, their goal is primarily to produce data." That being the case, the consortia should stop pretending the they have discovered any biological insights.

Maybe it's time to stop collecting more data and actually try to use it to gain some biological insights. If I were in charge of one of those consortia I would definitely be investigating the binding of various factors to random DNA in order to try and sot out function from accident. I'd also put some of my people on projects to look at specific "promoters" and "enhancers" to try and find out if they have a biologic function.

The last thing I'd do is waste them on pumping out a dozen more data sets. Enough is enough, already. The questions are there. It's time to start answering them.

The last thing I'd do is waste them on pumping out a dozen more data sets. Enough is enough, already. The questions are there. It's time to start answering them.

Well, that's not exactly true. Eventually every transcription factor will have to be ChIP-ped if we are to understand it's biology. We're not talking fundamental principles here, just characterizing the transcription factors (from which something interesting might pop out, who knows, but that's not necessarily the expectation).

There is indeed a tendency to go in the direction of "Let's go get more data" than to actually go deep in the data that already exists. It's easier. I've made that mistake myself. And we indeed have more data than we can fully make sense of already. But the that statement is not incompatible with the statement that we don't have all the data we need,

If you don't consider that a fundamental insight, you must have impossible standards, which if we are to follow we would be better off if we just stopped doing science altogether.

Well, I'm not an expert on Piwi RNAs and I haven't studied all the examples of small regulatory RNAs and anti-sense RNAs that have been discovered in the past decade.

However, I've been teaching students about small regulatory RNAs and antisense RNAs since 1980 and I put several examples in my first textbook published in 1987.

We have known for almost 40 years that transcription factors and RNA polymerases bind to random sequences of DNA. We've known for all that time that there MUST be spurious binding sites in a large genome. What's the point of mapping all those spurious binding sites for every known transcription factor?

The important question is not whether there are fortuitous binding sites (there are) but how many of them represent real biological functioning promoters and enhancers. Isn't it about time to start addressing that question?

Well, I'm not an expert on Piwi RNAs and I haven't studied all the examples of small regulatory RNAs and anti-sense RNAs that have been discovered in the past decade.

However, I've been teaching students about small regulatory RNAs and antisense RNAs since 1980 and I put several examples in my first textbook published in 1987.

Nobody disputes that.

But we did not know that there is a dedicated small RNA-based system for silencing transposons and that is so important. I consider that a major discovery.

We have known for almost 40 years that transcription factors and RNA polymerases bind to random sequences of DNA. We've known for all that time that there MUST be spurious binding sites in a large genome. What's the point of mapping all those spurious binding sites for every known transcription factor?

The important question is not whether there are fortuitous binding sites (there are) but how many of them represent real biological functioning promoters and enhancers. Isn't it about time to start addressing that question?

I think we have a slightly different definition of what a spurious binding site is. The binding sites we see are quite robust and reproducible. Why does the fact that there are tens of thousands of them have to bother you? There are at least 20K functional binding sites for the general transcription factors. I don't think you would dispute that. And I don't think you would dispute that there are thousands of enhancers in the genome. And when I say that, please note what I also said above about the claims in the paper that's the topic of this thread.

Of course that doesn't mean that all of them are functional. But it's not true that everyone is assuming they are and that nobody is testing them. It's just that it's a lot easier to map TF binding sites than to test their function. But it is something that's being worked on and results will be coming out in the coming years. We've only had (relatively) easy to use genome editing for less than an year now.

I think we have a slightly different definition of what a spurious binding site is.

Apparently we do. My definition is that a spurious binding site is where a transcription factor binds by accident because a random piece of junk DNA just happens to have a sequence that resembles the binding site consensus sequence.If the binding site is six (6) nucleotides then there will be one million of them in the genome. Only a handful are functional.

What's your definition?

The binding sites we see are quite robust and reproducible.

Of course they are. That's what I expect.

Why does the fact that there are tens of thousands of them have to bother you?

Like I said, it doesn't bother me in the least that there are tens of thousands of spurious binding sites in the human genome. What bothers me is those workers who interpret them as functional enhancers.

There are at least 20K functional binding sites for the general transcription factors. I don't think you would dispute that.

Do you mean that there's at least one functional binding site for each gene? If so, I agree.

And I don't think you would dispute that there are thousands of enhancers in the genome.

I would not dispute that. There are 25,000 genes (approximately) and almost all of them will have an enhancer of some sort. Many will have several enhancers.

As you yourself pointed out, TF motifs are not very information rich in metazoans, But you do not see all of them occupied, i.e. there is more than just the motif that makes a binding site, whether it's chromatin structure, cofactors, etc. . That's why I don't like the word "spurious" to refer to those, while once again, I stress that I am not in the "If you see it, it's biologically meaningful" camp.

When I said 20K sites for GTFs, I meant the TBPs and TFIIs. Each gene has a promoter after all.

Are there any other possibilities that you can think of?Well, if they're truly unaware of the controversy, they're probably also falling victim to the selectionist fallacy.

If they aren't unaware of it, they have seemingly chosen to ignore it completely, possibly because they've decided what side of the fence they come down on, and might even think the matter is settled (however horrifying that possibility is).

I think that there is another error in the quote. It approvingly quotes Pugh and Venter who manage to get the definition of "dark matter of the genome" wrong. I always thought that when people use the phrase "dark matter of the genome" they refer to whatever accounts for the "missing heritability". The large amount of the genome that changes without making any difference to the phenotype is fascinating, but it then cannot be that "dark matter". It may be a lot of DNA that has no known function, but that does not make it account for the missing heritability.

So for Pugh and Venter to refer to noncoding RNAs that do not do anything as "dark matter of the genome" they are changing the definition of that phrase.

I think the missing heritability quote is the most egregious aspect of the press release. It seems that people use these terms without properly understanding why they were coined, or what they actually mean. Agree that non-coding RNA has no significance in terms of missing heritability.

Someone, I can't recall who, once lamented that molecular biology was unlicensed biochemistry. It seems that for many people genomics is unlicensed molecular evolution.

Laurence A. Moran

Larry Moran is a Professor Emeritus in the Department of Biochemistry at the University of Toronto. You can contact him by looking up his email address on the University of Toronto website.

Sandwalk

The Sandwalk is the path behind the home of Charles Darwin where he used to walk every day, thinking about science. You can see the path in the woods in the upper left-hand corner of this image.

Disclaimer

Some readers of this blog may be under the impression that my personal opinions represent the official position of Canada, the Province of Ontario, the City of Toronto, the University of Toronto, the Faculty of Medicine, or the Department of Biochemistry. All of these institutions, plus every single one of my colleagues, students, friends, and relatives, want you to know that I do not speak for them. You should also know that they don't speak for me.

Subscribe to Sandwalk

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake.
Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory.
Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change.
Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance.
Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change.
Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat.
Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is TrueI once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000
It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma
One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick
There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner
An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins
Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod
The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.