Wednesday, June 25, 2014

The Function Wars: Part I

This is Part I of the "Function Wars: posts. The second one is on The ENCODE legacy.1

Quibbling about the meaning of the word "function"

The world is not inhabited exclusively by fools and when a subject arouses intense interest and debate, as this one has, something other than semantics is usually at stake.
Stephan Jay Gould (1982)The ENCODE Consortium tried to redefine the word “function” to include any biological activity that they could detect using their genome-wide assays. This was not helpful since it included a huge number of sites and sequences that result from spurious (nonfunctional) binding of transcription factors or accidental transcription of random DNA sequences to make junk RNA [see What did the ENCODE Consortium say in 2012?]..

I believe that this strange way of redefining biological function was a deliberate attempt to discredit junk DNA. It was quite successful since much of the popular press interpreted the ENCODE results as refuting or disproving junk DNA. I believe that the leaders of the ENCODE Consortium knew what they were doing when they decided to hype their results by announcing that 80% of the human genome is functional [see The Story of You: Encode and the human genome – video, Science Writes Eulogy for Junk DNA]..

The ENCODE Project, today, announces that most of what was previously considered as 'junk DNA' in the human genome is actually functional. The ENCODE Project has found that 80 per cent of the human genome sequence is linked to biological function.

Genomes
& Junk DNAIt’s unfortunate that one of the consequences of the ENCODE Consortium publicity campaign is an ongoing debate about the exact meaning of the word “function.” This debate has drawn in several philosphers as well as biologists. In some cases this has led to pointless quibbles that do nothing to settle the controversy over junk DNA. These debates also have the unfortunate consequence of seeming to justify the decision of the ENCODE Consortium leaders. I agree with Sean Eddy when he says (Eddy, 2013) ...

Attention focused on the squabbling more than the substance, and probably led some to wonder whether the arguments were just quibbling over the semantics of the word ‘function’.

Trying to conceptualize the forces that act on genome evolution is not just a matter of semantics.

(This is from the commentary in Current Biology where Eddy criticized Dan Graur’s paper (Graur et al., 2013) as “angry, dogmatic, scattershot, sometimes inaccurate.” I strongly disagree with Sean Eddy on that point even though I am sympathetic to the point he makes about quibbling over the meaning of “function” being a distraction.)

Although I am going to quibble about the word “function” in this lengthy post, my main point is that the function wars are, for the most part, distracting and unproductive. We’re interested in the big picture—whether most of our genome is junk—and that’s not going to be resolved by settling on a definition of “function.” We have enough experience in biology to know that very few terms can be defined unambiguously (e.g. “gene,” “species”).

Biomedically Relevant Function

Let’s look at an examples of quibbling over the meaning over of “function.” A recent paper by Germain et al. (2014) points out that the purpose of the ENCODE project was to discover functional sequences in the human genome. They are correct to say this in spite of the fact that the ENCODE leaders are now pretending that looking for function was not a very important part of the ENCODE project. According to the latest revisionist account, the most important contribution was just collecting massive amounts of data (Kellis et al. 2014).2

Germain et al. then go on to say that ....

ENCODE’s controversial claim of functionality should be interpreted as saying that 80% of the genome is engaging in relevant biochemical activities and that are likely to have causal roles in phenomena deemed relevant to biomedical research.

This seems to echo the view of the ENCODE Consortium since in their latest attempt at backtracking (Kellis et al. 2014) they emphasize this same point about medical relevance. After pointing out that only 1% of the genome encodes protein, the ENCODE leaders say ...

More recently, genome-wide association studies have indicated that a majority of trait-associated loci, including ones that contribute to human diseases and susceptibility, also lie outside protein-coding regions. These findings suggest that the noncoding regions of the human genome harbor a rich array of functionally significant elements with diverse gene regulatory and other functions.

They are suggesting that there’s a “rich array” of mysterious sequences that affect genetic diseases. I doubt very much that this is true. The mutations that produce genetic defects in humans will almost certainly turn out to be in well-understood parts of the gene or its closely associated regulatory sequences. There’s no reason to assume that mapping of genetic disease mutations in humans is likely to uncover a huge number of new regulatory elements that escaped detection by geneticists, biochemists, and molecular biologists.

The focus on putative functions that are biomedically relevant is just another way of describing the original claim of the ENCODE Consortium and it does nothing to advance our understanding of “function.” The correct way of expressing this idea is to say that 80% of the human genome might possibly have something to do with biomedical research. Using that kind of logic, one is forced to conclude that the most important result of ENCODE is to narrow the target by showing that 20% of the genome has nothing to do with biomedical research. But even that’s not true because most of the ENCODE leaders won’t rule out undiscovered functions in the remaining 20% of the genome.

It’s easy, and correct, to talk about “biochemical activities” as “putative function” or “potential function” and if that’s all that the ENCODE Consortium did then there would have been no headlines about the death of junk DNA. But even saying that 80% of the genome has a “putative function” is misleading since we know for a fact that one of the fundamental properties of DNA binding proteins is nonspecific (nonfunctional) binding and that these nonspecific sites must outnumber specific (functional) sites in a large genome (Yamamoto and Alberts, 1976) [see Slip Slidin' Along - How DNA Binding Proteins Find Their Target, DNA Binding Proteins ]..

Similarly, a great deal of the pervasive transcription detected by ENCODE is confined to a small number of cell types and very low abundance—a fact only reluctantly admitted eighteen months after the original papers were published (Kellis et al., 2014). What this means is that much of that pervasive transcription cannot be functional. So, we know for a fact that most of this “putative function” has to be nonfunctional (Struhl, 2007, van Bakel et al., 2010). Incidentally, one of the best ways to prove that accidental binding and spurious transcription is significant is to employ a negative control like the Random Genome Project (Eddy, 2013).

The best way to express this scientifically is not the statement that Germain et al. propose but something like: “The various sites identified by the ENCODE assays cover as much as 80% of the genome. Most of these sites will not have a biological function by any reasonable definition of ‘function’ but a small percentage of them have important, and well-understood, biological functions. It’s quite possible that an even smaller fraction of these sites have functions that we do not yet know about.” Somehow, that doesn’t seem quite as catchy as saying that 80% of the genome is functional.

In fairness, Germain et al seem to recognize the limitations of their argument when they admit that “this 80% cannot strictly speaking be called ... functional as ENCODE claimed.” However, they reveal their bias when they say that it is very likely to be functional. But this is the heart of the dispute. I, and many others, claim that most of this 80% is almost certainly nonfunctional and we have evidence and arguments to back up that claim. Evidence that Germain et al. seem to ignore.

Other philosophers believe that “function” can have different meanings depending on one’s interests. In Elliot et al. (2014) for example, the authors3 point out that medicine uses different definitions just as Germain et al. (2014) suggest. The example used by Elliot et al. is a mutation that causes cancer—this could be an oncogenic genome rearrangement, for example. Physicians could legitimately say that the mutation functions in causing cancer. This is not helpful.

The Many Meanings of “Function”

The issue of whether a large part of our genome is junk is not just a philosophical debate about the meaning of “function” but a large part of the Germain et al. paper is devoted to just that. The authors discuss two philosophical definitions called the causal role account of function and the selected-effect account. I find their discussion tedious and almost incomprehensible. The distinction between the two definitions is explained much better in Doolittle (2013) and Doolittle et al. (2014) but both discussions suffer from the over-emphasis of a false premise; namely, that it’s possible to define “function” in an unambiguous way that sheds light on the junk DNA debate.

The paper by Graur et al. (2013) suffers from the same problem. Those authors come down firmly on the side of selected-effect functions although they recognize that, “Estimates of functionality based on conservation are likely to be, well, conservative.” The best way to define function, according to Graur et al. is in terms of whether losing it has consequences. This is the best working definition, in my opinion: a sequence is functional if deleting it from the genome has an effect on the survivability of the organism or its progeny. This is the definition I’ve been using for almost two decades [see Junk DNA Poll].

Strictly speaking, this definition does not correspond to either the causal-role definition or the selected-effect definition because it can include functional DNA whose sequence is not conserved. This is the same definition used by Niu and Jiang (2013) for the same reason.

Dan Graur has expanded on this point in: "What is function?" A Section from a Future Textbook Chapter (would greatly appreciate comments) but now he seems to focus exclusively on functions that can be destroyed by mutation. This implies that a functional part of the genome has to have a specific sequence that is required for the function. This rules out spacer DNA and any of the bulk DNA hypothesis that are used by opponents of junk DNA. Even worse, this test of function fits the causal-role (CR) definition (not the selected-effect (SE) definition) according to some philosophers (Elliot et al., 2014).

As I mentioned above, Elliot et al. argue that different branches of biology use the word “function is different ways. They also argue that biologists who criticize ENCODE often appeal to the distinction between causal-role (CR) function and selected-effect (SE) function but “they do so in a way that many philosophers would find problematic.” It’s worth pointing out that philosophers are sometimes guilty of writing about biology in ways that many biologists would find problematic. The real question here is whether the debate about the amount of junk DNA in our genome is a biological problem, or a philosophical problem.

For the record, here’s the position adopted by Elliot et al. (2014)

Today, perhaps the closest thing to a consensus among philosophers of biology is that each function concept is associated with a distinct type of explanatory goal. On this view, the SE-function concept is appropriate for developing evolutionary or ultimate explanations, while the CR concept is appropriate for explaining proximate mechanisms.

I don’t know about you, but what this tells me is that philosophers aren’t going to make much of a contribution to the debate over junk DNA but they are going to be active participants in the function wars.

It is absolutely safe to say that if you meet somebody who claims not to believe in evolution, that person is ignorant, stupid or insane (or wicked, but I’d rather not consider that).

Richard Dawkins I don’t think it’s possible to define biological function in a way that can satisfy everyone. This isn’t unusual in biology since there are many important words that resist airtight definitions. I’m thinking of “gene” and “species” but there are many more. I agree with Doolittle et al. (2014) and Graur et al (2013) in one sense; namely, that defining “function” in terms of evolution and conservation (selected-effect) is vastly superior to defining biological function in terms of something that just does something else (causal-role). I also agree with all critics of the ENCODE Consortium that their attempt to use a causal-effect definition of function was just plain silly. (Or, possibly wicked, but I’d rather not consider that.)

The ENCODE leaders now (2014) take a slightly different approach to defining function. They refer to three approaches to the problem: genetic, biochemical, and evolutionary (Kellis et al., 2014).

The genetic approach relies on identifying function by recognizing stretches of DNA where mutations have an observable effect. This is a pretty good way of recognizing function. I prefer to think of the genetic approach in terms of whether or not a given sequence can be deleted without causing any significant effect but the basic idea is the same. Kellis at al point out the technical limitations of the genetic approach but that’s not very relevant when we’re talking about ways of defining function.

The evolutionary approach looks at sequence conservation as the hallmark of functional regions of the genome. This is a tried-and-true method of recognizing functional regions of the genome but there are some limitations (see discussion below). There can, in theory, be large regions of the genome that are functional but not conserved in terms of sequence. There is no evidence that this possibility is correct although we know for a fact that there are small regions of the genome that fall into this category,

The ENCODE leaders want you to know that it’s not always easy to recognize short conserved (functional) regions of the genome because multiple sequence alignments are a “substantial challenge.” They remind us that secondary structures in RNA might be conserved even though the sequence can change and that you can have substitutions in binding sites that still allow significant binding. (Nevertheless, scientists have been successful at identifying consensus sequences for over three decades.) The ENDODE leaders also want you to know that new functional sequences that have arisen specifically in the human lineage cannot be detected by the evolutionary approach. While true, this is likely to be trivial, as far as I’m concerned, but there are a surprising number of scientists who actually believe that a large fraction of the genome could have evolved new essential functions since humans diverged from chimpanzees. That’s why they keep mentioning this possibility.

The biochemical approach looks at molecules and sequences to determine what they do. It’s an excellent experimental method of determining whether a given DNA sequence has a function. The only limitation is that you have to understand biochemistry and that means understanding that just because you detect a biochemical effect of some sort, does not mean that you have identified a function. For example, human transcription factors will bind to million of sites in plant genomes but this activity doesn’t mean that they have a function in plants. Similarly, human transcription factors MUST bind to junk DNA, if it exists, because that’s the nature of DNA binding proteins. That’s a biochemical fact that’s described in all the textbooks.

The problem, as I see it, is that while biological function can most often be associated with conservation and selection, it isn’t a sufficient definition and it sometimes misidentifies sequences that don’t really have a significant biological function. In other words, there are both false positives and false negatives.

A good working definition of “biological function” is to consider a particular stretch of DNA functional if deleting it affects the survival of the organism or its descendants. Conversely, if the DNA can be removed without consequences then it is probably junk. These are not rigorous definitions because there are all kinds of cases where a gene with a known function can be deleted without harm to the organism.

For example, think of our primitive ancestor who just acquired a mutation in the gene for making vitamin C. That sequence is now junk because it can no longer encode an enzyme but was it junk or was it functional just before it acquired an inactivating mutation? I think we would want to say that the DNA sequence encoding the enzyme (L-glucono-γ-lactone oxidase) has a biological function even if we know that deleting it will have no effect.

An even better example is the gene for the enzyme N-acetylaminogalactosyl-transferase. This is the gene that controls ABO blood types. People with O-type blood are homozygous for alleles that make the gene nonfunctional and no enzyme is produced [Online Mendelain Inheritance in Man (OMIM) 110300]. As a consequence, the protein on the surface of red blood cells is not glycosylated as it is in people with A-type, B-type, and AB-type blood.

There is no evidence that people with the defective gene and O-type blood are any worse off than people that have the glycosylated protein. Does that mean that the ABO gene is junk even though it has a well-defined function? I don’t think that makes a lot of sense. This is a functional gene even though it meets our working definition of junk DNA.

Given examples like these, the working definition of junk DNA is not an airtight, unambiguous, way to identify junk DNA because it includes some DNA that has a clear biological function. Conversely, it may be possible to delete fairly large regions of the genome without immediate consequences as was done in the now-famous mouse genome deletion experiment (Nobrega et al., 2004) but opponents of junk DNA will not accept this as proof that the DNA was junk because they can imagine functions that might go undetected under laboratory conditions. Furthermore, there are those who argue that if we were to delete all the putative junk DNA from our genome there would probably be consequences. Cells might be smaller and cell divisions might be more frequent so that humans with very little junk DNA might look very different. This could be true but it doesn’t mean that the extra DNA in our genome is actually functional. It’s still junk.

What this means is that defining junk DNA as DNA that can be deleted without consequences will always be contested by quibbling. Nevertheless, it’s the best definition we have and it works quite well as long as you ignore the nitpicking and think about the big picture. About 90% of our genome is junk according to the best available biological evidence. Quibbling about the meaning of “function” (or "junk”) isn’t going to change that very much. The gray area, where a given sequence could be “junk” or “functional” represents only a few percent of the genome. (Although it probably takes up 90% of the published literature.)

What about identifying function by relying on sequence conservation? This is an evolutionary definition. It seems to be a pretty good way identifying functional regions of the genome (Doolittle et al,. 2014, Graur et al., 2013) and it’s slightly different from a definition that identifies function by saying that the DNA can’t be deleted without consequences. Looking for sequence conservation is a positive way of recognizing functional regions of the genome—at least in theory. It has worked pretty well in the past 50 years or so.

I agree with most biologists that conserved DNA is a pretty good proxy for functional DNA and that nonconserved DNA is most likely junk. However, even this definition is neither inclusive nor exclusive. There are examples of conserved DNA that look like junk and examples of nonconserved DNA that has a function.

As mentioned above, two large regions of the mouse genome were deleted without effect (Norbrega et al., 2004). Together, those regions covered 1,243 segments of DNA that were 70% identical in mice and humans (100 bp. window). This tells us that sequence conservation is not a reliable indication of function.

Similarly, Ahituv et al. (2007) detected four “ultraconserved” regions of the mouse genome that were shown to function as enhancers in vitro. Deleting these regions from the mouse genome yielded viable, fertile, mice that were indistinguishable from mice whose genomes contained the ultraconserved regions. The regions were conserved and potentially functional but they appear to be junk DNA.

We also have examples of pseudogenes whose sequences are relatively conserved in closely related species but they are, nevertheless, junk. Bits and pieces of defective transposons are important examples in this discussion since they represent a significant portion of the genome that is conserved between, say, humans and chimpanzees. They are conserved because they descend from an active transposon that inserted into that locus in the common ancestor of chimpanzees and human. But, today, those sequences are junk.

Speaking of transposons, active transposons have enhancers, a promoter, and at least one open reading frame encoding reverse transcriptase or transposase, depending on the type of transposon. The gene is functional and so are the regulatory regions. Under the right circumstances the gene will be transcribed and the transposon can move to a new location in the genome. Are active transposons junk DNA or are they part of the functional portion of the genome?

The question is analogous to asking whether an integrated copy of bacteriophage lambda in the E. coli genome (prophage) is functional or not. I think we would want to say that it IS functional and so are active transposons. These are not true examples of junk DNA. (Active transposons make up only a tiny proportion of the mammalian genome so the resolution of this semantic problem has no effect on the big picture debate.)

Questions like this can be of immense interest to philosophers and to those interested in the philosophy of biology. The previously mentioned paper by Elliot et el. (2014) addresses just this point: Conceptual and Empirical Challenges of Ascribing Functions to Transposable Elements. They talk about distinguishing between different levels of function such as the organismal level and the transposon level. It’s not clear whether they consider transposons functional at the transposon level and junk at the organismal level because much of the discussion is about whether transposons can affect the survival of the organism. That paper (Elliot et al., 2014) is a good example of the difficulties one can get into when the emphasis is on semantics (or philosophy) rather than the real question of how much of our genome is junk.

So conservation doesn’t necessarily mean that the DNA is functional. But are there examples of nonconserved sequences that are functional? Yes, there are. The best examples are spacer DNAs that separate DNA binding sites that have to form a loop when bound by their respective factors. The classic example is binding of lac repressor to two operator sites upstream of the promoter for the lac operon (Krämer et al. 1987; Krämer et al. 1988). You need the spacer but its sequence is unimportant. It has a function. Similarly, there’s a minimal size of intron because the assembly of the spliceosome requires an RNA loop [Junk in Your Genome: Protein-Encoding Genes] [Junk in Your Genome: Intron Size and Distribution].

These particular exceptions aren’t going to make much of a difference because they don’t involve a large percentage of the genome. That’s why sequence conservation is a good approximation of function and lack of conservation is still a fairly reliable indicator of junk DNA.

However, there are some possible “exceptions” to the rule that may be more important. One of them concerns a different kind of “spacer” DNA based on our understanding of chromosome bands and puffs in Drosophila polytene chromosomes and lampbrush chromosomes in vertebrate oocytes (especially amphibians). The idea is that genes are arranged on long loops of DNA that form compact higher order chromatin structures when the genes are silent but large extended loops when they are active. Emil Zukerkandl suggested back in 1976 that a certain amount of spacer DNA was necessary to keep genes apart on these loops and to form the complex heterochromatic state required for gene silencing. If more complex species needed more spacer DNA (larger loops), this would explain the C-value paradox (Zuckerkandl, 1976). A similar idea was suggested by Gall (1981).

There’s no evidence to support this hypothesis so it has been ignored in recent years. I mention it only to show that there are “spacer DNA” explanations that can account for a large percentage of the genome. This is DNA that cannot be identified by sequence conservation.

In addition, some people think that bulk DNA serves an important function in protecting against mutation, or in regulating the size of the nucleus. (There are other possibilities.) The point is that these bulk DNA hypotheses, like the one mentioned above, do not require sequence conservation but they do postulate that a lot of DNA has a function—it is not junk. If any of these hypotheses are correct then sequence conservation is not a reliable proxy for function. Fortunately, none of the bulk DNA hypotheses make any sense, so the point is moot.

So, we can adopt a working definition of function and junk based on whether or not deleting the DNA in question affects the survivability of the organism or its descendants. (Keeping in mind that there are minor exceptions).

1. Alex Palazzo suggested that we call these the “function wars.” Thanks, Alex.

113 comments
:

2 entries you refer to do not seem to be available (Slip Slidin' Along - How DNA Binding Proteins Find Their Target , DNA Binding Proteins ). I get the message "Sorry, the page you were looking for in this blog does not exist." for both

There is really way too much quibbling about things that really don't matter that much in the whole discussion.

The important questions are:

1) Is most of the genome junk, or more specifically and what is really the heart of the "debate", is most of the content of the large genome we observe in certain lineages there in order to and necessary for the specification of their generally (but by no means always) higher organismal complexity.

2) Are the genomes in question shaped largely by adaptive or nonadaptive evolutionary forces?

3) In the case of the human genome, what is the precise identity and function of the relevant to the phenotype segments of the genome?

1 & 2 are important because they affect how we think about the genome and ultimately, ourselves. 3 is important in practical biomedical terms.

A lot of the quibbling about what exactly is junk and how we define function is really deep in the weeds and does not really affect the answers to questions 1 & 2, and with respect to 3, it has to be recognized that "function" is not a binary attribute that a piece of DNA either has or does not have, but is more of a continuously distributed variable - sure, there are examples of DNA that suddenly became functional or (more often) nonfunctional, but IMO the slow drift into and out of functionality and latent functionality is more common, especially for regulatory elements.

But didn't your name just appear on a PNAS paper that devoted three pages to discussing different ways of defining "function"? Correct me if I'm wrong but isn't this what you wrote? ....

Quest to Identify Functional Elements in the Human Genome

... the scale of the ENCODE Project survey of biochemical activity (across many more cell types and assays) led to a significant increase in genome coverage and thus accentuated the discrepancy between biochemical and evolutionary estimates. This discrepancy led to much debate both in the scientific literature and in online forums, resulting in a renewed need to clarify the challenges of defining function in the human genome and to understand the sources of the discrepancy.

To address this need and provide a perspective by ENCODE scientists, we review genetic, evolutionary, and biochemical lines of evidence, discuss their strengths and limitations, and examine apparent discrepancies between the conclusions emanating from the different approaches.

Laurence A. Moran asked Georgi Marinov: “But didn't your name just appear on a PNAS paper that devoted three pages to discussing different ways of defining "function"? Correct me if I'm wrong but isn't this what you wrote? ....

“Are you saying that over a period of five or six years the majority of members of the ENCODE Consortium were just interested in data collection and storage and didn't think much about the implications or whether they were actually cataloguing sites that had biological significance?”

“Are you suggesting that during group meetings nobody wondered wether the pervasive transcription they were recording was real or just spurious transcripts as many had already suggested in the published literature."

"Are you telling us that nobody in those labs raised any questions about nonspecific binding of transcription factors as described in the textbooks?"

"Is it true that none of the PI's, postdocs, or graduate students gave journal club presentations on the junk DNA controversy and how if impacted the work they were doing on the characterization of the human genome?”

“Is it really true that the people in your lab never talked about whether the transcription factor binding sites they were analyzing were really regulatory sites or artifacts?"

Did you never discuss ways of identifying functional sites from nonfunctional sites or was the goal just to publish the locations of all the sites and let someone else try and figure out which ones were real?”

“Are you saying that it's true that the members of the ENCODE Consortium weren't very interested in making sense of their results and trying to understand the biologial functions of the genome?"

I also ask him the following questions, which might be relevant for all ENCODE scientists:

1. Considering that the C-value paradox (also referred to as C-value enigma) has been one of the most fundamental concepts in genome biology for decades, how is it possible to design and conduct a huge project on the human genome, such as ENCODE, without having this fundamental concept at the center of it?

2. Apparently, you just finished graduate school, focusing I presume on studying genome biology; in your studies on genome biology and evolution have you learned about C-value paradox? More specifically have you studied the articles written by the scholars in the field such as, for example, those written by out host Larry Moran or Ryan Gregory?

I have a few more questions for Georgi (please don’t take it personally, but at least here at Sandwalk you represent the ENCODE scientists) regarding the PNAS paper by Kellis et al. ( http://www.ncbi.nlm.nih.gov/pubmed/24753594):

As first author of Elliott et al. 2014, my official position is many TEs are functional at their level (selfish), but not functional at the level of the host. In most vertebrate genomes the vast majority of TE sequences are inactive and fragmented, and no longer capable of selfish replication and are best thought of as junk until sufficient evidence to the contrary comes to light.

I think it's important to remember that the genome is composed of multiple levels of evolution (selection, drift, etc.) and that it's not necessarily easy to give a single, simple to answer to a given question

I think it's a shame that you didn't come right out and say that active transposons are functional at the TE-level but junk at the organismal-level. By not saying that, you contributed to the very "lack of clarity" that you were trying to dispel.

Also, as John points out. To declare that the word "function" (and "junk") depends on context and the interest of the researcher is sort of like begging the question.

Finally, your statement that there are multiple mechanisms of evolution (e.g. natural selection and random genetic drift) doesn't make any sense. What did you mean by that?

I can only speak for myself but I distinguish between junk DNA and selfish DNA. Inactive, dead TEs for which we have no evidence of host or element level function are junk until proven otherwise. I dislike using the term junk for actives TEs because I find it ignores the the fact that there are multiple levels going on in the genome and we need to take that into account.

Sorry, I didn't mean to say that selection, drift, and other evolutionary forces are different levels. I meant it in the sense of levels in the hierarchy (TEs, cells, organisms, populations, etc.) at which those forces can act. People often say levels of selection but that is not the only force going on at multiple levels.

I think because we have this situation of a multi-level entity such as the genome we do have to keep context (levels) in mind. I see the ignorance of this fact in the TE literature all the time and I think it creates problems in interpretation of the information people gather.

I agree with Tyler above thati) Junk DNA isn't a good term to describe TEs in general. Active TEs that can replicate or move within the genome are better accounted as "selfish" while dead TEs that can no longer replicate and are likely subject to neutral mutation within the genome at this point are best described as "junk"ii) It's best to think of the genome in terms of multi-level selection. As he mentioned, different forces may be occurring depending on how you look at the genome. Active TEs may be "functional" as defined in Elliott et al. (2014) at their own selfish level, but may not be considered functional for the organism itself (this depends on where they insert in the genome, and may be deleterious, neutral or beneficial). However, despite the fact that TEs may be functional at their own level and non-functional at the host level does not classify them as "junk".

i) Junk DNA isn't a good term to describe TEs in general. Active TEs that can replicate or move within the genome are better accounted as "selfish" while dead TEs that can no longer replicate and are likely subject to neutral mutation within the genome at this point are best described as "junk"

Maybe I'm missing something, but a TE being active or not is irrelevant for the organism itself and has (presumably) no fitness impact on the organism, so both active and inactive TEs will be subjected to neutral mutations. There is no selection for TEs since selection is acting on the organim. They're all neutral, active or not. That's why most of them are inactive: there's no selection acting on them, and they're fate is sealed.

Maybe I didn't explain myself correctly. My point was only that if one states that dead TEs are subject to neutral mutations (as the above quote refers), then it would be implied that active TEs are not. But as far as we know, they both are, so that has no relevance to one being junk and the other not.

As for active TEs being seen as junk or not, I don't think anyone will be "right" or "wrong" here, since it depends on seeing functionality from the point of view of usefulness to the organim or simply if the genes in TEs are themselves functional or not. I know you see them as not being junk DNA, but others consider them so, and I don't think any position is necessarily correct.

Whether a TE is active or not can have fitness consequences for the host organisms, since active elements can jump into functionally relevant regions of the genome and cause deleterious mutations. Like any other mutations new TE insertions can be neutral, deleterious or beneficial, so some TE insertions are subject to positive selection at the host level, some are subject to negative selection at the host level and most probably just fluctuate in frequency due to drift because they are neutral at the host level.

BUT these forces also occur at the level of the elements, whereby in the element population those elements that are better are surviving and reproducing in the genome will be favoured by intra-genomic selection and will increase in copy number. This force will often by opposed by negative selection at the host level.

In the human genome most TE sequences are dead and degraded copies, I suspect the vast majority of these are of not functional relevance to the host organism and can be considered junk. Some inactive TE sequences happen to be in a beneficial spot or have a beneficial sequence for the host and then they are subject to positive selection at the host level, but this is the minority just as it is with mutations in general.

Whether one sees active, selfish TEs as junk or not for the organism seems to be a matter of perspective. Do many actives TEs contribute beneficial to the individual organism? I highly doubt it. There are examples where they do, such as telomere maintaining TEs in Drosophilids and some other taxa, but these examples appear to be the exception and not the rule.

I distinguish between junk DNA and selfish DNA because I think it can cause us to ignore that fact that TEs are their own level and that they are interesting and important entities to study in their own right. But I can see how some people would want to label active TEs as junk, because they usually aren't functional for the host. I just think we need to take a more nuanced approach and be careful about which level we are speaking about when discussing function.

I agree with what you said, generaly speaking. As I stated before, it amounts to a matter of perspective, and there are no right or wrong answers here. My perspective is that whatever future benefit comes out of TE activity can't be used as a discrimination between being junk or not, since under that criteria any part of the genome could potentially supply something useful down the line and junk as a concept would make no sense. All genomes would be effectively 100% "functional" just because of that future potential.

Do many actives TEs contribute beneficial to the individual organism? I highly doubt it. There are examples where they do, such as telomere maintaining TEs in Drosophilids and some other taxa, but these examples appear to be the exception and not the rule.

At this point, those TEs are not "junk" anymore, and I don't think anyone would claim they aren't functional.

I distinguish between junk DNA and selfish DNA because I think it can cause us to ignore that fact that TEs are their own level and that they are interesting and important entities to study in their own right. But I can see how some people would want to label active TEs as junk, because they usually aren't functional for the host.

I understand your point. I happen to somewhat favor the other side of the fence, but quite honestly, as long as everyone understands what the real point is, this should be of no practical relevance. Sometimes this seems to me as useful as deciding if viruses are alive or not.

"A sequence is functional if deleting it from the genome has an effect on the survivability of the organism or its progeny."

By this definition then, wouldn't all of our human DNA be functional since we need 90% of our DNA to be non-sequence specific in order to serve as a buffer against mutations. Without this, it would affect the survivability of our progeny.

Surely your definition also needs to include sequence specificity as well?

"Fortunately, none of the bulk DNA hypotheses make any sense, so the point is moot."

I'd appreciate if anybody could help me understand this from a genetic load perspective. If we accumulate 100 new mutations per generation and if about 10% of our genome is functional. Then does that not mean that each new generation accumulates 10 additional deleterious mutations?

By this definition then, wouldn't all of our human DNA be functional since we need 90% of our DNA to be non-sequence specific in order to serve as a buffer against mutations. Without this, it would affect the survivability of our progeny.

If it were true that this extra DNA was actually selected for its ability to protect against mutations then it would not be junk by my definition.

However, I have never seen a rational defense of such a claim. Would you like to the the first one to explain how all that extra DNA in various species of onion has the function of protecting the genes (and other sequences) against mutation?

I'd appreciate if anybody could help me understand this from a genetic load perspective. If we accumulate 100 new mutations per generation and if about 10% of our genome is functional. Then does that not mean that each new generation accumulates 10 additional deleterious mutations?

No, it does not mean that.

It means that the functional part of our genome acquires about 10 new mutations every generation. Some of these might be lethal so they never show up in newborn babies. Many of them are effectively neutral. Only a couple of them could be deleterious and we can tolerate that without going extinct.

With regards to the onion test, I agree that the vast majority of the onion genome will be junk.

Elsewhere on this blog you have a running calculation for the amount of functional DNA in the human genome. Currently the total is sitting at 8.7% - this agrees with the fact that about 9% of our genome is conserved.

What fraction of mutations within this functional region would you say are neutral?

By this definition then, wouldn't all of our human DNA be functional since we need 90% of our DNA to be non-sequence specific in order to serve as a buffer against mutations. Without this, it would affect the survivability of our progeny.

If we had less junk DNA but the same amount of "functional" DNA, the number of mutations per generation in the functional region would be roughly the same (and the total numbert of mutations, harmful, neutral or advantageous, would be smaller in a smaller genome). I hope I don't misunderstand genetic load completely, but it seems to me it would only increase if the functional region were larger in absolute terms. It doesn't really matter what fraction of the whole genome it amounts to (which is precisely why the junk fraction varies so wildly from taxon to taxon). Removing junk should not in principle increase the load.

That makes sense. After looking into the genetic load argument for junk DNA by A. Palazzo and Ryan Gregory (http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1004351), I was mistakenly lead to believe that we need a certain ratio of junk to non-junk in order to buffer against mutations but I can see now that that doesn't follow.

What I don't understand is how they can both say "only 1% of the nucleotides in the genome are essential for viability in a strict sequence-specific way" and then also say: "at most 10% of the human genome exhibits detectable organism-level function"

I do have a problem with the following definition, but at a different level:

"A sequence is functional if deleting it from the genome has an effect on the survivability of the organism or its progeny."

Since we all agree that junk can potentialy, by molecular evolutionary means, produce genetic novelty and turn into something useful somewhen down a lineage, wouldn't that mean that including the "or it's progeny" part would automatically make the genome 100% functional? Not all of it would become functional down the line, obviously, but since the potential is there for any part of it to become functional we cannot at any point say it's junk because it could affect "the progeny". That would invalidate the existence of junk by that definition. Seems to me that leaving the "progeny" part of the definiton would be better, or at least say "direct progeny" instead.

There could be a more subtle example of function along the lines of aceofspades above and following Larry's definition.If a particular TF had 50 functional binding sites and 950 non-functional semi-canonical sites and you went in and somehow managed to genetically engineer all of the non-functional sites out it would probably have a deleterious effect on the organism. The reason is the expression level of the TF protein and its bind strength to its functional sites has been 'tuned' to the fact that most of it will be soaked up by non-functional sites. Removing them is the equivalent to whopping over-expression. But this is function in only a roundabout sense

But this is function in only a roundabout senseWhy? It fits all of the definitions given above. Organisms aren't constructed; they evolve. It is only to be expected that sometimes an organism comes to depend on a certain DNA-sequence by chance. My only doubt is whether Larry's definition allows you to remove all of these sites at once. Removal of any individual site will have negligible effect.

I think this is the kind of 'function' that philosophers would get very excited about but biologists would note in passing. To the extent this discussion on junk to to deprive creationists of the argument that a mostly functional genome is consistent with design, this type of function is certainly not consistent with a designer.

The reason is the expression level of the TF protein and its bind strength to its functional sites has been 'tuned' to the fact that most of it will be soaked up by non-functional sites. Removing them is the equivalent to whopping over-expression. But this is function in only a roundabout sense

That's a very interesting commentary, thanks for sharing. I still think that wouldn't affect what we consider to be genomic junk in any sensible sense, but it does illustrate how perfect definitions won't be forthcoming. By the way, has anyone compared expression levels among closely related organisms with wildly varying amounts of junk (in the famous "onions" spirit)?

Similarly, Ahituv et al. (2007) detected four “ultraconserved” regions of the mouse genome that were shown to function as enhancers in vitro. Deleting these regions from the mouse genome yielded viable, fertile, mice that were indistinguishable from mice whose genomes contained the ultraconserved regions.

... then I wonder, indistinguishable by whom? If there were a 1% difference in fitness between those mice and their peers, that would be a very strong natural selection that would be effective in keeping those regions conserved. But in the lab, we would never notice it unless we did massive breeding experiments.

Molecular biologists often make that mistake -- Benjamin Lewin, editor of the journal Cell, once declared that it is a problem that so many genes can be deleted without noticeable effect. He forgot that Benjamin Lewin's ability to "notice" is much poorer than nature's. Fitness is not just either 0 or 1, but there are values in between.

I understand your argument. You can use it to challenge any attempt to demonstrate that a given stretch of DNA is junk (has no biological function). What you are doing is putting the onus on junk DNA proponents to prove that the DNA is functionless.

However, your argument doesn't work when you try to use it to look at forests instead of trees. If all of the excess DNA has a very tiny evolutionary function then the genetic load would be intolerable, wouldn't it? Also, I don't think your argument passes the onion test when applied at the level of the entire genome and it certainly doesn't explain the C-Value Paradox the way junk DNA does.

I don't think Joe was intending a defense of the idea that the entire genome is functional. Or claiming that we can't tell whether most of the genome is junk. He's saying quite the opposite, that when we have other good evidence for function -- they're ultra conserved, for FSMsake -- the inability to detect selection in the lab isn't good evidence of non-function. Are you sure you don't agree?

I also interpreted Joe's comment the same way as John. It seems to me a very valid point. Lab conditions are far from "real world" conditions, and failing to detect effects depends on the function of the gene and associated phenotype, which may or may not be obvious under lab conditions.

Nevertheless, it can be used as an argument to use functionality as the null hyphotesis, although I think it would be unreasonable to do so given all the rest. But it is important to keep those issues in mind.

Laurence A. Moran: If all of the excess DNA has a very tiny evolutionary function then the genetic load would be intolerable, wouldn't it?

Larry,

I’m sure you know that genomic sequences can have informational functions (iDNA) or non-informational functions (niDNA) and that the ‘genetic load’ only applies to iDNA; you might want to correct your statement.

Joe Felsenstein: I am very far from thinking that most of the genome is "functional" in any meaningful sense.

Joe,

Did you have the chance to read my paper “On the concept of biological function, junk DNA and the gospels of ENCODE and Graur et al.” ( http://biorxiv.org/content/early/2013/11/18/000588) in which I present additional evidence and arguments for a putative biological function(protection against insertion mutagenesis) of so called “junk DNA” (jDNA)?

True, DNA with non-informational function does not directly suffer from point mutations (one would imagine it would suffer from indels though, but let's ignore that right now)

However, I have asked you repeatedly to show us how genomes would end up with such DNA through adaptive means in the first place, and you have never bothered to reply. You know, as Joe Felsenstein once said here, "Show me the selection coefficients", that kind of stuff.

It is not enough to provide verbal arguments about what function something might be playing, you need to have the population genetics to back it up too. And none of the papers promoting such views that I have ever seen have even touched the subject.

“However, I have asked you repeatedly to show us how genomes would end up with such DNA through adaptive means in the first place, and you have never bothered to reply."

To my recollection, I always ‘bother’ to reply; otherwise the discussion is meaningless or one-sided. I would suggest that if you make this type of statements, you provide a link to the questions you asked, just like I did in my comment above (see my comment on Sunday, June 29, 2014 11:45:00 AM).

“It is not enough to provide verbal arguments about what function something might be playing, you need to have the population genetics to back it up too. And none of the papers promoting such views that I have ever seen have even touched the subject.”

“But there are things that are absolutely necessary and they just don't even exist -- I am about to officially get my PhD in two weeks and in neither my undergraduate nor my graduate institution did I even have the option to take a serious evolution class...”

I want you to show me a convincing population genetics and molecular-biology-based argument for how the non-information function might have been selected for, i.e how a small compact genome become a big apparently bloated genome, in which, however, the extra DNA plays a non-informational role as you propose, in a adaptive manner.

You can find this information in my paper “On the concept of biological function, junk DNA and the gospels of ENCODE and Graur et al.” ( http://biorxiv.org/content/early/2013/11/18/000588). Please read it and if you have specific questions, I’ll be happy to address them.

That paper contains nothing of the sort - all it has is verbal arguments that assume the existence of a big genome, but do not tell us anything about how getting so big in the first place was advantageous according to your hypothesis. I did that exercise for you in detail some time ago, you ignored it. I won't bother to do it again, but in short, even if we assume that extra DNA is conferring a protective advantage against mutation, there is no way it could have been selected for because even the largest individual insertions would only increase that protection by a tiny amount that would invisible to selection in a population with low Ne.

So we are back to the much more reasonable explanation of that DNA being there because of the balance between mutational processes and selection strength. Which you can then take and claim "But once it's there, it is playing a protective role and is maintained by selection" but"

1) There is no reason for that additional layer of explanatory complexity2) The argument that that extra DNA is maintained by selection is not convincing at all, because, by the reverse form of the same reasoning I mentioned above, individual small deletions would be invisible to selection and eventually drift to fixation and shrink the genome (if, of course, the indel balance was in favor of deletions), 3) you still fail the onion test (in the general form)

It is hard understanding your narrative. For example, what do you mean when you say: “So we are back to the much more reasonable explanation of that DNA being there because of the balance between mutational processes and selection strength.” What selection are you talking about?

So you agree that the genome size, like most if not all organismal features, is the result of the balance between mutational processes and natural selection strength.

Let’s consider for the sake of this discussion, that the so called ‘junk DNA’ (jDNA) in species with high C-value, such as humans, is the result of other evolutionary forces such genetic drift and neutral evolution.

So, in this hypothetical example, does this jDNA (e.g. 90% of the genome) confer a protective mechanism against insertion mutagenesis by inserting elements, such as retroviruses, or not?

You do talk about balance between the different mutational processes, which is a good first step, and you do mention drift, but you seem unable to get past that initial good star (BTW, "neutral evolution" is not an evolutionary force).

Let me summarize it once again:

Why some genomes are big is sufficiently well explained by the following:

1) An overall balance of individual mutations (small indels and large insertions of TEs) that is in the direction of expanding these genomes2) Low Ne, which means that the slightly negative selection coefficients of each of these individual mutations are mostly invisible to selection in these lineages and are free to drift to fixation

Your hypothesis is:

Having a lot of noncoding DNA protects against insertional mutagenesis so the trait of having a lot of noncoding DNA is maintained by selection.

There are multiple severe problems with that hypothesis:

1) It does not at all explain the phylogenetic distribution of genome size values (i.e. the onion test)

2) The population genetics does not work. And it does not work because there is no "large genome" allele, the genome is large because of a very large number of very small compared to the whole genome in size insertions of noncoding DNA, and it is those alleles that evolution works with. That means that:

3) There was no plausible way for "protection against insertional mutagenesis" to be selected for and for the genome to grow as a result of it, as I have explained to you in the past (and that is without even going into the negative effects of fitness of those insertions)

4) There is no plausible way for it to be maintained by selection (by the reverse argument - if the indel balance was in the direction of deletions, the genome would be shrinking over time because each individual small deletion would be decreasing the effectiveness of protection against insertional mutagenesis by such a small amount that you would need very large Ne for it to be selected against (but Ne is low in these lineages).

That's what you need to work out to make your hypothesis worth taking seriously.

I don’t know why your keep bringing up ‘selection’, when I just said in my previous comment to consider that the so called ‘junk DNA’ (jDNA) is the result of genetic drift and neutral evolution, not selection.

Considering that in humans approximately 90% of the genome consists of the so called jDNA, does it provide a protective mechanism against insertion mutagenesis by inserting elements, such as retroviruses, or not?

The answer to your question is irrelevant. Here's an example to show you why: Do random sequences bind transcription factors or not? Well, yes they do, but does random sequence therefore have the function of binding transcription factors? No.

It's completely irrelevant if it protects or does not protect against insertion mutagenesis (a question that can be answer only if all else is equal, and all else is not equal) if what you are claiming is that there is purifying selection maintaining all that extra DNA, and that claim makes no sense. Point 4) above.

Regarding Point 3), you wrote this yourself:

Notably, this model couples the mechanisms and the selective forces responsible for the origin of jDNA with its putative protective biological function, which represents a classic example of ‘fighting fire with fire.’ One of the key tenets of this theory is that in humans and many other species, jDNAs serves as a protective mechanism against insertional oncogenic transformation. As an adaptive defense mechanism, the amount of protective DNA varies from one species to another based on the rate of its origin, insertional mutagenesis activity, and evolutionary constraints on genome size.

You cannot claim in the same time that:

1) junk DNA arose as a result of the nonadaptive interaction between mutational balances and the population genetics environment of the lineage2) It is an adaptive mechanism guarding against insertional mutagenesis3) It is currently maintained by purifying selection due to this "function" it has

These cannot be all true in the same time.

Finally, once again, explain to me how the process works at the level of an individual transposable element insertion, 6kb in size, within a 3GB genome?

Do these random sequences therefore have the function of binding transcription factors? Yes, those random sequences that bind the right transcription factors and are located at the right position in the genome in order to regulate the expression of a functional gene in beneficial way for the organism are indeed functional. The other random sequences are not; they are ‘dysfunctional’ :-) for the organism.

BTW, you did not hesitate to answer your ‘irrelevant question.’ Why, then, do you hesitate to answer the question that I’m asking:

Does the so called ‘junk DNA’, which represents approximately 90% of human genome, constitute a protective mechanism against insertion mutagenesis by inserting elements, or not?

Whether jDNA has been evolutionary maintained simply because of a mutational imbalance, favoring amplification of parasitic DNA versus deletion, or because jDNA is under host positive selection (whatever this selection might be), the protective function of jDNA in humans and other eukaryal organisms against insertional mutagenesis by endogenous and exogenous mobile genetic elements, such as retroviruses, is a bona fide fact.

Again, for whatever reason, you keep bringing up *selection*. Selection is not essential for my theory that jDNA provides a protective function against insertion mutagenesis by endogenous and exogenous mobile genetic elements, such as retroviruses.

”Whether jDNA has been evolutionary maintained simply because of a mutational imbalance, favoring amplification of parasitic DNA versus deletion, or because jDNA is under host positive selection (whatever this selection might be), the protective function of jDNA in humans and other eukaryal organisms against insertional mutagenesis by endogenous and exogenous mobile genetic elements, such as retroviruses, is a bona fide fact.”

Just to clarify, in the phrase bellow, “adaptive defense mechanism” is used as an conceptual analog to the CRISPR/Cas *adaptive* defense system against viral elements in prokaryotes.

“As an *adaptive* defense mechanism, the amount of protective DNA varies from one species to another based on the rate of its origin, insertional mutagenesis activity, and evolutionary constraints on genome size.”

Claudiu, I'm afraid your blinders are working too well. My irrelevant question was a direct analogy to your irrelevant question and showed why it was irrelevant. You answered my question in a weaselly way that let you say "yes" and mean "no", thus avoiding having to think about what "function" means.

2) I say that the so called jDNA, which I increasingly refer to as symbiotic DNA (sDNA), plays a critical function. As a commentator here at Sandwalk, you know that natural selection is not the only force of evolution; just ask Larry :-)

3) You say that because “all else is not equal” (whatever that means) your answer to my question is ‘No.” That’s fine with me.

I answered your question according to the existing data: indeed, very few random sequences inserted in the genome can bind transcription factors, and even less sequences can serve as productive promoter elements. That means that the vast majority of inserted genomic sequences *do not* play a role in gene regulation. If the ENCODE scientists would have had the same appreciation about the biology and evolution of the human genome, it would have been great, don't you think?

So, my answer to your question is clear: with few exceptions the random sequences *cannot* be functional as promoter elements.

Now, do you have the courtesy and the ‘courage’ to answer my question?

2) I say that the so called jDNA, which I increasingly refer to as symbiotic DNA (sDNA), plays a critical function. As a commentator here at Sandwalk, you know that natural selection is not the only force of evolution; just ask Larry :-)

If it plays a critical function and is important to the organism fitness, then it is subject to purifying selection by definition. Not purifying selection on the sequence level, but on its length.

So clearly selection has to be involved in maintaining it, otherwise your claims do not even begin to make sense.

However, as I repeatedly point out to you, there is no mechanism to maintain that extra DNA in a situation in which Ne is low and the indel balance is in the direction of deletions. As you yourself wrote, the protective capability of that extra DNA is directly proportional to its quantity. But if I delete a 2.8kb piece of such DNA from a 3.2GB genome that has 200-400MB of sequence-constrained DNA, I am decreasing that capacity by 1x10^(-6). Greatly inflating the fitness effect and assuming a selection coefficient for that mutation equal to the negative of that number, there is now way such a deletion could be selected against when Ne is around 10^4. So the genome would be shrinking. Note that it would be shrinking too if you had a lot of small individual deletions, of say 3bp in size, even if Ne was 10^8, I just used the 3kb region for the sake of the argument.

So there is no plausible way that extra DNA is maintained by selection.

Now the indel balances are more often than not in the direction of expansion, but if that is the case, then your hypothesis is not needed to explain anything - indel balance and effective population size explain why the genome is so big without any need for postulating a causative role of the proposed protective role of the extra DNA. We are looking for minimal fully explanatory models, after all.

Obviously, the so called jDNA is present in our genome and many other species. As I wrote in the conclusion of my paper, it is possible that this jDNA ”has been evolutionary maintained simply because of a mutational imbalance, favoring amplification of parasitic DNA versus deletion”. However, regardless of the forces behind the origin and maintenance of this jDNA, it serves as a protective mechanism against insertional mutagenesis and, therefore, it would make sense to refers to it as protective, symbiotic DNA (sDNA).

BTW, Georgi, what percentage of the human genome do you think is junk DNA?

Claudiu Bandea may well be right that junk DNA "serves as a protective mechanism against insertional mutagenesis", but before I would rename it I would want evidence that it is maintained by natural selection for this.

The difficulty in concluding that are the selection coefficients involved. I often give Larry a hard time here when he thinks that selection coefficients such as 0.0001 indicate neutrality. But if we try to calculate selection coefficients favoring retaining a particular 100-base piece of jDNA as a result of this protective effect, it will be far smaller. Won't we get a number so small even I will agree that the deletion of that piece of junk DNA is neutral?

These are highly relevant questions. However, I addressed these issues in my papers, so either you did not read them, or did not read them carefully enough.

Please read my paper “On the concept of biological function, junk DNA and the gospels of ENCODE and Graur et al.” ( http://biorxiv.org/content/early/2013/11/18/000588), including the material in the *Data Supplements*, which is accessible at the same link.

You might also want to read my post entitled “Junk DNA is bunk, but not as suggested by ENCODE or Doolittle” ( http://www.ncbi.nlm.nih.gov/pubmed/23479647#cm23479647_1429), in which I outline my perspective on the *nucleoskeletal* and *nucleotypic* theories, which have dominated the thinking on genome size evolution over the last few decade, and which have been embraced by Doolittle as pillars for his theoretical framework on genome size evolution and biology.

If after reading this material you still want to ask these 3 questions, or if you have additional ones, I’ll be happy to address them.

You might want to become familiar with these so called “DNA-bulk theories” for another reason. As stated in my comment (see first comment) at your post *The Function Wars: Part II* ( http://sandwalk.blogspot.com/2014/07/the-function-wars-part-ii.html), “The idea that most of the genome in species with high C-value, such as humans, has informational roles has been dismissed decades ago for well rationalized reasons, including the C-value paradox, mutational load, and the evolutionary origin of most genomic sequences from transposable elements” Therefore, in context of this reality, I suggested that ”the major question remaining in the field of genome evolution and biology is whether most of the human genome and that of other organisms with relatively high C-value has *non-informational* functions or not”.

Apparently, you agree with this: ”I agree with you that bulk DNA speculations are the only way to avoid the conclusions that most of our genome is junk”. So, it would make sense to write some posts on these theories, and becoming familiar with them would help.

I am perfectly familiar with all the bulk DNA speculations. Just because I agreed with you that they are possible ways of explaining large genomes does not mean that they are correct. In fact, most of them are just plain silly and the others are so vague and imprecise that it's difficult to tell what the authors actually mean.

Claudiu, your speculations fall mostly into the first category (silly) but they are also confusing and vague. It's because your ideas are so confusing that you are getting questions whenever you post comments.

Your refusal to answer those questions is becoming both boring and annoying. Many of us have read your manuscripts and we stll don't know what you mean. Answer the questions or shut up.

Thanks for keeping an open mind regarding my theory that the so called so called ‘junk DNA’ (jDNA) serves as a protective mechanism against deleterious insertion mutagenesis.

As stated in my original and more recent papers, as well as on this thread (please see above), it is possible that the so called ‘junk DNA’ (jDNA) is simply the result of a mutational imbalance, favoring its amplification versus deletion (parenthetically, this might be a genuine example of Masatoshi Nei’s ‘mutation driven evolution’). Therefore, selection is not essential to explain the protective function of the so called jDNA.

That being said, I think there is plenty of selection involved in shaping the genome evolution and the accumulation of the so called jDNA. First of all, as emphasized in my model, there is very strong selection on the location where transposable elements can insert and the overall quantity of so called jDNA.

Evidently, most insertions in the informational DNA (iDNA) are deleterious and, therefore, the hosts carrying them are eliminated by selection. And, the same happens with most insertions in specific non-informational genomic DNA sequences (niDNA) that have constrains on length or overall sequence composition.

However, insertions in niDNA that have no such constrains [i.e. those acting as protective or symbiotic DNA (sDNA)] can survive evolutionary along with their hosts. Nevertheless, as discussed in these papers there are limits of how much of this sDNA can accumulate.

For example, in organisms hosts with high ‘metabolic and energetic constrains,’ such as bacteria, only those individuals organisms with limited amounts of jDNA can survives evolutionary; in these organisms, the high selective pressure imposed by insertion mutagenesis has led to the co-evolution of highly efficient protective mechanisms in form of site specific integration. However, in many eukaryal organisms, including most multicellular species, the costs for maintaining these sequences are small compared to those associated with other organismal features so the purifying selection against the accumulation of jDNA is relatively weak, at least up to a certain quantity.

Joe Felsenstein: “But if we try to calculate selection coefficients favoring retaining a particular 100-base piece of jDNA as a result of this protective effect, it will be far smaller. Won't we get a number so small even I will agree that the deletion of that piece of junk DNA is neutral?

”Sorry, Larry -- you know I appreciate your blog posts, but it's clear that you didn't understand the paper(s) you criticize”

However, as promised, here are the answers to your 3 questions:

1. Why don't bacteria bulk up their genomes to get protection?

Due to the high metabolic and energetic constrains associated with increasing their genome size, in context high reproductive rate and large populations, bacteria has co-evolved other protective mechanism against deleterious insertion mutagenesis, such as specific sites of integration, which consists of relatively short sequences.

2. How does your speculation fare in The Onion Test?

The amount of protective or symbiotic DNA (usually referred to as ‘junk DNA’) as an adaptive defense system (as in *adaptive immunity*; e.g. see the CRISPR/Cas *adaptive immunity* system in bacteria) varies from one species to another (including various species of onions) based on the rate of its origin and deletion, insertional activity, and evolutionary constraints on genome size. More specific, if a certain species of onions is exposed to high insertional activity by endogenous or exogenous viral elements as compared to other species including other species of onions, then its genome would increase in size until the inserting activity levels off or until the genome size becomes a metabolic or physiological burden.

3. How does creating a genome with dozens of active transposons that survive a million years of evolution count as "protection."

Well, you exist don’t you?

I doubt that you agree with these answers, which is fine with me. But I hope you’ll have the confidence to specifically address the following points:

Your stance on the so called ‘junk DNA’ is as clear it can be: the so called ‘junk DNA’ is the result of genetic drift and neutral evolution, and that natural selection has nothing to do with it. (yea or nay?)

You are also clear that the products of genetic drift and neutral evolution can have biological functions, can’t they?

So, if we consider that jDNA is the product of genetic drift and neutral evolution, why can’t it provide a protective biological function against deleterious insertional mutagenesis?

As stated in my original and more recent papers, as well as on this thread (please see above), it is possible that the so called ‘junk DNA’ (jDNA) is simply the result of a mutational imbalance, favoring its amplification versus deletion (parenthetically, this might be a genuine example of Masatoshi Nei’s ‘mutation driven evolution’). Therefore, selection is not essential to explain the protective function of the so called jDNA.

If something has a function then it is subject to selection, most of the time purifying.

That being said, I think there is plenty of selection involved in shaping the genome evolution and the accumulation of the so called jDNA. First of all, as emphasized in my model, there is very strong selection on the location where transposable elements can insert and the overall quantity of so called jDNA.

You keep repeating this, without backing it up with anything, and I keep showing you (with numbers) how it is nonsense. And then you repeat it again....

Evidently, most insertions in the informational DNA (iDNA) are deleterious and, therefore, the hosts carrying them are eliminated by selection. And, the same happens with most insertions in specific non-informational genomic DNA sequences (niDNA) that have constrains on length or overall sequence composition.

How many examples of intergenic "non-informational" DNA with very tight constraints on its length can you cite?' It's not enough to just posit the existence of a phenomenon that is very important for the theory you like and then use it in support of it, it has to actually exist.

Joe Felsenstein: “But if we try to calculate selection coefficients favoring retaining a particular 100-base piece of jDNA as a result of this protective effect, it will be far smaller. Won't we get a number so small even I will agree that the deletion of that piece of junk DNA is neutral?

So, if we consider that jDNA is the product of genetic drift and neutral evolution, why can’t it provide a protective biological function against deleterious insertional mutagenesis?

Let's repeat this one once again too.

If the existence of junk DNA can be explained entirely by drift and neutral evolution, then there is no need to add an additional explanatory layer of complexity.

the products of genetic drift and neutral evolution can have biological functions

There is something called constructive neutral evolution. It happens and it happens a lot. But the products of constructive neutral evolution are locked in their irreducibly complex state and maintained by purifying selection.

We have nothing of the sort here because there is neither a conceivable mechanism through which purifying selection could maintain the size of the human genome nor there is a need to invoke it.

According to my model on genome evolution in organisms with relatively high C-values, such as humans, the so called “junk DNA” provides a protective mechanism against insertional mutagenesis and, therefore, it fulfils the definition of symbiotic DNA (sDNA). As I said before, if you don’t agree with this theory, that’s fine.

However, as a member of the ENCODE project it would be interesting to know how much of the human genome do you believe is junk DNA?

And why is my paradigm that the vast majority (approximately (90%) of the human genome, which I label symbiotic DNA (sDNA), provides a defense mechanism against deleterious inserting mutagenesis by endogenous and exogenous inserting elements, such as retroviruses, *confusing and vague*.

Is it because you and some of the other commenters here (see above) cannot conceive that sDNA could have originated and maintained not by natural selection but by other evolutionary forces, such as this genetic drift?

Why do you think that an organismal feature that has originated and is being maintained by genetic drift and neutral evolution cannot have a biological function?

There is no evidence that people with the defective gene and O-type blood are any worse off than people that have the glycosylated protein. Does that mean that the ABO gene is junk even though it has a well-defined function?I don't think your ABO example was a very good one - if anything, it really highlights the difficulty in defining "function".

There is pretty good evidence that ABO blood types evolved in response to infectious burdens. Non-B blood types appear to have evolved in regions with high levels of malaria, and emerged at roughly the same time as the species of malaria which infect humans. Both A and O type individuals are less likely to suffer severe malarial consequences, and if infected when pregnant, have better pregnancy outcomes (as a rule O's do much better than A's & B's, but both 'O's and 'B's do better that 'A's). The mechanism appears to be the evolution of reduced cytoadherance between infected RBCs (which, when they occlude vessels, causes ischemia, which is responsible for a lot of the pathophysiology of malaria).

So in this example A & B have an obvious biochemical function - glycosylation, but oddly fail your definition of function - e.g. you can delete them - ala the 'O' allele - without reduced fitness. In fact, in malaraia-prone regions fitness increases.

The 'O' allele has no biochemical function but has an evolutionary one - in the sense that the 'O' allele provides a survival benefit when the carrier encounters what is probably the most serious pathogen humans have encountered in their recent evolutionary history.

The 'O' allele has no biochemical function but has an evolutionary one - in the sense that the 'O' allele provides a survival benefit when the carrier encounters what is probably the most serious pathogen humans have encountered in their recent evolutionary history.

So, a gene has a function when it makes a protein/enzyme but when you knock out that function it also has a function because it doesn't make a protein?

Do you really think that's a helpful contribution to the function wars? :-)

In case you can't access it but would like to know what our major arguments are, here is the Abstract:

"Media attention and the subsequent scientific backlash engendered by the claim by spokespeople for the Encyclopedia of DNA Elements (ENCODE) project that 80% of the human genome has a biochemical function highlight the need for a clearer understanding of function concepts in biology. This article provides an overview of two major function concepts that have been developed in the philosophy of science—the causal role concept and the selected effects concept—and their relevance to ENCODE. Unlike in some previous critiques, the ENCODE project is not considered problematic here because it employed a causal role definition of function (which is relatively common in genetics) but because of how this concept was misused. In addition, several unique challenges that arise when dealing with transposable elements (TEs) but that were ignored by ENCODE are highlighted. These include issues surrounding TE level versus organism-level selection, the origins versus the persistence of elements, and accidental versus functional organism-level benefits. Finally, some key questions are presented that should be addressed in any study aiming to ascribe functions to major portions of large eukaryotic genomes, the majorities of which are made up of transposable elements."

And the Concluding Remarks:

"The possibility that the majority of noncoding DNA plays an important functional role at the organism level has been actively discussed for many decades. While it is not true that most of the genome was simply dismissed as useless junk, there have long been legitimate debates regarding the percentage of DNA that is biologically important in large eukaryotic genomes. This is a question that will require both empirical data and conceptual clarification to resolve.

For example, the recent claims by the ENCODE project leadership that 80% of the human genome can be assigned a “biochemical function” are highly misleading because of the way in which the concept of “function” was employed. The issue is not simply that ENCODE made use of a causal role definition of function rather than a selected effects definition, as the CR definition is relatively common in genetics. Rather, it is because ENCODE misapplied this definition of function by using criteria that were far too broad. Equivocation between this loose concept of CR function and phenotypically relevant biological functions exacerbated the confusion surrounding the ENCODE results.

As described in this article, ascribing functions to specific components of the genome is uniquely challenging when the sequences involved are transposable elements. Their capacity for autonomous replication creates several major complications that confound the use of functional assessments typically implemented in studies of genes or regulatory regions. These unique challenges were ignored by ENCODE because the entire human genome was treated in the same way, despite the fact that it is made up primarily of TEs. Future work that aims to provide an estimate of the percentage of DNA in the human genome with a biologically meaningful function at the organism level will therefore require a much more sophisticated approach that takes these issues into account."

Ryan and I are discussing this on his Facebook page. facebook.com/tryangregory

He thinks my post is extraordinarily muddled about function concepts. He says, "... it's clear that you didn't understand the paper(s) you criticize."

His recent comment is ...

As I said, I don't think you understood the point or content of the paper, so we'll just have to agree to disagree about the usefulness of our discussion. I also don't see why you are ao focused on which TEs count as junk - - junk is even more nebulous concept than function. Meanwhile, if you're not interested in this topic, just don't read that literature about it. It doesn't mean no one else should care about or work on these issues.

I'm trying to understand where he thinks I've failed to understand his paper. If I find out, I'll post an update with the correct interpretation according to Ryan Gregory.

I think it's possible that I understand Ryan's paper but that I disagree with parts of it. Ryan might be seeing this as a lack of understanding rather than a legitimate difference of opinion. I've re-read the paper and my blog post and I still don't see a problem.

BTW, Ryan confirms that active transposons are junk DNA by his way of defining terms. I wish he and the other authors had specifically stated this in the paper because it would be an example of a coding region that is junk and that's a significant point that should have merited further discussion in the paper. It would mean that junk DNA is not confined to noncoding DNA as most people believe.

I'm also a bit confused about his claim that "junk" is more nebulous than "function." It seems to me that those are the only two choices so that if you define one then you define the other. Maybe there's a third option that I'm not familiar with?

Larry, by your own supposedly clearer definition of "function", active TEs would be non-functional if they don't contribute to organism survival. And we did not say that all active TEs are necessarily non-functional *at the organism level*. We said, and I told you again in the discussion, that some may be functional, some may have beneficial side-effects for the host, and many are probably not functional *for the organism*. You don't seem to acknowledge that we're discussing this from a multi-level perspective.

I also completely disagree with your opinion that working out concepts of function is mere "quibbling" and is not productive. On the contrary, I consider it a fundamental part of the debate and a necessary step for focusing future research. Just because there isn't a single, easy definition (though you seem to want to present one nonetheless) doesn't mean it is a conceptual free-for-all. That's how we got the equivocation from ENCODE to begin with. It's fine if you have particular views on definitions, but you are not simply offering a different set of ideas, you're suggesting that other people should not even bother working on this issue.

Again, I will simply direct your readers to the actual papers and leave it at that.

Larry, by your own supposedly clearer definition of "function", active TEs would be non-functional if they don't contribute to organism survival.

As I point out in my blog post, active TEs don't fit my definition. They can be deleted without effect but I still don't want to call them junk because they have a clear biological function. That's why NO definition works in all cases and it's a waste of time to try and come up with foolproof definitions.

And we did not say that all active TEs are necessarily non-functional *at the organism level*. We said, and I told you again in the discussion, that some may be functional, some may have beneficial side-effects for the host, and many are probably not functional *for the organism*. You don't seem to acknowledge that we're discussing this from a multi-level perspective.

I understand the distinction you are trying to make about having different definitions of "function" for different (ill-defined) levels. That does not mean I have to agree with it.

I also understand that you discuss abstract theoretical cases where transposons can have a "function" at the organismal level that's different from their function at the TE-level. What I was looking for in the paper was a clear statement on whether active TEs are junk DNA in the absence of any of these theoretical functions. You have answered on Facebook. I understand that the answer is "yes," active TEs are junk by your definition unless they have a new and different function unrelated to their selfish DNA function. My opinion is not the same as yours. That does not make you wrong but it also doesn't make you right. Your paper would have benefited enormously from a clear statement on this question along with a discussion about why some people legitimately disagree with you.

As I pointed out on Facebook, such a statement would have made it clear that you consider some coding regions (genes for transposase and reverse transcriptase) to be junk DNA and that would have been a contribution to clarity since many scientists (including you) frequently assume that junk DNA is confined to noncoding DNA.

But Larry, obviously you *do* care about the issue of definitions. You keep saying that you wish we had given a more specific indication of what counts as junk, and you present your own definition of function in this post. (You don't justify it or actually deal with the glaring exceptions, but that's beside the point).

In any case, I invite you to make a contribution to the primary literature in which you clearly lay out, develop, and defend your ideas about how to define "junk" and "function" -- or indeed, why they should not or can not be defined, if that is your position.

I also completely disagree with your opinion that working out concepts of function is mere "quibbling" and is not productive.

I discuss several papers in my blog and there's more to come. How do you think the "productive" thingy is working out so far? Do you think that everyone reading this blog, and the papers, now has a much clearer idea understanding of the word "function"? I don't.

The Germain et al. paper was a very philosophical approach to the issue and they defended the ENCODE definition. You think that was "productive"?

Just because there isn't a single, easy definition (though you seem to want to present one nonetheless) doesn't mean it is a conceptual free-for-all.

From my perspective, it certainly looks like a conceptual free-for-all where everyone offers their own opinion on the precise meaning of "function" and "junk." Your own definition of "junk" has evolved considerably over the past decade but 90% of the genome is still junk.

It's fine if you have particular views on definitions, but you are not simply offering a different set of ideas, you're suggesting that other people should not even bother working on this issue.

Yes, that's exactly what I'm suggesting. I'm not looking forward to a plethora of papers from scientists and philosophers arguing about the nuances of causal-role and selected-effect definitions and discussing whether DNA can be junk if you look at it from one perspective but not from another.

The net effect of all that will be to lend credence to the position taken by the ENCODE leaders. After all, if scientists and philosophers can't agree on a definition then maybe the ENCODE definition is okay after all.

Ryan, I would rather not have these debates over the precise meaning of "function" and "junk" but you, and others, have chosen to engage in this quibbling exercise. Having made that choice, don't be surprised if people quibble about what you wrote in your paper.

...and you present your own definition of function in this post. (You don't justify it or actually deal with the glaring exceptions, but that's beside the point).

Hmmm ... I called it a "working definition" in order to highlight the fact that it was not intended to be a philosophically defensible definition that would withstand all criticism. We need something. If you have a better "working definition" then please let me know.

Right after presenting this definition as a possibility, I add ...

These are not rigorous definitions because there are all kinds of cases where a gene with a known function can be deleted without harm to the organism.

I then go on to describe some cases where my "working definition" doesn't work. I can find exceptions to every single definition I've ever seen. Why don't we just admit that quibbling about CR and SE isn't getting us anywhere and just settle on some reasonable definition? Then we can get on with the task of looking at real biological data to decide whether most of our genome is junk?

In any case, I invite you to make a contribution to the primary literature in which you clearly lay out, develop, and defend your ideas about how to define "junk" and "function" -- or indeed, why they should not or can not be defined, if that is your position.

I'm doing that on my blog. It's faster and a lot cheaper. (I don't have several thousand dollars to spend on publishing papers in science journals.) You reference your own blog in your "primary literature" publications so now you can reference mine as well! :-)

I'm doing that on my blog. It's faster and a lot cheaper. (I don't have several thousand dollars to spend on publishing papers in science journals.) You reference your own blog in your "primary literature" publications so now you can reference mine as well! :-)

Yes, but first, based on my personal observations you are the exception among the people from your generation, who generally do not read blogs, and second, publications in the scholarly literature still hold more significantly more weight in people's minds than blog postings. It would be of everyone's benefit to have a more official paper trail of people's opinions. I don't think the cost is such an impediment, not all journals charge thousands of dollars for publication.

You probably think the cost is unimportant because you don't have to pay for it out of your own personal bank account. :-)

The most important reasons for preferring blogging over publishing are: (1) instant feedback via comments - I love the debate, (2) the ability to make corrections and updates, (3) photos and images, (4) direct lnks to other blog posts and publications, (5) speed, I don't have the patience to wait six months before expressing my opinion, (6) ego, I don't have to edit my opinion based on criticisms from reviewers (admittedly, about 5% of reviewers turn out to be helpful), and possibly (7) more people will read a blog post than a published paper.

It is not a question of one approach or the other, they are complementary and both are necessary in this case.

Also, there are cheap options like PeerJ (which is $100 per paper although it does not publish opinion and perspective pieces), and now there is bioRxiv too. And if someone invites you to write a perspective on the subject, I would imagine that would be free.

I don't see what is so laughable. I mentioned PeerJ because it is cheap and it is PubMed-indexed so it is part of the "official literature". Unfortunately, it does not publish perspectives, so it's obviously not useful in this case, but I used it as an example to illustrate the point that not all journals have exorbitant publication fees.

bioRxiv is not PubMed-indexed, but at least it is officially citable as articles there receive a doi.

1) Are the proteins that they encode functional? This is an open question that is being researched but it seems unlikely. There have been other examples of novel proteins found in humans but their expression levels are so low that they are unlikely to be functional.

2) This is in yeast and has yet to be confirmed whether these are found in mammals.

3) How large is the subset of these lncRNAs that is engaged by the translation machinery and so can produce protein products? It seems like it would probably be insignificant.

How about distinguishing the *definition* of function from the *criteria* or *methods* for determining whether that definition applies? Function can be defined as the role of a part of a system in the production of an 'organized' activity, process, ability or property of that system. Biological function can be defined as the role in an organism's ability to maintain the living state. In this view, Larry's genetic, evolutionary and biochemical approaches aren't different ways to *define* function, but different ways to determine whether a part has a function and what that function is.

Dear Prof. Moran,I hope that despite being late I might get an answer... I'll try to be concise and avoid "tedious and almost incomprehensible" discussions.You propose that:"A good working definition of “biological function” is to consider a particular stretch of DNA functional if deleting it affects the survival of the organism or its descendants. Conversely, if the DNA can be removed without consequences then it is probably junk."Your working definition is a-historical, whereas most evolutionary biologists would probably agree that junk DNA is first and foremost a historical concept, i.e. that this DNA is there NOT because of natural selection (at the level of the organism) acting on it, but for other reasons. But let's leave that aside for the moment. Could you please provide a working definition of "affecting the survival of human beings or their descendants"?The issue we tried to emphasize in our paper is that any difference in genotype makes a difference to the organism, and that although many such differences are irrelevant, fitness is not a panacea for sorting them out. A clear example is that there are several diseases that do not significantly affect our lifespan, nor our capacity to reproduce. Of course you could claim that the medicalized environment we live in is not our "normal environment", but unless you can provide non-historical reasons for this, there are plenty of potential environments to look at (e.g. does this genetic variation affect 70 years old men and women's capacity to climb the Kilimandjaro?). What matters, we thought, were differences we actually care about, in our current environment.So to get to the core of the issue: If we deleted the 50% of the human genome you're most certain is non-functional, are you sure you would notice no difference in any phenomena we care about, from drugs side effects to aging or risk for ASD? I think we don't know, and that's the substantive part of this debate.Now, of course none of this has to do with junk DNA historically understood -- for which many authors, such as Gregory, have made a powerful claim. And I think it's precisely a problem of this debate that the two questions systematically tend to get conflated.The concept of function might be doing more harm than good in biology.

Thank-you for responding. I'm hoping to get back to blogging within a few days and the next post on the Function Wars will address my concerns with your paper.

My point is that quibbling about the exact meanings of terms like "function" or "junk" is unlikely to be productive. I'm making this point by quibbling. :-)

Similarly, when we use terms like "affecting the survival of an organism or its descendants" I'm hoping that people will appreciate the sense of the phrase rather than insist on a precise meaning. Few terms in biology can stand up to quibbling. I doubt very much that you misunderstood my meaning.

The issue we tried to emphasize in our paper is that any difference in genotype makes a difference to the organism, and that although many such differences are irrelevant, fitness is not a panacea for sorting them out.

It cannot possibly be true that "any difference in genotype makes a difference to the organism." At least, not true in any biologically relevant way.

It's also true, in my opinion, that fitness is not a panacea for determining function. I gave some examples in my posts. Nevertheless, the value of a working definition is that it applies in the majority of cases and exceptions are just that, ... exceptions.

That's the best we can do. By the way, you and your colleagues focus on diseases but genetic diseases are not a normal function of the genome. They are mutations.

If we deleted the 50% of the human genome you're most certain is non-functional, are you sure you would notice no difference in any phenomena we care about, from drugs side effects to aging or risk for ASD?

Yes, I'm pretty sure that there would be no effect on drug side effects or risk for ASD. But the real issue concerns the burden of proof.

I think we don't know, and that's the substantive part of this debate.

There's a sense in which "we don't know" is meaningful but this isn't one of those cases. The evidence for junk is not something that you can ignore in this debate. The burden of proof is on those who claim that most of our genome is functional. Part of that burden requires that you refute the evidence. Here's a post on Five Things You Should Know if You Want to Participate in the Junk DNA Debate.

Thanks for your answer, and I'm very much looking forward having more feedback.

Our aim with the paper was twofold: a) to show that the SE account is highly problematic and that the CR account shouldn't be dismissed so quickly, and b) that beyond semantics there's also an empirical disagreement at play.

Regarding a), which is probably what you call quibbling (granted, contemporary "professional" philosophy has brought quibbling to quite a level, but in my opinion philosophy should be more a critical than a positive enterprise), I think there are at least 3 important values in such quibbling:1) from a didactic point of view, something such as this controversy is a most excellent way for students and scientists to think about a range of questions and considerations;2) challenging concepts and arguments often leads to their improvement (for instance I think a little quibbling could improve your interesting "five-things-you-should-know");3) it reminds us that some of the concepts we're working with (and your positions on functions are roughly the orthodoxy) are just that: perhaps useful but problematic working concepts (about functions I even have doubts on the "useful" bit).

Concerning b), I'm referring to the question I asked you about deleting. I'd be very curious to see a poll about this, because I'm not sure what proportion of biologists share your intuition. You may be right in saying that the burden of the proof shouldn't be equally distributed, but the allegedly overwhelming evidence you refer to isn't actually solving that question because it's tied to fitness (and my question wasn't). After a few cycles of quibbling-and-reformulation, I wouldn't deny the core of your "five-things-you-should-know", and yet it doesn't answer my question. Genetic load, for instance, tells us that only a small percentage of our genome contains critical information, but does it tell us whether the rest can make a small difference in, say, old age?I think it'd be constructive to consider a moment that ENCODE was claiming to provide evidence that much of the rest of DNA did make a such a difference, and to assess this claim.Pierre-Luc

but does it tell us whether the rest can make a small difference in, say, old age?

These hypothetical sequences wouldn't be retained in the genome on the basis of making a small difference in old age. Any beneficial effect invisible to selection would be accidental, as would any deleterious effect invisible to selection. How interested are we in whether we have some small number of such "happy accidents" to comfort us in our old age (I cannot imagine there would be a lot of them), that will pass randomly out of the genome since selection won't act to retain them?

Larry will, I hope, answer for himself. But as an evolutionary biologist I would almost endorse his definition, with one exception. I would call a sequence functional if deleting it reduces the fitness of the organism. Note that

1, This means a transposon insertion may be deemed nonfunctional if deleting it increases the fitness of the organism. If I weld a bunch of junk to the front of my car, it may reduce the speed of the car, and deleting it would increase the speed of my car. Nevertheless that junk is nonfunctional.

2. There seems to be a consensus that conserved sequence is the gold standard for "function". In such cases deleting the sequence would reduce fitness. But not necessarily noticeably. In evolution a mutation that reduces fitness can be effectively selected against if the selection coefficient is greater than 1/N. For most organisms that is so small a number that we would not notice the change in the laboratory. Nature does a longer and bigger experiment than we can.

3. Notice that I wrote "fitness", not "survival". A sequence could affect fertility but not viability, and that seems to have escaped attention here.

4. If you delete all those transposon copies that have a negative effect on the fitness, the resulting increase in fitess might be noticeable. But deleting each one individually might not lead to a noticeable improvement.

Whether the definition of functional that I am backing is of much use is to be doubted, since we will not easily be able to assess it.

Laurence A. Moran

Larry Moran is a Professor Emeritus in the Department of Biochemistry at the University of Toronto. You can contact him by looking up his email address on the University of Toronto website.

Sandwalk

The Sandwalk is the path behind the home of Charles Darwin where he used to walk every day, thinking about science. You can see the path in the woods in the upper left-hand corner of this image.

Disclaimer

Some readers of this blog may be under the impression that my personal opinions represent the official position of Canada, the Province of Ontario, the City of Toronto, the University of Toronto, the Faculty of Medicine, or the Department of Biochemistry. All of these institutions, plus every single one of my colleagues, students, friends, and relatives, want you to know that I do not speak for them. You should also know that they don't speak for me.

Subscribe to Sandwalk

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake.
Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory.
Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change.
Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance.
Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change.
Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat.
Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is TrueI once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000
It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma
One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick
There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner
An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins
Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod
The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.