This week, the ENCODE project released the results of its latest attempt to catalog all the activities associated with the human genome. Although we've had the sequence of bases that comprise the genome for over a decade, there were still many questions about what a lot of those bases do when inside a cell. ENCODE is a large consortium of labs dedicated to helping sort that out by identifying everything they can about the genome: what proteins stick to it and where, which pieces interact, what bases pick up chemical modifications, and so on. What the studies can't generally do, however, is figure out the biological consequences of these activities, which will require additional work.

Yet the third sentence of the lead ENCODE paper contains an eye-catching figure that ended up being reported widely: "These data enabled us to assign biochemical functions for 80 percent of the genome." Unfortunately, the significance of that statement hinged on a much less widely reported item: the definition of "biochemical function" used by the authors.

This was more than a matter of semantics. Many press reports that resulted painted an entirely fictitious history of biology's past, along with a misleading picture of its present. As a result, the public that relied on those press reports now has a completely mistaken view of our current state of knowledge (this happens to be the exact opposite of what journalism is intended to accomplish). But you can't entirely blame the press in this case. They were egged on by the journals and university press offices that promoted the work—and, in some cases, the scientists themselves.

To understand why, we'll need a bit of biology and a bit of history before we can turn back to the latest results and the public response to them.

What we know about DNA, and when we knew it

Among other things, DNA has at least two key functions. First, it codes for the proteins that perform most of a cell's functions. Second, it has control sequences that don't encode anything, but determine when and where the coding sequences are active. We've had some indication that non-coding DNA played key regulatory roles since the 1960s, when the Lac operon was described and won its discoverers the Nobel Prize.

The Lac operon is present in bacterial genomes, which are under extreme pressure to carry as little DNA as possible. The typical bacterial genome is over 85 percent protein-coding DNA, leaving just a small fraction for regulatory purposes. But that isn't generally true of vertebrates.

The coding portions of vertebrate genes turned out to be interrupted by noncoding regions, called introns. Some of these are huge—roughly a third the size of some of the smaller bacterial genomes. Vertebrate genomes also appeared to be littered with old and disabled viruses and mobile genetic parasites called transposons. Even some of the coding portions seemed a bit useless—near exact duplicates of genes were common, as were mutated and disabled copies. Many of these apparently useless pieces of DNA continued to carry sites for regulatory DNA binding proteins and continued to make RNA.

(To give you an idea of how mainstream all this was, I spent some time working on a mouse gene that was thought to be superfluous because it was a near-exact copy of a gene used by the immune system. But the copy was only expressed in males because a mobile genetic element's regulatory sequences had been inserted nearby. And I knew all this as an undergrad in the late 1980s).

By the time we sequenced the human genome, we discovered that this seemingly useless stuff was the majority. Over half the genome was built from the remains of viruses and transposons. Introns accounted for another large fraction. And all of it seemed to be an evolutionary accident. One fish, the fugu, lacks a lot of this DNA, and seems to get along fine, while many salamanders have ten times the DNA per cell that humans do. And if you looked at the DNA of different mammals, the vast majority of it (about 95 percent) wasn't shared by different species.

These findings seemed to support a model that was first proposed back in the 1970s, which picked up the (possibly unfortunate) moniker junk DNA. Genomic accidents—duplicating genes, picking up a virus—happen at a steady rate. Individually, these don't cause an appreciable cost in terms of fitness, so species aren't under a strong selective pressure to get rid of it, and pieces could linger in the genome for millions of years. But the typical bit of junk doesn't do anything positive for the animals that carry it.

Regulatory, junk, and non-coding DNA are all partly overlapping categories, which helps foster confusion. (Circles not to scale.)

You could even consider the idea of junk DNA to be a scientific hypothesis. It notes that animal genomes experience several processes that produce superfluous bits of DNA, predicts that these will not cause enough harm to be selected for elimination, and proposes an outcome: genomes littered with random bits of history that have no impact on an organism's fitness.

Junk dies a thousand deaths

For decades we've known a few things: some pieces of non-coding DNA were critically important, since they controlled when and where the coding pieces were used; but there was a lot of other non-coding DNA and a good hypothesis, junk DNA, to explain why it was there.

Unfortunately, things like well-established facts make for a lousy story. So instead, the press has often turned to myths, aided and abetted by the university press offices and scientists that should have been helping to make sure they produced an accurate story.

Discovery of new regulatory DNA isn't usually surprising, given that we've known it's out there for decades. There has been a steady stream of press releases that act as if finding a function for non-coding DNA is a complete surprise. And many of these are accompanied by quotes from scientists that support this false narrative.

The same thing goes for junk DNA. We've known for decades that some individual pieces of junk DNA do something useful. Introns can regulate gene expression. Bits of former virus or transposon have been found incorporated into genes or used to regulate their expression. So some junk DNA can be useful, in much the same way that a junk yard can be a valuable source of spare parts.

But it's important to keep these in perspective. Even if a function is assigned to a piece of junk that's 1,000 base pairs long, that only accounts for about 1/2,250,000 of the total junk that is estimated to reside in the human genome. Put another way, it's important not to fall into the logical fallacy that finding a use for one piece of junk must mean that all of it is useful.

Despite that, many new findings in this area are accompanied by some variation on the declaration that junk is dead. Both press officers and scientists have presented a single useful piece of virus as definitively establishing that every virus, transposon, and dead gene in the human genome is essential for our collective health and survival.

63 Reader Comments

I find more and more of the mainstream press coverage is designed to "steal hits back from the bloggers", rather than actually, you know, cover the news thoroughly and accurately. The quality of science reporting in our local (major market) papers is appalling… unfortunately, it makes my job of "cleaning up the mess" later for my high-school students a massive one. I'll be referring my kids here when we hit DNA / RNA in a couple of weeks!

Yes, I've read a number of these articles in my journals, and in the popular press. There is a lot of confusion. But, as you say, there is the definition of "functional" to think about.

Generally, functional means that something required biologically, is being accomplished. But if we think instead "to function". We can see that as long as something is being done, then it's functional. So encoding for RNA that has no part in the life of the cell, isn't exactly functional, but that bit is functioning, in the sense that it is doing something.

I wouldn't go with Terrible headline, although I would agree that the headline seems to refer to everything you've ever read was wrong. Perhaps the author should have slipped the Encode Project name into the title. Although I can see it's applicability to other science headlines as well.

That having been said, perhaps it's time for the scientists to come up with some sort of formal, defined titles for the various "parts" of DNA as we currently understand them. The only problem I see is that the process is very much likely to be as easy as herding 15,000 cats across the United States on foot.

This is an unfortunately common occurrence and the highest profile journals (Nature, Science) seem to be guilty in this regard than other, more specialist, journals.

That said, I'm not sure that it's fair to blame the press offices for this as they are often laymen who can't be expected to have the expertise to critically examine the claims of scientists. It's the scientists and the institutions as a whole (university departments, and the journal's ed. boards) that are responsible for the confusion and hyperbole in press releases.

It's clear that there is a spectrum of activity, with about 80% of DNA having some level activity. The problem I think lies in the semantic labeling of parts of that continuum of DNA activity. It's really an impossible task.

But I think the biggest problem lies with the term 'junk', which carries a lot of semantic baggage, and not with the term 'functional'.

Clearly John has a special interest in this topic. Perhaps he is still defensive over his thesis work being labeled junk .But the idea that the vast majority of the genome is transcribed, and interacts with binding proteins really is significant. If for no other reason than it buffers the availability of transcription and processing factors changing their availability for interacting with the "more important" sites (acknowledging that our ability to measure importance is very poor). In that sense, DNA is functional even if it has no intrinsic activity of its own. As noted above, the term junk is too emotionally laden. Encode is right to say that most of the DNA is functional to some degree. But John is right to say that this is not entirely news. Reports that 80% of the human genome was transcribed in certain cells is now three years old.

Well the scientists make try to describe a sensational story with great conclusions so that they can get in a high impact journal. The journal want citations so they try to make it sound as impressive as possible and the media wants to create click baits that people read.

Choosing to report the 80% number was a poor decision by the ENCODE scientists, especially considering the actual number of ~20% is still quite large compared 1% for coding regions. This seems like a clear ploy to garner attention and thus more funding. In a way, that's a valid goal in and of itself since more funding will allow ENCODE scientists to catalog more DNA interactions which could lead to more medical applications in the future. I just wish they could have done that in a way that didn't feed into the intelligent design propaganda machine.

On the other hand, getting some coverage by mainstream press is a Good Thing. Maybe it will inspire some more interest and funding for basic (and advanced) science.Those of us who went to school for, or take a special interest in, molecular biology may realize this is more than an argument about semantics, but most laypeople won't.It seems they're presenting this kinda how I like my physics. Big ideas that I can wrap my head around (even if they're somewhat simplified and not True with a capital T) instead of acres of equations.

On the other hand, getting some coverage by mainstream press is a Good Thing. Maybe it will inspire some more interest and funding for basic (and advanced) science.Those of us who went to school for, or take a special interest in, molecular biology may realize this is more than an argument about semantics, but most laypeople won't.

Most laypeople will come away thinking something else. "And they told us most of the DNA didn't do anything! They said it was junk! Science is wrong again, just like when they told us the Earth was flat!"

It all comes down to funding, too, people. The better the claim you can make (yes, even in a formal paper), the better it looks on grant applications.

As funding pressures increase, we are gradually selecting for scientists who are good at writing "elevator pitches". Who cares about accuracy and precision? We just took one (teeny, weeny) step to curing cancer! Hey... just edit out the "teeny, weeny" part for more "punch"...

The single biggest problem I see with reporters/journalists is that they are experts on nothing. They don't have an in-depth understanding of the stuff they report on, except perhaps in the rare cases where they may have additional education/experience in the subject, or perhaps are more involved in it as a personal hobby.

EDIT: I just realized I may have misunderstood the article here, but I think my comments is still somewhat relevant. I imagine PR people are just as guilty of not being experts in the material that they are responsible for disseminating to the public, and that compounds the problem.

I still wonder how much “junk” is “junk” and whether it’s not important in circumstances that we haven’t yet elucidated.For example, “one person's junk is another person's treasure”. An extreme case of this is the mutation in hemoglobin that causes sickle cell – pretty troublesome for many, but maybe a lifesaver when it confers resistance to malariaOr even, “one cell’s junk is another cell’s treasure”: It’s becoming apparent that as cells differentiate and follow specific lineages, they tend to lose DNA that’s not being used, whereas DNA that is actively transcribed and valuable to the cell in that state of differentiation is retained – i.e. the selection pressure to retain it is strong. In the late 1960s, Lord Florey began to describe how the endothelium was considerably more than “a sheet of nucleated cellophane”…

For example, “one person's junk is another person's treasure”. An extreme case of this is the mutation in hemoglobin that causes sickle cell – pretty troublesome for many, but maybe a lifesaver when it confers resistance to malaria

That's a different issue entirely, that's a mutation in a gene that's actively in use causing a protein to fold incorrectly. What's termed here as 'junk' is DNA that isn't actually transcribed into proteins or otherwise used for regulatory purposes at all, with or without a mutation.

On the other hand, getting some coverage by mainstream press is a Good Thing. Maybe it will inspire some more interest and funding for basic (and advanced) science.Those of us who went to school for, or take a special interest in, molecular biology may realize this is more than an argument about semantics, but most laypeople won't.

Most laypeople will come away thinking something else. "And they told us most of the DNA didn't do anything! They said it was junk! Science is wrong again, just like when they told us the Earth was flat!"

Sadly this is all too true. Someone on Facebook posted a link to a NYT article when this "news" originally came out stating that is was "another nail in the coffin for evolution."

Generally, functional means that something required biologically, is being accomplished. But if we think instead "to function". We can see that as long as something is being done, then it's functional. So encoding for RNA that has no part in the life of the cell, isn't exactly functional, but that bit is functioning, in the sense that it is doing something.

So I can understand the confusion here.

Yes. Obviously there are junk DNA elements which, if expressed, could have a very large 'functional impact' - cancer, old retroviral infections, etc. From the point of view of the virus or cancer cell those genes are doing a great job, though the rest of the organism might differ.

In their defence, ENCODE were careful to define the term 'functional element' quite clearly at the start of their paper. They just omitted to tell the press that much (maybe most) of the 'functional elements' they found were still junk that we could do without.

For example, “one person's junk is another person's treasure”. An extreme case of this is the mutation in hemoglobin that causes sickle cell – pretty troublesome for many, but maybe a lifesaver when it confers resistance to malaria

That's a different issue entirely, that's a mutation in a gene that's actively in use causing a protein to fold incorrectly. What's termed here as 'junk' is DNA that isn't actually transcribed into proteins or otherwise used for regulatory purposes at all, with or without a mutation.

I agree with you, it is a different issue and perhaps I haven't made my point very well. Which is that we're gradually learning how junk DNA/junk blobs of cells/pointless organs/bugs living on our skin & in our hair aren't at all junky, but doing something more subtle that we anticipated. To drag things out into another junk-related example, it was recently shown that the whopping population of bugs living in our intestines changes during pregnancy, and has a significant effect on insulin & glucose metabolism. So the bacteria aren't just poo helping us digest, they are potent regulators of our own metabolism.

So back to that 80% of "non-junk". Maybe most of it is relics from historial/evolutionary battles of "our" genome with retrotransposons/LINEs, retroviruses, maybe redundant miRNAs, lncRNAs from our past lives. Or maybe some confer a useful (not essential) regulatory role in a subset of paneth cells when faced with an onslaught of dysentery. And discovering those more esoteric functions is going to be really hard and less amenable to brute force genomic approaches. Disclaimer: I do like these brute force approaches - but the volume of data generated does scare me

This article should be applied to global warming. I'm sick to death of reading articles that claim that global warming is real but they point to absolutely no scientific, scholarly or peer reviewed sources. They simply regurgitate some other poorly written article in an attempt to be taken seriously or as factual. Now I'm waiting to be flamed by the global warming "herd" who will reply back to this post with equally unscientific responses that will amount to nothing but name-calling since they're either too lazy to engage in a discussion of the facts or too uninformed to know what "peer reviewed", "scholarly" or "scientific" really means.

This article should be applied to global warming. I'm sick to death of reading articles that claim that global warming is real but they point to absolutely no scientific, scholarly or peer reviewed sources.

That wasn't actually the problem here. All this work was peer reviewed - it was just promoted in a misleading way.

Also, are you complaining about articles that appear on Ars? Because, far and away, our primary source is the peer reviewed literature.

ferrels wrote:

Now I'm waiting to be flamed by the global warming "herd" who will reply back to this post with equally unscientific responses that will amount to nothing but name-calling since they're either too lazy to engage in a discussion of the facts or too uninformed to know what "peer reviewed", "scholarly" or "scientific" really means.

No, if you get flamed, it's probably going to be because you seem to have written a wildly off-topic post that has nothing to do with what's being discussed here.

This article should be applied to global warming. I'm sick to death of reading articles that claim that global warming is real but they point to absolutely no scientific, scholarly or peer reviewed sources.

That wasn't actually the problem here. All this work was peer reviewed - it was just promoted in a misleading way.

Also, are you complaining about articles that appear on Ars? Because, far and away, our primary source is the peer reviewed literature.

ferrels wrote:

Now I'm waiting to be flamed by the global warming "herd" who will reply back to this post with equally unscientific responses that will amount to nothing but name-calling since they're either too lazy to engage in a discussion of the facts or too uninformed to know what "peer reviewed", "scholarly" or "scientific" really means.

No, if you get flamed, it's probably going to be because you seem to have written a wildly off-topic post that has nothing to do with what's being discussed here.

Poe-d? I can't tell.

Question: Did ENCODE's definition of "functional" amount to "DNA we could actually more or less DO something (possibly interesting) with"?

Thought this was allusion to Firesign Theater "Everything You Know Is Wrong" if so bravo. After all, Ben Franklin was the best president that was never president.

Competition for grant $ is vicious in biology with ever larger grants going to increasingly fewer investigators. It is no surprise that there is a tendency for profs to engage in rampant self-promotion to help rack up publications to maintain their position in the academic arms race.

Unfortunately with ENCORE bringing together so many peer-reviewers at so many institutions under one tent there is increased likelihood that they are all just peer-reviewing each others mediocre research. No surprise that they crown this love fest with a a less-than-honest PR campaign.

Thank-you John for your expert perspective identifying the spin and taking the press officers and some of your fellow journalists to the woodshed. And as you cautiously suggested, the scientists are not blameless in allowing their work to be hyped. Your story today embodies what science journalism should be.

This article should be applied to global warming. ... Now I'm waiting to be flamed by the global warming "herd" who will reply back to this post with ...

Nothing to do with being flamed for not being part of the herd. in fact the opposite is true. Despite all the evidence supporting climate change AND that it is presently significantly increased by human activity, in fact actually caused by it, you decide to ignore those facts just to be part of the denier cult. And in addition, you decide to talk about something completely unrelated to the article in this space, which in conjunction with all your other "first time poster" comments in other threads makes you a troll.

Now on to the subject at hand, while I am not an expert on these things, I did stay at my favorite motel last night, so I have to ask, why is absence of proof being used as proof of absence in the case of "junk" DNA? After reading the article I still don't understand why we "know" that this so called junk DNA is in fact JUNK, I understand the idea of a model being used to predict an outcome, and therefore it being useful in science but if we have occasions where this prediction is shown to be wrong, doesn't that debunk the hypothesis/model?

Unfortunately with ENCORE bringing together so many peer-reviewers at so many institutions under one tent there is increased likelihood that they are all just peer-reviewing each others mediocre research. No surprise that they crown this love fest with a a less-than-honest PR campaign.

There's no reason to think that peer-review had any problems here. The problem has nothing to do with the research itself, or their findings. The problem is in how their work is being promoted to the public at just about every level.

But the idea that the vast majority of the genome is transcribed, and interacts with binding proteins really is significant. If for no other reason than it buffers the availability of transcription and processing factors changing their availability for interacting with the "more important" sites (acknowledging that our ability to measure importance is very poor). In that sense, DNA is functional even if it has no intrinsic activity of its own.

It gets into the definition of "functional" and "hereditary information".

DNA is a carrier of the genome, but such buffering capacity isn't inherited under strictly selective control. It is part of the genes environment, and as such rather inherited as the rest of the cellular machinery that is inherited through eggs and asexual clones back to the first protocells.

Now on to the subject at hand, while I am not an expert on these things, I did stay at my favorite motel last night, so I have to ask, why is absence of proof being used as proof of absence in the case of "junk" DNA? After reading the article I still don't understand why we "know" that this so called junk DNA is in fact JUNK, I understand the idea of a model being used to predict an outcome, and therefore it being useful in science but if we have occasions where this prediction is shown to be wrong, doesn't that debunk the hypothesis/model?

The majority of our "junk DNA," is definitely not regulatory and it's not coding for proteins. There is definitely some non-coding DNA that's regulatory, but not all of it. We know that some of it has no effect if it's deleted or copied more times than usual. We even know where some of it came from, viruses that infected our ancestors millions of years ago. We're likely to find more regulatory DNA as we investigate the non-coding DNA, but chances are that a lot of it will still wind up being junk that doesn't do anything. That's probably why a "higher" life form like puffer fish can get by with 10% of our DNA, while a "lower" life form like onions can be onions with 7x more DNA than us (in fact, the difference in amount of DNA between some onion species can be larger than our entire genome). There's just no reason to think that all DNA does something. There's very good reason to think that it doesn't. We don't know all of which is which, but the fact that some of it is junk will almost certainly be borne out in the end.

This article should be applied to global warming. I'm sick to death of reading articles that claim that global warming is real but they point to absolutely no scientific, scholarly or peer reviewed sources.

Poe bait or troll bait? Without "absolutely no scientific, scholarly or peer reviewed sources" for the claim of such articles, we can't properly discuss them.

But the situation is hardly analogous.

AGW is well established consensus, and articles don't necessarily need to refer to that anymore than articles about gravity has to refer back to Newton's or Einstein's work.

While ENCODE is affecting the consensus with aspects of its new information*, so needs to be referred to in that capability.

*For one, it may open up for observing more of the RNA world and the possibility that we retain more of it than earlier believed, because it shows how selective pressures on RNA is less than on DNA.

Now on to the subject at hand, while I am not an expert on these things, I did stay at my favorite motel last night, so I have to ask, why is absence of proof being used as proof of absence in the case of "junk" DNA?

I'll add to what WOC discussed, when we have a predictive theory "absence of proof [can be] used as proof of absence" or rather absence of observation when observation would be expected of the theory or the null hypothesis can be used for testing.

To claim otherwise is to replace empirical science with rejected ideas from philosophy. (Here rejected by the existence of statistical hypothesis testing that contradicts the claim.) I find it especially egregious because the ideas of induction (and its "black swan" problem) is promoted from a basis of theological precepts, from Middle Ages religious use of aristotelianism.

The hypothesis that genome dark matter is evolutionary non-functional is suggested by observation and is tested by new observation. It seems to work well, while the competing hypothesis have not much joy in test such as ENCODE.

Now on to the subject at hand, while I am not an expert on these things, I did stay at my favorite motel last night, so I have to ask, why is absence of proof being used as proof of absence in the case of "junk" DNA? After reading the article I still don't understand why we "know" that this so called junk DNA is in fact JUNK, I understand the idea of a model being used to predict an outcome, and therefore it being useful in science but if we have occasions where this prediction is shown to be wrong, doesn't that debunk the hypothesis/model?

The majority of our "junk DNA," is definitely not regulatory and it's not coding for proteins. There is definitely some non-coding DNA that's regulatory, but not all of it. We know that some of it has no effect if it's deleted or copied more times than usual. We even know where some of it came from, viruses that infected our ancestors millions of years ago. We're likely to find more regulatory DNA as we investigate the non-coding DNA, but chances are that a lot of it will still wind up being junk that doesn't do anything. That's probably why a "higher" life form like puffer fish can get by with 10% of our DNA, while a "lower" life form like onions can be onions with 7x more DNA than us (in fact, the difference in amount of DNA between some onion species can be larger than our entire genome). There's just no reason to think that all DNA does something. There's very good reason to think that it doesn't. We don't know all of which is which, but the fact that some of it is junk will almost certainly be borne out in the end.

two follow up questions:1.) about viral dna (as an example), we know that many bacteria are prolific "exchangers" of genetic material, that in fact that is how they gain resistance to many anti biotics, could such a situation occur for us, where the material our genome acquires could be used in a specific case to confer some advantage that at this point is just not commonly present in our environment today? and would we call this at this point junk dna? 2.) and are there cases when the multiple copies of a sequence confer extra capability? for example as a backup or as a reinforcing command?

third question is how do you go about testing for either the inclusion of junk dna, (considering that maybe the now inactive dna could be activated in an edge scenario that we don't normally experience), or the exclusion of such dna and therefore the invalidation of the hypothesis/model. and is the problem then one of definition only, (for example, is dna that isn't used commonly, but ONLY in edge scenarios that don't happen often considered junk, or is it something that can't be activated at all [and how do you prove that?]).

I am willing to give that we will find some DNA that are either transcription errors or stuff that got added to the sequences, like the aforementioned viruses. therefore I am willing to accept that assertion that there is such "junk" dna and that we almost certainly will find that some of our dna does nothing. But I am not willing to give that most of our DNA fits that when we find that unlike the onions you talked about, there seems to be very close correlation of genome size in the human genome between the human population. This is why the idea of junk dna (in huge % of our genome) seems counter intuitive to me. Possibly my problem is that since I don't have a good basis in Biology, I don't have a good understanding of the regulatory processes for passing on genes, chromosomes and their relevant size to each other. Sorry for the ignorance but honestly, without some of those questions being at least partially answered or the foundation to them being relatively solid (both in myself and in the general public), I can understand the difficulty of reporting about these things and do so in a manner that won't drive a Biologist or geneticist crazy.

1.) about viral dna (as an example), we know that many bacteria are prolific "exchangers" of genetic material, that in fact that is how they gain resistance to many anti biotics, could such a situation occur for us, where the material our genome acquires could be used in a specific case to confer some advantage that at this point is just not commonly present in our environment today? and would we call this at this point junk dna?

There are IIRC known examples of genes that have been pieced together from various bits.

As well as repeated use among mammals of retroviral genes that have been inserted into a germ line of cells to manipulate the female immune system during pregnancy. Some viral genes lower the immune system response, and we as well as sheeps et cetera use them to modulate response at the barrier between uterus and placenta. It is what I know unlikely that the insertion, prompted by eukaryote viral inactivation, and the fixation, prompted by use, happened simultaneously.

hpmmfs wrote:

2.) and are there cases when the multiple copies of a sequence confer extra capability? for example as a backup or as a reinforcing command?

Gene duplication is a major evolutionary mechanism. Such proteins diverge in use, for example if they are enzymes they specialize in function (which can be generic) or change template.

hpmmfs wrote:

third question is how do you go about testing for either the inclusion of junk dna

Both pseudogenes (inactivated, but still recognizable promoters, ends et cetera) and viral genes/pseudogenes (gene lineages) can be recognized by sequencing.

Sequencing works with self-replicating gene parasites (LINEs, SINEs et cetera) too, they are repeated all over the place and can be seen to increase after cell division.

hpmmfs wrote:

But I am not willing to give that most of our DNA fits that when we find that unlike the onions you talked about, there seems to be very close correlation of genome size in the human genome between the human population.

Copying during cell division keeps the genome size fairly faithfully. Chromosome doubling can happen, but survivability and fecundity is a problem. It seems some lineages like plants and rodents can do it a lot (maybe correlated with short generation cycles and many species, so many attempts), others not so much.

Somewhere in ancestors we glued together two chromosomes that chimps retains. When that happened there was a rather large genome size change because of the rearrangement:

"The results of the chimpanzee genome project suggest that when ancestral chromosomes 2A and 2B fused to produce human chromosome 2, no genes were lost from the fused ends of 2A and 2B. At the site of fusion, there are approximately 150,000 base pairs of sequence not found in chimpanzee chromosomes 2A and 2B. Additional linked copies of the PGML/FOXD/CBWD genes exist elsewhere in the human genome, particularly near the p end of chromosome 9. This suggests that a copy of these genes may have been added to the end of the ancestral 2A or 2B prior to the fusion event.* It remains to be determined if these inserted genes confer a selective advantage." [ http://en.wikipedia.org/wiki/Chimpanzee_genome_project ]

It seems the only way to answer whether it's junk or not is to eliminate it from the genome and test viability.(Venter could do it with a mouse and claim he had created an artificial biological mouse).