Tuesday, December 8, 2009

A De Novo Gene: Unlikely and Very Unlikely

If you scramble about 90% of a protein sequence—randomly replacing amino acids with different ones—would the protein still work? That is what evolutionists are implying in order to make sense of their theory. The problem is that evolution’s explanations for de novo genes are unlikely and very unlikely. In the case of the T-urf13 de novo gene, the two choices seem to be (i) a one in ten million shot that protein coding sequences just happened to be lying around waiting for use or (ii) only about 10% of the T-urf13 sequence really matters and you can scramble the rest with no effect.

Background

An obvious problem with evolution is that it calls for vast banks of biological programs to arise on their own. One example of this is the protein coding genes within DNA. Evolutionists usually say that these resulted from the reuse of existing protein coding genes. For instance, we are able to see in color because the photocells in our retina contain different proteins that are sensitive to different colors of light.

And how did the genes for these different proteins arise? Easy, take one such gene, duplicate it and throw in a few mutations to modify the color sensitivity. Of course there are massive problems with this narrative which evolutionists fail to recognize, but that’s another story.

Also, there is the question of from where did the first such gene come? If new genes come from pre existing genes, then from where did the first gene come? Ever since David Hume, evolutionists have argued against an infinite regress of causation so they have to have a starting point. But they have no explanation for such massive complexity beyond vague speculation which amounts to “See, poof, it happened.”

In an effort to bridge this enormous gap, evolutionists have constructed a new narrative based on de novo genes. These are genes that were not predicted but now, amazingly, evolutionists are using them as proof that evolution can indeed create new protein coding genes.

That is the argument evolutionists use for the de novo gene T-urf13 which was found in the mitochondrial genome of certain varieties of corn. The problem is that T-urf13 provides no such evidence. Indeed, if anything, it is yet another de novo gene that contradicts evolutionary theory. Let’s have a look.

Two choices: Unlikely and very unlikely

The T-urf13 gene sequence appears to come from two separate sequences already residing in the mitochondrial genome. The two sequences are in, and flanking, an RNA gene. In other words, it appears that two sequences came together, along with a short unidentified segment, to form this new gene.

But the story is more complicated than the mere reuse of pre existing coding sequences. Under the theory of evolution, the RNA and flanking sequences are not designed to have a role in coding for proteins. Evolution does not have the foresight, for instance, to imbed secondary functions for future use in the DNA information.

Evolutionists therefore cannot say the T-urf13 gene arose from the duplication of an existing protein gene. They could say that T-urf13 is a lucky strike—that the RNA and flanking sequences just happened to have protein coding properties even though they were not designed or used as such. As explained elsewhere this is unlikely (probably far worse than a one in ten million shot).

Or evolutionists can agree that, yes, the RNA and flanking sequences were not originally protein-coding like segments, but mutations evolved them into a protein coding sequence. The problem here is that we don’t find very many mutations at work. This is a difficult argument for evolutionists to make because there is so little sequence information added to the sequence. What we find is a couple dozen point mutations out of about 340 nucleotides (about 93% of the nucleotides are conserved), along with several insertions and deletions.

This second option is probably worse than the first option. For evolutionists would have to say that a sequence that has no protein-coding properties—that was not designed or selected for such information and therefore is no better than a random sequence insofar as protein-coding is concerned—can be converted into a protein-coding gene by swapping only a relatively few nucleotides. The resulting protein would have only a few percent of the amino acids modified, along with some insertions and deletions.

One way to test this evolutionary hypothesis would be to introduce mutations at those T-urf13 nucleotide sites that share identity with the original RNA and flanking sequences. In other words, scramble the majority of the T-urf13 gene. While we cannot know for sure, certainly our current knowledge suggests the resulting gene would be junk. You cannot scramble ninety percent of a gene and reasonably expect a folding, functioning, fitness-adding protein.

And if the mutated gene is junk, then we would conclude that T-urf13 owes its protein-coding abilities, probably in large part, to those original RNA and flanking sequences and that the evolutionary hypothesis makes little sense.

Summary

Evolution is not well supported by the scientific evidence. Yet evolutionists continue to reinterpret the evidence in creative ways to prop up the theory. In the case of the T-urf13 gene evolutionists have claimed that, in spite of the science, the gene is a result of a routine evolutionary capability to produce new genes.

Cornelius said: "But they have no explanation for such massive complexity beyond vague speculation which amounts to “See, poof, it happened.”"

He also said on Uncommon Descent (in response to somebody who had queried whether ID is science: "So Newtonian physics was not science because it had no mechanism for gravity"

So apparently Newton gets a pass and Cornelius's science endorsement but Darwin and evolution does not because it does not have an answer for everything or has yet figured out abiogenesis.

And further more, we still don't get a single word from Cornelius as to what he thinks a viable explanation is for T-urf13. Nada. None. Zilch. Given that he is a member of the DI, I suppose one would assume he is a support of Intelligent Design in some shape or form, but given how he actually has no explanation or any actual new ideas on anything, I even wonder that at times.

Besides, isn't "poof, it happened" a rather good description of ID? All the IDer can say is that something was done by a designer - but they can provide no specifics, don't know when, don't know what, don't know how. And in the case of Cornelius, he's not even going to think or speculate on any of these questions. But apparently though this is still doing science, although evolution is not.

Hey Nick, You took that as an insult? It was a joke. Sorry If I offended you. BTW, do you realize you are starting out with an religous implication that I am a creationist? You seem to start out with a bias before you do your research. Hmmmmmmmmmmmm.

Are we sure that T-urf13 evolved recently? Somewhere, someone made the comment that there are selective reasons for male sterility among plants in the wild. Could it be that T-urf13 has been around a lot longer than we think?

I ask, because it certainly looks as thought the T-urf13 complex was meant to be a gate in the mitochondrial membrane, that opened when particular proteins bined with Urf13's outer binding-sites. It could be that it was after the fungus evolved proteins in its toxin that could attack this gate, that T-urf13 was selected against, and eventually degraded. Perhaps a dormant copy of it remained in the nuclear genome. Or perhaps copies of it remained in heteroplastic mitochondrial genomes.

And are we sure that T-urf13 evolved from the RNA sequence and the surrounding non-coding sequences? Could it be that the T-urf13 gene devolved into them?

It's interesting that you are so taken with T-urf13 - I'm rather delighted to have struck this nerve. You seem to be having trouble with the fact that this protein shows how populated with function protein sequence space is (much more so than IDists seem to be able to accept). You also seem to be fixated on the one-in-a-trillion number, but (as has been explained to you), this number is actually very modest. (You don't even need a calculator - 1 million pollen grains per corn plant, 1 thousand plants per acre, 1 thousand acres, and 1 mitochondrial recombination event per pollen grain = a trillion events such as gave rise to T-urf13.) But I don't know why you fixate on a trillion, when other studies show that function can be had for the paltry price of less than 100 million.

Your closing remarks make no sense to me. Of course T-urf13 came from sequences that had nothing to do with proteins. This is just a manifestation of the zero-CSI nature of proteins (a nature that is supported by a host of direct experimental measurement and some very interesting theoretical considerations).

Aside to Bilbo - why don't you ask me these questions, somewhere where I can answer with the appropriate tools (links and the like)? You seem to be avoiding answers. Why is this?

One more thing - I apologize for not keeping up my end of this discussion. It serves me right for jumping into this at the end of the semester. I'll try to be more responsive, but I can't make any promises, what with the holidays and the January NSF deadlines looming.

"But I don't know why you fixate on a trillion, when other studies show that function can be had for the paltry price of less than 100 million."

Actually those studies don't show that--unless you're thinking of different studies than what I'm thinking of. Perhaps you could cite them. In any case you still have an unlikely story.

You have a large effective population size. That can argue that the merger of the two pre existing sequences is reasonable. It can also argue that the mutation count is reasonable.

But that's not the problem. I'm taking it as a given that T-urf13 arose from those two pre existing sequences, and then some mutations occurred.

The problem that arises is that we should not expect the result to be a functioning protein. This has nothing to do with population size.

As I discussed in the blog, you need the functioning protein to arise from some combination of the information content in (i) the pre existing sequences and/or (ii) the mutations that are added.

But in neither case is that likely, as I explained.

Your response seems to be (i) I've got a large effective population size and (ii) proteins are have low CSI.

Your first point is not relevant and your second point is not empirical. You yourself just stated above the 100 million figure. That is a large number. All you have to work with is 500 kb in the genome from which your two sequences must come, and the few mutations.

"The problem that arises is that we should not expect the result to be a functioning protein"

On what basis? There are plenty of theoretical and wet-bench results that tell us exactly the opposite.

" ... your second point is not empirical"

My second point, that proteins are zero-CSI entities, is quite the empirical observation. The study that showed that 1 in 10^12 random sequences have function is a measurement of "SI". Many others have been done to boot. In all of these cases, if the ratio of functional to toal sequences (Durston uses the term Nf/N) is greater than 1 in 10^150, then there is no CSI. This "line in the sand" was laid down by Dembski. The facts are that these many measurements, that yield values between 1 in 10^7 and one in 10^12 (give or take - I'm typing off the top of my head), all give the same result re: CSI in proteins. This result is that proteins have no CSI.

These measurements also show that, for all practical purposes, the probability of deriving a new functional sequence (for any function) in the biosphere is essentially one. (Contrary to your claims, Cornelius, population size is most certainly a factor in this.)

The study from which the value of 1 in 40 million is taken - Kjaergaard et al., Appl Environ Microbiol 67, 5467-5473, 2001.

"Mitochondrial genomes are models for recombination dynamics. There are hundreds of copies per cell, and, as such, they can undergo a wide range of recombination events through direct and indirect repeats. Hence, mitochondrial DNA is essentially a polyploid genome, sustaining deletions and duplications, with little consequence to the general viability of the organism."

Consequently, it's not clear how relevant evolution of proteins in mitochondrial genomes is to understanding the evolution of proteins in nuclear genomes.

But Dr. Hunter, I haven't been able to find anything in the literature that explains how the URF13 complex functions as a gate ion channel apart from the pathotoxins. Are you sure that was its function?

With some of the comments, it looks like there is confusion over where the one in ten million comes from, and why population size is irrelevant to the number. Let me see if I can clarify this successfully:

One in ten million (10^7) comes from 10^12 (number of sequences before we find one functional protein) divided by 10^5 (number of T-urf13 sized genes that could possibly fit in the corn mitochondrial genome). Hunter's point is that it is extremely unlikely that the 26S ribosomal RNA gene or flanking sequence (or any RNA gene within the genome for that matter) from which the T-urf13 gene borrows contains a protein coding sequence, since “evolution does not have the foresight” to select for a protein coding sequence that is unused, and therefore the odds of the 26S ribosomal RNA gene (or the flanking sequence) containing said protein coding sequences is essentially the same as any random sequence.

Clearly, the 26S ribosomal RNA gene and flanking sequence is for the most part the same across the population (indeed, this is how we can track the origins), so the size of the population is irrelevant, as is the number of generations. Hunter is talking about a starting point in this case, which is the same for all individuals in the population. Think of it like a race, where every racer starts at the same point, and that point is randomly chosen to be anywhere from 0 to 10^12 meters away from the finish line. Hunter is saying that it is extremely unlikely that the start line is 0 meters away from finish under this scenario – and therefore the number of racers is irrelevant.

statemachinist (nice handle, by the way), those calculations are for constructing T-urf13 by randomly mutating the mitochondrial DNA until one of the iterations produces T-urf13.

Since T-urf13 was pretty obviously made by combining chunks of two pre-existing pseudo-genes and a little random garbage, you have to wonder why the calculation was made in the first place, since the new gene wasn't constructed that way.

I would guess that not understanding the problem and the well-known ID fascination with huge, but inapplicable, odds is responsible. See anything written by Dembski for examples.

You said: "therefore the odds of the 26S ribosomal RNA gene (or the flanking sequence) containing said protein coding sequences is essentially the same as any random sequence."

This is my point exactly. One can shuffle the mt genome to get any number of different, essentially random sequences. If the numbers (= population) are large enough, then the probability of hitting on a functional one (T-urf13) are pretty good.

Thanks, I wanted to use "StateMachine" as my WordPress handle, but thankfully that was taken and forced me to come up with a more "clever" handle.

Anyhow, Hunter's calculations are certainly NOT the probability of "constructing T-urf13 by randomly mutating the mitochondrial DNA until one of the iterations produces T-urf13". This is what I was trying to point out. He is calculating the probability that the protein-coding sequence "just happened to be lying around waiting for use"...that is, one of the sequences common to the corn mitochondrial genome ALREADY contained the protein coding sequence prior to T-urf13 appearing.

Indeed, he makes no attempt to try to calculate the odds of it being constructed via point mutations and reshuffling, because he considers it far too improbable, and I assume because there are far too many scenarios to consider.

On my blog (which I created solely for this topic), I recently attempted to estimate the probability for four different scenarios, including one in which 90% of the necessary sequence was "borrowed" from a pseudo-gene.

aghunt:

"One can shuffle the mt genome to get any number of different, essentially random sequences. If the numbers (= population) are large enough, then the probability of hitting on a functional one (T-urf13) are pretty good."

This is where I take issue, because it assumes a point mutation rate of 1 per nucleotide...or that a child's genome is completely independent of the parent's. Of course, we know from genetics that a child does not have access to any random sequence, that in fact there is a much higher probability of getting a sequence that is only one mutation away (whether it be reshuffling from another location or simply a point substitution) from the parent genome than a sequence that is several mutations away. Since we can expect that a random starting sequence (indeed, we are bound by our starting location!) will be several mutations away from our target sequence, the probability of getting that target sequence is much less likely than if we randomly selected a new sequence each time (because, once again, many of the same “wrong” sequences will occur at high frequency). This is why mutation rates are taken into account in the first place.

"But I don't know why you fixate on a trillion, when other studies show that function can be had for the paltry price of less than 100 million. [...] The study from which the value of 1 in 40 million is taken - Kjaergaard et al., Appl Environ Microbiol 67, 5467-5473, 2001."

First, perhaps you misread the paper. They found an identical insert (9-residue sequence in plasmids pKKJ106 and pKKJ116). There are, roughly, half a million-million such sequences. I suspect they tested more than that many sequences. I don't think your number of a hundred million sequences makes sense. That would make it quite an unlikely finding (a 20,000 to 1 shot).

Second, they screened for zinc binding--simpler than ATP binding in the Keefe study. And both are simpler than complete protein function.

"But Dr. Hunter, I haven't been able to find anything in the literature that explains how the URF13 complex functions as a gate ion channel apart from the pathotoxins. Are you sure that was its function?"

No, I'm not sure. Perhaps there is more to it, but for now I'm merely responding to AG's claims that the gene demonstrates evolution at work, which is clearly problematic.

"These measurements also show that, for all practical purposes, the probability of deriving a new functional sequence (for any function) in the biosphere is essentially one. (Contrary to your claims, Cornelius, population size is most certainly a factor in this.)"

No, population size is not a factor because we're already granting you all the sequence variations you want, within the observed number of mutations. More below...

"This is my point exactly. One can shuffle the mt genome to get any number of different, essentially random sequences. If the numbers (= population) are large enough, then the probability of hitting on a functional one (T-urf13) are pretty good."

That sounds like quite a mutational load. But whether or not we *can* shuffle the mt genome to get 10^xx random sequences is not terribly relevant to T-urf13 because it is not an entirely new sequence with no similar sequences to be found. It is not a random sequence that happened to be functional. Instead, it comes from two pre existing sequences. Those two sequences were not created by your massive shuffling (for example, they are found in other species).

Now after they merge you do have a few mutations you can add into the sequence but not many. So here are your choices:

1. You can say the pre existing sequences, though not protein-coding, just happened by luck to have protein-coding information. That's a long shot because the corn mt genome isn't very big, and the chances of such protein coding information luckily showing up is small. Population size is irrelevant, those sequences have been around, they didn't just show up out of nowhere due to massive mt genome shuffling.

2. You can say the pre existing sequences don't help much and have the mutations do the heavy lifting. Again, population size is not relevant. You can mutate those sites all you want. But would you ever find a functional protein? We certainly cannot say this is impossible. Perhaps it is true, but it would be surprising. And it could be tested. Just randomly shuffle the other ~90% of the gene and see if URF13 still functions.

The point is not that we have certainty, but that the evolutionary conclusion that the evolution of T-urf13 is obvious is not a very good conclusion.

Statemachinist, I think the odds of a correct sequence just happening to be laying around are 1/the number of possible permutations of the corn genome or close to it.

Dr. Hunter, why do you think getting a usable protein is so unlikely? URF-13 is made from chunks of either one or two broken RNA genes (not sure without looking it up). The fact that they're RNA genes means the codons are already arranged in groups of three, just like protein genes. The fact that the original RNA gene worked means that most of the trash sequences such as CCCCCCCCCCCCC and such have already been weeded out because they don't code for useable RNA. Finally, you can probably change a very large portion of the URF-13 protein without losing function because of the way proteins work. Given the huge number of corn plants and the huge number of cells in each corn plant and the large number of mitochondria in each corn cell, it's not surprising that a functional protein was produced.

And then there's the problem of a supposedly intelligently designed protein whose main function seems to be making the corn succumb to an invading microorganism that normal corn plants are immune to and, frankly, I'll vote for evolution every time.

"Dr. Hunter, why do you think getting a usable protein is so unlikely?"

Because we know this from experiments. Even a the simplest of functions (mere ATP binding) requires a million million random sequences to find. And ATP binding does not make for a fully functioning protein.

"URF-13 is made from chunks of either one or two broken RNA genes (not sure without looking it up)."

It is mainly made from two chunks, the smaller one from an RNA gene (not broken), the other from a flanking sequences of that same gene.

"The fact that they're RNA genes means the codons are already arranged in groups of three, just like protein genes."

No, RNA genes are not arranged in codons. The information content of an RNA gene and a protein-coding gene are, as far as we understand, completely unrelated. That is one of the facts that should make us suspicious. How curious it would be for evolution to have constructed such a long shot.

"Given the huge number of corn plants and the huge number of cells in each corn plant and the large number of mitochondria in each corn cell, it's not surprising that a functional protein was produced."

No, it would be a big mutational load if mt genomes were mutating so fast so as to discover new proteins so fast. Furthermore, the large numbers to which you refer aren't relevant anyway because these pre existing sequences were not created in this line of corn.

"And then there's the problem of a supposedly intelligently designed protein whose main function seems to be making the corn succumb to an invading microorganism that normal corn plants are immune to and, frankly, I'll vote for evolution every time."

I wouldn't call that the "main" function. Certainly that is the function that catches our attention. In any case, this is a good illustration of evolutionary thought. The science is dubious, but the metaphysics is certain. Your point hinges on the assumption that God would not design anything that could have material weaknesses, inefficiencies, etc.

We can use science to elaborate on the quality of designs (weaknesses, inefficiencies, etc) but we can't use science to decide which designs would and would not be implemented.

It should be noted that, contrary to Cornelius' implication here, "mere" ATP binding is more than an adequate proxy for protein function (for that matter, so is Zn binding). This is because "full" protein function - catalytic activity - is a matter of binding. Specifically, it is one of binding of the transition state between substrate and product. There is no reason to think that transition state binding is going to be so much more rare than, say, ATP binding. So this particular objection is really a non-issue.

As for Zn binding, I invite Cornelius to count the number of enzymes that use divalent metal ions in their catalytic mechanism. As is the case with ATP binding, Zn binding is actually an excellent proxy for divalent metal coordination centers, domains that possess numerous chemical catalytic activities.

The real facts, derived from decades of empirical experimental research, tell us precisely the opposite of what Cornelius is asserting here.

"It should be noted that, contrary to Cornelius' implication here, 'mere' ATP binding is more than an adequate proxy for protein function (for that matter, so is Zn binding). This is because 'full' protein function - catalytic activity - is a matter of binding. Specifically, it is one of binding of the transition state between substrate and product. There is no reason to think that transition state binding is going to be so much more rare than, say, ATP binding. So this particular objection is really a non-issue."

Catalytic activity is irrelevant for URF13. And in general, a great many proteins do more than a single binding. Usually a protein has multiple binding sites. This claim that a single ATP or Zn binding site is "more than an adequate proxy for protein function" is false, but a great illustration of how evolution influences science.

"As for Zn binding, I invite Cornelius to count the number of enzymes that use divalent metal ions in their catalytic mechanism. As is the case with ATP binding, Zn binding is actually an excellent proxy for divalent metal coordination centers, domains that possess numerous chemical catalytic activities."

But that was not the issue. Of course Zn binding is a legitimate example of an enzyme using a divalent metal. But that is not the enzyme's function. The binding of the divalent metal is only one step in helping the enzyme perform its function. The enzyme also needs other bindings site.

"The real facts, derived from decades of empirical experimental research, tell us precisely the opposite of what Cornelius is asserting here."

Actually I've said nothing that is even controversial, let alone in opposition to "real facts." Again, this is a good example of how evolution influences science.

you wrote: "the information content of an RNA gene and a protein-coding gene are, as far as we understand, completely unrelated"If you think rrna and protein-coding RNA are that different, i suggest you read this. It shows that many RNAs serve dual roles as coding and non-coding RNA, and that in many cases it is very difficult to tell the difference btwn coding and non-coding RNA.

But as far as we know, information required for a functional string of amino acids, via the genetic code, is unrelated to information required for a functional RNA molecule. We have no theoretical reason to think this information content is related.

OTH, there is quite a bit of degeneracy, so there could be some overlap between those two different requirements. The T-urf13 gene, as well as the paper you cite, demonstrate such overlap.

So are we to believe that evolution first happened to evolve RNA and DNA structures and information, useful for functional RNA molecules, and then it evolved the genetic code, and protein translation--a completely different way of making molecular machines--and yet the information needed for these new kind of machines just happened sometimes to overlap with the information needed for the RNA molecules?

I hope the point is obvious. How curious is this overlap--why should there be any overlap? I'll post more on this, thanks.

Cornelius,my point was that, because of the overlap between rRNA and protein-coding RNA, the odds that a functional sequence was just lying around in the corn rRNA are much lower than you have posited. do you agree with this point?

No, I don't agree. If you hit the jackpot more than once, does that make the first time more probable? No, it doesn't.

Since (i) functional coding and non coding DNA sequences are rare, (ii) the pathways they take to produce functional molecules are completely different such that there is no known physical reason why there should be an overlap, and (iii) evolution has no foresight, then there is no reason to expect significant overlap.

IOW, we have no reason to think that the probability of a random sequence being a functional coding sequence is dependent on the probability of that sequence being a functional non coding sequence. All indications are that these two probabilities are independent.

And the two rare and independent events occurring is much smaller than just one of those events. That is, the probability of a random sequence being both functional coding and non coding is much smaller, if they are independent events as would think they are.

OTH, if they are dependent, then the probability isn't so much smaller. Consider the extreme case where they are completely dependent -- if a sequence codes for a functional protein, then it necessarily also codes for a functional RNA.

We know this is not the case, but it gives us a bounding value. In this case, then you still have two rare events, but the probability is just the probability of one of the events.

So the probability has not improved (increased) merely because we discover dependence in these two events. It just didn't get worse. What's more, if there is dependence, your avoidance of the lower probability comes at the cost of a suspicious cleverness, built-into the system.

Remember, the coding and non coding sequences take completely different pathways to create their respective macro molecules (eg, one uses the genetic code and amino acids, the other doesn't). But yet, somehow, we discover dependence, such that a sequence that provides a functional molecule using the one path, will also tend to produce a functional molecule using the other path? That would be very suspicious.

None of this accords well with evolution. You either have very rare sequences formed, or you rare sequences formed along with a rigged system.

Cornelius, the probability that the ancestors of the t-urf 13 gene were rRNA is 1.0. So that probability is irrelvant. however, because there is an overlap between functional sequences and rDNA sequences (maybe there is no theoretical reason why there should be (I'll have to look into it), but there is anyway), the odds that a functional sequence would be found within an rRNA sequence is higher than it would be in a random sequence. so the odds you have been trumpeting based on an analysis of random sequences are not realistic, bc the ancestors of t-urf 13 were not random. Can you explain why you disagree with this point without getting into the probability of the rRNA molecules themselves forming? that really isn't relevant to how the turf-13 formed from them after they were already there.

"because there is an overlap between functional sequences and rDNA sequences (maybe there is no theoretical reason why there should be (I'll have to look into it), but there is anyway), the odds that a functional sequence would be found within an rRNA sequence is higher than it would be in a random sequence."

I believe aghunt disagrees with this statement (that there is this overlap), based on his response to me above (Dec 17, 5:39). Why would this overlap be unsettling for the evolutionist? Because...

1) If there is an overlap with no theoretical or physical reason, it would be much more in line with something akin to a "front-loading" hypothesis.

2) Hunt used this gene as a counter-example to Behe's EoE. But if the sequence was only one or two point mutations away from containing the protein-coding sequence, this would hardly contradict Behe's "edge".

"Can you explain why you disagree with this point without getting into the probability of the rRNA molecules themselves forming? that really isn't relevant to how the turf-13 formed from them after they were already there."

Yes, if there is a strong overlap to begin with, you greatly increase your odds. But I don't believe evolutionists want to concede this overlap (for the two reasons above), so the argument is that only a small random sequence was borrowed 26S ribosomal RNA gene, and that when randomly reshuffling the flanking sequence and adding in a few point mutations, THAT's when you get your resulting t-urf gene.

State,"Yes, if there is a strong overlap to begin with, you greatly increase your odds."THank you for agreeing with me. Have you read the PLoS paper I cited? if so, you can decide for yourself whether there is overlap and not let Art or Cornelius make your decision for you. It strikes me that there is solid evidence of some overlap. We can discuss that first, and how it affects Art's argument against Behe later.

"the probability that the ancestors of the t-urf 13 gene were rRNA is 1.0."

Just to be clear on this, the bigger chunk came from a flanking sequence of the rRNA gene. The smaller chunk came from the rRNA gene. I don't think that changes your point though.

"maybe there is no theoretical reason why there should be (I'll have to look into it)"

It will be difficult because the signals are weak. Make sure to make aggregate tests over a large set of genes to boost the signal. The biggest signal in protein sequences is the anti clustering of the hydrophobics. So you might start by testing for anti clustering of the hydrophobic codons in RNA genes, flanking sequences, etc. A sliding window of 9 codons (ie, 27 nucleotides long) should be good for starters. Of course make sure to test all 6 reading frames. Use the binomial distribution to construct the random distribution, null hypothesis, distribution (distribution goes from 0 to 9). Use the observed, aggregate, frequencies of all the codons. Then compare with the observed frequencies to see if there is a statistically significant difference (can use Chi^2 test for example). For instance, you could divide all the codons into 2 groups: hydrophobic coding and not. Also, you might consider the structure of the genetic code. If thymine is absent from both the 1st and 2nd positions of the codon then it is not a hydrophobic.

"the odds that a functional sequence would be found within an rRNA sequence is higher than it would be in a random sequence. so the odds you have been trumpeting based on an analysis of random sequences are not realistic"

OK, yes. Given the assumptions that RNA genes and flanking sequences have some tendency to contain protein coding segments, then the corn mt genome in question here will have higher probability of containing protein coding segments that could be used to assemble a new protein such as T-urf13, and the probabilities I used, based on the assumption of random sequences, are too pessimistic (too low). But, I hope it is obvious that all we have done is shifted the heroics elsewhere. IOW, we pay a price for our increased probabilities. We increase our probabilities at the cost of the pre existence of information bearing sequences. When I turn up the thermostat the probibility of warm air coming out of the wall is high, but doesn't mean it just happens simply because the probability is high. Likewise, the probability of the T-urf13 gene is higher only because that coding information has been implemented at an earlier stage.

nanobot, I am afraid that the paper you pointed to isn't particularly relevant to T-urf13. ncRNAs and rRNA are two completely different beasts. This is more so for nuclear ncRNAs and mitochondrial rRNAs.

Cornelius, as for the adequacy of ATP/Zn binding as proxies for protein function, your assertions to the contrary are unsupported by any sort of experimental data. I don't think you can come up with a single protein whose function cannot be reduced to binding sites that are akin to one or both of these*. Moreover, the expected appeal to combinatorial unlikelihood falls flat, once you properly calculate probabilities (something you have yet to do in this discussion).

The facts remain simple and obvious - random shuffling of the maize mitochondrial genome yielded an IC system consisting of at least three "CCC's", and a paltry number of events was needed. These simple facts tell us plainly that protein function is much more "probable" than IDists claim.

* - I can do this. But the examples I have in mind are far more damaging to ID theory than the fascinating system we are discussing.

aghunt,if ncRNA and rRNA are two different beasts, then why do the authors include rRNA as a class of ncRNA? why is it also included as such in Wikipedia?

also, correct me if i'm wrong,but are you suggesting that the mitochondrial genome randomly shuffles itself every generation? there is obviously some mutation/recombination, but that kind of error rate is not sustainable.

The ncRNAs that Mattick talks about as being multifunctional (being mRNAs and structural RNAs) do not include stable RNAs such as rRNAs, tRNAs, snRNAs, snoRNAs, etc. These latter classes are technically non-coding RNAs, but they differ in many important aspects from the RNAs that you are using as a possible paradigm to explain T-urf13. Stable RNAs have different 5' and 3' modifications, they are transcribed by different RNA polymerases, they are processed differently, etc., etc., etc.

As far as plant mitochondrial genomes and their gymnastics, it has been known for decades that these genomes in fact perform remarkable feats of rearrangement. You state that the accompanying error rate would be unsustainable - I wonder if you know of any experimental data that would support your claim.

"Cornelius, as for the adequacy of ATP/Zn binding as proxies for protein function, your assertions to the contrary are unsupported by any sort of experimental data."

Of course they are. Proteins have all kinds of functional needs beyond a single binding site. I've already mentioned multiple binding sites. And as for ion channels (relevant to this discussion), the channel residues must provide the appropriate free passage to the desired chemicals and discrimination against the undesired ones. Furthermore, the exterior residues must provide the stability to reside in the membrane. Your assertion is an example of how incredibly damaging evolution is to basic scientific thinking.

"Moreover, the expected appeal to combinatorial unlikelihood falls flat, once you properly calculate probabilities (something you have yet to do in this discussion)."

You expressed concern that the assumption that 10^12 (a million million) random sequences are required to find a single functional protein is too conservative. You said the value should be close to 5 or 6 orders of magnitude smaller, like ~10^7. I explained that your concern is off base (the paper you cited does not support your concern, and it would be a bizarre finding given what we know from other experiments).

nanobot, as if on perfect cue, an interesting review came out recently in the Journal of Experimental Botany - Pubmed should call it up with a search for Woloszynska.

Cornelius, I fear we are just going round in circles here. For some reason, you seem to think that the probabilistic resources present in a few acres of corn cannot approach the 10^12 number you are fixated on, even though some simple math tells us that you are wrong. Moreover, you are asserting, without any reference to experiments that supports your claims, that the expected "probability" for the origination of a gated ion channel is far, far lower than the data tell us. The numbers are clear (and I have spelled this out on my blog) - the ID claims for information in proteins are not consistent with what we know about the history and functioning of T-urf13.

Of course, you are skirting the issue here. You want to say that something other than random shuffling of the mt genome is responsible for the origination of T-urf13, but you are unwilling to explicitly say what, or how, or who, or why. Until you can come up with something tangible (that would be repeatable and controlled experiments) that argues against the facts as we know them (these include the history of the protein, the details of how it works, the larger body of work that shows that protein function is not impossibly rare, the known mechanisms of mt genome rearrangements, just to rattle off a few items), then we are probably at the end of the discussion.

"For some reason, you seem to think that the probabilistic resources present in a few acres of corn cannot approach the 10^12 number you are fixated on, even though some simple math tells us that you are wrong."

Though this is not directed at me, it is related to the point I made earlier (Dec 17th, 11:40) and a similar point that nanobot was alluding to -- that each new reshuffled/recombined/mutated genome is in fact NOT completely random, but is in some way contingent upon the genomes of the parents. If it was completely random (a mutation rate of 1 per nucleotide, for example), there'd be no such thing as inheritance. Since it isn't, a simple calculation that the number of mt genomes in all available pollen grains is greater than 10^-12 is insufficient to suggest a high likelihood of hitting a "correct" sequence. After all, many of the sequences that have high overlap with the parents are going to occur at a high frequency, while those that are several mutations away will occur at much lower frequency.

Consider a simplified example of a 20 nucleotide sequence in the genome, with a per nucleotide mutation (substitution) rate of 10^-8, and this type of mutation is the only type of change. Under this scenario about 99.99998% new genomes will contain the exact same 20-letter sequence, whereas only 1 out of every 10^20 or so will differ from the parent by 3 or more. According to your blog post, your estimate that you claim is on the high side is 10^19 different mt genomes available in all the pollen grains. In this amount, given the above scenario, we would not even expect to see 1 of the genomes differ by 3 or more letters in the specified sequence -- hardly a dent in the 10^12 different possible sequences.

Now, granted, there are many other forms of recombination than simple point substitutions. The paper you mention -- if I'm looking at the abstract for the correct one -- suggests that heteroplasmy is used to compensate for the lack of sexual recombination, among other things. Interestingly, in another paper from Molecular Biology and Evolution, 2007 (search on Barr to find it), I found the following quote in the abstract: "Contrary to recent studies, we found unconvincing evidence of recombination in the mitochondrial genome, and generally confirm the standard model of plant mitochondria characterized by low substitution rates and no recombination." Granted, I may be misinterpreting the quote, or perhaps these results have since been proven false. Regardless, we know that child genomes are not completely random, because of the fact that a) genomes within a species have a high level of similarity to one another and b) in the case of t-urf13, large chunks of two sequences are intact from "ancestor" genomes, and in this way we are able to determine the origin of the gene.

Because of this, we know that some sequences will occur with an extremely high frequency, taking up the vast majority of the "probabilistic resources" from which we can get our "correct" sequence. Thus, showing that 10^19 >> 10^12 is insufficient.

" Of course, you are skirting the issue here. You want to say that something other than random shuffling of the mt genome is responsible for the origination of T-urf13, but you are unwilling to explicitly say what, or how, or who, or why. Until you can come up with something tangible (that would be repeatable and controlled experiments) that argues against the facts as we know them (these include the history of the protein, the details of how it works, the larger body of work that shows that protein function is not impossibly rare, the known mechanisms of mt genome rearrangements, just to rattle off a few items), then we are probably at the end of the discussion. "

This is, unfortunately, a common mode of argument amongst evolutionists. They claim their theory is a fact (or some similar non scientific claim) without backing it up, and when this is pointed out they blame the skeptic for not providing a solution. Sorry, you're the one making the claim, and it doesn't make sense.

You said T-urf13 is an example of how evolution produces de novo genes. Your supporting argument consists of a calculation showing that there is a astronomical number of mitochondria in this line of corn (and therefore a huge number of mitochondria genome mutations) at play. What you did not do is spell out how this is supposed to prove your point.

Did one lucky mitochondria just happen to have an incredible string of mutations that produced T-urf13, which just happened to have two segments that are highly similar to pre existing non coding sequences in the genome?

No, you agree that T-urf13 arose from those two pre existing sequences, with a few dozen mutations added to the mix. So you agree that the number of mutations, while not trivial, is not huge. This means that your astronomical number of mitochondria in particular, and population size in general, are not relevant to this problem. That the mutations occurred is not the issue.

Of course I pointed this out several times and all you do is continue to maintain denial. You write:

" For some reason, you seem to think that the probabilistic resources present in a few acres of corn cannot approach the 10^12 number you are fixated on, even though some simple math tells us that you are wrong. "

I have pointed out several times, but you remain conveniently oblivious, that no one is arguing that the probabilistic resources could have generated the observed mutations. The problem is NOT the mutations, but that the preexisting non coding sequences just happen to code for a protein. As I explained several times, but you failed to respond to, evolution does not expect this because there is no planning ahead with evolution. You're not going to have protein coding sequences serendipitously lying around. You then write:

"the ID claims for information in proteins are not consistent with what we know about the history and functioning of T-urf13."

This is completely irrelevant. The 10^12 value is not at all controversial. Now it is, of course, only a rough estimate for a complicated question. We simply don't have the knowledge to support a precise calculation. But even an outer bound, conservative calculation, shows that the evolutionary scenario is several orders of magnitude short. It is *extremely* unlikely. And that is using the empirically observed 10^12 number. You then erroneously questioned the 10^12 value with a study that showed no such thing.

Now perhaps there was a flaw in the experiment that produced the 10^12 number, and perhaps the evolutionary scenario, in spite of what we know, actually has high probability. Who knows? But until this is demonstrated with some reasonable, empirically-based, assumptions and calculations, we can't in good conscience tell the readers this is so. As scientists we have an important duty to serve the public trust.