Sunday, April 1, 2012

Here is a Completely Different Way of Doing Science

That new UCSF paper is yet another example of the intricacies of DNA and gives us a teaching moment on how science can be done in a completely different way. Consider the blood protein hemoglobin found throughout the vertebrate species, illustrated here to show its four protein chains and their many helices. Each of the four chains has about 140 amino acids which are glued together in a long line, and then fold up into a glob. Those 140 amino acids are encoded by the corresponding DNA gene. In the gene, there are three DNA nucleotides for each amino acid. For instance, the figure below shows a short segment of a human and a horse gene, both of which code for a hemoglobin chain. The letters, “ACGCT …” represent the nucleotides—small molecules that make up the DNA strand. The letters A, C, G and T stand for adenine, cysteine, guanine and thymine, respectfully.

As you can see, these two genes are quite similar. Of the 21 nucleotides shown in this short segment, only three are different between the human and horse. As shown, three nucleotides at a time code for a single amino acid and the table below shows the codes. For instance, as highlighted in both figures, guanine-adenine-guanine (GAG) codes for the amino acid Glu, or glutamic acid.

Notice that one of the differences, between the human and horse genes, results in a different amino acid (tyrosine in the human and phenylalanine in the horse) but the other results in no change (glycine in both). Different nucleotide sequences can code for the same amino acid because the DNA code is redundant. In this case, the glycine amino acid has four different codes, which you can see at the bottom right of the table above (GGT, GGC, GGA and GGG).

The DNA code is redundant because it has 64 different codons (the three letter words) that code for only 20 amino acids and a stop signal. There are more codons available than amino acids to code for.

For evolutionists this redundancy was just another biological kludge revealing nature’s dysteleology. Their natural expectation was that mutations that produced no change in the amino acid sequence—the so-called synonymous mutations—would be worthless and discarded by evolution. The massive change required by evolution would come about by altering the amino acid sequences of proteins, and so the gene comparisons between species would mostly reveal mutations that did produce different amino acids—the so-called nonsynonymous mutations.

It was yet another in a long line of failed expectations. In fact gene comparisons between different species, as exemplified by the human and horse hemoglobin genes above, revealed that non synonymous sites are disproportionately more conserved than synonymous sites, sometimes by as much as an order of magnitude or more.

This finding became an impetus for the theory of neutral molecular evolution which said that useful mutations were exceedingly rare and evolution mainly proceeded via mutations which were basically neutral.

Decades earlier evolutionists had expected that mutations could produce all kinds of new designs, but laboratory experiments suggested otherwise. Mutations were, inevitably, not very useful and often harmful. So it was good that this finding was incorporated in the neutral theory.

But how then could evolution proceed if mutations were just neutral? The idea was that neutral mutations would accrue until finally an earthquake, comet, volcano or some such would cause a major environmental shift which suddenly could make use of all those neutral mutations. Suddenly, those old mutations went from goat-to-hero, providing just the designs that were needed to cope with the new environmental challenge. It was another example of the incredible serendipity that evolutionists call upon.

Too good to be true? Not for evolutionists. The neutral theory became quite popular in the literature. The idea that mutations were not brimming with cool innovations but were mostly bad or at best neutral, for some, went from an anathema to orthodoxy. And the idea that those neutral mutations would later magically provide the needed innovations became another evolutionary just-so story, told with conviction as though it was a scientific finding.

Another problem with the theory of neutral molecular evolution is that it made even more obvious the awkward question of where these genes came from in the first place.

Mutation rate deviations

The discovery that DNA mutations can be synonymous or nonsynonymous also raised questions about the mutation rate. Since the neutral theory viewed the synonymous mutations as having little or no effect, they were interpreted as an indicator of the underlying mutation rate. That is, if the hemoglobin genes in two different species were compared, the differences that are synonymous were viewed as mutations that had occurred and were not rejected by evolution, since they were not harmful.

But if so, then evolution’s mutation rate would have to be wildly varying. This is because those synonymous genetic differences between different species, or even within the genome of a given species, were inconsistent.

Hence, evolutionists speak of substantial deviations in the basic mutation rate between different species, different genes, different locations within the genome and different locations within the cell (nuclear genes versus organelle genes). Evolutionists constructed all kinds of just-so stories to explain these wild variations.

There are many other problems when we use evolution to try to understand molecular biology. For instance, a few years ago researchers found that, when comparing genes such as hemoglobin genes between different species, the number of synonymous differences grew disproportionately large as the number of nonsynonymous differences grew.

From an evolutionary perspective, it meant that genes tended to undergo stronger selection when the mutation rate was higher. Huh? That made little sense. It was yet another finding that suggested perhaps there is a better way to do science.

A completely different way of doing science

Let’s step back from the mess for a moment and rethink the problem. What do we know and what do we not know? We know that when comparing protein-coding genes, such as hemoglobin genes, between different species, there are many differences. And we know that these differences can be divided into two broad categories: synonymous and nonsynonymous.

The synonymous differences have no effect on the resulting protein’s amino acid sequence whereas the nonsynonymous differences do effect the protein sequence.

We also know that a protein-coding gene sequence has many more functional effects, beyond merely determining the protein sequence. These effects can be divided into two broad categories: pre translational and post translational. That is, the effects can take place before the protein is constructed or afterwards.

In the pre translational category the gene sequence can have many different influences. For instance, the sequence can influence the DNA structure and how it may interact with protein machinery, the stability of the DNA copy—the so-called mRNA strand, and the mRNA interactions with proteins such as editing machinery.

And that new UCSF paper elucidates another pre translational effect. The DNA sequence can control the rate at which the protein is produced.

In the post translational category the gene sequence can also have many different influences. These include the protein folding process and the three-dimensional protein structure, the stability of that structure, the function of the protein, interactions of the protein with other proteins, instructions for transport, and so forth.

In principle the pre translational effects can be influenced by both synonymous and nonsynonymous DNA sequence changes, but the post translational effects are influenced for the most part only by the nonsynonymous sequence changes. This is illustrated in the table below:

So a protein-coding gene can have a wide variety of effects and as listed above, these effects can be divided into pre translational and post translational effects. Or, simply put, a gene can carry several different “messages.” This is analogous to a satellite communication link that can carry many different messages simultaneously.

And with each of these messages there is some “wiggle room.” For instance, as we saw above in the DNA code, there are four different ways to code for a glycine amino acid. Likewise, the speed of production of a protein can probably be controlled using different variations of a message.

And these different messages in a DNA gene often can use different parts of the gene such that the messages are orthogonal. On the other hand, message can overlap if necessary. For instance, the same DNA nucleotide can help to code for a particular amino acid while also helping to control the speed of production of the protein.

So a gene carries several different messages but (i) none of the messages require the full capacity, or bandwidth, of the gene, (ii) many of the messages can be expressed in more than one way, and (iii) many of the messages can be expressed without interfering with the other messages though they can also overlap.

All of this makes for a complicated messaging strategy. How can the different genetic messages be overlaid onto the same gene without sacrificing any of the particular messages? What kind of codes best support this requirement? Of course we have yet to learn what are all the different kinds of messages, as indicated by the new UCSF paper which revealed yet another type of pre translational genetic message.

In researching these and other such questions it is crucial to have a large number of solutions to look at. As Bacon advocated, we need to look at as many experiments as possible and isolate a system’s parameters to learn how nature works. Fortunately we have a great number of proteins, and variations on those proteins, in nature.

For instance, the hemoglobin protein has a relatively high sequence variation whereas the histone IV as a low sequence variation. Evolutionists say the hemoglobin high sequence variation is due to its low sensitivity to change and the histone IV low sequence variation is due to its extreme sensitivity to change. But this raises the question of how the protein could have arisen in the first place. Also, experiments have not revealed such extreme sensitivity.

But now, on the new view, we can view the high sequence variation as an instance where there is a great deal of message variation. Instead of being driven by the mandate that evolution is true, and therefore that the sequence variation arose from random mutations, we can view each hemoglobin sequence as a different solution to the messaging problem, in each particular species. In the case of hemoglobin, with its high sequence variation, the “channels” are being heavily used.

On the other hand, in the case of histone IV with its low sequence variation, it could be that the different signals in the different species are the same, or it could mean that some of the signals are simply not being used.

So whereas an evolutionist sees high sequence variation as a sign of a higher mutation rate and less sensitivity to mutational change, in this new view a high sequence variation is a sign of more information content. Either more channels are being used, or there is greater variation in the way that those channels are used in the different species, because different solution paths are being used.

And whereas evolutionist sees low sequence variation as a sign of a lower mutation rate and greater sensitivity to mutational change due to functional constraints and purifying selection, in this new view a low sequence variation is a sign of less information content. Either fewer channels are being used, or there is less variation in the way that those channels are used in the different species because the same solution path is being used.

Consider the case of this paper discussed above, which found that the number of synonymous differences grows disproportionately large as the number of nonsynonymous differences grow. For evolutionists this was confusing because it meant that genes tended to undergo stronger selection when the mutation rate was higher.

But this new view has a completely different interpretation of the findings. Instead of force-fitting the results onto the highly unlikely evolutionary framework, we now understand these results to mean that the ratio of the normalized non-synonymous-to-synonymous gene sequence changes, between any two species, tends to increase with overall gene variability.

In other words, gene families such as hemoglobin that exhibit a wide range of sequence variation across species, are observed to have a higher proportion of non-synonymous DNA sequence differences

Very simply put, what this means is that as you increase the overall information content in a gene, a greater proportion of that information content is found in the non-synonymous changes, which are primarily the ones that can cause the post translational effects.

This could be explained, for instance, by the hypothesis that the pre translational sequence effects, and their associated signals, are more widespread such that most genes have these signals.

Then some genes, such as hemoglobin, have more post translational sequence effects and their associated signals, so for those genes you observe both higher signal content (which shows up as greater mutational rates in the gene to an evolutionist) and most of this content is observed in the non-synonymous DNA differences.

It is a completely different way of doing science that tries to figure out how nature works rather than how to solve difficult metaphysical conundrums and constraining science to an incredibly unlikely, religiously-driven, creation narrative.

Has anyone bothered to check with the authors of the paper to find out if they are design theorists trying out revolutionary new scientific practices or evolutionary biologists using the same old methodology that has served so well in the past?

Has anyone bothered to check with the authors of the paper to find out if they are design theorists trying out revolutionary new scientific practices or evolutionary biologists using the same old methodology that has served so well in the past?

It's not clear how this represents "A completely different way of doing science."

For example, before one could even begin to speak in terms of pre-translational and post-translational, we must first have conjectured the theory that some genes are translational, while others are not, then test this theory via observations. Right? If not, what other approach should we have taken?

Nor does the lack of an assumption of design necessarily result in concluding a lack of function, as we assumed these translated genes had a function despite never assuming they were designed specifically for that purpose beforehand.

As such, It's unclear why the research you cite is any different or how assuming design wouldn't also represent extrapolating observations using an explanatory framework.

More importantly, the results of research is meaningless without some sort of explanatory framework to interpret the resulting data. If we do not have an explanation for what non-translating genes do, then how can we devise experiments to test an un-conceived explanation?

The vague assumption they should "do something" isn't an explanation we can test.

Furthermore, If we are truly objective about a supposed designer, one could have intentionally "designed" the genome to contain non-functional genes, or even designed a process that would result in non-functional genes by design.

For example, when you delete a file off of a computer it's bytes are not zeroed out. Rather, the sequence of bytes that file resided at is merely marked as no-longer reserved. We do this because, in most cases, actually erasing the pre-existing data isn't necessary, and doing so would have a negative impact on performance. However, this means that new files are written over the bytes of old files. And if they are smaller than the original, this can result in fragmentation. So, here we have a concrete example of designing a process that intentionally results in non-functional sections of data.

I'd also note, when an error occurs while updating the boundary locations of these files, this results in what was intended to be "non-functional" sections are interpreted as being "functional" parts of a file. So, even if we assume these bits were "designed" when written, what we read out doesn't always end up as intended.

So, it's unclear how you can say that the biosphere we observe actually represents the specific purpose and intention the designer actually had in mind. Not to mention, if there could have been multiple designers, each with different intentions. So, what ends up being written is a compromise that represents what none of them wanted in the first place.

In other words, the assumption that the biosphere did end up as a designer intended would also represents extrapolating observations using some sort of vague explanatory framework, which clearly isn't "science neutral", if such a think is possible in the first place.

It's unclear why you're assuming that mutations that effect how genes are translated must necessarily be non-neutral in regards to an organisms fitness.

For example, if both protein A and protein B are required for function Z, some sort of mutation that causes A to appear earlier would still be neutral, as protein A just "waits around" until protein B to appears to perform Z. The net result is the same.

And if the specific appearance of Z occurs earlier than actually necessary to have an impact on an organism's fitness, then mutations that cause B to appear later would not necessarily have a net effect either.

Perhaps you've reached this conclusion since you think the expression of genes was intentionally specifically designed to expressed proteins at specific times in the first place. And this design is such that there is no tolerance as to how they could be expressed. As such, it would have an impact on the organism fitness.

However, this would represent extrapolating observations using an explanatory framework, right?