The original paper to which Conservapedia is responding, Blount, Borland, and Lenski (2008), "Historical contingency and the evolution of a key innovation in an experimental population of Escherichia coli", can be found here

On June 9, 2008, the New Scientist website published an article describing preliminary results of a long-running experiment started by Richard Lenski.[2] Lenski and his team had taken a single strain of the bacterium E. coli, separated its descendants into twelve populations,[3] and proceeded to observe their mutations over the course of twenty years (a process discussed on Lenski's website). At one point, one of the populations demonstrated a dramatic change, and evolved to become capable of utilizing citrate, a carbon source in their flasks that E. coli cannot normally use. Thus, evolution had been visibly observed, with an exquisite amount of proof establishing the timeline along the way. The paper also highlighted the role of historical contingency in evolution and the role of potentiating mutations.

Naturally, this news item was posted to Conservapedia, bringing it to the attention of one Andrew Schlafly, BSE, JD. As a creationist, this obviously flew in the face of his views and could not be tolerated. After a discussion in which he expressed skepticism, he proceeded to send Dr. Lenski an email requesting further data, and set up a page on his blog, titled "Lenksi dialog." An amusing exchange of correspondence resulted in the Lenski affair in which Schlafly was humiliated.

Consequently he changed track and now wishes to invent flaws in the study. This article examines those supposed flaws.

Richard Lenski rejected a request to release his bacteria mutation data to the public,[4] but the following serious flaws are emerging about his work[5] even without a full disclosure of the data. Note that the peer review on Lenski's paper took somewhere between 0 (non-existent) and at most 14 days (including administrative time), and Lenski himself does not have any obvious expertise in statistics. In fact, Richard Lenski admits in his paper that he based his statistical conclusions on use of a website called "statistics101".

The length of the peer review process says nothing about the quality of the paper. For example, Watson and Crick's paper, "A structure for Deoxyribose Nucleic Acid", received no peer review at all but is considered correct and is fundamental to all genetics research. Furthermore, in the information age, publication is generally more rapid, not less - the "overhead" time of communication can be reduced to almost zero, on important projects.

In addition, most scientific journals, including PNAS, use the following dating system. First, a paper is submitted. The paper is reviewed and returned with comments to the author/s, who make changes and resubmit. If the resubmitted paper is accepted in this round, then the date of resubmission is used as the "submitted" date. If further revisions are required, the paper is returned to the author/s again. So, for example, a paper may be submitted in January, reviewed and returned to authors in March, fixed and resubmitted in July, then printed in August. The paper will be published with the submission date listed as July, not January.

Schlafly also uses an argument bordering on ad hominem by insinuating without evidence that Lenski is deficient in statistical expertise. Schlafly himself however has been noted for his questionable statistical abilities. (For a recurring example refer to Mystery:Young Hollywood Breast Cancer Victims. The extensive talk page threads detail his complete failure to understand even the most basic statistical concepts, such as the importance of random sampling or avoiding selection bias.)

In reference to "Statistics101", this is not a website but a freely available statistics package also used by professionals[6]. The software package includes tests and simulations useful for this study's analysis. The hypothesis testing is standard and can be found in any textbook. The use of Monte Carlo simulations for Bayesian analysis would not be familiar to someone who does not have a strong statistical expertise, so if Lenski did not do this himself someone else who does know a lot of statistics did.

Finally, despite Schlafly's unsupported claim of Lenski's lack of statistical expertise, his own correspondence with Lenski strongly implies that he considers himself competent to critique Lenski's work in microbiology — even though Schlafly, of course, has no expertise in that field.

Lenski's "historical contingency" hypothesis, as specifically depicted in Figure 3, is contradicted by the data presented in the Third Experiment in Table 1 of his paper. Figure 3 proposes a step-up in mutation rate to Cit+ due to a historical contingency (potentiating mutation) occurring at about the 31,000th generation, yet the Third (and largest) Experiment in Table 1 shows Cit+ arising just as often before the 31,000th generation as after. The abstract, in further contradiction with Figure 3, suggests that the historical contingency (potentiating mutation) occurred prior to the 20,000th generation.

This is the result of an understandable confusion, given that Blount et al. are really hypothesizing three mutation events. They state: "The replay experiments indicate an even more complex picture that must involve, at a minimum, three important genetic events. At least one mutation in the LTEE was necessary to produce a genetic background with the potential to generate Cit+ variants, while the distribution and dynamics of Cit+ mutants in fluctuation tests indicate at least two additional mutations are involved." It is true that the specific time course depicted in Fig. 3 is inconsistent with the data derived from the replay experiments, which indicate that the potentiating mutation had to have happened by generation 20,000, as stated in the abstract. Figure 3 appears to be a graphical representation of the "two-step" hypothesis, formulated before the replay experiments were conducted, that some other mutation had to have occurred prior to the appearance of the Cit+ variants around generation 33,000. The precise length of time between the first mutation and the appearance of the Cit+phenotype is not essential to the historical contingency hypothesis, so Schlafly's objection here is irrelevant.

Richard Lenski incorrectly included generations of the E. coli already known to contain Cit+ variants in his experiments.[7] Once these generations are removed from the analysis, the data disprove Lenski's hypothesis.

The paper states that clones, that is, genetically identical colonies, from the various generations were used in the replay experiments. It is implied in the paper and described explicitly in the paper's supplement at the PNAS web site link title that the colonies chosen from all time points were Cit- at the time they started all the replay experiments. Therefore the colonies used in the replay experiments had to independently evolve the Cit+ phenotype. If a clone were Cit+, essentially all of its daughter cultures would be expected to be Cit+, but of 2800 cultures in replay experiment three, only 7 gave rise to any Cit+ populations and, interestingly, the only clone to give rise to two Cit+ mutants was from generation 20,000. In the first replay experiment, the first Cit+ arose at generation 750 and many did not arise until the experiment was halted around generation 3700. Clearly, none of the cultures were begun with Cit+ cells.

Lenski's largest experiment (Third Experiment) failed to support his hypothesis with statistical significance. Even though this largest experiment was nearly ten times the size of his other experiments, Richard Lenski did not weight this largest experiment correctly in combining his results.

For experiment 3, we can only reject the null hypothesis with an 0.0823 probability of Type-I error (rejecting the null hypothesis when it is true), instead of 0.05 or less. This is what we call a strong trend towards statistical significance, understanding that the .05 cutoff value is rather arbitrary. The statement in the paper is clearly correct that the two Cit+ cultures from generation 20,000 resulted in the p value being less than .05. Looking at the three experiments together, though, the pattern is quite clear, and it supports the stated hypothesis.

Blount et al. combined their three experiments using a method set out in Whitlock's paper "Combining probability from independent tests: the weighted Z-method is superior to Fisher's approach". As the title of the paper indicates the weighted Z-method is a superior method of combining independent experiments. Schlafly's claim is farcical when he says that Lenski did not weight the largest experiment correctly. Assuming Lenski used the method laid out in the paper the weight should have been "the inverse of the squared standard error of the effect size estimate for each study". The effect size estimate would have been the sample "mean generation of those clones that yielded Cit+ variants" and the standard error would be calculated with this. The weighting is not subjective. Actually, in Whitlock it gives no preference to weighted or unweighted Z-method and Blount et al. says that both were performed and both gave statistically significant results.

To date, Schlafly has produced no mathematical arguments to substantiate his claim.

Lenski's two alternative hypotheses suggest a fixed mutation rate, but the failure of the mutations in his experiments to increase based on scale (number of samples) tends to disprove both of Lenski's alternative hypotheses. Yet Lenski's paper fails to address adequately this obvious flaw in the paper.

Actually, the hypothesis did not require that the mutation rates to Cit+ be constant under different experimental conditions. The question Blount et al. asked was whether Cit+ mutants would arise from clones taken from later generations -- which would argue that a potentiating mutation had to arise first -- or whether Cit+ mutants would appear in a random distribution from clones derived from all generations. If the latter pattern was observed, that would support the alternate hypothesis of a single, extremely rare mutation that could arise at any time in the experiment. The first replay experiment involved continuous liquid subculture and the latter two experiments used plating on solid agar media. There is no simple way to determine how the total number of Cit+ mutants recovered in the first set should compare to the last two sets. One might expect the third replay experiment to produce significantly more Cit+ mutants than in the second set but even there, the two experiments were not performed identically. It is an interesting question and one that the authors raised in the supplemental index, but it is not crucial to the hypothesis and conclusion. It is not a 'flaw' but an observation.

The significant finding in the replay experiments is that the Cit+ mutations reproducibly arose from clones taken from later generations. With three runs performed under different experimental conditions, this relationship held and argues for the hypothesis that a potentiating mutation first arose in the population and increased the likelihood of the Cit+ mutation(s).

In the last set of experiments, Blount et al. performed fluctuation tests to estimate the upper and/or lower bounds of the mutation rates. Of course, these rates must be evaluated within the context of the experimental conditions used. These numbers allowed the authors to speculate about the possible type and nature of the mutations leading to the Cit+ phenotype.

The authors did measure a total mutation rate of 2.3 × 10-10 per cell per generation for both potentiated and non-potentiated clones. But it's clear from the data that the development of the Cit+ mutant is dependent on at least two exceedingly unlikely events occurring in a particular order.

Lenski's paper is not clear in explaining how the results of his largest experiment (Third Experiment) failed to confirm his hypothesis with statistical significance, even with the incorrect inclusion of the Cit+ variant generations. Instead, his paper refers to his largest experiment as "marginally ... significant," which serves to obscure its statistical insignificance. Other works published in PNAS are clear in defining statistical significance in the traditional way, which Lenski's Third Experiment (even with incorrect inclusion of the above-referenced generations) failed to satisfy.[8]

On the contrary, the paper is quite clear on this: "Although the third experiment was the largest, it was the least significant, owing primarily to the production of two Cit+ mutants by a 20,000-generation clone." Schlafly again validates Lenski's prediction that Andy apparently cannot be bothered to actually read the whole paper. The statistics Schlafly demands are actually reported in table 3 of the paper under the heading, "Monte Carlo P value".

This point is no different from Point 3 other than Schlafly waves a paper around. The paper cited "Cholera toxin induces malignant glioma cell differentiation via the PKA/CREB pathway" by Li, Yin, Wang, Zhu, Huang and Yan, does indeed have three experiments with a statistical significance of 0.05 or less. As far as can be ascertained, they have not combined their experiments, and they have used Student's t-test, rather than the Monte-Carlo simulations. Being a fairly recent development in applied statistics, the U.S. is practically the only country where Bayesian statistics have taken off[citation needed], and is widely used. Li et al. probably did not use this method for two reasons: (1) they are not as familiar with it, because the Student t-test is far more common, and (2) it was not going to benefit them anyways because they had a large data set to begin with.

The long lag time (over 12,000 generations) between the historical contingency (potentiating mutation) in the largest experiment disproves Lenski's implicit assumption that the potentiating mutation likely occurred in proximity with the occurrence of the Cit+ variant, and that the first occurrence of the Cit+ variant in the Third Experiment at the 20,000th generation somehow implies that a potentiating mutation occurred in its proximity.

First, "potentiating mutation" is not equivalent to "historical contingency". Schlafly made the same error twice in Point 1. The preposition 'between' customarily takes TWO objects, as in "...wouldn't know the difference BETWEEN his ass AND a hole in the ground." Schlafly is now crossing the line from attempting to disprove the author's explicit statements and conclusions over to questioning their implicit assumptions, or at least his perception of them. The only evidence of an implicit assumption is in the labeling of Fig. 3, which does suggest that the researchers initially supposed that the potentiating mutation occurred shortly before generation 33,000, while their data ultimately showed that it must have occurred significantly earlier. As noted previously, the time course here is not central to Lenski's hypothesis. Schlafly is also unclear on whether he means temporal proximity or possibly chromosomal proximity, due to his tenuous grasp of the English language. With the fractured sentence structure it's hard to tell, but Schlafly seems to suggest that a long delay between event A and event B disproves that A had to precede B. Like, say, the discovery of fire and the industrial revolution?

The mutation rate to Cit+ even in the potentiated clones remains very small. Note that only a minority of the individual cultures from the later generations gave rise to Cit+ mutants even after 3700 generations in the first replay experiment. Thus, is it not unexpected that in the original culture of the Long-Term Evolution Experiment (LTEE) it may take 10,000 generations or more to develop a Cit+ mutant after the potentiating mutation occurs.

Lenski's paper claims that "During [30,000 generations], each population experienced billions of mutations,[9] far more than the number of possible point mutations in the [approximately] 4.6-million-bp genome. This ratio implies, to a first approximation, that each population tried every typical one-step mutation many times." Lenski's conclusion is nonsensical because it assumes that the mutations are completely random and that each mutation has a roughly equal probability.

Schafly does not appear to understand the phrase, "to a first approximation". Blount et al. are well aware the mutation rates can vary across E. coli's chromosome. Still, DNA replication has a finite fidelity and it is on the order of 10-10 mutations per base pair per replication on the low end. The authors are simply pointing out that given the number of cells involved (effective population size of 3x107 cells per generation) and the number of generations grown in the experiment (~30,000 generations), the E. coli chromosome was well-peppered with point mutations many times over (3x107 x 30x103 x 10-10 = 90x).

The paper's authors found that isolation of Ara+ revertants, which occurred in the range of 2x10-10 to 3x10-10 revertants per cell per generation, is typical of point mutations. The very low conversion rate to Cit+, on the order of 4x10-13 per cell per generation, suggests multiple mutations and/or a mutation event that is far rarer, by a factor of at least a thousand, than observed for a simple point mutation. Based on these results the authors are well justified in concluding that the Cit+ phenotype is the product of very specific mutational events and not single point mutations.

One wonders what Schlafly's expectations of point mutations is, if they should not be completely random nor have roughly equal probabilities. This is the part of evolutionary theory that creationists usually accept. Perhaps he does not realize that these mutations happen in an individual organism and are usually not propagated in the overall population.

The paper Schafly mentions in his complaint but does not cite is: Lenski, R. E. 2004. "Phenotypic and genomic evolution during a 20,000-generation experiment with the bacterium Escherichia coli". Plant Breeding Reviews24:225-265. That reference describes the effective population size for cultures grown in the experiment.

1) First look at cited publications. Publishing data does not necessarily mean that he sends you the datafiles.

2) If you doubt a scientific publications content on a scientific basis, just write a comment in an appropriate form (title, abstract, length depending on the journal). Send it to the address of the editor of the original article. They will screen it for form, content, and style (if you dont write it in a more calm mood that you write your rants here, it will not be accepted). A comment is a standard way of forcing somebody to reply, because the editor will ask the authors to respond and both, comment and response are published together. Be specific (you aren't). A request like: "In fig. x the error bars are missing and we believe these are necessary" has much more chances of being followed than "which data? where how certain?". Make in the following paragraph explicitly clear what you problem is. e.g.: The region (x) inf Fig.(y) is not displayed in a resolution high enough to exclude (z). Make it clear that you are not the only one who believes that this is necessary by citing other works, where the author do capture a certain point in higher resolution. Don't ask for unreasonable things (which would essentially require screening a collection of 10,000 photos taken over twenty years....).

3) If you suspect scientific misbehavior (such as: falsifying data, faking results etc), report it to the responsible person in the research organization the other person works for. For such things you can lose your job and your title. Refrain from blaming the referees for not hunting scientific misbehavior. The function of a referee is NOT to search for scientific misbehavior, but to check the conciseness, consistency and completeness of the things presented in the paper. (if somebody really fakes data it will be impossible without spending a long time in the lab, some people really fake data in a very tricky way)

If you suspect both, do both. However, if you have nothing more than your scientifically worthless comments about articles, which are, as far as they are read and understood, written to the highest scientific standards, spare your readers from your whining. Sadly the highly statistically nature of the experiment makes it in principle difficult to reproduce (which in the past has tempted more scientists into faking data, see the "Schön Affair").