Well, the postdoc and I continue to struggle with our revisions to his manuscript about the sequence bias of the Haemophilus influenzae DNA uptake machinery. Quite a bit of the struggle is with each other, as we each try to clarify what we think.

One issue that's just come up is how interactions between bases at different positions of the preferred sequence motif will affect what sequences accumulate in the genome.

The top part of the figure below is a drawing of a double helix of DNA, with a specific sequence drawn on it, and below that are two 'sequence logos'. The first one is the pattern derived from the uptake sequences in the genome, and below that is the pattern derived from the sequences that were preferentially taken up by the cells' uptake machinery. The overall difference in height of the two logos isn't significant (they use sequences derived in very different ways), but the differences in the relative heights of the individual positions are. For example, in the genomic logo all of the Gs on the left are about the same height, but in the uptake logo the first G is much smaller than the others.

One issue our paper needs to address is the reasons that these two logos are so different.

Both of these logos are derived by considering only how frequent each base (A, G, C or T) is at each position in the set of sequences being analyzed. The analysis doesn't consider the actual sequences. For example, the two sets of sequences in the figure below (made using WebLogo) give the same logo. But the two sets of sequences are different; in the left one we have only strings of six As or six Ts, whereas in the second the As and Ts are often interspersed or in strings of different lengths.

The postdoc has done a detailed analysis of the actual sequences taken up by the cells (see figure in this post), to find out the importance to uptake of the interaction effects that the logo analysis doesn't consider. We were both thinking that these interaction effects might be responsible for at least part of the difference between the uptake-bias logo and the genomic logo.

But one of the reviewers of the version we originally submitted said that we were wrong: "If the consensus in the genome reflects only the incoming DNA and the
filtering at the outer membrane (as the authors state) then the two
consensus should be similar with or without interaction effects because
the genomic consensus is the simple result of the initial consensus." I've thought about this today, and I now think the reviewer is correct.

Let's consider two simple situations for an imaginary uptake machinery whose preferred sequences gave the A&T logo above. In Situation 1, the actual sequences were those in Set 1, and we would conclude that there were strong interaction effects between the positions because the machinery preferred a sequence where six Ts in one strand were basepaired with six As in the other strand. In Situation 2, the actual sequences were those in Set 2, and we would conclude that the uptake machinery preferred a string of six A:T basepairs but didn't care which base was in which strand at any position.

Now let's imagine that species exist with each of these uptake biases, and that each uptake bias is causing its preferred sequences to accumulate in its species' genome (because these sequences come in as part of longer DNA fragments that often replace homologous sequences in the genome by recombination - this is our molecular drive model). In Situation 1 the genome will accumulate strings of 6 As on one strand paired with six Ts on the other. In Situation 2 the genome will accumulate strings of six A:T pairs in various orders.

Now we sequence the evolved genomes, collecting sets of the overrepresented sequences in each, and make logos of the sequences. Both logos will look like the logo above. To see the how the interaction effects in the uptake bias affected the accumulated sequences in the genome, we'd have to do an interaction analysis of the genomic sequences.

Years ago we did an interaction analysis of the genome sequences; you can see them in the last figure in this post from 2006. It found only weak interactions, and only between adjacent or near-neighbour positions, very different from the interactions the postdoc has identified in the uptake bias. More recently he applied his interaction analysis to the set of genomic uptake sequences, and he's now repeating it (that's easier than digging through his notes to find what it showed).

Although such a difference in mutation rates might indeed be beneficial, since most non-neutral mutations are harmful, the result seems very improbable because we don't know of any mechanism by which the processes that cause mutations could adjust their activities according to the function of particular DNA sequences. The authors don't know of any such mechanism either but they postulate that one must exist.

This is very reminiscent of the 'directed mutation' controversy that arose about 15 years ago, in response to work by Jim Shapiro and John Cairns showing that selection for ability to use a sugar was much more effective if the sugar was present in the environment. That phenomenon has been shown to not be due to changes in the mutation rate (considered per base pair), but to initially unsuspected cryptic growth on the sugar and changes in the number of copies of the gene under selection.

Mutation rates are tricky to measure directly because mutations are identified by examining the phenotypes or DNA sequences of bacterial cultures many generations after the mutations would have happened. This means that there has been plenty of time for confounding forces to also act on the mutations - we find only the mutations present in surviving cells, not all the mutations that happened. The most important confounding force is thought to be natural selection acting on any phenotypic changes the mutations cause, but lots of other factors are known or suspected.

On first reading, I think that the authors of this paper did a good job of controlling for these factors. But, given what we know about the processes that cause and prevent mutation, their results are so improbable that I suspect they have missed other factors we don't know about yet. So I predict that, like the directed mutation controversy, the long-term outcome of this work will be identification of additional confounding factors in the analysis of mutation rates rather than of a clever risk management strategy in the bacteria.

Here's a quick outline of what the authors did: They started by comparing the genome sequences of 34 E. coli isolates; I think these were sequences available in GenBank, not ones they determined themselves. Even very closely related bacteria like these have a lot of variation in which genes are present, so the authors first identified a set of 3420 genes, each of which was present in at least 75% of these genomes. They then carefully compared the DNA sequences of these genes to find all the differences, which must have arisen by mutations accumulating over the many millions of years since these genes shared a common ancestor.

They then filtered out all the differences whose accumulation might have been confounded by natural selection. First they eliminated from consideration all the differences that changed an amino acid encoded by the DNA. Then they corrected for effects of E. coli's known codon biases, because mutations that don't change the specified amino acid may still change how efficiently that amino acid is incorporated into the specified protein. They also corrected for suspected effects of RNA folding by trimming off the ends of the gene sequences (I'm not sure how effective this would be...).

This analysis produced estimated gene-specific mutation rates that differed by as much as ten-fold (look at the jagged line and two examples below). The mutation rates of nearby genes were strongly correlated over distances of 10-20 kb, especially for genes that were assigned the same 'function' and the same direction of transcription; these are likely to be mostly genes in the same operon.

One factor I wanted more information about is the functional classification scheme used. This was something I hadn't heard of - the Multifun classification for E. coli, developed by Monica Riley and M. H. Serres. It looks good, certainly better for E. coli genes than the usual COG analysis (clusters of orthologous groups).

Another issue important for their conclusions is how they assigned functional importance to each gene. They estimated the strength of selection on each gene using the number of changes that did change the encoded amino acids (the info they had discarded in estimating mutation rates). By this measure, genes in subsets with higher mutation rates tended to have weaker evidence of selection. Genes in the low-mutation-rate subsets were also enriched for known to be essential for survival in lab culture in rich medium, and they were, on average, expressed as mRNA at higher levels.

The authors then examined how other confounding effects might alter the results, by examining the sequences for evidence that natural selection had acted on them, by checking the possible sizes of other confounding effects (transcription-coupled DNA repair, base composition, homologous recombination), and by using computer simulations to estimate the sizes of possible effects. These analyses revealed only effects that would be much too small to explain the big differences in estimated mutation rates they found.

Bottom line: This appears to be a very well done piece of work. (The Supplementary Materials file is enormous and dense with relevant information and analyses.) Nevertheless I'm very skeptical of their conclusion that cells have evolved a mechanism to mark important genes and protect them from mutation. That's both because we don't know of any way cells could do this, and because I think natural selection on such 'evolvability' traits is likely to be many orders of magnitude weaker than as-yet-unidentified direct effects on mutation accumulation.

I'm just back from EVO-WIBO a small conference for evolutionary biologists in the Pacific Northwest (WIBO=Washington, Idaho, BC and Oregon). The quality of the talks and the science was very high, but a few experiences got me thinking that I should write a post about how to handle social interactions at conferences. So here goes.

On the conference bus: Maybe you're sitting next to someone you don't know, and maybe they're too nerdy or shy or intimidated or self-centered to start a conversation. Don't just sit there, ignoring each other. Say 'Hi', my name is Sandra. I work on axolotl toenail proteins in Joe Blow's lab. What do you do?' Or 'Hi, I'm Sam. What did you think of that last talk?'

At the first-night mixer: You and a friend (or a new acquaintance) are chatting with each other, when a complete stranger walks over and stands near you, looking like maybe they'd like to join the conversation. Don't just ignore them! Say 'Hi, we were just chatting about the snacks. Do you think this could be real caviar?' Or 'Oh, sorry, we're having a bit of a private conversation. We'll go talk in the corner where it's quieter.'

At meals: If your conference includes meals, try to sit at a table with people you don't already know. If you're already seated and talking with someone when another person sits down, smile and say 'Hi, we're talking about the weird last slide in Susan Smith's talk.' Then turn a bit so they feel included in the conversation. If more people show up, start a round of introductions. If you're planning a free-time side trip to the swimming hole or the farmer's market, ask your lunch companions if they'd like to come along.

If everyone has to find their own lunch and you're on your own, try to strike up a post-talk conversation with someone. You can then say 'I'm going to look for some lunch, like to join me?' If they're on their own too, they'll appreciate the invitation. If they already have lunch plans, maybe they'll invite you along. If you're the one who already has lunch plans, consider inviting someone who might otherwise be on their own.

At your poster: Maybe you're explaining your poster to someone, thankful that it's attracted at least a bit of interest, when a second person walks up. Don't ignore them until the first visitor walks away! Make eye contact, smile, say 'Hi, I'm just explaining how we collected our data. If you can wait a minute I'll be able to talk about our goals.' Then continue your original conversation, but make it easy for the new person to join in or ask questions.

In the question period after your talk: Try to choose questioners who aren't Mr. Big in the field, and who aren't your friends or labmates. Make it easy for junior researchers to be heard.

You get the picture. One of the big reasons we come to conferences is to talk with other researchers in our field. Do what you can to help this along. Many of the people at conferences are junior scientists, are there for the first time, don't know anyone. Make them feel welcome and included. If you're one of these people, you should expect to be welcomed as a new colleague. If someone instead treats you as an interloper, go talk to someone with better social skills.

We originally submitted our manuscript to Science at the end of January, and posted a copy of it on the arXiv server, asking for comments/critiques from readers. We received a few of these, and on March 16 we received three detailed reviews from Science, and a provisional acceptance. On April 13 we submitted the revised version, and we're waiting with fingers crossed to for final acceptance.

I've just posted the revised manuscript on arXiv, replacing the original version. Here's the link. We tried to incorporate suggestions from blog comments too.

Below I've pasted the text of our 'Response to Reviews' letter. We didn't do a detailed response to the reviews because the Editor had clearly indicated the changes she thought important.

Thank you for giving us the opportunity to improve our manuscript. We are now submitting the revised version.

We have closely followed the suggestions in the pre-edited copy of the manuscript that you provided. We felt that the most important request of the reviewers was to directly measure the phosphate concentration in our basal AML60 medium. To this end, we conducted ICP-MS, obtaining a concentration 0.5 µM, in close agreement with our prior estimates based on cell growth. This new measurement fully supports our conclusion that the growth of GFAJ-1 in the hands of Wolfe-Simon et al. was due to residual phosphate in their putative -P conditions. When combined with shortening other text as indicated in the pre-edits, this has reduced the manuscript’s length from 2193 words to 1577 words.

We have retained a few sentences discussing explanations for the discrepancies between our results and those of Wolfe-Simon et al. Since these discrepancies are the point of our paper we feel that possible explanations for them should be considered even when they cannot be directly tested.

We would prefer to retain our original title, as we feel that the word ‘negligible’ puts undue emphasis on the trace of arsenate present in the DNA. Two alternative titles we would be happy with are ‘Absence of detectable arsenate in DNA from arsenate-grown GFAJ-1 cells’ and ‘No covalently bound arsenate in DNA from arsenate-grown GFAJ-1 cells’.

Changes in response to points raised by the reviewers:

The ‘-P’ and +P’ growth conditions we used are now clarified in both the Methods and the legend to Figure 1.

We now explain how cell numbers were determined.

We now explicitly say that we obtained strain GFAJ-1 from the authors of the Wolfe-Simon et al. paper.

The discrepancy in glutamate concentrations and the incorrect formulae have been corrected.

The ingredients of AML60 medium are now given in the Methods.

Reviewer 3 was concerned about our statement (in Methods) that cells were pre-grown in phosphate-limited medium containing 40 mM arsenate. We now explain that the cells were thoroughly washed to remove the arsenate before being frozen, and that the purpose of this pregrowth was to deplete cellular reserves of phosphate and to replicate the standard growth conditions used by Wolfe-Simon et al.

All of the reviewers, but especially Reviewer 3, would have liked more information about GFAJ-1’s growth properties and metabolism. Unfortunately, characterizing these in depth is beyond the scope of this work. We do not know why GFAJ-1 cells need glutamate or another amino acid for growth in our AML60 medium independent of phosphate supplementation, why they reproducibly grew to a higher density in AML60 medium with 70 µM phosphate than with 250 or 1500 µM phosphate, nor why they did not grow in Wolfe-Simon’s low-phosphate AML60 medium unless arsenate was provided.

We now have almost all the data in place for our paper about the roles of all the genes in Haemophilus influenzae's competence regulon. We (really the RA) created deletion mutations of all the 26 genes except ssb, which is essential; these deletions remove almost all of each gene's coding sequence. One set of mutations contains spectinomycin cassettes inserted at the site of the deletion; these are very useful because they let us select for each mutation by the SpcR phenotype it causes. The other set is 'unmarked', and these clean deletions are 'in-frame', preventing disruptions of translation that could interfere with expression of downstream genes in the same operon ('polarity).

For each unmarked mutant we've examined (1) its growth using the Bioscreen incubator/recorder, (2) its survival after transfer to the MIV starvation medium that induces competence, (3) its MIV-inducible ability to take up radiolabeled DNA and (4) its ability to be transformed by genetically marked chromosomal DNA. For all but one of the genes these phenotypes are at least roughly consistent with what we expected from the phenotypes of known mutations in H. influenzae or other bacteria and from the predicted properties of the encoded proteins.

But one gene's phenotypes surprised us. HI0659 is predicted to be a small cytoplasmic protein, and it has a predicted helix-turn-helix that would be expected to bind to DNA, probably at a specific sequence. It's mRNA is induced about 20-fold on MIV treatment. We expected it to either play no role in DNA uptake and transformation or to have normal uptake but reduced transformation. But our unmarked mutant (∆HI0659) doesn't take up any detectable DNA and doesn't transform at all, which suggests that it is required either for assembly/function of the uptake machinery or for continued expression of the competence regulon after initial induction by Sxy and CRP. That's of course very interesting, and we've thought of lots of cool experiments we could eventually do to find out how it acts.

But there's one wrinkle that needs to be cleared up before we publish this result. The phenotype of the marked (SpcR) HI0659 mutant (∆HI0659::spc) is not the same as that of the unmarked mutant - its transformation frequency is much higher, though still substantially lower than that of wildtype cells. (I don't know if its DNA uptake has been tested.) This is unexpected and suggests that there's a problem with the structure of either the marked or the unmarked mutation.

The structure of the unmarked mutation has already been carefully checked by PCR and it appears exactly as it should, so we suspect a problem with the marked mutation. The RA has now created new versions of the marked mutation, and yesterday I made four of these MIV-competent and transformed them. I'll learn the results of this test later today - if they don't transform at all we'll conclude that all is well with our mutants.

But if the new marked mutants do transform, we'll have to suspect that something is instead wrong with the unmarked mutant. The most likely problem is that this strain accidentally acquired a mutation elsewhere in its chromosome that prevents DNA uptake. Testing for this is a bit tricky, but here's my plan, diagrammed below).

HI0659 is in the same operon as HI0660, whose mutants both transform normally. If the new ∆HI0659::spc mutants transform, I'm going to transform the marked HI0660 mutant (∆HI0660::spc; SpcR) with a PCR fragment containing the normal HI0660 allele and the unmarked version of HI0659. I have frozen ∆HI0660::spc competent cells ready to use, and the RA is making the PCR fragment for me using primers she already used for another experiment. I'll plate the transformation mix without selection, using a control transformation with chromosomal NovR DNA to confirm that the cells were competent. Then I'll screen the colonies for loss of SpcR by picking them onto plain and Spc agar plates. Colonies that grow on the plain plate but not on the Spc plate will be ones that have lost their spc cassette by recombination with the ∆HI0659 fragment. I'll test these for transformability - if those that have acquired the ∆HI0659 deletion (checked by PCR) have lost the ability to transform then we can be reasonably confident that the ∆HI0659 deletion prevents transformation. If not then something is probably wrong with the ∆HI0659 mutant.

This would be a fair amount of work, probably too much to get done before the RA goes on a few months' leave at the end of April. so I very much hope that the transformations I did yesterday give no transformants.

Later: All the new ∆HI0659::spc mutants are nontransformable, just like the ∆HI059 mutant. Because I actually tested deletion mutants created in two independent experiments, this means we can be extra confident that the deletion is responsible for the loss of competence.

We finally (after two months) got the reviews back for the postdoc's manuscript about DNA uptake bias. It's a rejection - the reviews were quite negative. The first reviewer was very unfair; they didn't find any fault with the methods or data or analysis, but they attacked our brief discussion of the functional evolutionary context of uptake bias. This is all too common for my papers. The reviewer is so hostile to the idea that bacteria might take up DNA for food that they don't focus on the science. Because the paper was rejected we don't get to do an official response to the reviews, so I'm relieving my frustration by responding to them here.

Reviewer #1:

Suitable Quality?: No

Sufficient General Interest?: No

Conclusions Justified?: No

Clearly Written?: No

Procedures Described?: Yes

Comments:

The compelling topic of DNA uptake mediated by uptake signal sequences (USS) in Haemophilus influenzae transformation is addressed. Mell et al utilize Illumina-based deep sequencing of DNA recovered after uptake in transformation to study the uptake specificity of a Haemophilus influenzae strain. They re-confirm previous reports (Maughan, 2010), documenting the importance of the GCGG core in the USS, by using a new method. The experimental data is sound and the analysis of sequencing reads and degenerate USS is solid. New data are represented by the detection of interaction effects between individual USS positions, although this part constitutes only a small part of the manuscript presented. The Authors then attempt to inform ongoing debates on the function and evolution of the DNA uptake machinery, making suggestions which are not supported by their data. This is a particular concern due to the extensive self-referencing and simultaneous exclusion of references central to the

field.

All of this is peripheral to the main focus of the paper, which is the nature of the uptake bias, not the function of DNA uptake. The reviewer thinks the data and analysis are just fine, but wants the paper to be rejected anyway.

General comments:

The paper by Mell et al contains ample general statements beyond the scope of the study, which are not supported by the data.

Yeah, in the Discussion we try to put the results in their evolutionary and functional context.

Many of these statements are based on old models for DNA uptake in transformation and for the evolution of USS, that never were documented.

???Old models for DNA uptake in transformation? Meaning for the mechanism? There aren't any old models, and ours is the only rigorous model that's been presented. The old 'model' for the evolution of USS, that cells take up DNA for sex, are just hand-waving.

The extensive referencing of own publications (17% of references), particularly in regard to molecular drive (see below), and the lack of reference to reports in the field conveying contradicting views, weaken the validity of the manuscript.

We cite 8 of our own papers and 35 papers from other groups. That doesn't seem unreasonable, especially since we're the only group with recent papers on the topic.

We could indeed have cited these guys, and will in the revised version.

P3 L16-20 The Authors fail to mention/acknowledge that if "not evolved by natural selection for optimizing gene expression" (which is obvious since they are agents for DNA uptake), USS may have evolved by natural selection for being beneficial in securing uptake of homologous DNA.

Here's what we wrote: Unlike transcription factor binding sites, uptake sequences do not evolve by natural selection for optimizing gene expression, but instead are thought to accumulate as an indirect consequence of uptake bias because they replace chromosomal sequences by homologous recombination (4). We have addressed the issue of whether USS evolve by selection for uptake benefits in ref 4 and elsewhere. Nobody else has presented any solid arguments against this.

Competent bacterial species have evolved several adaptations to favour homologous DNA in transformation and only overlooking substantial contributions in the literature on the subject (e.g. the entire bibliography on pneumoococci) allows for an illusive sequence of logic.

Does this refer to the pneumococcal work on mismatch repair in recombination? On the bizarre notion that cells kill themselves to provide DNA for their neighbours? The reviewer simply asserts the existence of 'adaptations to favour homologous DNA in transformation' - we're the only group that thinks such assertions should be treated as testable hypotheses.

The Authors then go on to advocate molecular drive (or rather some unspecified form of molecular drive) as the mechanism behind the evolution of USS.

Not 'unspecified'; we've published a rigorous and detailed model.

This is highly controversial. The concept of molecular drive in this context may only make sense in its neglect of the influence of natural selection as the evolutionary mechanism responsible for the evolution of transformation. Since molecular drive in general represents a downsized view

on evolution, this theory has had its rise and fall in popularity and is today a largely outdated concept (as seen in the publication record referring to the subject).

What? Molecular drive only makes sense in the neglect of the influence of natural selection? We have shown that molecular drive is the null hypothesis, able to explain uptake sequences without any need to invoke natural selection. The onus is now on others to provide evidence that (i) natural selection is needed to explain the observations, and (ii) natural selection is able to explain the observations.

P3 L21-23: References 1 and 5 do not demonstrate that "Sequence specificity acts at the initial steps of DNA uptake, when DNA fragments are bound and transported across the Gram-negative outer membrane, pulled through type II secretin pores by the retraction of type IV pseudopili", neither does reference 37.

Aarrghh! This old canard! It's absolutely clear from the experiments in this paper and many previous ones that uptake specificity acts at transport of DNA across the outer membrane. In these experiments cells are given radioactively labelled DNA fragments containing either a USS or a control sequence. The USS-containing fragments become protected from added DNase and pellet with the cells, and the control fragments stay in the culture medium. This stupid assertion keeps coming up in reviews of our papers, probably from the same reviewer every time...

P4 L18-19: "Since USS and DUS are thought to have accumulated due to biased uptake,.." has never been shown.

We've shown that biased uptake plus recombination is sufficient to drive USS and DUS into genomes, using modeling.

Perhaps we could someday do a demonstration experiment: 1. Find a place in the chromosome where there's a mediocre match to the uptake consensus (a poor USS). 2. Synthesize a degenerate pool of fragments containing better and worse versions of this USS. 3. Ligate this into a long fragment with the flanking chromosomal DNA, so we have a pool of long fragments, all identical except for the USS degeneracy. 4. Incubate competent cells with this pool. 5. Sequence this segment of the genomes of these cells, to show that they have become enriched for better versions of this USS. We'd want to do this without selecting for acquisition of the fragment. Maybe do several cycles of incubation and recovery? We could use sxy-1 or murE749 cells to get high frequencies of transformation. If I genuinely thought that this experiment would convince our critics, I'd do it. But they would probably say that the outcome was obvious, as it indeed is.

P5 L 3 and P 15 L16: Statements such as "..DNA's intrinsic stiffness, charge and length." , "..pulling stiff charged DNA molecules through the narrow secretin pore."and "..physical constraints imposed by stiff highly charged DNA" attempt to describe restrictions in the uptake of DNA by making suggestions which are not supported by their data.

Yes, this is the Introduction and Discussion, not the Results.

P5 L6-9: This statement does not comply with molecular drive since adaptations per definition cannot evolve by that mechanism, and the Authors must make up their mind: Natural selection or molecular drive?

Huh? That's not what we said. We said that preferential uptake is widely assumed to be an adaptation, and contrasted that assumption with our model.

Of the recovered DNA fragments of course, as spelled out in the previous paragraph.

P6 L9 Reference to Bakkali PNAS 2004 is missing

OK

P 14 L20 and on: Why are the nice data presented in Results not systematically discussed here? Instead, a general discussion on uptake-specificity systems and their (co-)evolution is presented, which is way beyond the main scope of the study.

We'll include a bit more discussion of the data in the revised version, but it's so self-evident that there's not really much to discuss.

P14 L21-23 and on. Why are multiple speculations not supported by the data, presented here? The evolutionary concept presented is flawed.

This is the Discussion...

P14 L23-P15 L2: The Authors do not seem to appreciate that the breaking up of genetic associations itself can be beneficial and hence subject to natural selection (for review, see: Otto and Lonormand, Nature 2002; Otto 2008.). It is also a misunderstanding that it is the amount of USS that is directly favourable as stated, but rather the sequence quality of the surrounding allele(s).

The authors certainly do appreciate this, since we have published several mathematical modeling papers investigating whether selection for recombination benefits can be strong enough to select for genes causing natural transformation. Bottom line: it probably can't. We rarely mention this work in our experimental papers because we don't expect the reviewers to understand them.

P15 L5-13: The question of the evolutionary benefit of being a picky eater is still causing a problem for the reasons mentioned above. The entire evolutionary constellation of molecular drive and USS not being an adaptation is entirely built on the weak and over-interpreted analyses in reference 4, which is cited more than 10 times in the manuscript, in order to attempt to present a consistent view on evolution. The Authors aim at separating the evolution of USS from that of transformation itself, which proves difficult since the advantages of acquiring homologous DNA in nature are extremely well documented.

Sure. The advantages of winning the lottery are extremely well documented too, but that doesn't mean that buying lottery tickets is a good investment.

P15 L 19-21: No data presented refers to the claims regarding deformation and kinking of DNA.

This is the Discussion. In the revision we'll mention DNA's persistence length and give a reference.

Reviewer #2:

Suitable Quality?: No

Sufficient General Interest?: No

Conclusions Justified?: No

Clearly Written?: Yes

Procedures Described?: Yes

Comments:

The prokaryote Haemophilus influenzae is naturally competent and the transport of DNA across the membrane is done through a secretion system derived from T2SS/T4P which requires a sequence signal known as the uptake signal sequence (USS). The manuscript describes an analysis of this uptake signal sequence. This has been done several times before. First, it was done experimentally following pioneering works in 70's. This was later redone by using genome sequences following the sequencing of the genome in 1995.

No, as we take pains to point out, the analysis of genome sequences does not tell us about the bias of the uptake machinery. One of the goals of our paper is to test the hypothesis that the genomic sequences accurately reflect the uptake bias.

Here, the authors have used a combination of the experimental approach and mass-sequencing to re-analyze the question. They generate a large pool of degenerated USS and compare this input pool with the one found in the periplasm of the cell. The difference between the two should provide the information on the bias of the secretion system for the USS. The scale of this experimental approach is novel for this problem, yet the results are not very different. The only systematic difference between the USS definition in the previous and this work relates with the average preference of the positions that are outside of the core of the signal. Overall, the authors find fewer preferences than expected, i.e. weaker signal for USS. Hence, genomic scans might have over-inflated frequencies of the consensus at these positions or this work may have done the inverse.

The uptake machinery is indeed related to Type 2 secretion systems, but it seems silly to call it a secretion system... It's hard to see how our experiments could have over-inflated the uptake bias, given that what we did was directly measure the effect of uptake.

1) This paper is often opinionated in an odd way. The difference between the USS obtained by the experiment and the ones observed from genome scans can be due to a number of issues, notably: other biases in subsequent steps, genome constraints and non-linear effects at transport.

The first is dismissed in one single sentence " The discrepancy is unlikely to be caused by undetected sequence biases at later steps of natural transformation (translocation of ssDNA to the cytoplasm and recombination into the chromosome). Such biases no doubt exist, but they are unlikely to amplify the specific biases of the uptake machinery. "

There's nothing wrong with using a single clear sentence to explain why a possible explanation doesn't apply.

The second is dismissed in the same way: " A similar argument applies to constraints acting at the level of genome evolution; natural selection certainly will have acted on uptake sequences that arose in coding regions or in positions where they could act as transcriptional terminators, but this is unlikely to have specifically strengthened the apparently weak uptake biases of the outer core and T-tracts."

So, for both arguments the authors sustain that these effects certainly exist but needn't be taken into account. In fact, there are a number of published reasons why these effects must be taken into account. It is well known that USSs accumulate in certain regions of the genome more than in others (for example, Smith, Res Mic, 99). In particular, they tend to accumulate more than expected in intergenic regions (three times more than expected in Haemophilus) and be part of rho-independent terminators. This means the genomic scans will fetch the bias associated with USS, but also the bias of intergenic regions (AT richness) and sequences flanking the core structure of the rho-independent terminator (stretches of T after the terminator and A before the terminator if it is a bi-directional terminator). This is the exact difference between figs 4 A and fig4 B and could thus explain the discrepancies. Therefore, the argument of the authors that the differences between USS in this work and genomic scans are not due to any of the two first causes is not convincing. It must be seriously sustained in some way.

This is a valid point. Years ago I did some analysis of how terminator function and coding functions affect the genomic USS motif (see this post especially). This analysis never got published, but the results are very significant in the context of the uptake analysis and we might want to include them in our revised manuscript.

Finally, the author's argument is that there are interactions between bases and this means the consensus needs not be as strong as thought. Characteristically, the sentence is " A better explanation may be that the uptake motif model described above is compromised by its assumption that each position contributes independently to uptake, i.e. that interaction effects between positions make no contribution. ". The reasons why this is a "better explanation" are not stated in the text. And I'm afraid I don't understand this point. If the consensus in the genome reflects only the incoming DNA and the filtering at the outer membrane (as the authors state) then the two consensus should be similar with or without interaction effects because the genomic consensus is the simple result of the initial consensus.

Doesn't this contradict the previous concern that other factors might contribute to the genomic consensus?

This is of utmost importance for the discussion of this article, because the deviation between the genomic USS motif and the one identified by the authors is the only biological novelty presented.

Not true.

2) All positions in the motif are less biased in the current experiment than in the genome scans. This also includes the core positions that are well known to be important biologically. Can't this discrepancy be simply the results of experimental error in DNA extraction? If there is contamination between extracellular (or membrane-bound) DNA and periplasmic DNA one expects to have a mixture of OM-filtered and non-OM-filtered sequences and therefore a weaker signal. Exactly, the observed result. Also, standard population genetics predicts that the genomic patterns should be *weaker* not stronger than import biases in a selfish model because selection occurs at the entry point. Genomic sequences will endure drift and thus USS should be weaker not stronger in genomes relative to OM-filtered sequenced. That motifs are stronger in genomes suggests selection for the best motifs for the bacterial benefit, not a selfish drive.

The reviewer is mistakenly assuming that the absolute information contents (in bits) of the two motifs are comparable. They're not because the input sequences were derived in very different ways. We had cut from the manuscript the sentence pointing this out; clearly we need to restore it.

3) The manuscript has a couple of errors that make its reading difficult.

In the legend of Fig 4 the panels A and B seem inverted relative to the description given in the text.

Oops, yes.

The first formula in page 19 should have + not -. The original formula is log2(N)-(-sum(slog2(s)) (see Crooks, Genome Res, 04) which makes log2(N)+sum(slog2(s)), thus sum(slog2(s*N)). My calculations suggest the calculations of the authors are correct and that just the formula is wrong, but this should be checked.

No, the formula and calculations are both correct.

In page 15, l16 the authors indicate that it might be an important mechanistic problem to pass the DNA through the narrow secretin pore. No reference is given for this. The family of Secretins is known to be extremely flexible. Secretin pores can transport folded proteins and even entire phage particles. Some evidence should be given (at least a reference) that pore size would be a problem to transport DNA.

Here we'll point out the persistence length, which greatly exceeds any reasonable estimate of the pore flexibility.

4) The analysis of interactions seems to neglect the effect that these positions are not independent within the design of the experiment. Because the experiment aims at defining regions with a certain degeneracy this should mean that if one position matches the consensus the other position under comparison is *less* likely to match the consensus simply by the design of the method (because degeneracy must be in some positions and it is not at the focal position). The significance of this effect should be checked since interactions are not very strong.

No. We didn't force every fragment to have the same number of mismatches (nor could we have). Our supplementary data (analysis of the sequencing of the input DNA fragments) shows that mismatches were randomly distributed among the input fragments.

5) The exact differences between the two USS are difficult to assess by the use entropic measures. The problem would be much more appropriately analyzed by using classical population genetics of selective processes, because that's exactly the process at hand if you replace natural selection by transport system selection.

Huh? I don't see any way to treat this as a population genetics problem. We've certainly done lots of population genetics in other papers, on other aspects of transformation and USS evolution, but I don't think it can be applied here.

We're using the lab next door's Bioscreen incubator to generate growth curves for our H. influenzae competence mutants. This machine collects optical density (OD) data from cultures growing in wells in special 100-well plates; you can control the temperature and the shaking parameters and the frequency of the OD readings. I used it a few months ago for my GFAJ-1 analysis, but now we're using it to see if any of our knockout mutants have altered growth properties. That would be interesting because it would suggest that the missing protein does something useful for the cell outside of competence, and might help us understand why some proteins in the competence regulon don't play any role in DNA uptake.

I first did two controls. The first one was to compare H. influenzae cultures started in several different ways - from fresh and day-old colonies, and at various dilutions, and to see how close the growth of replicate cultures is. I used 10 replicate wells for each culture condition, and the results were excellent - all of the growth curves could be superimposed, with enough resolution to even see a tiny 'diauxic' pause in the growth, where the cells were switching from an exhausted nutrient to one they hadn't used yet.

The second control really should have been done first - a test for 'edge effects' and other inconsistencies in growth between the different wells in the culture plate. I was concerned that different wells might experience slightly different growth conditions (temperature, oxygen, whatever). In my previous test I had semi-randomized the arrangement of the different cultures in the wells, to average out any effects. For this test I put aliquots of the same culture in all the wells. The results were worse than I'd expected.

First, here's the superimposed growth curves from all 100 wells in this experiment. The X-axis is time and the Y-axis is OD. Although we don't see any growth for the first 300 minutes, the cells are growing, but their density is so low that they don't cause detectable changes in the OD readings. Again we see the little diauxic pause, here at OD = 1.05. The cultures peak with a tiny growth spurt at OD 1.4 and then the OD slowly declines.

But here's what was going on in the individual wells. Each line in the graph below is the readings at one time point; if all the cultures were growing identically it would be a flat line. Instead we see regular rises and dips with a 10-well period, showing that cultures in some of the edge wells are growing slower than those in the internal wells. I don't know why we also see peaks in the green lines at the top - these suggest that some wells' cultures grow to a higher final density. I don't think this is due to slightly different volumes in different wells. If so it should go away if I subtract off the initial reading for each well.

A look at the innards of the Bioscreen suggests that these wells are probably a bit cooler than the others. The postdoc used his R awesomeness to make a heatmap showing the differences in growth rates for the time point indicated by the red line above, and I massaged it onto the schematic of the well layout, shown below. The wells on the right side aren't cooler, because the tray that holds the plates has space for a second tray on that side.

So now I'm doing 8 replicate runs for each of the competence mutants, and I'm using the wells that are orange or red in the heat map as control wells, putting plain medium with no cells into them. So far the results are very boring - all the mutants grow at the same rate.

I was just reading about the Wellcome Trust's open-access policy; only about 55% of grantees are complying with it. One of the commentors suggested making academic institutions (e.g. universities) responsible for the compliance of their researchers, specifically having each institution provide an open-access repository for its researchers' papers.

I know that UBC's librarians are very much in favour of open access, and I think there's some kind of repository. But I have no idea how it works, or how easy it would be for other researchers to find my papers if I put them there. So I started trying to find out.

First step: Google "UBC open access archive":

OK, a five-year-old blog post, a link to the 2009 Open Access Week page, a library-science course, a blog post last summer about Open Access Week, a wiki about a digital access system...

A page called Scholarly Communication at UBC looks promising.

I clicked on the Authors link, which got me general information about open access and the following:

OK, the Open Access resource page got me this:

OK, it looks like ciRcle might be UBC's open-access repository.

OK. Reading various ciRcle pages answers some of my questions. Materials deposited there are indexed by Google, Google Scholar, etc, so I guess they would show up when colleagues search for my papers. But putting stuff there might be a big pain. First, I have to belong to a 'Community' .

Hmm. The Faculty of Science is a Community on ciRcle, but my department (Zoology) isn't. My granting agency CIHR has Communities (or maybe they're Collections) for Research Outputs from 2008, 2009, and 2010 (to allow compliance with their open access policy), but nothing for 2011 or 2012. OK, now I'll try to deposit something to either the Faculty of Science Community or the 2010 CIHR Community. Then I can see how quickly it shows up in Google.

Well, it looks like I first have to register:

Registering was easy - it just wanted my name, email address and a phone number. But now I'm at step 6, which says I'm supposed to send them an email. OK, the email link goes to one of the librarians, so I'm asking her what I have to do next.

Later: Registering was pretty easy, and the librarian quickly set up a '2011 CHIR' community for my submission. And within a few days the submission was showing up on Google Scholar. This particular paper is open access anyway, so readers can get it at the source (PLoS Pathogens).

Now I just have to figure out what else I want to post here. Should I post all the same pdfs we've put on the 'What we've done' page of our web site? That might make them easier for readers to find. But will this be a more blatant form of copyright infringement, for papers that aren't open access? `