The USS-Perl project becomes the USS-drive paper

I think I'm going to start using the label "USS-drive" for the manuscript that describes our "USS-Perl" computer simulation model. That's because the focus of the manuscript will be on understanding the nature of the molecular drive process that we think is responsible for the accumulation of uptake sequences in genomes.

The plan is to combine the modeling analysis with our unpublished results on the bias of the uptake machinery and on the nature of the motif that has accumulated in the genome.

The broad outline will be as follows:

We want to understand how the bias of the uptake machinery can affect evolution of sequences in the genome, assuming that cells sometimes take up and recombine homologous sequences from closely related cells. And we want to examine this in the context of what is seen in real cells - the real biases of the DNA uptake machineries and the real patterns of uptake-related sequences in genomes.

So we will begin by properly characterizing the genome sequences using the Gibbs Motif Sampler. I've already done this analysis for all of the genomes that have uptake sequences. And we've done Gibbs analysis on different subsets of the H. influenzae genome (coding, non-coding, potential terminators, different reading frames), and looked for evidence of covariation between different positions.

We will also collate the published uptake data for H. influenzae and N. meningitidis and N. gonorrhoeae, adding our unpublished H. influenzae data.

And then we will present the model as a way to investigate the relationship between uptake bias and genome accumulation. A key feature of the model is that it models uptake bias using a position-weight matrix that treats uptake sequences as motifs rather than as elements. That is, it specifies the value of each base at each position of the motif. This means that we can evaluate both uptake-bias data and the genome-frequency data as inputs into the model. The uptake-bias data isn't really good enough for this, and I anticipate that the main focus will be using the genome frequency data to specify uptake bias in the model.

Because the model allows the matrix to be of any length, we can use it with the full-length H. influenzae motif (30 bp), not just the core. And because the model lets us specify base composition, we can also use it for the Neisseria DUS.