leading and lagging DUSs

We're adding data about the uptake sequences in the Neisseria meningitidis genome (called DUSs) to our previous analysis of Haemophilus influenzae uptake sequences (USS). But my initial analysis of how these DUSs are distributed turned up something unexpected.

For H. influenzae, we've known for a long time that USSs are distributed quite randomly around the genome. Well, the spacing is a bit more even than perfectly random would be, and USSs are somewhat less common in coding sequences, but these qualifications not important for this unexpected result. In particular, the two possible orientations of the USSs occur quite randomly around the chromosome. (I've described this analysis in a previous post, though not the results.)

So I did the same analysis for Neisseria DUSs. I separated the forward and reverse-complement genome sequences sequence into 'replicated as leading strand' and 'replicated as lagging strand' halves. This was relatively simple because the origin of replication had been assigned to position #1. Nobody has mapped the terminus of replication, so I just picked the midpoint (bidirectional replication from the origin is expected). I wrestled with Word to find this midpoint, but I'll redo this using our 'splitsequence.pl' perl script to divide each sequence exactly in two. Then I used the Gibbs motif sampler (now running happily on our desktop Macs) to find all the forward-orientation DUSs in the 'leading' and 'lagging' sequences.

The surprise was that it found twice as many DUSs in the lagging strand as in the leading strand. After mistakenly considering that this could be because there were more genes on the leading strand (irrelevant because genes, and DUSs, occupy both strands) I decided that this must be because DUS were oriented differently around the genome, mostly pointing in the same way with respect to the direction that the replication fork passes through them. So I did a quick analysis of the locations of forward-pointing DUSs around the chromosome, expecting to find that they were more frequent near one end than the other, but they appeared to be evenly distributed, which would mean that the only other explanation is that I've screwed up.