Order of Results

We're still working out the order in which we present the different results in our draft manuscript. The first parts of the Results are now the analysis of how uptake sequence accumulation has changed the frequencies of different tripeptides in the the proteome, and of why some reading frames are preferred.

The next decision is how we present the results that use alignments of uptake-sequence-encoded proteins with homologs from three 'control' genomes that do not have uptake sequences. We had tentatively planned to first describe the finding that genes that don't have uptake sequences are less likely to have homologs in the control genomes (call this analysis A), then move to the genes that do have homologs in all three control genomes, first considering the overall degree of similarity (call this analysis B) and then considering the differences between the places where the uptake sequences are and the rest of the proteins (call this analysis C).

But I was having a hard time seeing how to motivate the first analysis - it didn't easily connect with what came before. Now I think a better plan is to first describe analysis B, which shows that genes with more uptake sequences have lower similarity scores (actually identity scores - my collaborator and I disagree about the relative value of using similarity and identity scores).

By describing analysis B first, we can more logically introduce the strategy of using alignments to homologs from a standard set of control genomes, before discussing the genes that lack homologs. And the results of analysis B then motivate analysis A, looking at the genes that had to be omitted from analysis B because they didn't have all three homologs.