The abundance of all tetra- and pentanucleotide sequences is calculated for a set of DNA sequence data comprising 767,393 nucleotides of the E. coli K-12 genome. Observed frequencies are compared to those expected from a Markov chain prediction algorithm. Systematic and extreme non-random representations are found for special sets of sequences. These are interpreted as arising from incorporation of a 2'-deoxyguanosine residue opposite thymidine during replication which, in special sequence… CONTINUE READING