Approaches

Approaches to DNA motif discovery

Analyze a DNA sequence for the frequency of k-mers (sequences of length k). Then, sort these results from highest frequencies to lowest frequencies. This approach was not very accurate when tested on several DNA sequences since it could not differentiate between “standard” biological motifs and specialized motifs.

Compare a biological sequence with another sequence of the same size, however randomly generate the second sequence. The number of motifs found in the biological sequence will be compared against those found in the random sequence, to test for statistical significance. When tested on known motifs within sequences, this approach had better success than the first, however it was not reliable. The frequencies of nitrogenous bases (A, C, G, or T) within the random sequence did not reflect those generally found in biological sequences

Compare a biological sequence to its randomly shuffled counterpart. Such an approach ensures that the random sequence will have the same letter frequencies as the biological sequence, leading to a much more accurate analysis of the motifs. ​