You can write your own program for this task and contribute to the benchmarks game by following these general instructions.

More specifically:

diff program output for this 250KB input file (generated with the fasta program N = 25000) with this output file to check your program is correct before contributing.

We are trying to show the performance of various programming language implementations - so we ask that contributed programs not only give the correct result, but also use the same algorithm to calculate that result.

We use FASTA files generated by the fasta benchmark as input for this benchmark. Note: the file may include both lowercase and uppercase codes.

define a procedure/function to update a hashtable of k-nucleotide keys and count values, for a particular reading-frame — even though we'll combine k-nucleotide counts for all reading-frames (grow the hashtable from a small default size)

use that procedure/function and hashtable to

count all the 1-nucleotide and 2-nucleotide sequences, and write the code and percentage frequency, sorted by descending frequency and then ascending k-nucleotide key

count all the 3- 4- 6- 12- and 18-nucleotide sequences, and write the count and code for the specific sequences GGT GGTA GGTATT GGTATTTTAATT GGTATTTTAATTTATAGT