Bash Statistics

November 10, 2014

10 Nov 2014 - Redwood City

I had a case where I needed to generate some simple statistics from an indel file which one of our bioinformaticians had procured for me. These files are huge, but instead of asking for the parsed file, I thought it might be fun to do this in bash. I’ll walk through an example.

Lets say I have a text file, filename, with the following content:

red apple
green apple
green apple
orange
orange
orange

I’m looking for some histogram data to yield a multidimentional array of items and frequency: