Larry Wall, the creator of the programming language Perl, laid out in his book, Programming Perl, the virtues of a programmer. The first in this list of three was laziness.I am currently working on a method to assess statistical power provided by genetic loci, in an attempt to select the most efficient set of markers possible to address a specific question about particular populations. The general method involves the estimation of a few locus specific population genetics parameters and then the use of these values to rank loci in term of informativeness. These different ranks are then tested to assess their ability to predict statistical power (i.e. does one parameter provide a better power curve than another?).The problem I have been having is, the power analysis requires hundreds of different input files. I started out optimistically by manually creating these files with the help of MS excel. My optimism didn't last past coffee break. In the three hours or so before coffee, I only managed to create 42 input files out of a required 320. I calculated my completion time to be around 23 hours. I've never done anything for 23 hours in my life let alone something as tedious as this. I decided that this was a job for my computer to do on it's own, so I set about writing some R code that would take less than 23 hours to write and run!!! As a novice programmer with little experience this might be a challenge.....

Well I overestimated the problem. It took me about 30mins to write the following and a further 2 mins or so to run:

POWSIM.create.R

powsim.file.create x=infile #read the fixed header lines (1:14) hdr #count the total numbe rof lines nlines #define the data as the total numer #lines - the first 14 fixed header #lines data #create the sequential input files for(i in 1:(nlines-14)){ #open a file connection with a specific names out for(j in 1:14){ #this is an element in the header which #needs to change as it define the number #of loci present in the file if (j == 7){ cat(i,"\n",file=out,sep="") }else{ cat(hdr[j],"\n",file=out,sep="") } } for(z in 1:i){ cat(data[z],"\n",file=out,sep="") } close(out) } }

The code isn't necessarily the most efficient way to do what I want to do, but it works for me. It can produce 42 input files in less than 2 seconds giving me a new completion time of 16 seconds rather than 23 hours. In fact I have already finished the analyses, 2 days ahead of schedule!!!!!!!!The moral of the story is useR (or any other language you like), it gives you so much time to waste doing other thing, like blogging instead of working.

Leave a Reply.

About the authors

Kevin Keenan is currently working towards a PhD in population and evolutionary genetics. His general research interests include; small scale intraspecific genetic divergence, speciation and phylogeography.

Uncle Mick is currently reading genetics at the University of Manchester.His interests include; the role of non-coding RNA in regulatory processes, molecular processes associated with sequencing technology, and the interactions between genotype and phenotype.