Thursday, July 9, 2009

Combining data from independent simulation runs using a bash script

Today I came across a problem that I have solved several times before. From my simulations, I generate a bunch of files called stat1, stat2, ... statN, which contain the following data:

$cat stat1567.20 0.8845.29 3.08296.58 21.500.33 0.14

The first column are some properties in a particular simulation run, and second column is the standard error. The "N" different "stat" files are N independent simulation runs. When I finally report, I like to report the average properties and associated standard errors. The following shell script DataAgg.sh creates a new file TotalProp which contains exactly that.

Note I don't need to know how many "stat"s there are, and how many rows each of the "stat"s has. The only precondition is that I know what the common prefix ("stat") of my datafiles is, and that those files contain only the two numerical columns mentioned above.