never underestimate the power of a genome pig

Modern genome assembly and analysis takes in, and deals with very large data. My team runs, manages and patches literally petabytes of storage, but sometimes we need less disk, and much more thought. It is really hard juggling this amount of DNA on a regular basis - we push this same data through 100's of algorithms and heuristics on a given day, every step is just another headache for us all.

However, as a community we also tend to use compression as much as practically possible, and I often forget that gzip can be run in parallel thanks to the most awesome pigz, written by Mark Adler who as a member of JPL (I visited their offices, totally amazing!) he has really cared about data compression! Mars is a long way away, compression on space missions is even more important - remember we had disk like this back in the day. Mark also helped PNG become part of our culture. Wonderful algorithms!

Now we compress that fairly hefty 30G file... we use -p24 to make the "pig" run 24 way parallel. Gzip is a classic embarrassingly parallel problem, so it works really well with trivial threads as applied in the pigz algorithm: