We're had several successful little projects doing de novo assembly of phage genomes with 454, but in one case all we got was host contaminant and what looked like human mitochondria. Moral: do more QC on the sample before sequencing. Otherwise you can waste your sequencing money & some analysis time.

Semi-anonymous user names may discourage posts though. I'm sure people here could share horror stories of colleagues coming to them with "We've just done some sequencing, could you assemble it for us please" with no idea of the scale of the problem nor how much analysis time they should have budgeted for. Probably the best warnings would be saved for off the record conversations at the pub/bar at conferences!

Another one for you (not first hand): We updated tool X and repeated the analysis and now all the results have changed almost beyond recognition. I can think of some threads here along those lines discussing differential gene expression from RNA-Seq data.

Edit: To make my point more explicit (thanks Simon), the point is you should be diligent in your record keeping (electronic lab book or whatever works for you) and include the version number of key packages and datasets/databases since this can sometimes make a surprising difference to the results. This goes beyond high throughput sequencing, and applies to Bioinformatics as a whole.

Another one for you (not first hand): We updated tool X and repeated the analysis and now all the results have changed almost beyond recognition. I can think of some threads here along those lines discussing differential gene expression from RNA-Seq data.

To try to make a wider point - this is why we advocate getting our users to visualise and explore their data. Running a tool, however good it may be, tends to make people too trusting in the results produced. If you can actually view those results in a number of different ways then you get a much better feel for how much confidence they can have in the hits they see.

For example - you might find that changing an analysis threshold by a small amount can hugely change the number of hits you get, but if you can see a scatterplot of your data with the threshold you're using on the edge of a huge cloud of points then you can see exactly why this happens.

For example - you might find that changing an analysis threshold by a small amount can hugely change the number of hits you get, but if you can see a scatterplot of your data with the threshold you're using on the edge of a huge cloud of points then you can see exactly why this happens.

Excellent advice. Another related point is to avoid pre-determined e-values as thresholds when they will alter radically based on things like dataset size (e.g. BLAST matches - whereas the bitscore is stable). i.e. A discriminatory e-value for one dataset can be quite inappropriate on another.

Excellent advice. Another related point is to avoid pre-determined e-values as thresholds when they will alter radically based on things like dataset size (e.g. BLAST matches - whereas the bitscore is stable). i.e. A discriminatory e-value for one dataset can be quite inappropriate on another.