bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"

A pedantic look at the cost of sequencing

A quick post to take a deeper look at the cost of sequencing, after Neil Hall wrote an excellent commentary on this (and other issues) in Genome Biology (“After the gold rush”).

We’ve all seen the graph, published by the NHGRI, on the historical cost of sequencing; in fact, we’ve seen it too much, in presentations, posters, grant applications and on the side of buses. In fact, it’s a major part of genomics bingo.

My first major grip with this graph is that the y-axis starts at $10k. The biggest value in the spreadsheet is $5292.39. The graph above is misleading, and makes the casual observer think the price used to be close to $10k per Mb (this is of course an illusion of the log scale). Whilst I appreciate that many graphs with log scales do increment in powers of 10, this is needless in this instance. Here is the graph with a slightly more informative y-axis:

The above graph of course tells the same story.

The second issue I have is with Neil’s assertion that the price went up. Of course, it did – from 6.5 cents per Mb to 7.3 cents per Mb, an increase of 0.8 cents per Mb. This was indeed an increase of around 12% in relative terms, but not a huge increase in real terms.

What the NHGRI spreadsheet doesn’t tell you is how they convert the “per Mb” cost into the “per genome” cost. Of course, they use a multiplier: but from 2001 – 2007, that multiplier is 18000 (equivalent to 6X of a human genome); for one data point, in January 2008, the multiplier is 30000 (equivalent to 10X of a human genome); and then for the rest of the data, the “next-gen” period”, the multiplier is 90000 (equivalent to 30X of a human genome).

These values aren’t in the spreadsheet, but they are mentioned in the web-page that refers to the data. This is not Maths, this is Biomaths; that is “Maths done in a spreadsheet with estimates from the real world that we put somewhere other than the data we’re calculating on”.

Looking at the rate of change

I want to finish by looking at the rate of change, that is, the change in price expressed as a proportion of the original price. This is what we see:

This graph may or may not tell a different story. The story is that yes, sequencing costs are coming down; but since late 2007, early 2008 the rate of changeof that reduction has been following an upwards trend i.e. over time, the reduction in cost from one period to the next has been increasing.

This could be an interesting pattern; or arguably, we should see 2007/2008 as an outlier and this is just the “proportional reduction” returning to the value range it was at before that outlier.

GA vs HiSeq

It is mildly interesting that introducing the Genome Analyzer in 2008 had a greater impact than introducing the HiSeq 2000 in 2010 and the V3 upgrades in 2011; that is until you realise the comparator for the former was “Sanger” and the comparator for the latter was “GA”, and relative to one another, the GA was more of a revolution compared to Sanger, than HiSeq was to GA.