I'm David Rosenthal, and this is a place to discuss the work I'm doing in Digital Preservation.

Monday, December 10, 2012

Sharing makes Glacier economics even better

A more detailed analysis of the economics of Glacier sharing the same infrastructure as S3 than I posted here makes the picture look even better from Amazon's point of view. The point I missed is
that the infrastructure is shared. Follow me below the fold for the details.

S3's data is, in the jargon, hot. It has to be available with a low latency. Internally, Amazon has a lot of data with even shorter latency requirements. They could store this data in flash, but that is costly. Before the advent of flash, the only way to provide low latency for hot data on disk was to "short-stroke" the drives. Using only a small range of the tracks on the disk meant that the seek time between accesses to the data was minimized. But it was expensive.

Glacier's data is cold, Amazon is prepared for it to take several hours to access. Suppose the disk is shared between a small amount, say 15% of hot S3 data, and a large amount, say 85%, of cold Glacier data. S3 data generates at least 5.5c/GB/mo, or $660/TB/yr. Glacier data generates 1c/GB/mo, or $120/TB/yr. Consider a group of 3 3TB drives in service for 4 years. They will generate:

Total cost is $520, for a gross profit of $473/yr. Sharing the infrastructure is a good idea; the small amount of hot data gets good performance and the large amount of cold data is subsidized by the more expensive hot data.