I'm David Rosenthal, and this is a place to discuss the work I'm doing in Digital Preservation.

Wednesday, December 28, 2011

Adding cloud storage to the economic model

The next stage in building the economic model of long-term storage is to add the ability to model cloud storage, and to use it to investigate the circumstances under which it is cheaper than local storage. The obvious first step is to collect historical data on cloud storage, to compare how rapidly it is decreasing against the Kryder's Law decrease in disk cost. The somewhat surprising results from looking at Amazon S3's price history are below the fold. I'd be grateful if anyone could save me the trouble of getting equivalent price histories for other cloud storage providers.

When Amazon launched S3 in March 2006 they charged $0.15 per GB per month. Nearly 5 years later, S3 charges $0.14 per GB per month for the first TB. For the first TB this is a price drop of less than 1.5%/yr. For the first month of the first TB of storage, you will pay $140, Even after the impact of the Thai floods, a 1TB Western Digital Green drive is $100 at Fry's. If we continue to assume that the media represent 1/3 of the 3-year cost of ownership S3 would cost over $4800 for a TB over 3 years where a raw local disk would cost $300, a factor of 16 difference.

What Amazon seems to have been doing is using the drop in storage prices to keep the price of a small amount of storage stable and introducing new, cheaper tiers for large amounts. At launch, a PB would have cost $15K/month. At current prices, it would cost about $10.3K/month, a drop of 31% over nearly 5 years, or about 10%/yr. Above 5PB the cost is now $0.055 per GB per month, only 27% of the launch price. Nevertheless, over the next 3 years a PB would cost about $3.36M versus about $300K assuming current inflated 1-off retail prices for local disk and the same assumption about other costs.

We can make two conclusions from this quick look at S3 pricing. S3 is competitive with local storage over the medium terms only if:

extremely large demands for storage can be aggregated

and either Amazon starts decreasing the cost of a given tier rather than simply adding lower cost tiers, or the Kryder's Law decrease in disk costs slows dramatically.

Amazon is pricing against value for smaller users, and pricing somewhat closer to cost for large users. Most S3 customers obviously value things other than cost of ownership.

Services such as Duracloud that act as brokers between customers and cloud storage providers thus depend in the medium term on aggregating very large, and rapidly increasing amounts of storage, and are assuming that cloud storage provider pricing policies change to more closely reflect media costs. Storing a TB in Duracloud for 3 years would cost $21K, a factor of 70 over the cost of raw local storage. Storing a PB in Duracloud for 3 years would cost over $3M, suggesting that they have negotiated favorable pricing with Amazon, or are using cheaper providers, or are using their current pricing as a loss leader to attract enough demand to get themselves into the cheapest tiers.

Of course, cloud storage providers such as S3 provide replication to enhance reliability, and brokers such as Duracloud or Oxygen Cloud layer additional services on top. We should expect them to cost several times the cost of raw local disk. But the factors are large, and at least for S3 appear to increase significantly as the price of disk decreases.

I agree, as I said in the last paragraph, that we should expect services such as S3 to cost some factor more than raw storage due to replication and other services. The exact factor will range from less than 2 for N-of-M services such as Cleversafe to say 7. But what I see so far is significantly larger factors. Perhaps other cloud providers are much cheaper?

I'm just don't think the $100/TB@Frys number should be used anywhere as a comparison, unless it's a home PC. Then we end up with people throwing around $100K/PB. :)Why leave the replication part to the last paragraph?

I say all the above that having been in meetings where your name is used in relation with the 7 copies number, so with all due respect. :)

The reason for leaving the issue of replication to the last paragraph is because it is irrelevant in comparison to the difference between the Kryder's Law decrease in disk cost, and the observed decrease in S3's cost through time. If disk cost decreases 20%/yr, and cloud decreases 1.5%/yr for an extended period it will quickly swamp a factor of 2-5.

The point of this post is that at least S3's cost has not decreased anything like as fast as people expect storage costs to.

Note also that the $100 1TB drive at Fry's is retail, quantity 1. If you are buying disks in volume you should be paying much less than $100/drive. The factor over raw disk costs I use above should be conservative.

We need to establish a better cost model for DuraCloud, as noted in your post. The cost for the first TB is $7k per year ( due to the subscription fee) however the second TB is only $1k additional, so cost averaging to $4k per year per TB. The 3rd incremental TB brings the ave cost down even more and so on. Working with the team to figure out a better model. Any suggestions welcome!

Your post highlights a flaw in the DuraCloud pricing that favors larger users. The first TB is $7k per year, due to the subscription fee, however the 2nd TB is only $1k additional per year. Therefore lowering the cost per TB to $4k per year. We need to fix this so the subscription fee is incremental to the storage and not penalizing users with fewer TB. Suggestions welcome!