But retrieving data takes three to five hours, and is potentially costly.

Amazon Web Services has always been about delivering IT on demand. Spin up a virtual server, or a few thousand, anytime you'd like. Store and access as much data as you need to your heart's content.

But even in a Web-driven world, there is need for services that don't offer instant results, but will be around for eternity (or as close as possible). So today, Amazon introduced Glacier, a data archival service that will store data for one penny per gigabyte per month. As befits its name, Glacier is designed to last for a long time, but is slow: accessing data will take three to five hours. Amazon hasn't detailed exactly what technology is storing the data, but massive tape libraries are a good bet given the lengthy retrieval windows. A ZDNet article interprets one Amazon statement to mean that the company is actually using "a multitude of high-capacity, low-cost discs," but this has not been confirmed. An Amazon statement sent to Ars says only that "Glacier is built from inexpensive commodity hardware components," and is "designed to be hardware-agnostic, so that savings can be captured as Amazon continues to drive down infrastructure costs."

We also don't know exactly how Amazon measures the reliability of its storage, but the company is promising 11 nines of annual durability (99.999999999 percent) for each item, with data stored "in multiple facilities and on multiple devices within each facility."

While Amazon says "Glacier can sustain the concurrent loss of data in two facilities," there is still risk data could be lost forever. If you store 1TB, Amazon's promised durability rate suggests you can expect to lose an average of 10 bytes per year. Amazon is betting that will be an acceptable risk for the service's low price.

Amazon CTO Werner Vogels described the new service in his blog, saying, "Building and managing archive storage that needs to remain operational for decades if not centuries is a major challenge for most organizations. From selecting the right technology, to maintaining multisite facilities, to dealing with exponential and often unpredictable growth, to ensuring long-term digital integrity, digital archiving can be a major headache. It requires substantial upfront capital investments in cold data storage systems such as tape robots and tape libraries, then there’s the expensive support contracts—and don’t forget the ongoing operational expenditures such as rent and power."

As mentioned, pricing is one cent per gigabyte per month, although that can go up to a whopping 1.1 cents if you store in Europe rather than the US, and up to 1.2 cents for storage in Japan. There is no cost to transfer data into the service over the Internet, but some customers transferring large amounts of data may end up paying for Amazon's import/export service, which involves portable storage devices shipped from the customer to Amazon.

Retrieval of storage is free if you're only grabbing 5 percent of your data per month, an Amazon announcement notes. After that, data transfer fees start at 1 cent per gigabyte, but vary widely based upon what region you're in. For example, here are the data transfer prices from Amazon's East Coast region:

For accessing 10TB, that works out to $1,200 after you've exhausted the free allotment. Transfer prices go up significantly if you're in the Asia-Pacific region, and hit their highest point in South America:

Complicating matters, the amount customers pay also takes into account hourly retrieval rates, as detailed on a Glacier FAQ.

For data that must be retrieved quickly, Amazon has long offered its Simple Storage Service (S3). Because of how the two services are priced, Amazon said that S3 will in many cases be the more cost-effective option "for data that you’ll need to retrieve in greater volume more frequently."

Glacier is really for the data you can't delete (perhaps for legal and regulatory reasons) but will hardly ever need. In that sense, Amazon is trying to displace the giant tape libraries enterprises build, or offsite archival vendors. While the service has quite a different purpose than Amazon's traditional cloud businesses, Glacier can be managed from the same console as S3 and Amazon's database services. Sometime "in the coming months" Amazon customers will be able to automatically move data between S3 and Glacier based on data life-cycle policies, much like enterprise storage systems automatically move infrequently accessed data to cheaper tiers of storage.

While Vogels' blog said Amazon will be able to meet enterprises' regulatory needs (in part with AES-256 encryption), the service also caters to small businesses without a good archiving plan, historical and research organizations, or people who work in digital media.

"Although archiving is often associated with established enterprises," because of the high upfront costs and ongoing maintenance, Vogels wrote, "many SMBs and startups have similar archival needs, but dedicated archiving solutions have been out of their reach."

Promoted Comments

This looks very interesting. I'm currently using JungleDisk/Amazon S3 to do offsite backup of my RAW images paying $0.125/GB/Month. I don't need fast access to the data, as (hopefully) the only time I'd need to access it would be if there was catastrophic failure of my main storage, and local backups (fire/flood/theft).I'd love to see this either offered by JungleDisk natively, or another app geared toward offsite storage of Aperture / Lightroom libraries.

Looks like a sensible expansion of Amazon's existing investments in data storage, and should increase their attractiveness as a solution for a wide range of different problems. The pricing looks competitive, although I'm slightly surprised they're purely offering it by the month (yearly contracts seem more appropriate for something like this). The "Amazon Import/Export" mentioned in the article (and the blog post) is also very nice to see:

Quote:

*Amazon Import/Export – for those datasets that are too large to transmit via the network AWS offers the ability to up- and download data from disks that can be shipped.

CrashPlan offers a (now outdated and overpriced) option for this, and it's something I'd love to see catch on more widely. Someday maybe more countries will actually invest in network infrastructure and with gigabit links everywhere this sort of thing will be obsoleted. Until that happy day arrives though a lot of people, or even major businesses, are stuck on relatively crappy links, and while they might be able to handle daily deltas the initial seeding of terabytes of data can range from painful to effectively infeasible. "A delivery truck full of hard drives" might have just a touch of latency, but the bandwidth is pretty good .

This sounds like the most feasible answer. It doesn't seem profitable enough to build a completely new system of infrastructure to support Glacier, but it does look like a potential solution to for any extra space Amazon already has.

I suspect the monthly versus yearly payment is to try and meet every level of demand, to see who picks it up. Even for like 3 months of 100 GB, this looks to be an affordable process.

"Building and managing archive storage that needs to remain operational for decades if not centuries is a major challenge for most organizations. From selecting the right technology, to maintaining multisite facilities, to dealing with exponential and often unpredictable growth, to ensuring long-term digital integrity, digital archiving can be a major headache. It requires substantial upfront capital investments in cold data storage systems such as tape robots and tape libraries, then there’s the expensive support contracts—and don’t forget the ongoing operational expenditures such as rent and power."

Is this a common business problem? "Would like to have" is not the same as "need to have".

In what context would a business need to retrieve data that's 25 years old that wouldn't already be converted to the format/technology of its time? Even health records aren't universally timeless, and the stuff that does tend to be timeless tends to be small. (Vaccination records, etc.)

So we're already using the word "traditional" when describing cloud services? Shouldn't we wait a few more years, at least?

And as for this being a really long term storage, I don't see as how it could be with losing 10 bytes per year per TB. That sounds low, until you realize that with some info, just the loss of a few bytes can render the entire thing useless.

At that price point and speed, I doubt it. Centeras are plenty faster than that, and considerably more expensive even at volume pricing. I'm guessing tape arrays, as the author suggests, are unlikely as well because it doesn't align well with their other service offerings. I would guess they have extra disk capacity available in their massive server infrastructure and are using this as a low tier of disk or something interesting along those lines.

Looks like a sensible expansion of Amazon's existing investments in data storage, and should increase their attractiveness as a solution for a wide range of different problems. The pricing looks competitive, although I'm slightly surprised they're purely offering it by the month (yearly contracts seem more appropriate for something like this). The "Amazon Import/Export" mentioned in the article (and the blog post) is also very nice to see:

Quote:

*Amazon Import/Export – for those datasets that are too large to transmit via the network AWS offers the ability to up- and download data from disks that can be shipped.

CrashPlan offers a (now outdated and overpriced) option for this, and it's something I'd love to see catch on more widely. Someday maybe more countries will actually invest in network infrastructure and with gigabit links everywhere this sort of thing will be obsoleted. Until that happy day arrives though a lot of people, or even major businesses, are stuck on relatively crappy links, and while they might be able to handle daily deltas the initial seeding of terabytes of data can range from painful to effectively infeasible. "A delivery truck full of hard drives" might have just a touch of latency, but the bandwidth is pretty good .

That sounds low, until you realize that with some info, just the loss of a few bytes can render the entire thing useless.

I had the same thought. Especially with multiple redundant sites and possibly parity too -- i don't see why this shouldn't be 0. I wonder how frequently they will compare the same data stored different places to check for errors.

So we're already using the word "traditional" when describing cloud services? Shouldn't we wait a few more years, at least?

And as for this being a really long term storage, I don't see as how it could be with losing 10 bytes per year per TB. That sounds low, until you realize that with some info, just the loss of a few bytes can render the entire thing useless.

I wondered about that figure, too. I don't think it's entropy over time or whatever, but instead it's an average incorrectly applied to a large dataset that would be more accurate stated as a probability of data loss, rather than an assertion of some fixed amount of data loss per TB stored. I.e. data loss will be "this 1GB file is absolutely toast!" in a big heap of petabytes rather than "I lost 10 bytes on my TB archive!"

If you're right, though, the math isn't particularly terrible. You could construct all possible combinations of bits in a 10 byte space in a fraction of a second, and then apply each possible configuration to your missing sections (assuming you can ID them) to see if you end up with a functioning artifact. I can't imagine it'd be much different than modern spinning disks doing their on-the-fly calculations. (Rather than zeroes and ones, you've got probabilities of this bit being zero or one. (Particularly as densities increase.))

The access delay is probably artificial and arbitrary, primarily to discourage potential S3 customers from switching to Glacier. It may also be used to decrease the mechanical wear rate on the (presumably) cheap commodity drives. If you only power up the drives for an hour a day they can last a very long time.

First person to write a nice simple OS X client for backing up my photos and videos to this gets my money.

iCloud?

iCloud maxes out at 50GB and that's $100/year. That buys 800+ GB of storage in Glacier.

I already have a pretty sound backup strategy (local time machine + a backup drive I keep at work that comes home for updates quarterly). But this seems perfect for an affordable failsafe that I would never expect to actually use. (and as such don't care about slow retreival speeds)

This sounds like a great option for me as a photographer and designer -- I just need safe off-site storage as a secondary backup in case of catastrophic backup failure at home. Data I would only need to access in event of a house fire or something, and way cheaper than Crashplan or a similar service. Sign me up.

Amazon CTO Werner Vogels described the new service in his blog, saying, "Building and managing archive storage that needs to remain operational for decades if not centuries is a major challenge for most organizations. From selecting the right technology, to maintaining multisite facilities, to dealing with exponential and often unpredictable growth, to ensuring long-term digital integrity, digital archiving can be a major headache. It requires substantial upfront capital investments in cold data storage systems such as tape robots and tape libraries, then there’s the expensive support contracts—and don’t forget the ongoing operational expenditures such as rent and power."

In the time frames that this article talks about, the bottleneck is not any of the above, it's the lifespan of not just the business that owns the media, but the legal and regulatory framework that helps DEFINE the business.

If, 30 yrs from now, Amazon is defunct and their assets sold off without a clearly defined chain of responsibility, none of the above matters.

First person to write a nice simple OS X client for backing up my photos and videos to this gets my money.

Exactly. I'll definitely be looking into this to backup my photos/videos/old documents. I had mozy for a year but found it hard to justify the price. But local backup doesn't protect you if your house burns down. This is a great option for that stuff.

Is this a common business problem? "Would like to have" is not the same as "need to have".

In what context would a business need to retrieve data that's 25 years old that wouldn't already be converted to the format/technology of its time? Even health records aren't universally timeless, and the stuff that does tend to be timeless tends to be small. (Vaccination records, etc.)

Would IBM ever need to query a business dataset that's 50 years old?

I don't know about IBM, but engineering design drawings and often calculations are generally required to be retained in perpetuity. When the next bridge collapses, the original construction documents need to be available.

When you build structures that are supposed to last for centuries,m the documentation should too

I'm with Puddleglumm though not interested in MacOS X... a plug-in utility for my Synology Diskstation to do this in the background would be ideal. This sounds like an ideal service for last-chance disaster recovery when your own backups fail.

First person to write a nice simple OS X client for backing up my photos and videos to this gets my money.

This already exists (and has for a while). It's called CrashPlan+ (or one of the other similar services). As well as being cheaper the use profile fits better. While you could use this for what you describe, it's not really the right tool for the job. It's more aimed at replacing in-house tape archival systems or similar, where data needs are massive.

bk0 wrote:

The access delay is probably artificial and arbitrary, primarily to discourage potential S3 customers from switching to Glacier.

Doubtful. If they are using tape in any part of the system, then it will be robotic and there will be a significant physical delay, as well as limits on how how many requests can be handled in a given unit of time. Even without tape, in order to hit ultra cheap price points it would be entirely reasonable for Amazon to skimp heavily on network infrastructure, caching, and all the other stuff necessary to make a thick, low latency pipe to a data pool.

xoa wrote:

"A delivery truck full of hard drives" might have just a touch of latency, but the bandwidth is pretty good.

Just because I hadn't checked for a while, quoting myself with a bit of back-of-the-napkin math. Assuming a seven day delivery time (cheap and slow driving) with a standard semitrailer packed with 3TB drives (50% volumes added for packing), the "bandwidth" would be the equivalent of a 97 Tb/s link. Whee!

What does it mean that it will take 3-5 hours to access the data? Is there an Amazon employee that has to locate the medium your data is stored on in a cave and plug it in?

If it's tape or offline disk, there is manual (or robotic) effort and there could be many requests that have to be gotten to before yours.

If tapes are involved, robotic tape picker and/or tape drive availability would presumably be the bottleneck. Whether you're restoring 10 kb of data or 10 gb, the robot still has to physically move and pick up the tape from the library shelf, and then move it to an available tape drive and load it. Then the tape drive has to wind the tape to the location where your data is written. LTO-5 tapes are 846m long, according to Wikipedia. That's a lot of time just positioning the tape. Who knows how many data retrieval requests they will get in a day? (New data can probably be cached to disk and written to tape when things are slower.)

And as for this being a really long term storage, I don't see as how it could be with losing 10 bytes per year per TB. That sounds low, until you realize that with some info, just the loss of a few bytes can render the entire thing useless.

This is a case where averaging it out is kind of silly. It's more like this - years will go by with no data loss at all. Then that incredibly rare triple-media-failure event occurs, and Glacier loses a few terabytes. If you average it out across all customers, it may well come to 10 bytes out of every TB. But really, 1 or 2 unlucky customers will lose data - the ones whose data happend to be on thoes tapes/disks/whatever.

This does look really great to backup all of my RAW files & such. Only issue I can see is you pay $0.05 per 1K requests which includes uploads. So, if you have boat loads of small files, it'll cost a decent bit to store them all individually.

Hmm...just ran the numbers, for 107K files, that's something like $5.35. A one time cost, and hardly a huge fee. NM, a simple client that stores each file individually will work just fine. Plus, if you do it right, you can keep every single version of your XMP/Catalog file so you can even go back in time easily. Granted, it's some additional space, but if you keep only 6 months backed up, it's not too bad. I might just have to hop on this bandwagon on top of my current CrashPlan. Of course...maybe I should wait 6 months to see what screw-ups happen.

It seems to me that the sort of application this is ideally suited for - medical and legal record keeping - is exactly the type of thing that can't take advantage of an online service like this because of privacy laws. Case in point - I'm currently involved in the digitization of a dental records archive stretching back to the 1960s. Because they can be used for identification by the coroner, we are required to keep them for 75 years. But we are also legally prohibited from storing them in another country, and especially in the US. Unless Amazon is willing to provide iron-clad guarantees of where the data will be kept, this service is completely useless to us.

Does Amazon have a crashplan/mozy style auto uploader for backups or is it strictly manual data set selection? For my home movies, photos, and documents I can easily live with hours for restore, and the price is so much less than mozy which I use now. Though if I have to remember to send files to amazon...it loses the benifit.

First person to write a nice simple OS X client for backing up my photos and videos to this gets my money.

This already exists (and has for a while). It's called CrashPlan+ (or one of the other similar services). As well as being cheaper the use profile fits better. While you could use this for what you describe, it's not really the right tool for the job. It's more aimed at replacing in-house tape archival systems or similar, where data needs are massive.

I've got CP+, works great, but I could easily see this as a secondary backup & archival system. It's very low cost, and stuff like RAW files never changes, just the XMP/LR Catalog. Heck, I could only keep 2 years worth of images locally, and archive the rest up to Glacier. Just keep around smaller JPG thumbnails. If I need the files, I just pay what I need to to retrieve them if it happens to work out to more than the 5%. If I'm really paranoid, create 2 Vaults in different regions with the same files plus CP+. Add in an uploader that adds proper CRC & ECC codes to each file that it uploads, and Glacier is looking pretty great for super long term storage.

First person to write a nice simple OS X client for backing up my photos and videos to this gets my money.

I'm in on this too.

If you guys can line up $10,000 I'll do it.

Seems to me that this would be a perfect project to put up on one of those crowd-sourcing sites (Quirky, etc)

I would bet that Amazon would even consider funding the development if they knew that they were getting the long-term commitment from the clients. They are targeting the big boys, but there may be a market for the home user, and in fact, in some cases, this could play very nicely as a parallel to Dropbox - stuff you need right away stays in Dropbox, stuff you can wait for stays on ice.

Taking the thought further...perhaps a unified client for OSX and Windows (Dropbox and Glacier)?