Highly scalable, simple, cheap, distributed storage for the cloud

InfoWorld|Apr 5, 2013

By the end of 2012, 1.3 trillion objects were stored in Amazon S3, the world's largest and most widely known object storage system. At the time, that number was growing faster than 1 billion objects per day, so the 2 trillion mark is right around the corner.

Object storage is vastly more scalable than traditional file system storage because it's vastly simpler. Instead of organizing files in a directory hierarchy, object storage systems store files in a flat organization of containers (called "buckets" in Amazon S3) and use unique IDs (called "keys" in S3) to retrieve them. The upshot is that object storage systems require less metadata than file systems to store and access files, and they reduce the overhead of managing file metadata by storing the metadata with the object. This means object storage can be scaled out almost endlessly by adding nodes.

Reliability is achieved on ordinary hardware and disk drives by replicating objects across multiple servers and locations. If you set up your own solution, such as with OpenStack Swift, you can configure the number of storage zones and replicas to suit your needs. (OpenStack recommends at least five nodes for a production system.) Amazon promises nine 9s of "durability" for standard Amazon S3, which translates into the loss of one file in 100 billion. If your data protection needs are not that extreme, you can save a few pennies with the Reduced Redundancy Storage option (two 9s of durability).

The features you get in an object storage system are typically minimal. You can store, retrieve, copy, and delete files, as well as control which users can do which, and that's about it. If you want search or a central repository of object metadata that other applications can draw on, you'll generally have to implement it yourself. Amazon S3 and other object storage systems provide REST APIs that allow programmers to work with the containers and objects. SoftLayer is the rare public cloud that provides search of its object storage to users.

Finally, the HTTP interface to object storage systems allows for fast, easy access to files for users from anywhere in the world. (For example, every file in Amazon S3 has a unique URL based on the Amazon location, the name of the bucket, and the name of the file: https://s3-us-west-1.amazonaws.com/objectstorage1/object_storage.rtf.) You'll wait longer than you would accessing a file from NAS, of course, but you can't beat the convenience.

In addition to the significantly slower throughput, compared to a traditional file system, the other big drawback of object storage is that data consistency is achieved only eventually. Whenever you update a file, you may have to wait until the change is propagated to all of the replicas before requests will return the latest version. This makes object storage unsuitable for data that changes frequently. But it's a great fit for all the data that doesn't change much, like backups, archives, video and audio files, and virtual machine images.