A Tale of Two Object Stores

New storage systems have joined the ranks of object stores, but they're really more general purpose storage devices that use some object storage concepts.

As the volume of unstructured data they need to store has grown over the past few years, organizations have discovered that their data is pushing up to, or over, the limitations of classic block and file based storage systems. Object storage systems, such as Amplidata’s AmpliStor, Data Direct Networks' WOS, and Amazon’s S3, provide the ability to store huge numbers of objects across exabytes of disks.

More recently, the designers of new storage systems are using object storage concepts on the back end of their systems while actually providing more traditional block or file access.

Traditional object stores -- although it seems a bit strange to call object stores traditional -- present their data through RESTful, HTTP-based Get/Put APIs. Relieved of the overhead of maintaining a hierarchical directory structure and having to support in-place data updates with all the locking overhead that implies, object stores can more easily scale-out to enormous dimensions.

No current object store is as pure as Seagate’s new kinetic drives, which use a native key value store, actually maintaining data on the disk drive sequentially by key. Some of the best known object stores, such as OpenStack Swift and EMC's ViPR, run on top of much more conventional file systems, scaling beyond the limitations of file systems by spreading the objects across many of them. Others translate objects to block IDs before writing them to local SATA drives.

The new class of object storage systems isn't out to create hugely scalable systems with RESTful interfaces but to take advantage of the power and scalability of object storage to build block and/or file based systems. Rather than map an incoming file or object to an object in their back end, storage systems from vendors like Exablox, SolidFire and Coho Data break incoming logical volumes and/or files into smaller objects and then store those.

Several of these new-age storage systems break the data into fixed size objects of 4KB-64KB, calculate a hash for each block and then use the hash value as the URI for the data chunk, turning their back end into CAS (content addressable storage). Each node, and ultimately each disk drive in the system, is responsible for storing those objects over range of hash values.

Data protection is typically provided by assigning two or three disk drives, in separate nodes, to hold each range of hash values and replicating the object across them. As the more observant reader will have already figured out, systems using this type of small object CAS get data deduplication as a side benefit of the architecture.

Since the object back-end, like a more traditional object store, doesn’t modify objects in place when a volume or file is updated, new objects are created to store the new data and the file’s metadata is modified to include the new object. Logical volumes, and files, are defined by their metadata, which makes snapshots and file versioning essentially free in terms of both performance and capacity consumed.

The real question is whether the vendors of these systems should call them object stores. From a technical point of view, they do use object technology but when I hear object store, I think of flat name spaces, RESTful interfaces and essentially unlimited scaling, all with limited random access.

These new systems, all of which use a significant amount of flash, are more scalable general purpose storage systems that just happen to use object storage as underlying technology. Do I care? Sure, but I’m not convinced we should lump them in with the RESTful crowd.

Howard Marks is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage ... View Full Bio