Open Source Project Takes on Hadoop Storage Scalability Issues

Necessity is often the mother of invention, so when it comes to managing Big Data these days, it shouldn’t be surprising to discover that a lot of organizations are relying more on their own initiative.

A perfect example of that initiative is a new open source project that makes available a new file system for Hadoop. Quantcast, a provider of an analytics service for measuring Web traffic, created the Quantcast File System (QFS) to address storage scalability issues in Hadoop environments. According to Jim Kelly, vice president of research and development at Quantcast, the QFS is a more efficient approach to managing storage than the Hadoop Distributed File System (HDFS) that comes native with the Apache distribution of Hadoop. QFS is a derivative of the open source Kosmos File System (KFS), which is also known as CloudStore.

Kelly says the key difference between QFS and HDFS is that Quantcast rebuilt the sorter in Hadoop and added a more accessible application-programming interface. QFS runs more tasks in parallel and implements an error recovery mechanism based on a Reed-Solomon error correction algorithm that reduces storage costs by better reclaiming empty space of disk drives attached to a Hadoop cluster. According to Kelly, that has enabled Quantcast to reclaim as much as half the disk space in its Hadoop cluster, which not only reduces storage costs, but also reduces physical space requirements in the data center and the amount of energy consumed.

Quantcast developed QFS to deal with the more than 40TB of data the company is pumping into its Hadoop cluster every day. While most organizations are still piloting their Hadoop projects, Kelly says it’s only a matter of time before they encounter the same scalability limitations of HDFS that Quantcast did.

Given the fact that storage vendors don’t have a real financial motivation to come up with technologies that serve to reduce storage consumption, Kelly says Quantcast felt the time had come to build a larger community to support the continuing development of QFS.

The issue that this move by Quantcast gets at is whether organizations can really trust storage vendors to address one of the most vexing and costly challenges in all of IT. In theory, competition between storage vendors should result in more efficient systems that help rein in storage costs. What’s not apparent is whether that pace of innovation is being stifled by incremental, rather than major, improvements in storage technologies that wind up only addressing part of a storage management problem that is becoming critical, without really solving the whole problem.

In fact, it’s that slow pace of innovation across the industry that appears to be giving birth to a multitude of open source projects that are not only less expensive technologies to adopt, but increasingly are becoming the driving force for software innovation across the entire IT industry. In other words, there are increasing signs that out of sheer frustration, IT organizations are increasingly losing patience when it comes to waiting for vendors to solve their problems.