Cloudera Innovators Program Fosters Hadoop API Adoption

APIs tend to go wherever the data is, so with more data than ever showing up in Hadoop, it stands to reason that APIs would soon follow. At the Strata Conference and Hadoop World 2013 event today, Cloudera announced a Cloudera Connect: Innovators program under which developers are invited to build applications that invoke an API that Cloudera in partnership with DataBricks has created for its distribution of Hadoop.

According to Cloudera chairman and chief strategy officer Mike Olson, because of the relative cost of storing data in Hadoop compared to a traditional data warehouse it’s only natural that Hadoop is emerging as the primary data management platform in which data first shows up in the enterprise. The challenge now is finding a way to expose that data to developers that want to consume the information in real time.

That’s problematic, however, because Hadoop is based on a batch processing engine that developers can invoke directly via MapReduce of SQL. To solve that problem DataBricks stores data from Cloudera’s implementation of Hadoop in-memory, which developers can then call via a standard API.

As a data source Hadoop is clearly emerging as one of the most important new sources of data for developers in recent memory. The issue is finding a way to use Hadoop as a data repository capable of meeting the demands of real-time applications. Fortunately, advances in in-memory computing are making it cost effective to put subsets of Hadoop data in memory, which in turn can update the Cloudera distribution of Hadoop that relies on traditional magnetic disk drives for primary storage.

As part of an effort to also extend that strategy to the cloud, Cloudera today also announced Cloudera Connect: Cloud, which is a program under which Cloudera is working with cloud service providers to make Cloudera available as a cloud service.

Hadoop represents a massive opportunity to gain access to more data than most developers ever dreamed would be possible. The trouble is that amount of data can easily overwhelm any application. The DataBricks API represents a way to not only tame all that data, but also gain access to it in a way that makes Hadoop data relevant in the here and now versus the hours that are traditionally associated with anything involving batch processing.

Whether developers can create applications that need that level of real time interaction over an API is another matter altogether. But given the fact that time is still money even in an API economy, chances are that time, more than ever, is of the essence.