Hadoop as a cloud service

When tasked with crafting a company wide strategy on whether to process big data in-house or outsource it to the public cloud you might rightly ask “Whose cloud are we talking about?”

It struck me that every major vendor whether it be Amazon, Microsoft, Google, IBM or Rackspace appears to have (or plans to have) a Hadoop/big-data offering in the cloud.

Take Amazon for instance. Amazon offers “Elastic MapReduce” as a cloud-service running along with Apache (or MapR) Hadoop on EC2 cloud platforms using S3 storage. This is interesting to companies who might already be storing corporate data on S3 storage, they can now pull this data into Hadoop Distributed File System (HDFS) running on Elastic Compute Cloud (EC2) to do their processing using MapReduce. However if you don’t use Amazon Simple Storage Service (S3) storage today you might wonder what other alternatives exist.

Microsoft appears ready to roll-out a service called “HDInsight” using Hortonworks Hadoop on the Azure platform using storage like Azure blob storage underneath. While Microsoft has been successful getting big name wins like the US Environmental Protection Agency (EPA), Toyota and California’s Santa Clara County for its Office 365 service, it is not clear how many customers will want to run Hadoop in Azure. However the Hive ODBC driver from Microsoft and Hortonworks opens up a world of possibilities to Excel and PowerPivot users who would use these familiar tools to query data within Hadoop running in Azure.

Smaller providers like Rackspace talk of planning to roll out a Hadoop service over OpenStack using their existing EMC and NetApp storage. Rackspace would use their support offerings as a way to differentiate from other cloud providers.

I was surprised to notice that IBM has “up leveled” the conversation to typical use-cases. Rather than focus on how IBM BigInsights would run on IBM SmartCloud on x86 servers running Linux and having IBM SONAS and Storewize 7000 technology powering all this, they’ve chosen to hide most of this tech-trivia and instead focus on use-cases. One example they provide is of a telecom service provider using IBM BigInsights to identify fraud and prevent customer churn. The fact that this involves using IBM’s templates for InfoSphere Streams which is an add-on in the BigInsights service isn’t of interest to customers. What a CIO at a telecom provider would want to know is “How does this solution in the cloud help me impact my company’s bottom-line by reducing customer churn?”. I think the year ahead promises to be an interesting one for enterprise customers as cloud providers refine their Hadoop-as-a-cloud-service offering and make it easier for customers to derive meaningful insights from this big data in the cloud.