Is Amazon cooking up cloudy big data service?

More than Elastic MapReduce

Common Topics

There's speculation and scuttlebutt that cloud computing juggernaut Amazon is pondering fluffing up a big data crunching service as part of its Amazon Web Services subsidiary.

A blog at the New York Times suggests that the retailing giant, which has some of the best analytics in the world to run its online operation, might want to apply that expertise to an AWS service, much as it sells raw infrastructure and various computing, storage, networking, and database services today.

The idea certainly makes sense, particularly for the hundreds of thousands of businesses that use AWS for all or part of their IT infrastructure rather than using machines in their own data centers.

Amazon has a number of unique assets that can be brought to bear should it decide to offer a big data service. First and foremost, Amazon has a wealth of data that it gathers from its own operations and its affiliates, and makes these figures available on a very select basis to its largest retailers in a service called called Amazon Retail Analytics (ARA) Premium.

This is chock full of online shopper buying patterns that Amazon makes available to help retail affiliates better peddle their products. Amazon, of course, dices and slices this data to drive its own Amazon.com online store.

If you run your own website on AWS, using the EC2 compute, S3 storage and RDS database services, you also have the advantage of having all of the big data you might want to chew on inside the Amazon firewall and on the internal AWS network, so it would be somewhat easier to bring the clickstream and log file data from your AWS services into one place to feed to more Amazon machinery.

Amazon has been peddling the Elastic MapReduce service, an implementation of the open source Hadoop data muncher, on its EC2 and S3 services since April 2009, and big webby app providers including Yelp, Foursquare, Etsy and Razorfish are all using the service. Karmasphere has even created a graphical tool that plugs into Eclipse IDEs to do queries against data stored in the Elastic MapReduce service and to manage the virtual clusters running Hadoop.

The NYT speculates that Amazon might mix in payment security, fraud detection and product recommendation services as part of this hypothetical Amazon big data service. These are big data services that Amazon has developed over its history that would no doubt be useful to its partners – and maybe even its competitors.

Speaking of competitors, there is absolutely nothing that would stop any of the current suppliers of data warehousing and big data analytics tools from setting up shop on AWS and offering such services, whether or not Amazon itself wraps up such tools as an uber-service.

Such big data vendors would not, of course, have access to the ARA data, unless Amazon decided to sell it. But then again, IBM and Oracle would seem to be more inclined to run their various data warehousing and analytics software on their own clouds, but SAS Institute, Teradata and the handful of commercial distributors of Hadoop stacks – Cloudera, Hortonworks, MapR, and EMC are in there with IBM and Oracle – might be tempted to offer their wares as a service on AWS.

Amazon was contacted by El Reg for comment on this speculation and was not available at time of publication. ®