Analytics

Today at Spark Summit, Databricks CEO, Ion Stoica , announced its first product that is DatabricksCloud. This was one among two important announcements from Databricks today. The first one was that Databricks got series B round of funding 33M from Andreeseen Horowitz. But IMO the availability of DatabricksCloud is more significant.

A few months back, I had an opportunity of have teleconference with Ion. During the discussion, while talking on business model, he emphasized more on Databricks’s Spark certification service. He was (rightly though) ambiguous on other developments and business model. After the call, I discussed with a colleague about a possible product around IDE for programmers and data scientists and monitoring of Spark clusters. But what we saw today was much better.

Databricks Cloud has put Apache Spark on Cloud. Big Data on Cloud is not a new thing. However, what Databricks has provided is an interactive, SQL based Web tool for Data Scientists to play with data and visually see the output in different forms like trends, charts, etc. It also provides a powerful WYSIWYG dashboard builder.

With making Spark and Spark streaming available on Cloud, Databricks joins Google and Amazon, both of them have streaming services with analytic stack available on cloud for building real time analytics and dashboard. However, the key difference is that Amazon and Google built those services for programmers to write streaming applications. In contrast, Databricks Cloud is more suitable for Data scientists.

Databricks Cloud has web based interactive tool , called Databricks Notebook. Though the details of the technology it built on is not yet available (in fact Databricks website is still silence on the announcement), the concept and look & feel is astonishingly similar to ipython’s Notebook. And name is similar too!! Is it reusing the Rich client from iPython’s Notebook? Of course, it also seems to be different. A few big differences are: the Databricks Notebook heavily demonstrates the power of using SQL on data. It also make the power of machine learning available to the data scientists. Anyway, looking forward to get to see more details about the service, and pricing.

Anyway, for last couple of months, I was exploring a business viability of Spark Analytics as a Service on Cloud. It just got killed! Good that it happened earlier than later 🙂

In his keynote at Oracle Open World 2011, Mark Hurd announced new Exalytics analytics appliance that is geared to execute OLAP and MOLAP. It is for online application processing or multi-dimensional online application processing, for deriving business intelligence. Cloud, Big data are among the key themese on this year’s Oracle Open World. Oracle’s Co-president Safra Katz declared, “We are big data. We are also the cloud.”. The push on Cloud is much more significant on the background of last year’s statement by Larry first ridiculing the usage of term Cloud and then claming that Oracle is already providing cloud. But this year is the real delivery of Cloud BI, Cloud based Apps, etc. Fitting in its vision of e2e in a box, in addition to Exadata and Exalytics, Oracle announced Big Data Appliance. The Oracle Big Data Appliance integrated Apache Hadoop, Open Source R, Oracle’s NoSQL Database, ODI adapter for Hadoop and Oracle Loader for Hadoop on Linux and Oracle Java VM in a Big Box. This combination provides a good for big data processing of unstructured / strucutred data. For more on Big Data Appliance: https://texploration.wordpress.com/2011/10/04/oracles-big-data-appliance-puts-hadoop-nosql-r-in-a-box/

With the advent of NoSQL database and MapReduce infrastructures, I already thought that Oracle cannot be left behind in the latest NoSQL train. Hadoop is gaining significant traction in batch oriented applications like unstrucutred data processing, Warehousing, etc. Hadoop provides a way to distribute data and processing logic on nodes in server cluster. It takes the processing logic close to the data. Hadoop, originated from Yahoo, is based on Map Reduce architecture introduced by Google. Anyway, I predict that usage of Hadoop in Oracle stack would go adding it in Big Data Appliance . Oracle may do some acquisition in the same.