Spark

Today at Spark Summit, Databricks CEO, Ion Stoica , announced its first product that is DatabricksCloud. This was one among two important announcements from Databricks today. The first one was that Databricks got series B round of funding 33M from Andreeseen Horowitz. But IMO the availability of DatabricksCloud is more significant.

A few months back, I had an opportunity of have teleconference with Ion. During the discussion, while talking on business model, he emphasized more on Databricks’s Spark certification service. He was (rightly though) ambiguous on other developments and business model. After the call, I discussed with a colleague about a possible product around IDE for programmers and data scientists and monitoring of Spark clusters. But what we saw today was much better.

Databricks Cloud has put Apache Spark on Cloud. Big Data on Cloud is not a new thing. However, what Databricks has provided is an interactive, SQL based Web tool for Data Scientists to play with data and visually see the output in different forms like trends, charts, etc. It also provides a powerful WYSIWYG dashboard builder.

With making Spark and Spark streaming available on Cloud, Databricks joins Google and Amazon, both of them have streaming services with analytic stack available on cloud for building real time analytics and dashboard. However, the key difference is that Amazon and Google built those services for programmers to write streaming applications. In contrast, Databricks Cloud is more suitable for Data scientists.

Databricks Cloud has web based interactive tool , called Databricks Notebook. Though the details of the technology it built on is not yet available (in fact Databricks website is still silence on the announcement), the concept and look & feel is astonishingly similar to ipython’s Notebook. And name is similar too!! Is it reusing the Rich client from iPython’s Notebook? Of course, it also seems to be different. A few big differences are: the Databricks Notebook heavily demonstrates the power of using SQL on data. It also make the power of machine learning available to the data scientists. Anyway, looking forward to get to see more details about the service, and pricing.

Anyway, for last couple of months, I was exploring a business viability of Spark Analytics as a Service on Cloud. It just got killed! Good that it happened earlier than later 🙂