The Schwartz Report

Microsoft and Apache Spark creator Databricks are building a globally distributed streaming analytics service natively integrated with Azure for machine learning, graph processing and AI-based applications.

The new Datrabricks Spark as a service was introduced at Microsoft's annual Connect developer conference, which kicked off today in New York. The new service, available in preview, is among an extensive list of announcements focused on its various SQL and NoSQL database products and services, as well as productivity, cross-platform and added language improvements to Visual Studio and VSCode developer tools, as well as new DevOps capabilities, new machine learning, AI and IoT tooling.

During the opening keynote, Scott Guthrie, Microsoft's executive VP for Cloud and Enterprise, emphasized that Databricks is the creator of, and steward of, Apache Spark, and the new service will enable organizations to build modern data warehouses that support self-service analytics and machine learning using all data types in a secure and compliant architecture.

Databricks has engineered a first-party Spark-as-a-service platform for Azure. "It allows you to quickly launch and scale up the Spark service inside the cloud on Azure," Guthrie said. "It includes an incredibly rich, interactive workspace that makes it easy to build Spark-based workflows, and it integrates deeply across our other Azure services."

Databricks customers have been pushing the company to build its Spark platform as a native Azure service, said Ali Ghodsi, the company's cofounder and CEO, who joined Guthrie on stage. "We've been hearing overwhelming demand from our customer base that they want the security, they want the compliance and they want the scalability of Azure," Ghodsi said. "We think it can make AI and big data much simpler."

In addition to integrating with the various Azure services, it's designed to let those who want to create new data models to do so. According to Databricks, a user can target data regardless of size or create projects with various analytics services including Power BI, SQL, Streaming, MLlib and Graph. "Once you manage data at scale in the cloud, you open up massive possibilities for predictive analytics, AI, and real-time applications," according to a technical overview of the Azure Databricks service. "Over the past five years, the platform of choice for building these applications has been Apache Spark. With a massive community at thousands of enterprises worldwide, Spark makes it possible to run powerful analytics algorithms at scale and in real time to drive business insights."

However, deploying, managing and securing Spark at scale has remained a challenge, which Databricks believes will make the Azure service compelling.

Internally, Databricks is using the Azure Container Services to run the Azure Databricks control-plane and data planes using containers, according to the company's technical primer. It's also using accelerated networking services to improve performance on the latest Azure hardware specs.