Microsoft Azure Stack is an extension of Azure—bringing the agility and innovation of cloud computing to your on-premises environment and enabling the only hybrid cloud that allows you to build and deploy hybrid applications anywhere. We bring together the best of the edge and cloud to deliver Azure services anywhere in your environment.

What is Apache Spark?

Apache Spark is an open-source processing framework that runs large-scale data analytics applications. Spark is built on an in-memory compute engine, which enables high performance querying on big data. It takes advantage of a parallel data processing framework that persists data in-memory and disk if needed. This allows Spark to deliver 100x faster speed and a common execution model for tasks such as extract, transform, load (ETL), batch, interactive queries, and others on data in an Apache Hadoop Distributed File System (HDFS). Azure makes Apache Spark easy and cost effective to deploy with no hardware to buy, no software to configure, a full notebook experience to author compelling narratives, and integration with partner business intelligence tools.

In-memory processing for interactive scenarios

Customers today expect quick answers to their questions, instead of waiting minutes, hours, or days. Apache Spark delivers by persisting data in-memory to get up to 100x faster queries, while processing large datasets in Hadoop. This makes Spark for Azure HDInsight ideal to speed up intensive big data applications.

Use IntelliJ IDEA for native developer experiences and remote debugging

To make development on Spark easier, we introduced deep integration with IntelliJ IDEA to allow you to code with native authoring support for Scala and Java. You can do remote debugging, which gives you flexibility in your development lifecycle and the ability to submit the application to Azure when ready. Spark for HDInsight clusters also come pre-loaded with the most popular Python libraries (Anaconda) for machine learning.

Take advantage of BI tools to interactively analyze big data

For business analysts, we offer integration with Power BI alongside other business intelligence tools like Tableau, SAP BusinessObjects Lumira, and QlikView. This lets you build interactive visualizations over data of any size. In addition to the traditional dashboards, Power BI gives you a streaming connector that integrates with Spark, which allows you to publish real-time events from Spark Streaming directly to Power BI.

Out-of-the-box notebook experience

Unlike other Spark offerings, which require you to install your own notebooks or take advantage of proprietary ones, Spark for HDInsight has out-of-the-box integration with Jupyter (iPython), the most popular open source notebook in the market. This allows you to create narratives that combine code, statistical equations, and visualizations that tell a story about the data. To make integration easier for you, we worked with the Jupyter community to enhance the kernel and allow Spark execution through a REST endpoint, which gives a compelling experience for data scientists.

Use Spark for Azure HDInsight as an engine to run R Server, which has a large parallel analytics and machine learning library built to work with the open-source R language. This lets you take advantage of the familiarity of R, with the enterprise-scale from R Server running on Spark. Multithreaded math libraries and transparent parallelization in R Server, combined with Spark, means handling up to 1000x more data and up to 50x faster speeds than open-source R—which helps you to train more accurate models for better predictions than before.

Highest availability for business continuity

To run Spark at the highest scale, Microsoft gives you the industry’s highest availability SLA at 99.9% to ensure your business continuity and protection against catastrophic events. We co-led with Cloudera and the project Livy to create an open-source Apache-licensed REST web service for managing long-running Spark contexts and submitting Spark jobs. This new capability is designed to make Spark a more robust back end for running interactive notebooks and allow other applications to take advantage of Spark for their interactive workloads.

Analyze any data of any size without changes as data grows

To make sure Spark runs at scale, we integrated Spark with Azure Data Lake Store. This integration is uniquely available from Microsoft and allows Spark to store and process data that scales to any size, without forcing changes to your application as data grows. Through this integration, you can implement role-based data access controls at the storage level.

Real-time processing for real-time scenarios

Today’s connected world is defined by big data that arrives in real-time. Spark Stream for HDInsight is ideal for challenging real-time scenarios. It enables various opportunities including Internet of Things (IoT) scenarios, real-time remote management and monitoring, and getting insights from devices like mobile phones or connected cars.

Easy setup, fast results

There’s no time-consuming installation or set up with Spark for HDInsight. Azure does it for you. You’ll be up and running in minutes, and can deploy Spark without buying new hardware or paying other up-front costs.

Elastic capacity for big data

Spark for HDInsight takes advantage of the power of Azure, which makes it easier for you to create clusters of any size to process any amount of data on demand. You only pay for the compute and storage that you use.