Azure Databricks

The best destination for big data analytics and AI with Apache Spark

Unlock insights from all your data and build artificial intelligence (AI) solutions with Azure Databricks, set up your Apache Spark™ environment in minutes, autoscale, and collaborate on shared projects in an interactive workspace. Azure Databricks supports Python, Scala, R, Java, and SQL, as well as data science frameworks and libraries including TensorFlow, PyTorch, and scikit-learn.

Start quickly with an optimized Apache Spark environment

Azure Databricks provides the latest versions of Apache Spark and allows you to seamlessly integrate with open source libraries. Spin up clusters and build quickly in a fully managed Apache Spark environment with the global scale and availability of Azure. Clusters are set up, configured, and fine-tuned to ensure reliability and performance without the need for monitoring. Take advantage of autoscaling and auto-termination to improve total cost of ownership (TCO).

Boost productivity with a shared workspace and common languages

Collaborate effectively on shared projects using the interactive workspace and notebook experience, whether you’re a data engineer, data scientist, or business analyst. Build with your choice of language, including Python, Scala, R, and SQL. Get easy version control of notebooks with GitHub and Azure DevOps.

Get high-performance modern data warehousing

Modernize your data warehouse in the cloud for unmatched levels of performance and scalability. Combine data at any scale, and get insights through analytical dashboards and operational reports. Automate data movement using Azure Data Factory, load data into Azure Data Lake Storage, transform and clean it using Azure Databricks, and then make it available for visualization using Azure SQL Data Warehouse.

Report

Frequently asked questions about Azure Databricks

The Azure Databricks SLA guarantees 99.95 percent availability.

A Databricks unit, or DBU, is a unit of processing capability per hour, billed on per-second usage.

A data engineering workload is a job that automatically starts and terminates the cluster on which it runs. For example, a workload may be triggered by the Azure Databricks job scheduler, which launches an Apache Spark cluster solely for the job and automatically terminates the cluster after the job is complete.
The data analytics workload isn’t automated. For example, commands within Azure Databricks notebooks run on Apache Spark clusters until they’re manually terminated. Multiple users can share a cluster to analyze it collaboratively.