Posts by Srinath Shankar

Databricks has introduced a new feature, Library Utilities for Notebooks, as part of Databricks Runtime version 5.1. It allows you to install and manage Python dependencies from within a notebook. This provides several important benefits: Install libraries when and where they're needed, from within a notebook. This eliminates the need to globally install libraries on...

Big data workloads require access to disk space for a variety of operations, generally when intermediate results will not fit in memory. When the required disk space is not available, the jobs fail. To avoid job failures, data engineers and scientists typically waste time trying to estimate the necessary amount of disk via trial and...

As the amount of data in an organization grows, more and more engineers, analysts and data scientists need to analyze this data using tools like Apache Spark. Today, IT teams constantly struggle to find a way to allocate big data infrastructure, budget among different users, and optimize performance. End-users like data scientists and analysts also...

In another blog post published today, we showed the top five reasons for choosing S3 over HDFS. With the dominance of simple and effective cloud storage systems such as Amazon S3, the assumptions of on-premise systems like Apache Hadoop are becoming, sometimes painfully, clear. Apache Spark users require both fast and transactionally correct writes to...

At Databricks, our users range from SQL Analysts who explore data through JDBC connections and SQL Notebooks to Data Engineers orchestrating large scale ETL jobs. While this is great for data democratization, one challenge associated with exploratory data analysis is handling rogue queries that appear as if they will finish, but never actually will. These...