In July 2016, we conducted our Apache Spark Survey to identify insights on how organizations are using Spark and highlight growth trends since our last Spark Survey 2015. The 2016 survey results reflect answers from 900 distinct organizations and 1615 respondents, who were predominantly Apache Spark users.

The results show that the Spark community is still growing fast: the number of meetup members worldwide has tripled, and the number of contributors to the project has grown by 67% since last year. In addition, users build diverse apps, with significant growth in machine learning and streaming.

Spark has moved well beyond the early-adopter phase at high-tech companies and is now mainstream in large data-driven enterprises, such as banking and health, the results reveal.
And with the rise of public cloud computing, the survey findings reflect users’ affinity toward deploying Spark in the public cloud.

Report Highlights

Spark community growth and adoption accelerates

Over the year, we have seen growth in number of contributors and meetup members: code contributors almost doubled, and Apache Spark Meetup members tripled, from 66K to 225K. Also, since the release of DataFrames in 2015, its usage has doubled, from 15% to 38%; Windows users jumped from 23% to 32%. All this indicate a diverse thriving community and growing adoption of Spark.

Spark Streaming and Machine Learning usage surge

Interest in developing real-time applications and advanced analytics is on the rise. More than half (51%) of the respondents in this survey consider Spark Streaming as an essential component for building real-time streaming use cases, and 82% of respondents say the same for advanced analytics. This year, the production use of Spark Streaming jumped from 14% (in 2015) to 22% (in 2016), along with Machine Learning from 13% (in 2015) to 18% (2016).

Spark’s deployment in the public cloud rises

The rise of cloud computing is rapid in the tech industry. We observed this trend reflected in the survey results, as many respondents elected to deploy Spark in the public cloud, reaping its many benefits. Spark deployments in the cloud this year is at 61%, up from 51% last year. By contrast, the Spark deployments using on-premises cluster managers fell by an average of 5%.

Spark’s usage increases in production

Overall, the use of Spark components in production has gone up. Moreover, Spark developers often combine multiple Spark components for building sophisticated applications. Seventy-four percent of respondents use more than two components, while 64% use three or more in production. Along with Spark Streaming and Machine Learning, 38% use DataFrames, while 40% use Spark SQL in production.

Conclusion

As Apache Spark becomes easier, faster, and smarter, a newer audience across diverse industries is adopting it. From the results revealed in the 2016 survey, we got a glimpse into the growth and trends of who’s using Spark, how they are using it, what’s important, what new features they use, and what they are using it for.

All the feedback will help us and the community as we move forward with the development of Spark, just as feedback from surveys has done the past few years. Thank you to everyone who participated in our Apache Spark Survey 2016.