What you'll learn

Learn how Pure Storage uses Spark for both streaming and batch jobs, helping engineers understand the state of its continuous integration pipeline

Description

Continuous integration (CI) pipelines generate massive amounts of messy log data. Pure Storage engineering runs over 70,000 tests per day creating a large triage problem that would require at least 20 triage engineers. Instead, Spark’s flexible computing platform allows the company to write a single application for both streaming and batch jobs so that a team of only three triage engineers can understand the state of the company’s CI pipeline. Spark indexes log data for real-time reporting (streaming), uses machine learning for performance modeling and prediction (batch job), and reindexes old data for newly encoded patterns (batch job). Ivan Jibaja discusses the use case for big data analytics technologies, the architecture of the solution, and lessons learned.

This session is sponsored by Pure Storage.

Ivan Jibaja

Pure Storage

Ivan Jibaja is a tech lead for the big data analytics team at Pure Storage. Previously, he was a part of the core development team that built the FlashBlade from the ground up. Ivan holds a PhD in computer science with a focus on systems and compilers from the University of Texas at Austin.