Presentation: Using Luigi to build data pipelines that won’t wake you at 3am

Location:

Roebling / Gleason

Day of week:

Friday

1:30pm - 2:20pm

Datadog collects hundreds of billions of data points from our customers’ infrastructure every single day. In addition to our realtime systems, we run a significant number of offline batch jobs to crunch this data. These algorithms yield complicated graphs of jobs and dependencies running across multiple distributed systems.

In this environment, failures can and do happen, often in the middle of the night. To prevent (most) failures from waking up humans, Datadog uses Luigi, a framework for crafting complex batch data pipelines.

In this talk, we’ll discuss:

How to craft data pipelines with Luigi

How to make pipelines idempotent for easy restart and failure recovery

And plenty of examples of how this works for us in practice

Speaker: Matthew Williams

DevOps Evangelist @ Datadog

Matt Williams is the DevOps Evangelist at Datadog. He is passionate about the power of monitoring and metrics to make large-scale systems stable and manageable. So he tours the country speaking and writing about monitoring with Datadog. When he's not on the road, he's coding.