We present the architecture and implementation of a Node.js/DTrace-based distributed platform for analyzing the performance of cloud applications in real-time. We will give an overview of the system’s architecture, paying specific attention to the design tradeoffs that allow it to operate in real-time and on a production cloud. We will then discuss the novel data visualizations used by the system to convey understanding of complex distributed systems behavior. Finally, we will demonstrate the system on a real, internet-facing cloud and cover some of the interesting performance pathologies that this system has helped understand.

David Pacheco

Joyent

David Pacheco is the lead engineer of Joyent’s Introspection Team, which develops Cloud Analytics and other tools for observing software in the cloud. Previously a member of Sun’s Fishworks team, David worked on several areas of the Sun Storage 7000 series of appliances including remote replication, fault management, and flash device support.

Brendan Gregg

Netflix

Brendan Gregg is a senior performance architect at Netflix, where he does large scale computer performance design, analysis, and tuning. He is the author of the book “Systems Performance”, and recipient of the USENIX 2013 LISA Award for Outstanding Achievement in System Administration. He has previously worked as a performance and kernel engineer, and has created performance analysis tools included in multiple operating systems, as well as visualizations and methodologies.