Sign up or log in to save this to your schedule and see who's attending!

Apache Kafka has been used in a growing number of companies such as LinkedIn, Netflix, and Uber. I will first describe a common pattern of how those companies are using Kafka. All data including business metrics, operational metrics, logs and database records are collected as structured data into Kafka in real time. These data are then fed into batch processing systems such as Hadoop and data warehouses, as well as various real time systems such as search indexes, stream processing frameworks, graph libraries, and monitoring engines. Next, I will explain some of the underlying technologies in Kafka that enable this common usage pattern. In particular, I will cover (1) the scale-out architecture of Kafka; (2) how Kafka achieves high throughput for both real time and non real time consumption; (3) how Kafka provides durability and availability.

Jun Rao is currently a co-founder of Confluent, a company that provides a stream data platform on top of Apache Kafka. Before Confluent, Jun Rao was a senior staff engineer at LinkedIn where he led the development of Kafka. Before LinkedIn, Jun Rao was a researcher at IBM's Almaden... Read More →