Log Compaction | Highlights in the Apache Kafka and Stream Processing Community | September 2016

It is September and it’s evident that everyone is back from their summer vacation! We released Apache Kafka 0.10.0.1 which includes fixes of the bugs in the 0.10.0 release. In our last meeting we agreed to give time-based releases a try and immediately started planning Apache Kafka 0.10.1.0.

We agreed to try Time-Based Release Plan. Aiming for 3 Apache Kafka releases a year (one every 4 months) and guaranteeing rolling upgrades for a duration of two years.

We started planning the next Apache Kafka release, which will have the version 0.10.1.0.. Much thanks to Jason Gustafson, Kafka’s newest committer for volunteering to drive the release. As usual, the community is encouraged to participate. Take a look at the release plan to learn how.

KIP-62 has been merged and will be included in Apache Kafka 0.10.1.0 and Confluent Plafrom 3.10. This KIP adds a background thread to the Kafka Consumer, allowing background heartbeats which will keep alive Consumers that stop polling. This should make it much easier to write consumers, especially consumers that need to process large amounts of data between iterations.

KIP-63, a proposal for improving caching in the Streams API in Kafka, was approved. This is a significant performance optimization that coalesces processing updates before sending them downstream, which reduces the load on Kafka clusters and on downstream external systems. It also paves the way for implementing new “trigger” behaviors.

KIP-71 was approved, allowing messages in topics to be both compacted and deleted. This will allow admins to impose disk constraints on compacted topics, by removing compacted keys which are older than the time limit or exceed disk space limits.

KIP-73 was approved, adding replication quotas or throttling to Apache Kafka. This feature is especially useful when reassigning replicas to brokers, allowing admins to limit the resources used by the reassignment process and therefore reducing the risk in reassignment. Replica reassignment has long been a difficult process in Apache Kafka, and we are excited about this improvement.

KIP-79, a proposal to evolve the Apache Kafka protocol to allow for requesting offsets according to timestamps (using the new timestamp indexes) is under active discussion. You are invited to take a look and share your feedback with the Kafka community.

A technology community is made up of people. Without people writing code, writing tutorials, welcoming newcomers, giving presentations, and answering questions, what we have is not a community, but just ...