Nov 30, 2014

A new company behind Kafka development and a fascinating new high-throughput messaging framework making waves inspired me to go look at major ideas in Kafka design. Its strategy in general is to maximize throughput by having non-blocking, sequential access logic accompanied by decentralized decision making.

A partition corresponds to a logical file which is implemented as a sequence approximately same-size physical files.

New messages are appended to the current physical file.

Each message is addressed by its global offset in the logical file.

A consumer reads from a partition file sequentially (pre-fetching chunks in the background).

No application level cache. Kafka relies on the file system page cache.

A highly efficient Unix API (available in Java via FileChannel::transferTo()) is used to minimize data copying between application/kernel buffers.

A consumers is responsible for remembering how much it has consumed to avoid broker-side book keeping.

A message is deleted by the broker after a configured timeout (typically a few days) for the same reason.

A partition is consumed by a single consumer (from each consumer group) to avoid synchronization.

Brokers, consumers and partition ownership are registered in ZK.

Broker and consumer membership changes trigger a rebalancing process.

The rebalancing algorithm is decentralized and relies on typical ZK idioms.

Consumers periodically update ZK registry with the last consumed offset.

Kafka provides at least-once delivery semantics. When a consumer crashes the corresponding last read offset in ZK could lag behind. So the consumer that takes over will re-deliver the events from that window.