A recent blog article from Jaime Alquiza talks about load testing (vs. benchmarking) Kafka clusters that are being hosted on AWS / EC2. Alquiza opens up the article with some words on what exactly the difference is between load testing and benchmarking, and how load testing doesn't necessarily characterize the same workload that your system will otherwise be handling. This ends up being an important factor as he moves forward, and he conveys pretty clearly the need to keep in mind that your a testing relative and varied workloads--and that while you'll certainly learn about cluster performance, it won't be exactly informative of your standard workload.

He identifies some of the goals of load testing in his mind:

But what performance characteristics should I expect in my particular configuration? What should I consider a bad latency spike? What's the impact of dropping a broker? The reason I'm interested in load testing is to form a collection of expectations so these questions are better understood in the reality of production infrastructure.

Alquiza created a tool for his load testing purposes that he calls Sangrenel, the purpose of which is to generate random messages of varying size and bombard the Kafka clusters and "spit out" throughput latency. In the rest of his article he sets up the testing tools and scenario, and includes a breakdown of Kafka performance, including the performance of:

Message size (throughput and latency)

Number of partitions

Replication cost

Consumer position impact

Scaling up broker resources

Varying cluster size

Small message trans rate maxout

Here wraps up with some closing thoughts and concludes:

I basically spend 2/3 of my work time torture testing and operationalizing distributed systems in production. There's some that I'm not so pleased with (posts pending in draft forever) and some that have attributes that I really love. Kafka is one of those systems that I pretty much enjoy every bit of, and the fact that it performs predictably well is only a symptom of the reason and not the reason itself: the authors really know what they're doing. Nothing about this software is an accident. Performance, everything in this post, is only a fraction of what's important to me and what matters when you run these systems for real. Kafka represents everything I think good distributed systems are about: that thorough and explicit design decisions win.