Summary

Throughput Overview

Bandwidth Overview

Traffic Overview

Percentage Overview

codec

throughput%

bandwidth%

traffic%

gzip

47.51

5.16

10.87

snappy

116.33

64.11

55.03

lz4

188.87

34.53

18.27

Consumer Benchmark

Tests for Consumer is much more easier than the one for Producer. Before tests, I will send the same nginx.log to Kafka with different compression codec - none, gzip, snappy and lz4. And use kafka-console-consumer.sh to consume a fixed number of messages and in this tests the number is 500k(5,000,000). What I need to look for is the time the procedure costs, and furthermore, we can get the throughput.

Detail Metrics

Capacity Benchmark

In the previous blog, I have not make tests for this section. The pressure each codec cause to the CPU is another important factor to consider. I will make some simple benchmark in disk space and CPU in this section by dstat.

Disk Usage

Although Kafka has its own retention policies, and it works well, but sometimes the disk space could be in engineers’ consideration, especially in large Kafka cluster. In the former section, I have sent nginx.log to Kafka with different codecs and I measured the the disk space each topic has used. The numbers can simply be got by du -sh in Kafka logs directory.

codec

Disk Space(MB)

Percentage%

none

1329.53

100

gzip

140.18

10.54

snappy

679.81

51.13

lz4

222.58

16.74

CPU Usage

Compression and de-compression will mainly use cpu and I will record the usr, sys, wait and the total of them to measure how much CPU each codec will use. The data is made by dstat as well. Pay attention that my docker only has 4 CPU and this tests are mainly used to compare with different codec, not to dig into the absoulute number because it would be different in different boxes.

The test is simple, I used dstat to record the system metrics I want, and meanwhile, use kafka-console-producer.sh or kafka-console-consumers.sh in another container (not the Kafka container) to send or consume data from Kafka.

I will record metrics of both client(run console shell) and server(run Kafka server).

Producer CPU Usage

Server Side

Metrics

codec

usr

sys

wait

total

none

41.08

6.56

7.08

54.72

gzip

31.28

1.70

0.45

33.43

snappy

36.89

4.12

4.13

45.14

lz4

33.72

2.77

1.37

37.86

Chart

Client side

Metrics

codec

usr

sys

wait

total

none

41.00

6.52

7.09

54.61

gzip

31.28

1.70

0.44

33.42

snappy

36.65

4.13

4.07

44.85

lz4

33.70

2.76

1.36

37.82

Chart

Consumer CPU Usage

Server side

Metrics

codec

usr

sys

wait

total

none

19.41

5.05

7.74

32.2

gzip

25.63

1.43

0.47

27.53

snappy

23.92

4.23

3.90

32.05

lz4

24.13

2.35

0.90

27.38

Chart

Client Side

Metrics

codec

usr

sys

wait

total

none

19.47

5.05

7.77

32.29

gzip

18.84

1.16

0.41

20.41

snappy

23.82

4.36

3.86

32.04

lz4

24.13

2.35

0.91

27.39

Chart

Conclusion

GZIP has the best compression rate but lowest performance, and LZ4 has the best performance. In the aspect of capacity, GZIP and NONE will cause wome wait which I don’t know the reason for that. Actually, the CPU usage for each codec is almost the same, I think capacity won’t be the main cause to choose different codecs.

To summarize the benchmarks briefly, use GZIP if you need cost less bandwidth and disk space, use LZ4 to maximum the performance

There is also one problem this benchmark has not cover - how much CPU usage would Kafka server use when there are a huge number of clients. Will the increase of the server-side CPU usage be the linear growth with the number of client? I have not made this test because I only have two containers.