10 Tradeoffs Not optimized for millisecond latencies Have not beaten CAPSimple messaging system, no processingZookeeper becomes a bottleneck when using too many topics/partitions (>>10000)Not designed for very large payloads (full HD movie etc.)Helps to know your data in advance

13 Log based queue (Simplified model)BrokerProducer API used directly by application or through one of the contributed implementations, e.g. log4j/logback appenderTopic1Topic2Consumer1Message1Message1Message2Message2Producer1Message3Message3Consumer2Message4Message4Message5Message5Producer2Message6Message6Message7Message7Message8Consumer3ConsumerGroup1Message9BatchingCompressionSerializationMessage10

14 Partitioning Broker No partition for this guy Group1 Partitions Topic1ConsumerGroup1BrokerPartitionsTopic1ProducerProducerProducerConsumerGroup2Topic2ProducerConsumerGroup3ProducerNo partition for this guyConsumer

21 Broker ProtipsReasonable number of partitions – will affect performanceReasonable number of topics – will affect performancePerformance decrease with larger Zookeeper ensemblesDisk flush rate settingsmessage.max.bytes – max accept size, should be smaller than the heapsocket.request.max.bytes – max fetch size, should be smaller than the heaplog.retention.bytes – don’t want to run out of disk space…Keep Zookeeper logs under control for same reason as aboveKafka brokers have been tested on Linux and Solaris