Raquel Campuzano Godoy

If you are searching for a distributed streaming platform to process streams of data in real-time, chances are that you have selected Kafka as the perfect solution for you. Kafka is commonly used to build real-time streaming pipelines for getting data between systems or applications, and to build applications for transforming or reacting to the streams of data.

If your environment has grown, then your needs of security, scalability, high-availability, and failover will also have increased. That means it's time to move to a multi-tier deployment. Bitnami offers you a Kafka cluster that is ready to move from development to highly demanding production environments.
It is completely up-to-date and configured to provide high-performance and fault tolerance.

Kafka: the jewel in the Big Data crown

Apache Kafka is well known for its ability to digest trillions of events in real-time which, ultimately, makes it a crucial element in the big data landscape. Besides processing large streams of data, it can be used as a message broker, website activity tracker, source for operational monitoring data, or as a replacement for a log aggregation solution.

Kafka can be integrated with other open source data-oriented solutions such as Apache Hadoop, Spark, Storm, or HBase for real-time analysis and rendering of streaming data.

For further information about Kafka architecture and features, see the Kafka official documentation. Now, let's see how the Bitnami Kafka Multi-Tier solution can help you to deploy your Kafka cluster in just a few clicks.

Why Bitnami Multi-Tier solutions are the best choice for production environments?

All Bitnami solutions feature the following benefits, and the Bitnami Kafka cluster is no exception:

Up-to-date: As soon as a new release of Kafka is launched, Bitnami publishes immediate updates, making the latest versions of Kafka available its users.

Secure: If serious security issues are discovered, Bitnami provides new versions of the application, often within hours of a fix becoming available.

Consistent: All Bitnami applications are packaged and configured consistently across operating systems.

Free of charge: You only have to pay for the resources you consume from your cloud provider. You can deploy any Bitnami solution at no additional cost.

Apart from this, every Bitnami Multi-Tier solution is configured by default to guarantee high performance and availability, replication and security. We follow industry standards and incorporate all the components required to run an application or framework, independently of the underlying operating system or deployment platform.

But what are the specific adjustments made by Bitnami that makes the Kafka cluster so versatile and simple to use?

Bitnami Kafka cluster specific configuration

Bitnami builds its packages using the latest version published by the software provider, fixes the bugs (if necessary), and also adds new improvements (including performance improvements) to make extremely simple to run any application on any platform.

Let's take a look at the enhancements made on the Bitnami Kafka cloud image:

Disk configuration

Depending on the cloud platform you want to use to deploy Kafka, the disk configuration will be different. The values below are the minimum requirements; these can be varied by users according to their needs.

Google Cloud Platform: Default (1vCPU, 3.75 GB RAM)

Microsoft Azure: Default (D1-V2 machine type: 1vCPU, 3.5 GB RAM)

The disk space, also depends on the cloud platform. Bitnami has configured these values by default to be a 30Gb SSD disk for Google, and a 50Gb SSD disk for Azure.

Cluster specific features

The Bitnami Kafka cluster has been configured as a multi-broker cluster with several Kafka brokers and Zookeeper nodes. As a high-availability cluster, the default configuration of Bitnami Kafka is fully customizable depending on your replication needs.

This minimum configuration ensures you can replicate topics from the start. That way, if any of these brokers goes down, you will have a copy of the data available in the other brokers. The replication is done at topic level. When you create a topic, you can choose the number of replicas (the replication factor) you want, and you can adjust this according to your needs. Remember that the number of replicas must be equal to or less than the total number of brokers available in the cluster.

Security

Network and access ports

For security reasons, the Bitnami Kafka cluster is not reachable from external networks. The only port opened by default is 22 for SSH connections. However, Bitnami has enabled a public IP address in the first node of the cluster for debugging purposes. We recommend that you always connect to the Kafka server from the same network using a Kafka client. If you need to connect to the cluster remotely from a different network, you can peer both virtual networks using network peering or using a secure channel such as a VPN or an SSH tunnel.

Authentication

The Bitnami Kafka Multi-Tier solution configures several security enhancements by default: Authentication of connections inter-broker, client-broker, and broker-zookeeper are configured through SASL/PLAIN mechanism. This means that all the agents involved in the Kafka-Zookeeper communication need to be configured in a JAAS authentication file. You can find the configuration parameters in the /opt/bitnami/kafka/conf/kafka_jaas.conf file.

What can I do to boost high-availability and failover in my Kafka cluster?

The default configuration can be changed to improve the high-availability of your cluster. We recommend launching a Zookeeper ensemble (with three or more nodes) to avoid this cluster failure. Thereby, one node can act as "leader", and the rest as "followers". If the leader goes down, one of the followers is promoted to be a leader, and this maintains the correct performance of the whole cluster.

If you want to increase the number of nodes in the Kafka cluster, you just need to connect the new Kafka nodes to the Zookeeper ensemble. To do so, configure the new Kafka nodes to connect to the Zookeeper ensemble by adding the IP addresses of the Zookeeper nodes in the main kafka configuration file.The Zookeeper ensemble will tell Kafka that there are other Kafka nodes connected to it, sending them the message that it is part of a Kafka cluster.