Apache Kafka Tutorial

Streaming data is of growing interest to many organizations, and most applications need to use a producer-consumer model to ingest and process data in real time. Many messaging solutions exist today on the market, but few of them have been built to handle the challenges of modern deployment related to IoT, large web based applications and related big data projects.

Apache Kafka has been built by LinkedIn to solve these challenges and deployed on many projects. Apache Kafka is a fast, scalable, durable and distributed messaging system.

The goal of this article is use an end-to-end example and sample code to show you how to:

Install, configure and start Kafka

Create new topics

Write and run a Java producer to post messages to topics

Write and run a Java consumer to read and process messages from the topics

Credits

This content is based in part on the documentation provided by the Apache Kafka project.

We have added short, realistic sample programs that illustrate how real programs are written using Kafka.

Note that this will start Zookeeper in the background. To stop Zookeeper, you will need to bring it back to the foreground and use control-C or you will need to find the process and kill it. You can now start the Kafka server itself:

As with Zookeeper, this runs the Kafka broker in the background. To stop Kafka, you will need to bring it back to the foreground or find the process and kill it explicitly using kill.

Step 3: Create the topics for the example programsThe messages are organized by topics on which Producers post messages and from which Consumers read messages. Our sample application uses two topics: fast-messages and summary-markers. The following commands create the topics:

You will see log messages from the Kafka process when you run Kafka commands. You can switch to a different window if these are distracting.

Note: The broker can be configured to auto-create new topics as they are mentioned by the client application, but that is often considered a bit dangerous because mis-spelling a topic name doesn't cause a failure.

Run your First Kafka Application

At this point, you should have a working Kafka broker running on your machine. The next steps are to compile the example programs and play around with the way that they work.

1- Compile and package up the example programsClone and compile the repository using the following commands:

Note that, there are other ProducerRecord constructors that allow you to more constructor parameters such as a message key, a partition number but these parameters are not used in this simple tutorial.

3- Producer EndOnce you are done with the producer use the producer.close() method that blocks the process until all the messages are sent to the server. This call is used in a finally block to guarantee that it is called. A Kafka producer can also be used in a try with resources construct.

...
} finally {
producer.close();
}
...

4- Producer executionAs mentioned earlier, the producer is a simple Java class, in this example application the Producer is started from the Run Application as follow:

...
Producer.main(args);
...

Now that you know how to send messages to the Kafka server let’s look at the consumer.

Consumer

The Consumer class, like the producer is a simple Java class with a main method.

This sample consumer uses the HdrHistogram library to record and analyze the messages received from the fast-messages topic, and Jackson to parse JSON messages.

The poll method is called repeatedly in the loop. For each call, the consumer will read records from the topic. For each read, it tracks the offset to be able to read from the proper message in the next call. The poll method takes a timeout in milliseconds. It will wait up to that long if data is not available.

The returned object, of the poll method, is an Iterable containing the received records so you just need to loop on each record to process them. The code to process the messages in the consumer looks like:

In the sample application, the consumer only processes messages from the fast-messages topic with the following logic based on the fact that messages are consumed in order in which they have been sent by the producer:

4- Consumer endOnce you are done with the consumer use the consumer.close() method to free resources. This is especially important in a multithreaded application. The sample consumer does not call this method, as it is stopped with a Ctrl+C that stops the whole JVM.

5- Consumer executionAs mentioned earlier, the Consumer is a simple Java class, in this example application the consumer is started from the Run Application as follow:

...
Consumer.main(args);
...

Conclusion

In this article you have learned how to create a simple Kafka 0.9.x application using:

a producer that publishes JSON messages into multiple topics

a consumer that receives JSON messages and calculates statistics from message content.

This application is very simple, and you can extend it to test some other interesting features of Kafka:

add new consumers, using different groups, to process the data differently, for example to save data into a NoSQL database like HBase or MapR-DB

add new partitions and consumers for the topics to provide high availability to your application.

Finally, while this example is based on Apache Kafka, the same code will work directly on a MapR cluster using MapR Streams, an integrated messaging system that is compatible with the Kafka 0.9.0 API. With MapR Streams, you will simplify the production deployment of your application as it is integrated into the MapR data platform so that you will have a single cluster to deploy and manage. Watch ESG Lab's video review after testing MapR Streams. Using MapR Streams for messaging also provides additional scalability, security and big data services associated with the MapR Data Platform.