Today, I would like to show you how to use Hazelcast Jet to stream data from Hazelcast IMDG IMap to Apache Kafka. IMap is a distributed implementation of java.util.Map, it's super-easy to use, and it's probably the most popular Hazelcast IMDG data structure.

This post assumes you have an existing application that uses IMap inside an embedded Hazelcast member and you would like to capture all data inserted into the IMap and push it into a Kafka topic. And obviously, this won't be a trivial one-off job: we want a continuous stream of changes from IMap to Kafka. This pattern is often referred as a Change Data Capture. One of the goals is to minimize the impact on the existing application. And all this should not take more than five to ten minutes.

It's very simple — it just starts a new Hazelcast IMDG instance, creates an IMap, and starts inserting UUID entries. It's extremely trivial and has nothing to do with Hazelcast Jet or Apache Kafka. The only dependency is Hazelcast IMDG. This is how your pom.xml could look:

Nothing new or exciting so far — just the usual Hazelcast IMDG experience, a few lines code to start a new cluster, no server installation needed, and no opinionated framework that wants to control your application. Let's move on.

In the next step, we are going to download and start Apache Kafka. Once you have the archive with Kafka download and extracted, then you have to start Apache Zookeeper:

$ ./bin/zookeeper-server-start.sh ./config/zookeeper.properties

When Zookeeper is running, then you can start Apache Kafka:

/bin/kafka-server-start.sh config/server.properties

And finally, you have to create a new topic that Hazelcast Jet will push entries into:

Now comes the fun part: Use Hazelcast Jet to connect to your existing Hazelcast IMDG cluster with Apache Kafka. This can be a separate application and as the best Hazelcast tradition, it's extremely simple! Check out the code:

It also transitively fetches the Hazelcast Jet core and Kafka client libraries.

The Java code above:

Creates a new Jet pipeline.

Uses Hazelcast IMDG IMap as a source.

Applies a filter as we are only interested in newly added or modified entries.

Uses Kafka as a sink.

At this point, you are almost done. There is just one last bit: You have to tell to the existing application to maintain a log with IMap mutation events so the Jet pipeline can read from it.

This is the only change you have to do in the existing application and it turns out to be a trivial configuration-only change. You have to add to your IMDG configuration XML (hazelcast.xml) this snippet:

<event-journal>
<mapName>myMap</mapName>
</event-journal>

Now, you can restart your existing application to apply the new configuration, start the Jet pipeline, and you are done! Hazelcast Jet will read mutation events from remote IMap event log and push them into Apache Kafka. You can easily validate that it's working with the topic consumer distributed along with Apache Kafka:

And that's all, folks. It took just a few minutes to push events from your existing application into Apache Kafka. This shows the simplicity of Hazelcast Jet: it can be easily embedded inside your application and does not require you to maintain a separate server. It also offers an easy-to-learn API and, as always with Hazelcast, it's simple to scale it out!

In the next post, I'll show how to implement a similar pipeline, but reversed. We'll be using Hazelcast Jet to push events from Apache Kafka to an IMap. While waiting for the next post, you can have a look at Hazelcast Jet code samples and demos. Happy Hazelcasting and stay tuned!