I just announced the new Learn Spring course, focused on the fundamentals of Spring 5 and Spring Boot 2:

1. Introduction

In this tutorial, we’ll learn about Hazelcast Jet. It’s a distributed data processing engine provided by Hazelcast, Inc. and is built on top of Hazelcast IMDG

If you want to learn about Hazelcast IMDG, here is an article for getting started.

2. What is Hazelcast Jet?

Hazelcast Jet is a distributed data processing engine that treats data as streams. It can process data that is stored in a database or files as well as the data that is streamed by a Kafka server.

It can perform aggregate functions over infinite data streams by dividing the streams into subsets and applying aggregation over each subset. This concept is known as windowing in the Jet terminology.

We can deploy Jet in a cluster of machines and then submit our data processing jobs to it. Jet will make all the members of the cluster automatically process the data. Each member of the cluster consumes a part of the data, and that makes it easy to scale up to any level of throughput.

Here are the typical use cases for Hazelcast Jet:

Real-Time Stream Processing

Fast Batch Processing

Processing Java 8 Streams in a distributed way

Data processing in Microservices

3. Setup

To setup Hazelcast Jet in our environment, we just need to add a single Maven dependency to our pom.xml.

4. Sample Application

In order to learn more about Hazelcast Jet, we’ll create a sample application that takes an input of sentences and a word to find in those sentences and returns the count of the specified word in those sentences.

4.1. The Pipeline

A Pipeline forms the basic construct for a Jet application. Processing within a pipeline follows these steps:

draw data from a source

transform the data

drain the data into a sink

For our application, the pipeline will draw from a distributed List, apply the transformation of grouping and aggregation and finally drain to a distributed Map.

We create a Jet instance first in order to create our job and use the pipeline. Next, we copy the input List to a distributed list so that it’s available over all the instances.

We then submit a job using the pipeline that we have built above. The method newJob() returns an executable job that is started by Jet asynchronously. The join method waits for the job to complete and throws an exception if the job is completed with an error.

When the job completes the results are retrieved in a distributed Map, as we defined in our pipeline. So, we get the Map from the Jet instance and get the counts of the word against it.

Lastly, we shut down the Jet instance. It is important to shut it down after our execution has ended, as Jet instance starts its own threads. Otherwise, our Java process will still be alive even after our method has exited.

Here is a unit test that tests the code we have written for Jet:

@Test
public void whenGivenSentencesAndWord_ThenReturnCountOfWord() {
List<String> sentences = new ArrayList<>();
sentences.add("The first second was alright, but the second second was tough.");
WordCounter wordCounter = new WordCounter();
long countSecond = wordCounter.countWord(sentences, "second");
assertTrue(countSecond == 3);
}

5. Conclusion

In this article, we’ve learned about Hazelcast Jet. To learn more about it and its features, refer to the manual.

As usual, the code for the example used in this article can be found over on Github.

Persistence bottom

I just announced the new Learn Spring course, focused on the fundamentals of Spring 5 and Spring Boot 2: