Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Join semantics in kafka streams

With the recent adoption of the Confluent and Kafka Streams, organizations have experienced significantly improved system stability with real-time processing framework, as well as improved scalability and lower maintenance costs.
The focus of this webinar is:
~Different join operators in Kafka Streams.
~Exploring different options in Kafka Streams to join semantics, both with and without shared keys.
~How to put Application Owner in control by leveraging simplified app-centric architecture.

If you have any queries, contact Himani over mail himani.arora@knoldus.in

The records in a KStream either come directly from a topic or have gone through some kind of transformation—for example there is a filter method that takes a predicate and returns another KStream that only contains those elements that satisfy the predicate.

4.
Introduction
● Apache Kafka is a distributed streaming platform where producers send
messages—key-value pairs—to topics which in turn are polled and read by
consumers. Each topic is partitioned, and the partitions are distributed among
brokers.
● It has four core APIs:
○ Producer API
○ Consumer API
○ Streams API
○ Connector API

5.
Streams API
● Kafka Streams is a client library for processing and analyzing data stored in Kafka.
● There are two main abstractions in the Streams API:
○ A KStream is a stream of key-value pairs. KStreams are stateless, but they allow
for aggregation by turning them into the other core abstraction.
○ A KTable, which is often described as a “changelog stream.” A KTable holds the
latest value for a given message key and reacts automatically to newly incoming
messages.

6.
How to install the Streams API?
● There is no installation needed - Build Apps, Not Clusters!
● It is a library and can be added to your app like any other library.
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>1.1.0</version>
</dependency>

7.
Joins
Kafka Streams supports 3 type of joins:
● Inner Joins
○ Gives an output when both input sources have records with same key.
● Left Joins
○ Gives an output for each record in the left or primary input source. If the other source does not
have a value for a given key, it is set to null.
● Outer Joins
○ Gives an output for each record in either input source. If only one source contains a key, the
other is null.

9.
KStream-KStream Join
● This is a sliding window join, meaning that all tuples close to each other with regard to time are
joined. Time here is the difference up to the size of the window.
● These joins are always windowed joins because otherwise, the size of the internal state store used
to perform the join would grow indefinitely.
● Since KStream-KStream Join is always windowed joins, we must provide a join window.
KStream<String, String> joined = left.join(right,
(leftValue, rightValue) -> "left=" + leftValue + ", right=" +
rightValue, /* ValueJoiner */
JoinWindows.of(TimeUnit.MINUTES.toMillis(5)),
Serdes.String(), /* key */
Serdes.Long(), /* left value */
Serdes.Double() /* right value */
);

10.
KTable-KTable Join
● KTable-KTable joins are designed to be consistent with their counterparts in relational databases.
They are always non-windowed joins.
● The changelog streams of KTables is materialized into local state stores that represent the latest
snapshot of their tables. The join result is a new KTable representing changelog stream of the join
operation.
KTable<String, String> joined = left.join(right,
(leftValue, rightValue) -> "left=" + leftValue + ", right=" +
rightValue /* ValueJoiner */
);

11.
KStream-KTable Join
● KStream-KTable joins are asymmetric non-window joins. They allow you to perform table lookups
against a KTable everytime a new record is received from the KStream.
● In contrast to stream-stream and table-table join which are both symmetric, a stream-table join is
asymmetric.
KStream<String, String> joined = left.join(right,
(leftValue, rightValue) -> "left=" + leftValue + ", right=" +
rightValue, /* ValueJoiner */
Serdes.String(), /* key */
Serdes.Long() /* left value */
);