Introduction

1. Overview

1.1. Apache Kafka

Kafka is a scalable, high-performance distributed messaging engine.
Low latency, high throughput messaging capability combined with fault-tolerance have made Kafka a popular
messaging service as well as a powerful streaming platform for processing real-time streams of events.

Connector API to pull data from existing data storage systems to Kafka or push data from Kafka topics to other data systems

Streams API for transforming and analyzing real-time streams of events published to Kafka

1.2. Project Reactor

Reactor is a highly optimized reactive library for building efficient, non-blocking
applications on the JVM based on the Reactive Streams Specification.
Reactor based applications can sustain very high throughput message rates and operate with a very low memory footprint,
making it suitable for building efficient event-driven applications using the microservices architecture.

Reactor implements two publishers Flux<T> and
Mono<T>, both of which support non-blocking back-pressure.
This enables exchange of data between threads with well-defined memory usage, avoiding unnecessary intermediate buffering or blocking.

1.3. Reactive API for Kafka

Reactor Kafka is a reactive API for Kafka based on Reactor and the Kafka Producer/Consumer API.
Reactor Kafka API enables messages to be published to Kafka and consumed from Kafka using functional APIs
with non-blocking back-pressure and very low overheads. This enables applications using Reactor to use
Kafka as a message bus or streaming platform and integrate with other systems to provide an end-to-end reactive pipeline.

2. Motivation

2.1. Functional interface for Kafka

Reactor Kafka is a functional Java API for Kafka. For applications that are written in functional style,
this API enables Kafka interactions to be integrated easily without requiring non-functional
asynchronous produce or consume APIs to be incorporated into the application logic.

2.2. Non-blocking Back-pressure

The Reactor Kafka API benefits from non-blocking back-pressure provided by Reactor. For example, in a pipeline, where
messages received from an external source (e.g. an HTTP proxy) are published to Kafka, back-pressure can be applied easily to the
whole pipeline, limiting the number of messages in-flight and controlling memory usage. Messages flow through
the pipeline as they are available, with Reactor taking care of limiting the flow rate to avoid overflow,
keeping application logic simple.

2.3. End-to-end Reactive Pipeline

The value proposition for Reactor Kafka is the efficient utilization of resources in applications with multiple
external interactions where Kafka is one of the external systems. End-to-end reactive pipelines benefit from
non-blocking back-pressure and efficient use of threads, enabling a large number of concurrent requests to be
processed efficiently. The optimizations provided by Project Reactor enable development of reactive applications
with very low overheads and predictable capacity planning to deliver low-latency, high-throughput pipelines.

2.4. Comparisons with other Kafka APIs

Reactor Kafka is not intended to replace any of the existing Kafka APIs. Instead, it is aimed at providing
an alternative API for reactive event-driven applications.

2.4.1. Kafka Producer and Consumer APIs

Applications using Kafka as a message bus using this API may consider switching to Reactor Kafka if
the application is implemented in a functional style.

2.4.2. Kafka Connect API

Kafka Connect provides a simple interface to migrate messages
from an external data system (e.g. a database) to one or more Kafka topics. Using existing connectors,
this migration can be performed without writing any new code.

Applications using the connector API may consider using Reactor Kafka if a reactive API is available for
the external data system and transformations are required for the data. When transformations involve
other I/O (e.g. to obtain additional information from another database), a reactive pipeline
benefits from end-to-end non-blocking back-pressure provided by Reactor. Messages from/to different Kafka
partitions can be processed in parallel, improving throughput by avoiding blocking for I/O.
The pull model in Reactor controls the pace of messages flowing through the pipeline, enabling efficient
use of threads and memory without the need for overflow handling in the application.

2.4.3. Kafka Streams API

Kafka Streams provides lightweight APIs to build stream processing
applications that process data stored in Kafka using standard streaming concepts and transformation primitives.
Using a simple threading model, the streams API avoids the need for back-pressure. This model works well in cases
where transformations do not involve external interactions.

Reactor Kafka is useful for streams applications which process data from Kafka and use external interactions
(e.g. get additional data for records from a database) for transformations. In this case, Reactor can provide end-to-end
non-blocking back-pressure combined with better utilization of resources if all external interactions use the reactive model.

3. Getting Started

3.1. Requirements

You need Java JRE installed (Java 8 or later).

You need Apache Kafka installed (1.0.0 or later). Kafka can be downloaded
from kafka.apache.org/downloads.html. Note that the Apache Kafka client library used with
Reactor Kafka should be 2.0.0 or later and the broker version should be 1.0.0 or higher.

3.2. Quick Start

This quick start tutorial sets up a single node Zookeeper and Kafka and runs the sample reactive producer and
consumer. Instructions to set up multi-broker clusters are available here.

3.2.1. Start Kafka

If you haven’t yet downloaded Kafka, download Kafka version 2.0.0 or higher.

Unzip the release and set KAFKA_DIR to the installation directory. For example,

The sample producer sends 20 messages to Kafka topic demo-topic using the default partitioner. The partition
and offset of each published message is output to console. As shown in the sample output above, the order of
results may be different from the order of messages published. Results are delivered in order for each partition,
but results from different partitions may be interleaved. In the sample, message index is included as
correlation metadata to match each result to its corresponding message.

The sample consumer consumes messages from topic demo-topic and outputs the messages to console. The 20 messages
published by the Producer sample should appear on the console. As shown in the output above, messages are consumed
in order for each partition, but messages from different partitions may be interleaved.

3.2.3. Building Reactor Kafka Applications

To build your own application using the Reactor Kafka API, you need to include a dependency to Reactor Kafka.

6.2. Reactive Kafka Sender

Outbound messages are sent to Kafka using reactor.kafka.sender.KafkaSender. Senders are thread-safe and can be shared
across multiple threads to improve throughput. A KafkaSender is associated with one KafkaProducer that is used
to transport messages to Kafka.

A KafkaSender is created with an instance of sender configuration options reactor.kafka.sender.SenderOptions.
Changes made to SenderOptions after the creation of KafkaSender will not be used by the KafkaSender.
The properties of SenderOptions such as a list of bootstrap Kafka brokers and serializers are passed down
to the underlying KafkaProducer. The properties may be configured on the SenderOptions instance at creation time
or by using the setter SenderOptions#producerProperty. Other configuration options for the reactive KafkaSender like
the maximum number of in-flight messages can also be configured before the KafkaSender instance is created.

The generic types of SenderOptions<K, V> and KafkaSender<K, V> are the key and value types of producer records
published using the KafkaSender and corresponding serializers must be set on the SenderOptions instance before
the KafkaSender is created.

The KafkaSender is now ready to send messages to Kafka.
The underlying KafkaProducer instance is created lazily when the first message is ready to be sent.
At this point, a KafkaSender instance has been created, but no connections to Kafka have been made yet.

Let’s now create a sequence of messages to send to Kafka. Each outbound message to be sent to Kafka
is represented as a SenderRecord. A SenderRecord is a Kafka
ProducerRecord
with additional correlation metadata for matching send results to records. ProducerRecord consists of a key/value pair
to send to Kafka and the name of the Kafka topic to send the message to. Producer records may also optionally
specify a partition to send the message to or use the configured partitioner to choose a partition. Timestamp may
also be optionally specified in the record and if not specified, the current timestamp will be assigned by the Producer.
The additional correlation metadata included in SenderRecord is not sent to Kafka, but is included in the
SendResult generated for the record when the send operation completes or fails. Since results of sends to
different partitions may be interleaved, the correlation metadata enables results to be matched to their corresponding record.

The code segment above creates a sequence of messages to send to Kafka, using the message index as
correlation metadata in each SenderRecord. The outbound Flux can now be sent to Kafka using the
KafkaSender created earlier.

The code segment below sends the records to Kafka and prints out the response metadata received from Kafka
and the correlation metadata for each record. The final subscribe() in the code block
requests upstream to send the records to Kafka and the response metadata received from Kafka flow downstream.
As each result is received, the record metadata from Kafka along with the correlation metadata identifying the
record is printed out to console by the onNext handler. The response from Kafka includes the partition to which
the record was sent as well as the offset at the which the record was appended, if available.
When records are sent to multiple partitions, responses arrive in order
for each partition, but responses from different partitions may be interleaved.

6.2.1. Error handling

public SenderOptions<K, V> stopOnError(boolean stopOnError);

SenderOptions#stopOnError() specifies whether each send sequence should fail immediately if one
record could not delivered to Kafka after the configured number of retries or wait until all records
have been processed. This can be used along with ProducerConfig#ACKS_CONFIG
and ProducerConfig#RETRIES_CONFIG to configure the required quality of service.

If stopOnError is false, a success or error response is returned for each outgoing record.
For error responses, the exception from Kafka indicating the reason for send failure is set on SenderResult
and can be retrieved using SenderResult#exception(). The Flux fails with an error after attempting to send
all records published on outboundRecords. If outboundRecords is a non-terminating Flux, send continues to send
records published on this Flux until the result Flux is explicitly cancelled by the user.

If stopOnError is true, a response is returned for the first failed send and the result Flux is terminated
immediately with an error. Since multiple outbound messages may be in-flight at any time, it is possible that
some messages are delivered successfully to Kafka after the first failure is detected. SenderOptions#maxInFlight()
option may be configured to limit the number of messages in-flight at any time.

6.2.2. Send without result metadata

If individual results are not required for each send request, ProducerRecord can be sent to Kafka
without providing correlation metadata using the KafkaOutbound interface. KafkaOutbound is a fluent
interface that enables sends to be chained together.

The send sequence is initiated by subscribing to the Mono obtained from KafkaOutbound#then().
The returned Mono completes successfully if all the outbound records are delivered successfully. The Mono
terminates on the first send failure. If outboundRecords is a non-terminating Flux, records continue to
be sent to Kafka unless a send fails or the returned Mono is cancelled.

Success indicates all records were published, individual partitions or offsets not returned

5

Subscribe to request the actual sends

Multiple sends can be chained together using a sequence of sends on KafkaOutbound.
When the Mono returned from KafkaOutbound#then() is subscribed to, the sends are invoked
in sequence in the declaration order. The sequence is cancelled if any of the sends fail
after the configured number of retries.

Error indicates failure to send one or more records from any of the sends in the chain

4

Success indicates successful send of all records from the whole chain

5

Subscribe to initiate the sequence of sends in the chain

Note that in all cases the retries configured for the KafkaProducer are attempted and failures returned by
the reactive KafkaSender indicate a failure to send after the configured number of retry attempts. Retries
can result in messages being delivered out of order. The producer property
ProducerConfig#MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION may be set to one to avoid re-ordering.

6.2.3. Threading model

KafkaProducer uses a separate network thread for sending requests and processing responses. To ensure
that the producer network thread is never blocked by applications while processing results, KafkaSender
delivers responses to applications on a separate scheduler. By default, this is a single threaded
pooled scheduler that is freed when no longer required. The scheduler can be overridden if required, for instance,
to use a parallel scheduler when the Kafka sends are part of a larger pipeline. This is done on the SenderOptions
instance before the KafkaSender instance is created using:

public SenderOptions<K, V> scheduler(Scheduler scheduler);

6.2.4. Non-blocking back-pressure

The number of in-flight sends can be controlled using the maxInFlight option. Requests for more elements from
upstream are limited by the configured maxInFlight to ensure that the total number of requests at any time for which
responses are pending are limited. Along with buffer.memory and max.block.ms options on KafkaProducer,
maxInFlight enables control of memory and thread usage when KafkaSender is used in a reactive pipeline. This option
can be configured on SenderOptions before the KafkaSender is created. Default value is 256. For small messages,
a higher value will improve throughput.

public SenderOptions<K, V> maxInFlight(int maxInFlight);

6.2.5. Closing the KafkaSender

When the KafkaSender is no longer required, the KafkaSender instance can be closed. The underlying KafkaProducer is closed,
closing all client connections and freeing all memory used by the producer.

sender.close();

6.2.6. Access to the underlying KafkaProducer

Reactive applications may sometimes require access to the underlying producer instance to perform actions that are not
exposed by the KafkaSender interface. For example, an application might need to know the number of partitions in a topic
in order to choose the partition to send a record to. Operations that are not provided directly by KafkaSender like send
can be run on the underlying KafkaProducer using KafkaSender#doOnProducer.

User provided methods are executed asynchronously.
A Mono is returned by doOnProducer which completes with the value returned by the user-provided function.

6.3. Reactive Kafka Receiver

Messages stored in Kafka topics are consumed using the reactive receiver reactor.kafka.receiver.KafkaReceiver.
Each instance of KafkaReceiver is associated with a single instance of KafkaConsumer. KafkaReceiver is not thread-safe
since the underlying KafkaConsumer cannot be accessed concurrently by multiple threads.

A receiver is created with an instance of receiver configuration options reactor.kafka.receiver.ReceiverOptions.
Changes made to ReceiverOptions after the creation of the receiver instance will not be used by the KafkaReceiver.
The properties of ReceiverOptions such as a list of bootstrap Kafka brokers and de-serializers are passed down
to the underlying KafkaConsumer. These properties may be configured on the ReceiverOptions instance at creation time
or by using the setter ReceiverOptions#consumerProperty. Other configuration options for the reactive
KafkaReceiver including subscription topics must be added to options before the KafkaReceiver instance is created.

The generic types of ReceiverOptions<K, V> and KafkaReceiver<K, V> are the key and value types of consumer records
consumed using the receiver and corresponding de-serializers must be set on the ReceiverOptions instance before
the KafkaReceiver is created.

Once the required configuration options have been configured on the options instance, a new KafkaReceiver instance
can be created with these options to consume inbound messages.
The code block below creates a receiver instance and creates an inbound Flux for the receiver.
The underlying KafkaConsumer instance is created lazily later when the inbound Flux is subscribed to.

The inbound Kafka Flux is ready to be consumed. Each inbound message delivered by the Flux is represented
as a ReceiverRecord. Each receiver record is a
ConsumerRecord
returned by KafkaConsumer along with a committable ReceiverOffset instance. The offset must be acknowledged
after the message is processed since unacknowledged offsets will not be committed.
If commit interval or commit batch size are configured, acknowledged offsets will be committed periodically.
Offsets may also be committed manually using ReceiverOffset#commit() if finer grained control of commit
operations is required.

Acknowledges that the record has been processed so that the offset may be committed

6.3.1. Subscribing to wildcard patterns

The example above subscribed to a single Kafka topic. The same API can be used to subscribe to
more than one topic by specifying multiple topics in the collection provided to ReceiverOptions#subscription().
Subscription can also be made to a wildcard pattern by specifying a pattern to subscribe to. Group
management in KafkaConsumer dynamically updates topic assignment when topics matching the pattern
are created or deleted and assigns partitions of matching topics to available consumer instances.

Existing subscriptions and assignments on the options instance are deleted when a new assignment
is specified. Every receiver created from this options instance with manual assignment consumes messages
from all the specified partitions.

6.3.3. Controlling commit frequency

Commit frequency can be controlled using a combination of commit interval
and commit batch size. Commits are performed when either the interval or batch size is reached. One or both
of these options may be set on ReceiverOptions before the receiver instance is created. If commit interval
is configured, at least one commit is scheduled within that interval if any records were
consumed. If commit batch size is configured, a commit is scheduled when the configured number of records
are consumed and acknowledged.

Manual acknowledgement of consumed records after processing along with automatic commits based on
the configured commit frequency provides at-least-once delivery semantics. Messages are re-delivered
if the consuming application crashes after message was dispatched but before it was processed and
acknowledged. Only offsets explicitly acknowledged using ReceiverOffset#acknowledge() are committed.
Note that acknowledging an offset acknowledges all previous offsets on the same partition. All
acknowledged offsets are committed when partitions are revoked during rebalance and when the receive
Flux is terminated.

Applications which require fine-grained control over the timing of commit operations
can disable periodic commits and explicitly invoke ReceiverOffset#commit() when required to trigger
a commit. This commit is asynchronous by default, but the application many invoke Mono#block()
on the returned Mono to implement synchronous commits. Applications may batch commits by acknowledging
messages as they are consumed and invoking commit() periodically to commit acknowledged offsets.

Note that committing an offset acknowledges and commits all previous offsets on that partition. All
acknowledged offsets are committed when partitions are revoked during rebalance and when the receive
Flux is terminated.

6.3.4. Auto-acknowledgement of batches of records

KafkaReceiver#receiveAutoAck returns a Flux of batches of records returned by each KafkaConsumer#poll().
The records in each batch are automatically acknowledged when the Flux corresponding to the batch terminates.

The maximum number of records in each batch can be controlled using the KafkaConsumer property
MAX_POLL_RECORDS. This is used together with the fetch size and wait times configured on the
KafkaConsumer to control the amount of data fetched from Kafka brokers in each poll. Each batch is
returned as a Flux that is acknowledged after the Flux terminates. Acknowledged records are committed periodically
based on the configured commit interval and batch size. This mode is simple to use since applications
do not need to perform any acknowledge or commit actions. It is efficient as well and can be used
for at-least-once delivery of messages.

6.3.5. Disabling automatic commits

Applications which don’t require offset commits to Kafka may disable automatic commits by not acknowledging
any records consumed using KafkaReceiver#receive().

6.3.6. At-most-once delivery

Applications may disable automatic commits to avoid re-delivery of records. ConsumerConfig#AUTO_OFFSET_RESET_CONFIG
can be configured to "latest" to consume only new records. But this could mean that an unpredictable
number of records are not consumed if an application fails and restarts.

KafkaReceiver#receiveAtmostOnce can be used to consume records with at-most-once semantics with a configurable
number of records-per-partition that may be lost if the application fails or crashes. Offsets are committed
synchronously before the corresponding record is dispatched. Records are guaranteed not to be re-delivered
even if the consuming application fails, but some records may not be processed if an application fails
after the commit before the records could be processed.

This mode is expensive since each record is committed individually and records are not delivered until
the commit operation succeeds. ReceiverOptions#atmostOnceCommitCommitAheadSize may be configured
to reduce the cost of commits and avoid blocking before dispatch if the offset of the record has already
been committed. By default, commit-ahead is disabled and at-most one record is lost per-partition if
an application crashes. If commit-ahead is configured, the maximum number of records that may be
lost per-partition is ReceiverOptions#atmostOnceCommitCommitAheadSize + 1.

Process each consumer record, this record is not re-delivered if the processing fails

6.3.7. Partition assignment and revocation listeners

Applications can enable assignment and revocation listeners to perform any actions when
partitions are assigned or revoked from a consumer.

When group management is used, assignment listeners are invoked whenever partitions are assigned
to the consumer after a rebalance operation. When manual assignment is used, assignment listeners
are invoked when the consumer is started. Assignment listeners can be used to seek to particular offsets
in the assigned partitions so that messages are consumed from the specified offset.

When group management is used, revocation listeners are invoked whenever partitions are revoked
from a consumer after a rebalance operation. When manual assignment is used, revocation listeners
are invoked before the consumer is closed. Revocation listeners can be used to commit processed
offsets when manual commits are used. Acknowledged offsets are automatically committed on revocation
if automatic commits are enabled.

6.3.8. Controlling start offsets for consuming records

By default, receivers start consuming records from the last committed offset of each assigned partition.
If a committed offset is not available, the offset reset strategy ConsumerConfig#AUTO_OFFSET_RESET_CONFIG
configured for the KafkaConsumer is used to set the start offset to the earliest or latest offset on the partition.
Applications can override offsets by seeking to new offsets in an assignment listener. Methods are provided on
ReceiverPartition to seek to the earliest, latest or a specific offset in the partition.

void seekToBeginning();
void seekToEnd();
void seek(long offset);

For example, the following code block starts consuming messages from the latest offset.

6.3.9. Consumer lifecycle

Each KafkaReceiver instance is associated with a KafkaConsumer that is created when the inbound
Flux returned by one of the receive methods in KafkaReceiver is subscribed to. The consumer is kept alive until
the Flux completes. When the Flux completes, all acknowledged offsets are committed and the
underlying consumer is closed.

Only one receive operation may be active in a KafkaReceiver at any one time. Any of the receive
methods can be invoked after the receive Flux corresponding to the last receive is terminated.

7. Sample Scenarios

This section shows sample code segments for typical scenarios where Reactor Kafka API
may be used. Full code listing for these scenarios are included in the
samples sub-project.

7.1. Sending records to Kafka

See KafkaSender API for details on the KafkaSender API for sending outbound records
to Kafka. The following code segment creates a simple pipeline that sends records to Kafka and
processes the responses. The outbound flow is triggered when the returned Flux is subscribed to.

7.2. Replaying records from Kafka topics

See KafkaReceiver API for details on the KafkaReceiver API for consuming records
from Kafka topics. The following code segment creates a Flux that replays all records on a topic
and commits offsets after processing the messages. Manual acknowledgement provides
at-least-once delivery semantics.

Start consuming from first available offset on each partition if committed offsets are not available

2

Commit every 10 acknowledged messages

3

Topics to consume from

4

Process consumer record from Kafka

5

Acknowledge that record has been consumed

7.3. Reactive pipeline with Kafka sink

The code segment below consumes messages from an external source, performs some transformation
and stores the output records in Kafka. Large number of retry attempts are configured
on the Kafka producer so that transient failures don’t impact the pipeline. Source commits are
performed only after records are successfully written to Kafka.

If a send fails, it indicates catastrophic error, fail the whole pipeline

6

Use correlation metadata in the sender record to commit source record

7.4. Reactive pipeline with Kafka source

The code segment below consumes records from Kafka topics, transforms the record
and sends the output to an external sink. Kafka consumer offsets are committed after
records are successfully output to sink.

7.5. Reactive pipeline with Kafka source and sink

The code segment below consumes messages from Kafka topic, performs some transformation
on the incoming messages and stores the result in some Kafka topics. Manual acknowledgement
mode provides at-least-once semantics with messages acknowledged after the output records
are delivered to Kafka. Acknowledged offsets are committed periodically based on the
configured commit interval.

Transform incoming record and create outbound record with transformed data in the payload and inbound offset as correlation metadata

3

Acknowledge the inbound offset using the offset instance in correlation metadata after outbound record is delivered to Kafka

7.6. At-most-once delivery

The code segment below demonstrates a flow with at-most once delivery. Producer does not wait for acks and
does not perform any retries. Messages that cannot be delivered to Kafka on the first attempt
are dropped. KafkaReceiver commits offsets before delivery to the application to ensure that if the consumer
restarts, messages are not redelivered. With replication factor 1 for topic partitions,
this code can be used for at-most-once delivery.

Send with acks=0 completes when message is buffered locally, before it is delivered to Kafka broker

2

No retries in producer

3

Ignore any error and continue to send remaining records

4

At-most-once receive

7.7. Fan-out with Multiple Streams

The code segment below demonstrates fan-out with the same records processed in multiple independent
streams. Each stream is processed on a different thread and which transforms the input record
and stores the output in a Kafka topic.

Reactor’s EmitterProcessor
is used to broadcast the input records from Kafka to multiple subscribers.

Consume records on a scheduler, process and generate output records to send to Kafka

5

Add another processor for the same input data on a different scheduler

6

Merge the streams and subscribe to start the flow

7.8. Concurrent Processing with Partition-Based Ordering

The code segment below demonstrates a flow where messages are consumed from a Kafka topic, processed
by multiple threads and the results stored in another Kafka topic. Messages are grouped
by partition to guarantee ordering in message processing and commit operations. Messages
from each partition are processed on a single thread.

Send multiple records generated from each source record within a transaction

7.10. Exactly-once delivery

The code segment below demonstrates a flow with exactly once delivery. Source records
received from a Kafka topic are transformed and sent to Kafka. Each batch of records
is delivered to the application in a new transaction. Offsets of the source records
of each batch are automatically committed within its transaction. Each transaction
is committed by the application after the transformed records of the batch are
successfully delivered to the destination topic. Next batch of records is delivered
to the application in a new transaction after the current transaction is committed.