The tutorial is based on a small microservices ecosystem, showcasing an order management workflow, such as one might find in retail and online shopping.
It is built using Kafka Streams, whereby business events that describe the order management workflow propagate through this ecosystem.
The blog post Building a Microservices Ecosystem with Kafka Streams and KSQL outlines the approach used.

Note: this is demo code, not a production system and certain elements are left for further work.

In this example, the system centers on an Orders Service which exposes a REST interface to POST and GET Orders.
Posting an Order creates an event in Kafka that is recorded in the topic orders.
This is picked up by different validation engines (Fraud Service, Inventory Service and Order Details Service), which validate the order in parallel, emitting a PASS or FAIL based on whether each validation succeeds.

The result of each validation is pushed through a separate topic, Order Validations, so that we retain the _single writer_ status of the Orders Service —> Orders Topic (Ben Stopford's book discusses several options for managing consistency in event collaboration).
The results of the various validation checks are aggregated in the Validation Aggregator Service, which then moves the order to a Validated or Failed state, based on the combined result.

To allow users to GET any order, the Orders Service creates a queryable materialized view (embedded inside the Orders Service), using a state store in each instance of the service, so that any Order can be requested historically. Note also that the Orders Service can be scaled out over a number of nodes, in which case GET requests must be routed to the correct node to get a certain key. This is handled automatically using the interactive queries functionality in Kafka Streams.

The Orders Service also includes a blocking HTTP GET so that clients can read their own writes. In this way, we bridge the synchronous, blocking paradigm of a RESTful interface with the asynchronous, non-blocking processing performed server-side.

There is a simple service that sends emails, and another that collates orders and makes them available in a search index using Elasticsearch.

Finally, KSQL is running with persistent queries to enrich streams and to also check for fraudulent behavior.

Here is a diagram of the microservices and the related Kafka topics.

All the services are client applications written in Java, and they use the Kafka Streams API.
The java source code for these microservices are in the kafka-streams-examples repo.

You will get a lot more out of this tutorial if you have first learned the concepts which are foundational for this tutorial.
To learn how service-based architectures and stream processing tools such as Apache Kafka® can help you build business-critical systems, we recommend:

Then run the full end-to-end working solution, which requires no code development, to see a customer-representative deployment of a streaming application..
This provides context for each of the exercises in which you will develop pieces of the microservices.

Exercise 0: Run end-to-end demo

After you have successfully run the full solution, go through the execises in the tutorial to better understand the basic principles of streaming applications:

Exercise 1: Persist events

Exercise 2: Event-driven applications

Exercise 3: Enriching streams with joins

Exercise 4: Filtering and branching

Exercise 5: Stateful operations

Exercise 6: State stores

Exercise 7: Enrichment with KSQL

For each exercise:

Read the description to understand the focus area for the exercise

Edit the file specified in each exercise and fill in the missing code

Copy the file to the project, then compile the project and run the test for the service to ensure it works

An _event_ is simply a thing that happened or occurred.
An event in a business is some fact that occurred, such as a sale, an invoice, a trade, a customer experience, etc., and it is the source of truth.
In event-oriented architectures, events are first-class citizens that constantly push data into applications.
Client applications can then react to these streams of events in real time and decide what to do next.

In this exercise, you will persist events into Kafka by producing records that represent customer orders.
This event happens in the Orders Service, which provides a REST interface to POST and GET Orders.
Posting an Order is essentially a REST call, and it creates the event in Kafka.

Service-based architectures are often designed to be request-driven, in which services send commands to other services to tell them what to do, await a response, or send queries to get the resulting state.
Building services on a protocol of requests and responses forces a complicated web of synchronous dependencies that bound services together.

In contrast, in an event-driven design, the event stream is the inter-service communication that enables services to cross deployment boundaries and avoids synchronous execution.
When and how downstream services respond to those events is within their control, which reduces the coupling between services and enables an architecture with more pluggability.
Read more on Build Services on a Backbone of Events.

In this exercise, you will write a service that validates customer orders.
Instead of using a series of synchronous calls to submit and validate orders, the order event itself triggers the OrderDetailsService.
When a new order is created, it is written to the topic orders, from which OrderDetailsService has a consumer polling for new records.

Streams can be enriched with data from other streams or tables through joins.
A join enriches data by performing lookups in a streaming context where data is updated continuously and concurrently.
For example, applications backing an online retail store might enrich new data records with information from multiple databases.
In this scenario, it may be that a stream of customer transactions is enriched with sales price, inventory, customer information, etc.
These lookups can be performed at very large scale and with a low processing latency.

A stateful streaming service that joins two streams at runtime (source)

A popular pattern is to make the information in the databases available in Kafka through so-called change data capture (CDC), together with Kafka’s Connect API to pull in the data from the database.
Once the data is in Kafka, client applications can perform very fast and efficient joins of such tables and streams, rather than requiring the application to make a query to a remote database over the network for each record.
Read more on an overview of distributed, real-time joins and implementing joins in Kafka Streams.

In this exercise, you will write a service that joins streaming order information with streaming payment information and data from a customer database.
First, the payment stream needs to be rekeyed to match the same key info as the order stream before joined together.
The resulting stream is then joined with the customer information that was read into Kafka by a JDBC source from a customer database.
Additionally, this service performs dynamic routing: an enriched order record is written to a topic that is determined from the value of level field of the corresponding customer.

A stream of events can be captured in a Kafka topic.
Client applications can then manipulate this stream based on some user-defined criteria, even creating new streams of data that they can act on or downstream services can act on.
These help create new streams with more logically consistent data.
In some cases, the application may need to filter events from an input stream that match certain critera, which results in a new stream with just a subset of records from the original stream.
In other cases, the application may need to branch events, whereby each event is tested against a predicate and then routed to a stream that matches, which results in multiple new streams split from the original stream.

In this exercise, you will define one set of criteria to filter records in a stream based on some criteria.
Then you will define define another set of criteria to branch records into two different streams.

An aggregation operation takes one input stream or table, and yields a new table by combining multiple input records into a single output record.
Examples of aggregations are computing count or sum, because they combine current record values with previous record values.
These are stateful operations because they maintain data during processing.
Aggregations are always key-based operations, and Kafka’s Streams API ensures that records for the same key are always routed to the same stream processing task.
Oftentimes, these are combined with windowing capabilities in order to run computations in real time over a window of time.

In this exercise, you will create a session window to define five-minute windows for processing.
Additionally, you will use a stateful operation reduce to collapse duplicate records in a stream.
Before running reduce, you will group the records to repartition the data, which is generally required before using an aggregation operator.

Kafka Streams provides so-called state stores, which are disk-resident hash tables, held inside the API for the client application.
The state store can be used within stream processing applications to store and query data, an important capability when implementing stateful operations.
It can be used to remember recently received input records, to track rolling aggregates, to de-duplicate input records, etc.

State stores in Kafka Streams can be used to create use-case-specific views right inside the service (source)

It is also backed by a Kafka topic and comes with all the Kafka guarantees.
Consequently, other applications can also interactively query another application's state store.
Querying state stores is always read-only to guarantee that the underlying state stores will never be mutated out-of-band (i.e., you cannot add new entries).

In this exercise, you will create a state store for the Inventory Service.
This state store is initialized with data from a Kafka topic before the service starts processing, and then it is updated as new orders are created.

Confluent KSQL is the streaming SQL engine that enables real-time data processing against Apache Kafka.
It provides an easy-to-use, yet powerful interactive SQL interface for stream processing on Kafka, without requiring you to write code in a programming language such as Java or Python.
KSQL is scalable, elastic, fault tolerant, and it supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, and sessionization.

You can use KSQL to merge streams of data in real time by using a SQL-like join syntax.
A KSQL join and a relational database join are similar in that they both combine data from two sources based on common values.
The result of a KSQL join is a new stream or table that’s populated with the column values that you specify in a SELECT statement.
KSQL also supports several aggregate functions, like COUNT and SUM.
You can use these to build stateful aggregates on streaming data.

In this exercise, you will create one persistent query that enriches the orders stream with customer information using a stream-table join.
You will create another persistent query that detects fraudulent behavior by counting the number of orders in a given window.

If you are running on local install, then type ksql to get to the KSQL CLI prompt.
If you are running on Docker, then type docker-compose exec ksql-cli ksql http://ksql-server:8088 to get to the KSQL CLI prompt.

Assume you already have a KSQL stream of orders called orders and a KSQL table of customers called customers_table.
From the KSQL CLI prompt, type DESCRIBE orders; and DESCRIBE customers_table; to see the respective schemas.
Then create the following persistent queries:

TODO 7.1: create a new KSQL stream that does a stream-table join between orders and customers_table based on customer id.

TODO 7.2: create a new KSQL table that counts if a customer submits more than 2 orders in a 30 second time window.