When we think of modern data processing, we often think of batch-oriented ecosystems like Hadoop, including processing engines like Spark. However, the sooner we can extract useful information from our data, the better, which is driving an evolution towards stream processing or “fast data”. Many of the legacy tools, including Spark, provide various levels of support for stream processing, but deeper architectural changes are emerging.

Then we’ll work through code examples that use Akka Streams and Kafka Streams with Kafka to implement a machine-learning example where a machine learning model is updated periodically to simulate the problem of periodic retraining and serving of ML models in a streaming context. In particular, if you periodically retrain the model using one tool chain, for example, once a day, how to do you incorporate the updated model into a running pipeline for scoring without restarting the pipeline?

Outline/Structure of the Workshop

In this hands-on workshop, we’ll start with a brief overview of the characteristics of streaming architectures:

Use cases driving this evolution to streaming architectures.

Kafka (or emerging alternatives) as the data backplane, to capture data streams as logs between producers and consumers.

When you should use the feature-rich and highly-scalable processing engines, like Spark and Flink.

When you should use the more-flexible and lower-latency data processing libraries, like Kafka Streams and Akka Streams, inside microservices.

Target Audience

developers working on applications that can be modelled using state machines (e.g. web applications) who want to test more complex properties of their software.plex properties of their software.plex properties of their software.