Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

How to implement a solution using Kafka as a distributed database, KafkaStreams as a glue for different services and how to apply some Domain Driven Design concepts to ensure data integrity and design the boundaries of each service.

3.
WHAT TO EXPECT?
● To meet ScoutWorks :)
● Tales about business requirements
● A brief introduction to some Kafka & Kafka Streams conventions
● See how we designed our architecture
● Talk about resilience in a functional architecture

5.
OUR DOMAIN
● Core of domain are listings
● Images are one of the main point of information of listings
● Dealers want to export those listings to other marketplaces

6.
OUR PRODUCT
A system able to export dealers’ high quality listings
to other marketplaces to improve her visibility on the market.

7.
BUSINESS REQUIREMENTS
● A dealer is capable of enabling and disabling the export process
● All active listings of a dealer will be exported
● Exported listings that become inactive or deleted should be hidden
on external marketplaces

8.
MORE BUSINESS REQUIREMENTS
● It’s acceptable to not have latest listing information exported in real-time,
but it should be eventually updated
● It’s important to have all listings on external marketplaces ASAP to ensure
visibility
● Listings data format is dynamic, so it should be possible to reprocess the
listing and export again

12.
WHAT IS KAFKA?
● Distributed streaming platform
● Records are published in topics, which formed by partitions
● Each partition is an append-only (*) structured commit log
● Records consist of partition key, a value and a timestamp, and an assigned
offset, which means position of record in the log

14.
WHY KAFKA?
Kafka is often used for building real-time streaming applications
that transform or react to the streams of data.

15.
WHY KAFKA?
● Listings change propagation fits very well to Kafka streaming mindset
● Possibility to go back in time and reprocess records if needed
● Enables developers to design thinking in a composition of small functions

20.
Functions run once and
completely, can not be
interrupted
Atomic Composable
Functions can be chained
generating more abstract
and business-related
algebras
State-ignorant
State is shared as a
parameter, avoiding mutable
state between functions
FUNCTIONS ARE

21.
CONSISTENCY BOUNDARIES
● Can only be ensured on a single partition
● Is degraded when repartitioning

22.
AGGREGATE ROOT
● Is the boundary of consistency
● Is a set of records in a single topic with the same partition key
● Represents a single business object (for example, a Listing)

29.
KAFKA
For every topic with replication factor of N,
Kafka tolerates failures up to N-1 nodes.

30.
KAFKA STREAMS
● One node setup: after coming back, picking up where processing stopped
● Multi-node setup: other nodes taking over, but…
○ Stateless processor: continue working as soon as nodes are re-balanced
○ Stateful processor, simple setup: can take a while until state is built up
○ Stateful processor, hot stand-by setup: local state is being build-up, but records are
not being actually processed until failover happens

31.
LEARNINGS
● Function signature should be unique (only one function should be
responsible of a single transformation)
● Functions, by design, should not pertain to a single domain, but
map two domains
● The consistency boundary is a partition (or a single aggregate root)

32.
LEARNINGS
● A system can be seen as a composition of functions, but data needs
to be managed by an external system.
● As a function, we should test transformations, not side-effects.
● Adding a correlation id on data sources is really useful for tracing, but
boundaries should be chosen carefully.

33.
LEARNINGS
● Kafka Streams should not be used for external I/O. For example, if
you need a service that makes HTTP requests, use another streaming
engine for that (we used Akka Streams).
● Kafka Streams’ learning curve is really steep.
● Kafka Streams and Kafka by default are not there yet for medium size
messages (like ~50KB). You will need to tweak and optimize the
configuration.

34.
LEARNINGS
● Backpressure is a natural fit as functions are pull-based.
● Single-direction data-flow is a mindset that needs to be learned and
improved.