Basic Event Sourcing in Clojure

Today I am going to talk about basic event sourcing without all the buzz and
fuzz. You don’t need Kafka, opaque containers, external providers, and/or a
fancy distributed, fault-tolerant cluster setup to do it.

In Clojure and Clojurescript, it’s very commmon to store app state in a
single atom. What do you do when you want persistence? The default is to start
integrating with some SQL database. This requires additional software, you
have to setup schemas and the data model works is different from the native
Clojure one.

What else can you do? For things that run on a single server, we can use files.
Files are great and are generally underestimated. Hacker News runs on files, for
example. One approach is to persist the entire atom to file whenever we change
it. It might look something like this:

Basic atom db persistence

We would probably want to do the swap! and spit in some form of transaction
function. We can verify that the db has been persisted correctly:

$ cat app.db
{"foo@bar.com" {:cookies 5}}

On startup we simply run something like:

;; load db into memory
(reset! db (read-string (slurp "app.db")))

One thing that is missing so far is atomicity, ensuring that the file doesn’t
get corrupted. We can solve this by writing to a temporary file and then
renaming it once we know the data is on disk. See Brandon Bloom’s post
Slurp and Spit.

Event sourcing

Event sourcing simply means that the source of truth consists of a series of
events. We apply these events using some function that creates an aggregate view
of what we are interested in. In Clojure code:

This information would’ve been lost if we just kept mutating the database.
Another feature this enables is is that we can change our schema easily. Let’s
say we want to put all the user data in a :users key. All we have to do is
change add-cookie:

and re-build our aggregate state. We don’t risk losing any data because we never
touch our real source of truth, events. In fact, you can have multiple aggregate
states for multiple purposes, if you so wish.

Trade-offs

With this approach you have to be careful with how you define your events so
you can always read them. This is easier if you define schemas and write your
new state transition functions can still read old events, for example with
default values.

Writing the state transition functions can get hairy and requires some
discipline. Threading functions for state is your friend.

There are some edge cases to take care of when it comes to updating events and
in-memory database at the same time that you have to think about, so you don’t
get an inconsistent state. Validate your data.

Not being in standard SQL makes it harder to use external tools.

Configure your own backups. Unlike, say, RDS you have to actually keep
your own backups.

Doesn’t work for multiple servers. You can write your own event server
and do it that way, but it gets hairy quickly. This is when you might want to
look into something more elaborate.

Not great for a lot of data. If you can’t keep your data in memory you are
going to run into issues. Hint: if you haven’t tried, your data most likely
fits in RAM.

Slow startup time. If you have a lot of events, loading the data might be
slow. This can be solved by snapshotting state.

On the other hand, you get something quick to work with, you keep the entire
history, you don’t need additional integrations, you can easily grep your data,
and it’s very flexible.