Event Sourcing in a Pinch

Let’s talk about Event Sourcing. Perhaps you’ve heard of it, but haven’t found the time to attend a conference talk or read one of the older, larger books which describe it. It’s one of those topics I wish I’d known about sooner, and today I’m going to describe it to you in a way that I understand it.

Most of this code can be found on GitHub. I’ve tested it using PHP 7.1.

I’ve chosen this title for a few reasons. Firstly, I don’t consider myself an expert on the topic. For that, you’d be hard pressed to find a better tutor than the authors of those books, or someone like Mathias Verraes. What I’m about to tell you is only the tip of the iceberg. A pinch of salt, if you will.

Event sourcing is also part of a larger, broader set of topics; loosely defined as Domain Driven Design. Event sourcing is one design pattern amongst many, and you’d do well to learn about the other patterns associated with DDD. In fact, it’s often not a good idea to pluck just Event Sourcing out of the DDD toolbox, without understanding the benefits of the other patterns.

Still, I think it’s a fascinating and fun exercise, and few people cover it well. It’s especially suited for those developers who have yet to dip their toes in the pool of DDD. So, if you find yourself needing something like Event Sourcing, but don’t know or understand the rest of DDD, I hope this post helps you. In a pinch.

Common Language

One of the strongest themes of Domain Driven Design is the need for a common language. When your client decides they need a new application, they are thinking about how it will affect their ice-cream sales. They’re concerned about how their patrons will find their favorite flavor of ice-cream, and how that will affect foot-traffic at their ice-cream stand.

You may think in terms of website users and geolocated outlets, but those words don’t necessarily mean anything to your client. Though it may take some time, initially, your communication with your client will be greatly improved if you both use the same words when talking about the same thing.

You’ll also find that modeling the entire system in the words your client understands gives you a bit of a safety net against scope changes. It’s much easier to say; “You initially asked for customers to purchase ice-cream before the invoice is sent (shown here in code and email), but now you’re asking for the invoice to be sent first…” than it is to describe the changes they’re asking for in language/code only you understand.

That’s not to say all your code needs to be understood by the client, or that you have to use something like Behat for your integration testing. But, at the very least, you should call entities and actions the same thing as your client does.

An added benefit of this is that future developers will be able to understand the intent of the code (and how it applies to the business process), without as much help from the client or project manager.

I’m waffling a bit, but this point will be important when we start to write code.

Storing State vs. Storing Behavior

Most of the websites I’ve built have had some form of CRUD (Create, Read, Update, and Delete) database functionality. These operations are intentionally generic, as they have traditionally mapped to the underlying relational database they use.

This is enough for the most basic presentation of ice-cream information on the client’s website. It’s how we’ve been building websites for ages. But it has a significant weakness — we don’t know what happened to get us here.

Let’s think of some things which could influence how the data got to this point:

When did we start selling “Chocolate”? Many Object Relation Mappers (ORM) libraries will add fields like created_at and updated_at, but those only go so far in telling us what we want to know.

How did we get that much stock? Did we get a delivery? Did we give some away?

What happens to our analytics when we no longer want to sell “Chocolate”, or when we want to move all stock to another outlet? Do we add a boolean field (to the products’ table), to indicate that the product is no longer sold, but should remain in the analytics? Or perhaps we should add a timestamp, so we know when that all happened…

Storing Behavior

The weakness is such that we only know what the data is like now. Our data is like a photo, when what we want is a video. What if we tried something different?

… and we could do that without any extra boolean/timestamp fields. We could come back to already-stored data, and create a new kind of report. That’s so valuable!

So Which Is It?

Event Sourcing is both of these things. It’s about capturing every event (which you can think of as every change in application data) as a self-contained, repeatable thing. It’s about storing these events in the same time-order they happened, so that we can at-will journey to any point in time.

It’s about understanding how to interface this architecture with other systems that aren’t built in the same way, which means having a way to represent just the latest application data state.

The events are append-only, which means we never delete any of them from the database. And, if we’re doing things right, they describe (in their names and properties) what they mean to the business and customer they relate to.

Making Events

We’re going to use classes to describe events. They’re useful, simple containers we can define; and they’ll help us validate the data we put in and the data we get out for each event.

Those experienced in Event Sourcing may be itching to hear how I describe things like aggregates. I’m intentionally avoiding jargon — in much the same way as I’d avoid differentiating between mocks, doubles, stubs, and fakes — if I were teaching someone their first bit of testing. It’s the idea that is important, and the idea behind Event Sourcing is recording behavior.

It’s really important (in my opinion) that event classes are simple. Using PHP 7 type hints, we can validate the data we use to define events. A handful of simple accessors will help us get the important data out again.

On top of this class, we can define the real event types we want to record:

Notice how we’ve made each of these final? We have to fight to keep the events simple, and they wouldn’t continue to be simple if another developer could come along and subclass them (for whatever reason).

I also find it interesting how we can isolate the definition, format, and accessibility of the event dates: by defining $date as private and requiring subclasses to access it through the date method. This is perhaps a tad too defensive, but it obeys the Law of Demeter in that the concrete events need not know how the date is defined or formatted, in order to use it.

With this isolation, we can change the entire system’s timezone, or change to using UNIX timestamps, and we’d only need to change a single line of code.

We could omit these classes if we’re willing to sacrifice performance (and do runtime associative array checks) or type safety.

Storing Events

Let’s store these events in a SQLite database. We could use an ORM for that, but perhaps this is a good opportunity to recap how PDO works.

Using PDO

The first bit of code, for connecting to any supported database through PDO, is:

PDO connections are typically made using a Data Source Name (DSN). Here we define the database type as sqlite, and the location as an in-memory database. This means the database will disappear as soon as the script finishes.

It’s also a good idea to set the error-mode to throw exceptions when a SQL error occurs. That way we’ll get immediate feedback on our mistakes.

One of those tables is going to be where we generate and store unique product identifiers. The exact syntax of CREATE TABLE differs slightly between database types, and you’d typically find more columns in a table.

A great way to learn how your database creates tables is to make a table through a GUI, and then run SHOW CREATE TABLE my_new_table. This will generate CREATE TABLE syntax, in all of PDO’s supported databases.

Prepared statements (using prepare and execute) are the recommended way of executing SQL queries. They are even more useful when you need to pass query parameters:

In addition to a table for each event, I’ve also added tables to store and generate product and outlet IDs. Each event table has a date field, the value of which is generated by the abstract Event class.

The store function is just a convenience. PHP has no concept of typed arrays, so we could add runtime checking, or use the signature of storeOne to validate that we’re only trying to store Event subclass instances.

We can get specific event data via the payload method. This data will differ based on the event class being stored, so we should only assume keys after we’re sure which event type we’re dealing with.

We’re also using some product and outlet helper methods. Here’s what they look like:

inventProduct, priceProduct, openOutlet, and stockOutlet are all pretty self-explanatory. In order to get the IDs they refer to, we need the newProductId and newOutletId functions. These insert empty rows so that unique identifiers will be generated and can be returned (using the $connection->lastInsertId() method).

You do not have to follow this same naming pattern. In fact, it’s better to use names and patterns that you and your client agree define the core concepts of the product, as far as DDD is concerned.

Projecting Events

As we’ve seen, the method of storing behavior gives us an unprecedented look at the entire history of our data. It’s not very good for rendering views, though. As I mentioned, we also need a way to interface an event sourcing architecture with other systems that are not built in the same way.

That means we need to be able to tell the outside world what the more recent state of the application is, as if we were storing it like that in the database. This is often called projection, because we sort through all the events to display a final state for everyone else to see. So, projection in the sense of forecasting a future state, based on present trends.

I believe one of the biggest hurdles to Event Sourcing newcomers face is not knowing how to realistically apply it to their situation. It doesn’t help we talk about the theory of Event Sourcing if we don’t talk about how to use it well!

The reason we need any of these *NameFromId and *IdFromName functions is because we want to create and present the events using entity names, but we want to store them as foreign keys in the database. That’s just a personal preference of mine, and you’re free to define/present/store them however makes sense to you.

This code is similar to the code we use to store events. For each type of event, we modify the array of entities. After all the events have been projected, we should have the latest state. Here’s what the other projector methods look like:

Each of these projection methods accepts a type of event, and sorts through the event payload to make their mark on the $entities array.

At this point, we can use the structure we’ve created to populate a website. Since our projectors accept events, we can even generate an initial projection (at the beginning of a request) and then apply any new events to them, as they happen.

As you can probably guess, this isn’t the most efficient way to query a database, just to render a website. If you need the projections a lot of the time, it might be better to periodically project your events and then store the resulting structure in denormalized database tables.

That way, you can capture events (through things like API requests or form posts), and still query “normal” database tables, when displaying data in an application.

Projecting Specifically

So far, we’ve seen how we can describe, store, and project (to the latest state). Projecting to a specific point in time is just a matter of adjusting the projection functions, so that they apply events up to, or after, a certain timestamp.

We’ve covered way more than I originally planned to, so I’ll leave that last bit as an exercise for you. Think of how you could model the traditional (for CMS applications) model of draft/working and published versions of content.

Summary

If you’ve managed to get this far; well done! It’s been a long journey, but worth it I feel. Let us know what you like or don’t like about this design pattern. If you want to learn more about Event Sourcing (or DDD in general), definitely check out the books linked at the start.