Domain Driven Design

I can’t remember how I came across CQRS and Event Sourcing exactly but the when was somewhere around 2010. I spent a year or so really digging in to it and getting familiar with concepts before I eventually had the opportunity (or balls) to try it on a real project. At that time, I along with probably every man and his dog was still using ORMs like NHibernate. Though I was acutely aware of the pain and cost it incurred I didn’t really know how to do it any differently. Discovery of the concepts in CQRS and Event Sourcing was the beginning of a journey that little by little, bit by bit began to change my mindset to the point where even if I had to work on a project that for all intents and purposes had no real relationship to such ideas, I would build things differently in order to at least make these projects somewhat maintainable, and a bit less painful to work on.

On this very blog, about four years ago I described in a series of articles the idea of messages flowing through code that had (and still does) have a massive impact on the structure of your typical CRUD application. Even now, somewhat refined, I still use those techniques because they add value both to me and anyone else having to read my code. The result is far fewer layers, more vertical slices (or end to end feature folders) and the ability to delete code without having to worry about the impact on the rest of the application. This one ability alone has the greatest pay off because it means I can rapidly rewrite features and the rest of the code won’t know or care.

Fast-forward to 2017. For me, these ideas and the way they materialised were a good way to move from the old painful CRUD, ORM mapping world but for the large majority of developers stuck in their boring old Enterprise jobs, they’re still churning out heavy layered CRUD apps, cookie cutter style. Talk to most about CQRS and/or Event Sourcing and their eyes will glaze over. They’ll look at you like your speaking a foreign language. After seven years or so, I feel like I have a pretty good grasp on what it’s all about but for those that don’t but want a quick high level overview here’s what I think you should know:

1. Event sourcing is not a technical alternative to mapping DTOs with an ORM

When you first talk to a developer about event sourcing they usually think about it in technical terms. Saving events vs saving DTOs but this is completely missing the point. Event sourcing is about capturing user intent/actions that are IMPORTANT to the business even if they don’t know HOW that data is important just yet. When they figure it out you’ll be ready to help them dig into it by building projections off that historical event data so they can answer questions right here, right now. This leads me on to the next point.

2. Data structure vs behaviour

When your ORM or your database is at the centre of your world it’s hard to think about nothing but structure and how you’re going to map the data back into objects and we all know about the object relational impedance mismatch right? Remember those books by Bruce Eckel – Thinking in Java, Thinking in C++, etc? What you actually need to do is start “Thinking in Behaviours” (Come on Bruce, a great book title right there!). Events are the results of actions being performed in the domain you’re working in. By working with domain experts you can use language that both can understand because events such as CustomerPlacedOrder are plain English and let you get to the heart of what a system should DO not what it should store. Thinking in behaviours leads to breakthroughs in understanding.

3. The view of the world changes over time

How often, working on a large codebase are you required to add in some extra functionality because a requirement has arisen that nobody foresaw? Now it means you’re probably going to have to jump through hoops to make it all work, you’ll struggle trying to introduce tests for it, and you’ll probably have just made the codebase that little bit dirtier. You’re fighting a losing battle with software entropy. Systems that prioritise structure and state are brittle. They do not cope easily with changes over time. However, with CQRS/ES this is fairly trivial. Working with the domain expert we probably arrive at some new behaviour that results in a bunch of new events, probably from a newly discovered aggregate. Combining these with some existing events we can write a new denormaliser to project those events into new reports, new screens, allowing the read side of the system to evolve independently.

4. DDD does not apply to monoliths

Remember when you first heard the term agile (no, not the marketing crap that it’s become)? I’m talking about how the idea that the waterfall approach that everyone knows is bad if applied over say six months, makes a whole load of sense if applied over a week, possibly two at most? Basically, we squeeze the tried and trusted steps of analysis -> design -> implementation -> test -> deployment down to a much smaller time frame. The same is true of Domain Driven Design. Developers learning DDD will try to apply the ideas at the macro level meaning everywhere across the entire system and before they know it they’re well on their way to another big mess that someone else will have to pick up and work with. Domain Driven Design should be applied at a smaller scale, in a bounded context (one or more but not everywhere). The determining factor being a context where the business derives great value. DDD is costly and even more so if applied indiscriminately. The great thing about DDD though is that it really does fit like a glove with the idea of Aggregates and Event Sourcing. Both encourage thinking about behaviour, and language. This leads me to point 5.

5. Get your granularity right

I like to think of the pieces of a properly designed system at different levels of granularity ranging from the micro to the macro. So for instance, at the micro level we have:

* Aggregates.

Aggregates represent a transactional boundary and are responsible for enforcing invariants and not allowing themselves to be bypassed. This is why in a properly designed Aggregate you will be unlikely to find state exposed via properties. Instead only behaviours which in turn emit our events. What does it mean to enforce invariants in a transactional boundary? As an example, imagine our aggregate (representing a customer, say) has behaviours that are only allowed if the customer is over 18. Let’s say we need to correct the customer’s date of birth. This has implications for those behaviours. If when we correct the birth date it turns out the customer is no longer allowed to perform those operations we need that state within the aggregate (e.g. IsAdult) to be consistent with the date of birth. What this means from a technical point of view is that the two should never get out of sync. We never allow the IsAdult state to be eventually consistent, it must always be immediately consistent.

* Services

Zooming out a bit from the view of aggregates we arrive at services which in DDD parlance represent a bounded context. So here we can apply all the goodness of Domain Driven Design within this small, narrowly focused service if, of course, it is rich enough to be of business value. In this bounded context the idea of the ubiquitous language will apply. If you find the language of a particular term changing meaning depending on who you speak to then that’s probably a sign that your service is too large and you should look to see how it can split into two different bounded contexts. Without getting into the whole service vs micro-service argument the point is that these contexts should not be so large that they can’t easily be torn down and rewritten/replaced within a very small period of time. Some of these services/contexts will be ideal candidates for event sourcing. Others will be a bit more mundane and can be implemented in a much more trivial manner. Don’t sweat the details here. They’re small enough that you can replace them, refine or rewrite them. They don’t need to be perfect. They just need to be stable with a consistent api.

* Top level architecture

Zooming out again we end up looking at how these services/bounded contexts communicate. This will likely be via messages in what is known as an EDA or Event-Driven Architecture. At this level of granularity CQRS and/or ES do not apply. At this point all communication is async and you are totally in the realms of eventually consistent views of the world. When you get here you’re in a position to take advantage of even bigger views of the world i.e. the cloud, containers (ie. docker, etc), and the ability to scale up and down as required.

Summary

As I get older, working on large codebases becomes an exercise in frustration because my brain seems less able to understand the big (massive) picture of the whole system. I can’t keep it all in my head! Breaking a system down and having knowledge of business events as opposed to technical mappings is a much stronger glue for system understanding in my opinion. Whilst there are likely many more parts to the whole, each one is manageable and digestible at least from a coding point of view (orchestration is another matter). Division of labour among teams is much easier without falling over each other in source control systems too.

One idea I really like and think of as a benefit of Event Sourcing is the occasionally connected client such as apps on phones and tablets. Even using ES here could reap real rewards. Capturing events in an app allows the user to continue working even when they lose signal. At some point later on when they reconnect, those events can be transmitted and the system brought up to date.

CQRS and Event Sourcing are not new ideas. They weren’t new when I first came across them but still even now among the majority of developers these ideas are not well known. I understand it can look complex from the outside but like anything you look at for the first time you realise there’s a learning curve but you just need to get stuck in and start playing with the ideas and it’ll start to click.

The one thing in common though with all these ideas is that it requires you to THINK and that’s probably the hardest thing of all to learn.

I’ve kind of neglected my blog just recently as I’ve been so damn busy and tired to do anything else but I’m trying to put it right, starting with this post. The culprit for occupying so much of my time and thoughts is a new system we’ve been developing for our Procurement department who, believe it or not, here in the 21st Century, run it all off Excel. That needed to change, so we set off on a journey to understand the often complex world they inhabit, apply some DDD principles, and try to deliver a system that meets their needs. Despite some hiccups along the way we’re nearly there and we should be going live shortly. This post is a sort of look back at what we’ve done and how we did it.

The last six or seven months or so have been interesting to say the least. From the start I felt CQRS and Event Sourcing would be a good fit for this project (I’ve been learning and prototyping the concepts for the last couple of years) and I set about attempting to sell the idea to the rest of the team. I read a tweet by Eric Evans recently that said, it’s been five years since Greg young first started talking about CQRS and Event Sourcing. It’s been around quite a while now, but to most people, including those I needed to convince, it’s something alien and a little bit frightening. However, fairplay to them, it was adopted with relatively few concerns and after a while people started to see progress and became more comfortable with the ideas. We’ve had to change course a few times as our understanding developed, and one change in particular was quite drastic but the chosen architecture meant we had very little friction when doing so.

There’s something beautiful to me about the CQRS + ES style. Everywhere I look throughout the codebase, I get a very strong picture of how the application is held together. It’s very obvious what the responsibilities are of each class and namespace i.e. Commands, CommandHandlers, Events, EventHandlers, Projections, etc and it’s easy to picture the flow of the messages through the system. This is in stark contrast to the typical n-tier, multi-layered approach where to be honest I’ve never seen anything that hasn’t resembled a big ball of mud to some degree. And behind that elegant structure sits a database that stores events, an append-only log of all the actions that were performed in the system. This too has benefits. Apart from being an audit log that you can rely on as to what actually happened in your system, it reduces the cognitive load on your brain because you only need to know that your database contains events. You don’t need to think in terms of entities sitting in one table, joined to entities sat in another and the relationships between them and what the data actually means. You don’t need to worry about efficient fetching strategies in an ORM to get the data for a query (because you don’t query it!). For aggregates, you simply load up their events, and replay them to get back to current state, and you’re done. On the read side, for view models, it’s an entirely different choice. For us they’re stored as documents in RavenDB and again, simple to retrieve and bind to the screen.

Whilst most CQRS reading material tends to focus on asynchronous, highly scalable distributed systems where transactions are a no-no, you don’t have to do it that way. There is nothing prescriptive about CQRS. Trade-offs are everywhere depending on your needs. For us, going with transactions was one such trade-off. Our target audience for the Procurement application is fairly small, somewhere around 8 users or so to begin with. To take onboard the complexity that comes with every command being asynchronous was too high a price to pay in our situation. We felt it best to keep things simple and familiar as we started down our path, so we went with the familiarity of transactions and synchronous requests and, of course, the sky hasn’t fallen on our heads, it works just fine. So we lose the ability to scale (or suffer the DTC – Ugh!) but I don’t think we’re going to have that problem with this application. And of course, the first law of distributed computing is don’t distribute ;). I know too, that should the need ever arise to go async with our commands, it’s not going to be some massive refuctoring of the applcation.

So, contrary to the name of this blog, in terms of our view models we are immediately consistent, but for other parts of the system we do take advantage of eventual consistency. All our events are published on a Bus using MSMQ. Subscribing to those events are a reporting service, an email service, and an SLA service. The email service is as simple as you’d expect. All it does is sit and listen for particular events and then sends an email on our application’s behalf. The SLA service is about the users acting upon particular events within a given timeframe. We use Sagas to track the passage of time between events and again, send emails out as necessary. The report service is a little more interesting. It outputs the event data into denormalised SQL Server tables to serve traditional business reporting needs. Again this is another benefit of the CQRS approach. Instead of one RDBMS trying to be both an OLTP and OLAP system as you will often find, we are now able to use the right tool for the job at hand. This allowed us to shape the reporting database in an OLAP style star schema because it’s only responsibility is to serve reports. What really makes the Event Sourcing part of the project come alive is when you get to the point where you’re able to replay all your captured events back from the beginning of time and push them through the event handlers to see your reporting database rebuilt. Very cool.

We’re using a single database, RavenDB, for both our events and view models simply because we don’t need to scale our reads. RavenDb is a great choice, and when it comes to NoSQL databases, is one of only a few that supports transactions. Working with it has shown me just how much more productive a person can be when they don’t have to fight the object-relational impedance mismatch. The more I use it the more I discover just how useful it really is. For instance, whilst finishing up the last few remaining parts of the system, I began to think about how, going forward we would migrate our events as and when they change to support new features of the application. Truth be told this was one of those areas I had little experience in even though I was comfortable with what was required. My first thought was that I would version my events and make the aggregate handle the new event as well as the old one. They’d look something like this.

This would mean that during the loading of events into the aggregate the old handler would be invoked for the old events in the database but the application, going forward would only ever raise the new event. The problem with this approach is that the aggregate can get quite bloated if you end up versioning your events frequently as you have to keep all the handlers around for all the different versions. An alternative would be some kind of in-place upgrade on the fly as and when the system encounters the old events but with RavenDB there is yet another option, and that’s the Patching API. This allows you to do Set based changes to your Json documents as a one off operation. This will allow us to just go ahead and modify the class of our old event to its new form without having to introduce any new event types or fill our aggregates with new handlers. Originally, I had thought I would need to write a utility that would take my changes and call that API but now with RavenDB v2.0, this patching can be done within the Raven Studio UI itself. There is now a Patch tab that lets you write your patch to upgrade a document and even test it so you can see the result without actually applying the change. You can change existing properties, add new ones, delete old ones, etc. When you’re ready you can choose to apply the patch to all the applicable documents in one operation. Having only just discovered this functionality over the last couple of days, I am excited by how simple the process looks like it is now going to be and almost certain that that is the approach we’ll be taking. We will change our events as necessary, patch the existing documents to match the new event and then deploy. Simples.

In retrospect, whilst I’m happy overall, there are some decisions that we took that I’m not too enamoured with. One of which is that we chose to write this as an Asp.Net MVC application. As this app is mainly for internal use I think we should have made it a traditional desktop application whether WinForms or WPF. Just like the often repeated message within the community to get people to stop and consider their choice of database rather than blindly going with a RDBMS, I think the same consideration should be given to the application itself. Anyone who knows me, knows I’m not really a fan of writing web front ends. I personally don’t get a lot of enjoyment out of writing reams of HTML and Javascript to create a rich UI and all the fudges involved regarding different browsers, etc. I think desktop applications still have a place in the world and applications for internal use are one such scenario. Having said that, I don’t think it would be a big job to put a rich client UI on the application if we needed/wanted to but it’s unlikely that will ever happen now. Lesson definitely learned though.

The other decision is potentially more serious and is one of coupling to another bounded context. Basically at some point the user submits data to another application, an existing enterprise RDBMS that deals with orders for the company. In my opinion, this should have been done asynchronously through messaging. There is absolutely no need for our new application to have any knowledge of the other system whatsoever, and should the interface to that system ever change it will mean we will have to update and redeploy our new system too, but, alas, we’ve done it, and we’ll have to live with that decision, at least for the time being.

Finally just as in any application with a degree of domain complexity, discovering what the true aggregates are for this application was quite a lot of work, and took some time to get right. Understanding the difference between an aggregate and an object graph is essential in order to ensure your transactional boundaries are correct and for that I have to thank the work done by Vaughn Vernon in his Effective Aggregate Design essays, they’re well worth reading (and re-reading).

All the architecture patterns in the world can’t help you if you don’t capture the things important to the people who will be using your application. This is essentially what caused us to take such a drastic turn in the middle of the project. We basically had one of those “breakthrough” moments when all became a lot clearer. The result of that breakthrough was that we threw away a lot of the code we’d already written but the clear separation gained from going down the CQRS route meant we had relatively little trouble in adapting and changing course. It also helped that we have a large suite of unit and integration tests to help keep us on the straight and narrow.

Overall, the combination of CQRS, Event Sourcing, and RavenDB has made this probably the most enjoyable project I’ve ever worked on. To take an idea from start to finish using these architectural principles and modern database technologies has been a fantastic ride and I’ve learned so much. Combined with some of the messaging techniques I’ve picked up over the last two years or so, the way I write code has changed becoming more functional in nature and it’s allowed me to visualise ways of writing simpler, more composable, and testable code. Some of those ideas I was able to apply to this project to good effect and it’s something I plan to blog about more in the near future. Would I do it again? Absolutely, given a domain with enough complexity. Yes, it requires effort and you certainly have to think more compared to a traditional CRUD application, but the result I feel is worth it given that you end up with a more maintainable and flexible application.