Richard Drizin

Old dogs can learn new tricks indeed

Product Extensibility - Domain Objects and CRUD

This is the second part in a series of posts looking at Product Extensibility in .NET Framework. On the first part of this series, I proposed how it’s a viable business idea to run a highly-customizable SaaS product where instead of developing a full-fledged PaaS for customers to do their own customization (because that wouldn’t pay off unless you have tens of thousands of customers) you could develop and maintain the customizations yourself, by following some architecture principles so that you don’t end up with a completely orphaned codebase for each customer (because that would be a costly maintenance hell).

On this second part I’ll introduce some architectural foundations and tools that allow us to extend our Domain Objects (and respective CRUD) both with new properties and new behavior, while keeping our core product upgradable. I’ll also reinforce a lot about the difference of real architecture problems that we need to solve and nice-to-have concepts that we don’t need to achieve because they would imply in major maintenance efforts.

Disclaimer

I love software architecture (and I have worked exclusively with this for some time), but I try to always be a pragmatic developer, and not an architecture astronaut, and in this sense I try to reject all designs that do not add obvious value to my code. During the following posts I’ll take many decisions that are based on my own experience, trying to be pragmatic, but I’ll try to the best of my knowledge to always give some key reasons for the decisions that I’ll take (with lots of links to opinions of others who are much better developers than me). I believe that developers frequently forget to focus on business value and get lost doing overly engineered systems with complex abstractions that turn a simple “Hello World” into a mission that’s only accomplishable to rockstars developers with plenty of spare time. In this sense, I’ll try to reduce the number of layers and abstractions to a minimum, only enough to reach my goals of this article. Afterall, as Alan Kay said, simple things should be simple, and complex things should be possible.

This initial article discusses a LOT of concepts, so I tried to organize the text in small paragraphs, and tried to introduce the concepts in the correct order.

Background

I’m doing consultancy work for a client who has a core product which is forked for a few thousands customers, and each fork is extended (customized) according to very different customer requirements. The extensions are developed in the open-box model, which Wikipedia describes as “the most flexible form of extensibility (…) [in which] changes are performed invasively in the original source code”. The problem with this model is that it has difficult maintenance (e.g. bugs, upgrades, etc.) unless you have a well-designed architecture. Most of my decisions and designs are based on this client and his requirements.

Requirement – Modular Extensions

We have this core-product, we have customer-specific extensions/customizations, and we also have some generic modules which can be installed on our clients as needed. This leads me to have a plugin architecture (modular architecture), where all my extensions should be automatically loaded (without requiring explicit calls to register or invoke each one).

Unless specified otherwise, each module is independent from any other module.

Assumption – Independent Codebases

Now that you’ve just read about “modular plugin architecture”, please forget what you know aboutplugin architectures. We’re not developing plugins for Excel or for Photoshop. We’re not developing ABAP scripts for SAP. We DO have the source code for “our Excel”, and we don’t need to be so strict about the “holy product”. We do want modular extensions, but we don’t need to have a product that is untouchable and in which customizations may get difficult, expensive or even impossible. We also don’t want to have the Guardians Team who will block you from touching the core product.
We want to isolate common code and customer-specific extensions to the best we can, but we don’t need to have those components totally decoupled. What we want is a well-structured codebase for our product, that can be forked (or branched) for each different customer, and may have whatever adjustment is requested by our customers, without completely modifying the base code so that it would be hard to merge a product upgrade. In other words, it’s totally acceptable to modify ANY part of the forked common code, as long as it still possible to later merge updates in the common code without having to manually rewrite/review all customer extensions. We want to stick to the open-box model (where we can make modifications to the original source), we just want to make it easier for us.

Rebuilding code should be easy and it’s not a problem. Maintaining individual codebases for each customer should be easy and it’s not a problem (I’ll cover this on future posts, but if you’re not using Git or some other DVCS, you’re forking in the hardest way). Actually, by making some allowances to your architecture will save you tons of problems that shouldn’t exist in the first place. Being pragmatic is the key.

If you still don’t get the idea, please refer back to the first part of this series, where I stress a lot that you’re not Microsoft.

Definition – CRUD

In this series of posts, I’ll use CRUD mostly to refer to SQL code that is used as part of your transactional business operations, in other words I’m mostly talking about INSERTs/UPDATEs/DELETEs used over your Domain Objects, but also about simple SELECTs used for automatically loading those entities.
On the other hand, when I’m talking about SELECTs that are used for reports and displayable content (grids/listings/etc.), I should refer to them as Reporting Queries. And as you know, we should use the right tool for the right job, so don’t expect that the same foundations that I use for CRUD will be used for Reporting Queries.

Additionally, I assume that you understand the risk of SQL Injection (Bobby Tables says hi!), and I assume that you know that parametrized queries (SqlParameters) are the best solution to avoid that (and also to improve performance since we get a cached execution plan). I suppose also that you know that it’s boring and error-prone to write SqlParameters by hand.

Last, at the risk of sounding Captain Obvious, but the only way to achieve a maintainable codebase is to keep your common code and your customization code isolated. You can’t obviously isolate common code from customizations if they both share the same line of code. At worst case you could keep custom code immediately after (or before) common code, but obviously not in the same line. That means that you can’t use hand-written CRUD because your extensions (custom columns for example) would be in the same code line of your common code, and no code comparison/merging tool could save you from a maintenance hell.

Assumption – Home-grown ORM

Since you won’t write your own CRUD, you should obviously use an ORM. And in the same sense that writing CRUD by hand is stealing from someone, writing your own ORM is probably not a good idea either, and Ayende has some very good arguments on why it’s harder than you thought.
As someone wisely said on Hacker News: “If you’re not using an ORM, then you ultimately end up writing one. And doing a far worse job than the people who focus on that for a living. It’s no different from people who “don’t need a web framework”, and then go on to re-implement half of a framework. Learning any framework at a professional level is a serious time investment, and many student beginners or quasi-professional cowboys don’t want to do that. So they act like their hand-rolled crap is a badge of honor.”
Ok. Enough about reinventing the square wheel.

Technology – Entity Framework 6 and Dapper

The most well-established ORM for .NET is Entity Framework (endorsed/developed by Microsoft itself), and the most well-established Micro-ORM for .NET is Dapper (developed by StackOverflow team). Entity Framework (abbreviated as EF) needs no introduction. It’s powerful, full-fledged, very well documented, and very consistent.

Dapper on the other hand is a lightweight library, has a very specific objective, but is an extremely useful tool. It’s mostly targeted at mapping from SQL queries to POCOs, and to mapping from CLR objects to SqlParameters in an easy way. [If by any reason you don’t use a full-fledged ORM like EF and still hand-write CRUD, Dapper can save you thousands of boilerplate lines of code without any tradeoff at all].
Both are very useful tools, and I like to use the right tool for the right job:

EF6 will automatically generate CRUD for my entities, has amazing support for relationships (eager loading, lazy loading, saving a graph of objects in the correct order, resolves concurrency problems, etc.). It’s strongly typed, which helps us being more productive (thanks to the best IDE and Intellisense), and also helps to catch errors during the build (I’m pragmatic enough to know that you won’t have 100% code coverage on your tests, especially because you shouldn’t be unit testing problems that have already been solved by someone else [or in this case by some other library]).
Dapper also has some good extensions for generating CRUD and for working with relationships, but it’s not as mature as EF.
For these reasons, I like to use EF6 for complex entity updates (with the benefits of type checking), lazy loading, etc.

Dapper makes it very straightforward to hand-write SQL queries, pass parameters from C# to the SqlCommand, and also has good support for multi-mapping (allowing me to manually write efficient queries), and being closer to “bare-metal” SQL. Additionally, Dapper saves us from using EF for complex queries and bulk operations where it’s known for having performance issues. Last, Dapper allows us to return dynamic types (while EF doesn’t) and allows WHERE IN.
For these reasons, I like use Dapper for Reporting Queries (where I may not need a DTO), batch operations, and very-simple operations where I don’t need to benefit from ORM strongly-typing.

I must confess that in the past I’ve made the mistake of using EF for things where EF was a bad decision, and where Dapper would be a better fit. But my repulsion for using ADO.NET blinded me to the point of using EF for complex queries where I was obviously hammering screws. (To my defense I must say that this was a large Silverlight project, which required RIA services, and EF was the obvious choice for that.)

Assumption – Database is the King

I must confess that I’m old-school. I’ve learned database design with a senior DBA / Data Admin whose favorite part of the week was printing out a full 20pg database diagram (4x5 pages), sticking that ER diagram on the wall, and explaining to the developers all recent changes in the data model. In other words, I’m still much more comfortable with database-first design, rather than code-first design.

This means that I usually tend to think first in terms of Tables, and paraphrasing Jon Smith my philosophy is that Database is King. I usually starting with my tables and do a bottom-up approach, reflecting my tables in the Business Layer, as Persistent Entities (any class that can be persisted to database). And although in my posts you’ll find some references to non-Persistent classes (like ViewModels or other DTOs), when I say “Entities” (or Domain Entities) I’m probably talking about Persistent Entities that are directly mapped to Database tables.

Technology – EF Reverse POCO Code First Generator

I believe that plain C# code is much more friendly than XML configuration (specially for version control reviews/compares), so when I want to automatically extract my data model (my entities) from the database, instead of using the regular “Database first” option (which would generate my model inside an EDMX XML file, ugly, hard to do any adjustments or version control), I prefer the “Code First from existing Database” option.

However, the default EF “Code First from existing Database” Wizard is not configurable, can’t be automated (I have hundreds of legacy codebases to upgrade to my architecture), and it’s not actively maintained. More than that, I’m not a big fan of Conventions over Configurations - I prefer to see exactly my raw configurations/mapping in code. If I wanted magic conventions (like automatic plurazing my table names) I probably wouldn’t be using C#. So instead of using Data Annotations, I really prefer using Fluent API, which is also much more powerful.

With all those requirements in mind (pure C# code instead of XML, reverse engineer model / database-first, and Fluent API), I found this great T4 templates EntityFramework-Reverse-POCO-Code-First-Generator which can be fully customizable to our needs, and to which I had the pleasure to make a few contributions

I like the principles of OOP, I think it makes my code clear, concise, and as I’ll show later I think inheritance is one of the best ways to provide a “common vs custom” architecture. Well, maybe the DTO vs POCO is a personal taste decision, but I’m glad to have a view similar to Ayende and Martin Fowler (who describes the Anemic Domain Model anti-pattern) on this.

Definition - Business Layer

My entities (as POCOs with state and behavior) belong to what I call Business Layer, like we used to do in the old and good three-tier model. Ok, call me old school again, I don’t care. You can call it Services Layer, or whatever you prefer, but I believe people confuse “Services” with SOA or Web Services (REST or whatever), so I prefer to use the old and good “Business” name to make it clear that all business rules belong there.

Similarly to DDD (where services are for the situation when you have an operation that doesn’t properly belong to any aggregate root), I also keep those Services which can’t fit into any specific entity in the same Layer. That’s why I see my POCOs and BLL as something between traditional layered-architecture and DDD.

Last, in this same layer I also add Repository Queries (different filters to load my entities according to some criteria) , both for EF (as extensions to IQueryable<T>) and for Dapper.

In summary: My Business Layer will be composed of POCOs (or Entities, which I call interchangeably, although they are slightly different) with state and behavior (methods). Each instance of those Entities is what I’ll call a Domain Object, or just “instance”.

Repository and Unit of Work Patterns

Design patterns are a pretty good way of communicating software design concepts, but they are overly misunderstood and misused. Part of this is probably due to new technologies that emerged and got mature in the past 20 years (since the GOFA book was first published), and the other part is due to people repeating over patterns without really understanding the reasons behind them.

When developers use Entity Framework, it’s common that they misuse two design patterns: the Repository and the Unit of Work.
The principle of the Repository is that your POCO should be persistent ignorant and that the Repository is an in-memory collection of objects responsible for hiding details of data access from the business layer. This is exactly what EF provides you as DbSet<T>, with methods for adding entities, removing, finding, etc.

The principle of the Unit of Work is that it will keep track of your objects, resolve the order of inserts, manage transactions and apply changes. This exactly what EF provides you as the DbContext.

In other words, you don’t need to implement Repository or Unit of Work because EF already does that for us. As Ayende explains, adding an abstraction over another abstraction doesn’t actually give you anything. He also explains that “Getting data from the database is a common operation, and should be treated as such. Adding additional layers of abstractions usually only make it hard”. So please stop adding abstractions over abstractions. Start with direct and ‘naive’ architecture and develop it over time. If you’re still inclined towards writing an abstraction over EF, read this nice and pragmatic opinion. Additionally, if you write abstractions to “protect” your developers, stop it immediately, and educate them instead.

More than that, the whole concept of isolating your DAL from your BLL (back from the three-tier model) is also outdated since Entity Framework (and nHibernate, and many other ORMs and Micro ORMs) are already persistence agnostic. YourDAL is Entity Framework and your BLL are your POCOs (or DTOs and Services/Managers if you prefer). I know this sounds obvious, but I have seen countless projects with completely empty BLLs and DALs (just proxying calls to the underlying EF or other ORM, or even worse calling stored procedures which were just another repetition of the DAL and of the BLL). People seriously misunderstand design patterns.

In summary: The Business Layer uses EF directly – there is no Data Access Layer because EF handles that for us.

Same POCOs for EF and Dapper

I don’t think it makes sense to have different POCOs for EF and Dapper (let alone having both a POCO and a DTO for every entity), so I use the same POCOs for both EF and Dapper, and will use the same entities for transferring data whenever it’s possible to avoid creating DTOs for existing entities.

The advantage of using the same POCO for both EF and Dapper is that my business methods can work with entities loaded by both. The drawback is that it is possible that a developer tries to navigate through relationships in POCOs that were loaded by Dapper, which would throw a NullReferenceException, since POCO relationships can only be lazy-loaded by EF proxies. Additionally, I think we may face some CLR type mismatches between what EF and Dapper expect, but I haven’t yet faced that problem and I don’t think it would happen for common types.

Requirement: POCOs Inheritance for Behavior

Since POCOs have behavior, standard application behavior will be in the POCOs. However, I want to rely on OOP for my extensions, so I want to be able to use INHERITED POCOS, where I can use OOP features to override default behavior with my own behavior. In other words, if my application has a class called Product with a method bool HasStock(int quantity), I want to be able to override that method with my own implementation (which may or may not use the base implementation), and class inheritance (and method overriding) is the most elegant way to do this.

Challenge: EF Entities Loading

When I create a new instance (Domain Object) for using with EF, I can choose my constructor. When I want to create a Product and add it to the DbSet<Product>, instead of using the regular Product class it’s easy to construct a derived class MyCustomProduct (with custom behavior – I’m not yet discussing custom data properties). That would work. However, when I’m loading the instances directly from the database I can’t explicitly choose the constructor for my entities (if I’m loading a specific entity) or for related entities (either with eager-load or lazy-load). In other words, if we call new DatabaseContext().Products.First() or if we lazy load order.Products.First() it will not load the derived class. So we must somehow tell Entity Framework that despite that our model is defined using Product class, it should always use MyCustomProduct instead.

To our rescue, we can use EF Inheritance as a hack for behavior inheritance.
When EF tries to load an entity which is defined as an abstract class, this is what happens:

If the entity has NO derived concrete classes, we would get this error:

The abstract type 'ExtensibleAdventureWorks.Business.Entities.Sales_Store' has no mapped descendants and so cannot be mapped. Either remove 'ExtensibleAdventureWorks.Business.Entities.Sales_Store' from the model or add one or more types deriving from 'ExtensibleAdventureWorks.Business.Entities.Sales_Store' to the model.

If the entity has MORE THAN ONE derived concrete class, we get an error telling that discriminator column is missing:

Invalid column name 'Discriminator'.

If the entity has ONLY ONE derived concrete class, EF will find the correct child class and will use it instead of the abstract class. e.g.:

By using this behavior, I can make all my entity classes abstract, create a single non-abstract child class for each one, and I can get BEHAVIOR overriding in my child classes. Parent abstract class can define the default behavior for my application, but when my entities are loaded they will always be child classes, and we’ll be able to use overridden methods.

My Class Hierarchy for Default vs Custom Behavior

Taking advantage of that EF behavior, I generate all my database tables as abstract classes and an empty child concrete class for each abstract class.

First I was planning to use the same name for both Base and Derived classes, keeping them in different namespaces. Unfortunately that doesn’t work because EF6 doesn’t allow two classes with the same name (although they obviously are in different namespaces) [this was fixed in EF core]. Because of that I just gave up on different namespaces, and kept both the base (abstract) classes and the derived (concrete) classes in the same namespace, different only by a leading underscore in the names of abstract classes.

Another idea was having 3 levels in my hierarchy: first level would be an abstract class having only Data Members (properties and relationships), second level would also be abstract but would add the default behavior, and finally the third level would be a concrete class with the custom behavior. That works, but I decided it was overkill. Remember, I don’t want to over engineer anything, but you can use 3 (or more) levels if you prefer, and the first level could somehow be used as plain DTOs. Just remember to wear your heavy astronaut boots so that you don’t float too far on the galaxies of useless-abstraction (myth busted).

To sum up, each POCO will have just a base abstract class and a derived custom concrete class, both in the same namespace, different only by a leading underscore in their names.

This two-level design implies a few important points:

When reading an entity we can refer to the base type, even though we will get the derived type:

Creating the classes directly from the DbSet<T> won’t work because T is an abstract class:

var store = db.Sales_Stores.Create()

I’m not sure if I can change the templates so that all context and relationships will use the concrete types instead of the base types, but it might work, although EF plumbing is complex.

In brief, you should always instantiate the concrete type even if you’re writing a method in an abstract class (either from the same entity or from other entity). This creates a dependency, since our base classes depend on the concrete classes. I don’t think is a problem since they are all in the same layer, but maybe with Dependency Injection we could decouple them. I don’t think it’s worth because we would lose the ability to call the regular class constructors, which I think is a good way of forcing mandatory parameters on object initialization. In other words, I’d rather have a well-designed and consistent API (and Domain Model) than having the base class decoupled from the concrete classes.

Most Business/Services won’t ever need to reference base classes, and can always use the concrete entities instead. I can’t think of a scenario where one would have to use the base classes, so probably always casting to the derived type is a nice idea.

Like in any class, if no constructor is defined for the concrete classes, the compiler will automatically create a public empty constructor. However, if any constructor is defined (like in my example of forcing mandatory members to be passed in constructor) we should create a parameterless constructor for EF. A private one is enough for the entity to be loaded, but it should be at least protected to allow lazy loading of relationships.

For achieving the abstract/concrete inheritance described above, we make the following changes to the T4 templates:

Both the abstract classes and the derived classes are all generated in partials in a single file. If the developer needs to create default application behavior he should create a new partial for the abstract entity, and if he needs to create custom behavior he should create a new partial for the derived class. Another option would be creating with T4 an individual file for each class (both for the abstract and for the concrete), but it would polute my sample with many empty classes, and additionally it makes it risky to rerun the T4 and overwrite your uncommited customizations.

Requirement: Extend existing Entities and Add new Entities

We want to be able to extend our Data Model with new entities (new tables), and extend existing entities with new properties (new columns). We don’t want to modify our core product, but only to add this extensions to the instance of a single customer. We want to be able to later upgrade the product, adding new entities/properties which have been added to our master codebase, without breaking our customer-specific extensions.

Dynamically Loading EF Model Extensions

On my first attempt, I had this objective of keeping all customizations isolated from the product code, including model extensions, which should be contained in an isolated module.

Since entity framework does not allow more than one EntityTypeConfiguration for the same entity, for extending existing entities (by adding new properties and relationships) I created a generic interface IModelExtension<T> which should be implemented by any class that needs to map. Then I created an extension method ConfigureExtensions(this EntityTypeConfiguration config) that loads all classes that implement IModelExtension<T>, and called this load method this.ConfigureExtensions() in the end of each configuration class in my T4 template. In other words, I created this plugin architecture so that I could keep customizations for entity T across multiple modules, each one mapping NEW PROPERTIES to entity T.

For extending the model with NEW entities I created an interface IModelExtension that should be implemented by any class that needs to map NEW ENTITIES to the model. Then I created an extension method LoadEntityFrameworkExtensions which loads all those extensions, and is called from the DbContext.OnModelCreating.

… and then I realized that I spent 4hrs for something stupid.

Why on earth would I need to keep customizations isolated from common code if this is all generated code? Go back to decoupling assumption: We don’t need to have product and extensions totally decoupled. This means that since our Data Model is automatically generated from the database, we don’t need to worry about keeping the common-part and extensions isolated in POCO columns definitions and POCO mappings.

Dynamically loading EF Model Extensions is totally possible(*), but it’s no useful for our goals. We can just apply the DDL scripts for each module and each customization, and just rely on the good and old code generation for updating our data model. Additionally, we also don’t need to isolate POCO properties and POCO mappings, etc. We (obviously) only need to isolate generated code from hand-written code (which can be either application standard behavior or custom behavior/extensions).

Move on… let’s look for real problems.

(*) If you’re interested in Dynamically Loading EF Model Extensions, please check my branch EFDynamicModel, and I can write about it in a future post. If for any reason you don’t want to use code generation that may be useful.

Model Extensions and the Inner-Platform Effect

The most popular customization in any application is probably adding new fields to your entities (in other words, customer wants to add new field to a form). If we have code generation and we use objects across our layers (instead of passing scalar values), that shouldn’t be hard – it should take only a few minutes, a rebuild, and our customer would have the new columns. (I will discuss UI in future posts, for now I’m only talking about POCOs/BLL). However, software vendors instead of just doing this themselves as a customization (adding the columns and deploying an update, which would allow us to use this new field in as a first-class citizen, in any business rule), they usually create a metadata-based feature to allow the user to create by themselves the new tables/columns/data. In other words, instead of doing something that should be simple (and easily isolated from the core product, so that it doesn’t block future upgrades), we delegate that to the user itself, as if this was empowering him, when actually we’re just leaving custom data properties disconnected from the rest of the application, because since they are not part of our POCOs they can’t be used on programmatic business rules. And more than that, it usually doesn’t pay off, because the efforts you have in creating and maintaining user-defined-fields are usually much higher than it would be if you just created them by yourself whenever customer needs something new.

And there you have Inner-Platform Effect, which happens when you design a system to be so customizable that it ends becoming a poor replica of the platform it was designed with. Sometimes doing this “customization” using the inner-platform becomes so complicated that only a programmer (or a consultant) is able to do it, instead of the end user who was supposed to use your inner-platform at first place. As someone said if your customer needs something totally flexible you should ship him a C# compiler.

On the top of my head I can remember a lot of “configuration tools” which were developed to be used by the end user and ended-up as being a half-baked tool that is at best a badly-designed subset of its underlying platform. Sometimes these tools take the form of a software, sometimes they are just a complex and unreadable XML/JSON file, sometimes they are a bunch of parameters stored in the database, sometimes they are a scripting language, etc.

As I explained in the beginning of this post, creating a full-fledged PaaS is expensive, and doesn’t pay off unless you have a very large user base, and creating an abstraction over an abstraction usually only make it hard. So my point is: instead of letting the users create the fields themselves, you (as the SaaS vendor) should be the one creating those fields, but you’ll be creating them on your development platform (.NET/SQL Server/etc), and not on some half-baked framework that you developed while pretending you’re Borland.

Summary

This second post (which is the first technical one) required me to introduce some concepts before going further into programming examples. With so many different paradigms I thought it was important to reinforce my technical goals and to explain some technical decisions.

I can’t stress enough that we’re focused on solving real problems. Given my goals, some unreal problems would be decoupling default behavior and custom behavior, isolating product generated code from customizations generated code, having a product codebase that can’t be modified, or a single codebase shared among all customers, etc. I don’t care about those issues, they don’t affect my goals. I also don’t care about having a Domain Model that has a direct one-to-one mapping to my database.

I also gave good reasons to explain some architectural decisions like using database-first, using both EF and Dapper, using POCOs instead of DTOs, putting all my Business Rules + Entities + Services into a single Business Layer (without DAL or extra Repository/UnitOfWork patterns).

Then, I described a class design that allows me to have default behavior and custom behavior using plain OOP. Last, I described how adding new properties and new entities shouldn’t be a hard problem when you rely on code generation, and how forked repositories (and the open-box model) are a good alternative for achieving unmatchable customization level to your customers.

Next Steps

On the next parts I’ll go into more technical examples and discuss new topics including:

Enums Extensibility, using Type-Safe Enums, to which we’ll develop custom mappings both for EF and for Dapper.