5 Aspects of Effective Persistence of Entity Framework Models

About 10 years ago, Eric Evans coined the term domain-driven design (DDD) to refer to a new approach to software design that he was pushing. DDD came out at a time in which SQL Server–based software design was the leading model. For the most part, architects were building systems starting from an optimized (relational) data model. Everything else—most notably, the business logic—was organized on top of the entities identified in the data model.

The best-selling argument for DDD was moving the primary project focus to the business domain and related logic rather than focusing on storage details. The DDD vision was kind of revolutionary, and a lot of developers still sneer at the idea of considering the database merely infrastructure.

As usual, the world split into two strongly opinionated camps: those who push a code-first vision of the model and those who push a database-first vision. Both camps have good reasons to take their stand, but people in both groups are dead wrong if they insist on blindly ignoring each other.

Because of its complexity, the software world requires that both domain logic and persistence are managed effectively through patterns and tools. DDD is one of these patterns, and Entity Framework is one of these tools. But can they coexist?

Entity Framework has never been clear about its vision of the domain and persistence. It always made it simple enough for developers to pursue whatever design approach they wanted and easy to mix domain layer with infrastructure. As a result, Entity Framework can be effectively bent to perform DDD-oriented design (via Code-First modeling), as well as classic SQL Server–based design (via Database-First modeling). In this article, I discuss five aspects of Entity Framework's persistence model: Code-First and Database-First modeling, complex types, enum types, array types, and Table-per-Hierarchy.

Code-First and Database-First Modeling

The biggest change brought by DDD is modeling the domain through objects. Architects gain a deep and profound knowledge of the business domain and model it through a web of related objects. The resulting domain honors all of the rules and functions in the domain and backs up all of the processes being implemented in the specific context. The focus is on how things work.

The resulting model must function just like things in the real world. Strictly speaking, you don't need a SQL Server database for this. You most likely need some kind of persistence layer for the model, although the form of the persistence layer isn't necessarily a SQL Server database. It can also be a NoSQL store, an event store, a combination of different stores, or a layer of remote and opaque web services—perhaps an in-memory database.

Entity Framework Code-First simply pushes a domain-driven vision of the project. You create your classes, and from the perspective of those classes database concerns just don't exist.

Entity Framework Database-First, in contrast, takes an opposite approach. You take an existing SQL Server database and infer a model of classes from there. Entity Framework tools can also do this via a wizard.

Which approach is better, and when is one method preferable over the other? As usual, it depends.

If you're designing a system on top of an existing database, and there's no chance you can alter to some degree the persistence layer, then Database-First is the most reasonable way to start. The key point of Entity Framework, though, is having you reason in terms of objects and their methods, not data rows and stored procedures. You can infer the model from an existing database—meaning that the persistence layer is invariant—but you should try to add business logic to your objects so that they look like real-world entities. The Database-First approach isn't just a way to use objects instead of data rows. If you end up working with classes with only strongly typed properties, your code will surely be easier to read than with plain ADO.NET, but you won't really get any serious benefit from changing your coding paradigm.

Code-First is a different matter. It's about modeling the domain with code, without considering actual persistence until a later time. You should opt for Code-First when you're able to take some liberty in altering the structure of the actual database. Choosing Code-First doesn't mean you ignore the database or that the database has to adapt to the model. You focus on the model first, and you ensure that it can be effectively persisted later.

Complex Types

One of the pillars of DDD is the use of types that better describe the entities in play. This means reducing the use of primitive types to the bare minimum and grouping multiple properties together to compose aggregated types. The canonical example is representing the address of a person or a customer. In a Database-First approach, you have four (or more) distinct columns such as address, zip code, city, and country. In a Code-First scenario, you might want to have an Address type that results from the union of the aforementioned properties. But should you really take this route?

If you choose Code-First, your purpose is modeling the domain as faithfully as possible. A Customer entity, therefore, has an address and the address is made up of street, zip, city, and so on. Furthermore, an address isn't simply a string or two. An address can be associated with specific small pieces of logic such as formatting the components or calculating latitude and longitude. Methods on the class are the ideal place to have these chunks of logic.

A complex type isn't an entity. An entity has identity, whereas a complex type in the DDD vision is a mere aggregation of standalone data. The complex type isn't even responsible for its own persistence; it gets saved and read via the service of its parent entity. If you don't much like the idea of having complex types in your model, the alternative isn't having an entity such as Address—instead, it's just having direct properties for the constituent parts of the address.

You should note that complex types aren't subject to lazy loading. Any time you load an instance of an entity, the properties of the complex type are loaded as if they were scalar properties.

Enum Types

Starting with Entity Framework 5, you can define your enum types and use them to define properties in the model. In a DDD perspective, enums are essential because they let you clearly specify which values are valid for a given property. When modeling, sometimes you find that a given property of a particular entity is going to take integers and subsequently you set it to be of type Int32. But very rarely, the values allowed are within the entire range of integers. More often than not, the property gets only a small range of integers. You can certainly manage filtering in a validation layer—but isn't a validation layer in this regard an unnecessary complexity? If you model the property correctly, you need no extra validation. This is the way to go, but this is a precise example of when the theory of DDD clashes with the practice of Entity Framework.

Entity Framework supports enums starting with version 5, but it requires that you compile the application against .NET 4.5. The situation improved a bit with Entity Framework 6 in the sense that as long as you use Code-First, you can easily use enum properties in the model, as follows:

The type here is a complex type with all properties rendered as plain .NET enums. When it gets saved to SQL Server, integers behind enums are saved. On the way back, integers are turned into enum members.

What if, for whatever reason, you can't afford .NET 4.5 or Code-First? In this case, you can employ a trick to use enums. This is the same trick you can use to handle arrays.

Array Types

Imagine that one of the classes in your model needs a property that's an array of some other type. For example, consider a class that represents a match of some sport (e.g., basketball) where only a limited number of faults per player are allowed. Here's an example:p

The property FaultsTeam1 is an array of type Fault where each element of the array matches a player on the team. You can't just expose a plain get/set property in the model. It would work in the model, but it can hardly be persisted by Entity Framework. The workaround consists of declaring an additional property—in the example, it's called InternalFaultsTeam1—of type String. This property is a plain property that Entity Framework knows how to handle.

public String InternalFaultsTeam1 { get; set; }

The getter/setter pair of methods in FaultsTeam1 knows how to read and write to the InternalFaultsTeam1 property every time the official API for faults is used. The Fault type can contain information about the player and details about the fault. For simplicity, the sample code assumes that all you know about a Fault is an integer that represents the number of faults for a player. Players are identified by index, so the first element in the list is the first player on the team, and so forth. Serialized, the list of faults consists of a comma-separated string. While mapping the model to the Entity Framework persistence engine, you map InternalFaultsTeam1 but tell Entity Framework to ignore FaultsTeam1. You can do the mapping via both the fluent API and data annotations, as shown in the following code:

In this way, a column named InternalFaultsTeam1 is created. It contains the serialized string built by the setter. When reading back, though, the serialized string is loaded into the array for the rest of the code to consume it as necessary.

This is the same trick you can employ in .NET 4.0 to use enum types with a version of Entity Framework that doesn't natively support enum types. You define an internal integer property for actual storage and expose an unmapped enum property whose getter/setter converts to/from the internal buffer.

Table-per-Hierarchy

The major benefit of Code-First is that you produce a hierarchy of related classes and, with the exception of arrays and in some cases enums, Entity Framework can save it to a database table. For adding common things such as keys, constraints, and relationships, you have a bunch of conventions and explicit rules to set either via attributes or a fluent API. You can find some videos and other core documentation about mapping classes to database structures at Microsoft's Entity Framework (EF) Documentation web page.

By default, Entity Framework tends to give each entity its own table. You mark a class as an entity using the fluent API or, more simply, declaring types in object sets within the context class.

If entities are bound together via inheritance, then a few strategies can be applied. The default strategy (Table-per-Hierarchy—TPH) entails that all classes in a hierarchy are mapped to the same table. Alternatively, you can have distinct tables—one per type. This strategy is known as Table-per-Type (TPT) and requires some fluent code to work:

Yet another approach is having one physical table per concrete (i.e., non-abstract) type in the hierarchy (Table-Per-Concrete Type Inheritance—TPC). This approach requires the following explicit fluent API code in which you import inherited properties from the base class and then proceed to creating a table for the derived type.

m.MapInheritedProperties();
m.ToTable("Rankings");
});

In a Code-First scenario, mapping code goes in the OnModelCreating overridable method defined on the custom context class.

TPH, TPT, and TPC are three mapping modes supported by Entity Framework specifically for situations in which inheritance between classes is used within the domain model. TPH is the default strategy, which is no coincidence. It's the strategy that poses the lowest costs on the query side because it reduces the number of JOINs.

You've Gotta Have Faith

The primary driver to DDD and a design no longer centered on databases is the increased complexity of business domains. If your model is essentially little more than CRUD, you might not want to embark on a DDD path if you don't feel confident. However, DDD is about faithful models—and even CRUD can be more faithfully modeled with DDD.