A Brief History in ENTITIES (... or what the Heck is this EF THing?) - part I

So, it seemed like it would be useful to talk a bit about why we built the EF and the EDM, what we think it is and where we think it is going.

A lot of docs exist on the EF and EDM. One can find many opinions on both all over the web. I wanted to take time to talk about them from my perspective as a person who has been on the team since its inception. This is merely my perspective and rationale, not intended to be spin. Hopefully it will clarify some things, hopefully it will promote more candid feedback and debate.

Starting at the beginning (kind of) -> the making of the EDM

Back in the days when we were working on the Whidbey Release (VS 2005), I was in a team called MBF (Microsoft Business Framework). We were building an application framework for ISV's focusing on writing LOB applications. At the same time there was the ObjectSpaces team and the WinFS teams working on technologies which all had one common overlap... MBF, ObjectSpaces and WinFS all had some form of object persistence.

There was an attempt to rationalize these technologies (it is probably worthwhile to add that none of these technologies ended up shipping). One of the artifacts of the rationalization effort, however, was this thing called the Entity Data Model. The Entity Data Model (previously referred to as the Common Data Model) was an attempt to align the WinFS Item Data Model, the MBF data model and asks from across a series of partners (internal and external). The end goal of the Common Data model was to unify data models that were emerging across the company. For example, just within the SQL Server division there were data models for Reporting Services (SMDL), Analysis Services (UDM), System Management (SMO) and WinFS (IDM). These data models all represented one common characteristic, they tended to be one level higher than the logical model that people were defining for the store. Consider DBA's who were building logical models and deploying them (resulting in a particular physical data model), these new data models tended to be somewhat higher level and represented a more domain specific transformation of the shapes defined in the "authoritative" logical model.

The desire was to be able to provide a common data model that could unify the core concepts of these different data models so that we could unify the sets of services and our investments in tooling so that they could all be based on a single, higher level model. Furthermore, there was an expectation that if one could define a common data model then other services like ETL (via SQL Server Integration Services), Sync (via our Sync Framework) could then operate against a common data model. In fact, you could start building out a horizontal platform for data programmability in terms of this common data model.

To be clear, one misconception is that we were trying to deliver on the old promise of canonical schemas, when we said one common model we meant a single set of concepts with which one could describe nouns in an application (say Customer). Specifically what we mean is a common representation or metamodel through which one can reason about the various concepts. For example, if an ecosystem of services (say Reporting Services, Sync, Analysis Services, Integration Services, Workflow) new how to reason about a common representation for an entity and a relationship then custom apps, packaged apps and database instances that exposed the metadata about their concepts in terms of this metamodel could play in terms of tooling, integration and the basic offerings of these services. The canonical example that we use is the idea of something like MS CRM or SharePoint exposing their metadata as EDM. If they could do so and if there was a mechanism for accessing these stores in terms of the EDM (say an ADO.NET provider like Entity Client) then one could get a reporting, synchronization and ETL experience over these solutions that would be consistent with what one could get over a SQL Server database.

So now you have the desire for a common data model -> shouldn't that be the CLR type system?

The most common question that was asked was why did there need to be a new data model why couldn't you just model everything in the CLR. The objections to using the CLR ended up along the following lines

Not all services discussed would be for the managed (.NET) platform

Most of the data models that were being aligned were relational models

Many of the teams did not need an object persistence solution... they needed the ability to expose metadata about their storage in a common representation

The concrete models that were defined were typically projections of a relational model and preserved relational semantics with a more domain specific shape

OK so you don't want to use the CLR as the core model, why did you have to introduce things like first class relationships

The existence of first class relationships was a hotly contested topic during the formative days of the EDM, many of the architects behind the EDM were rooted in the relational world. Simultaneously as we looked at most of the models that we were trying to align on, they all had notions of first class relationships that carried particular semantics and to which one could ascribe constraints. When one "binds to the CLR", for example by using the Entity Framework to retrieve objects from a persistent store the existence of relationships should not be a matter. Relationships can be surfaced in the CLR realization of an EDM model as navigation properties or collections on a class and thus one gets the experience of references and collection.

EDM Overview

From a developer’s perspective the Entity Data Model can be thought of as a way to define a model for a given application or system. For example, if one were building an application for a video library, the domain model may look something like:

In the above case, these items look much like the “nouns” in the system. Many application developers would build such a domain model using UML or, in Visual Studio, some may start with the Class Diagram tool. Other developers like to start by designing the database or leveraging an existing database and then building a data access layer that surfaces the data up to their application in whatever shape they desire.

Note that both models represent the same concepts, but in very different ways. The class diagram reflects an application developer’s view of the abstractions; the database model represents a model that is better suited to data persistence.

Even though the shape of the models are different, the basic concepts used to define the models are the same. There is some notion of a “Thing” (class, table) and a notion of a relationship between “things” (associations). The Entity Data Model builds on these concepts to allow a developer to define a domain model that can map to classes and tables and that can be rationalized with other models. The basic building blocks of the Entity Data Model are Entity Types (analogous to the “thing”) and Relationships (which relate Entity Types). In the next few sections we will work through the various concepts in the Entity Data Model. The Entity Data Model, in version 1.0 of the product, can be represented in an XML form or can be edited using the Entity Framework Tools. For the purpose of this discussion we will show both forms.

Entity Types

Entity Types represents the first class nouns in the data model. Entity Types have the following characteristics:

Members:

An Entity Type has two types of members, properties and navigation properties.

Properties

Properties are first class members of the type, if one considers the example of the Video Type from the video library:

Type: Video {ID, Title, Description, PublishDate, Actors*}

The ID, Title, Description, PublishDate are all properties of the type. These properties can be primitive types or complex types (inline types analogous to structs) but cannot be other Entity Types or Collections of primitive or complex types.

The Actors member on the Video type represents a collection of related Actor entities. In the Entity Framework, this set of related actor entities is defined by a relationship. The Navigation Property “Actors” allows someone to reason about the relationship from the perspective of the Enclosing type.

Much like Primary Keys in a database, Entity Types have a distinct identity which is represented by members of the type. The identity of an Entity Type is represented by an EntityKey which identifies the properties that makeup the Identity. In the case of the Video type the Identity is represented by the ID property.

<Key>
<PropertyRef Name="ID" />
</Key>

The entire specification for the video entity type can be represented as follows:

One of the primary concepts of the Entity Data Model is the notion of first class relationships. Whereas one can surface the relationship between two types as a NavigationProperty, the actual model concept is a relationship. In the first version of the Entity Data Model the only type of relationship that can be defined is an Association. Consider the relationship between the Video and Actor, one could express this as:

Association{Video[*]:Actor[*]}

Where this is an Association between two Entity Types, Video and Actor each of which can have a multiplicity of many, hence a many to many relationship between Actor and Video.

The EDM supports the following multiplicities on each side of the relationship:

Zero or One (0..1)

Exactly One (1)

Zero or More (*)

Complex Types

Complex types provide a named structural representation much like an Entity Type. The difference between a Complex Type and an Entity Type is principally that Complex Types do not have an explicit identity, cannot reference instances of EntityTypes and are only “reachable” via dereference from an EntityType.

Sets

Defining the core types is the first step. In order to take these concepts and actually build a model where one can reason about storage of instances we need to introduce the concepts of sets. Once one has a concept of sets it is often useful to define some construct that describes the closure of meaningful sets. Within the EDM these concepts are the Entity & Relationship Sets and the Entity Container.

Entity Sets, as the name implies, define the storage for instances of entity types. Entity Sets are nominal and typed, in other words an Entity Set has a name and declares the type of instances that can be contained within the set. Entity Sets are also polymorphic which means that a given Entity Set can store instances of its declared type and any derived types.

<EntitySet Name="Videos" EntityType="VideoLibModel.Video"/>

The above statement declares an Entity Set with the name Videos and of Type VideoLibModel.Video… if Video had any derived types these would be legal members of this EntitySet.

In the EDM there is no single Entity Set for instances of a type. If one desired, one could create multiple Entity Sets where one would store different instances. So, for example if one wanted to create an Entity Set for Cartoons and an Entity Set for Dramas one could do so:

As relationships are first class concepts in the EDM, one must declare a relationship set for a relationship. The relationship set is the declaration which associates instances of a given set with instances of another set… it is done purely in terms of the storage (sets) as opposed to the types:

The above is the definition of the relationship set corresponding to the relationship between Actors and Videos. Note this presupposes that there exists an Entity Set called People and an Entity Set called Videos.

The construct that demarks the closure around Entity Sets and Relationship Sets is the Entity Container. An Entity Container is merely a named “thing” through which one can reason about or dereference a group of Entity Sets and Relationship Sets.

On the use of XML

The examples are in XML because the V1.0 version of the Entity Framework uses an XML representation of the EDM as its basis for representing EDM. As we move forward with the EDM and the Entity Framework we expect that there will be different representations of the EDM. For example one should be able to describe an EDM model in the CLR by convention and extend/specialize it with configuration (attributes or external). We also expect that particular partners will maintain their models in metadata repositories.