Other Links

Tag: EF Code First

Nhibernate supports lazy loading for single properties and this solves perfectly the situation where a table has on or more field with a lot of data and you absolutely need to avoid loading all that data each time you load the entity. Basically with NH you can define Lazy Properties and the content of that property is loaded from the database only if you access it.

Entity framework does not supports this feature, but it can solve this scenario with Table Splitting, as described by Julie Lerman. The problem is that the example by Julie is based on EF designer, but I want to use Code First. In this post I’ll describe you the steps I did to move from a standard Entity mapped on a single table to Two entity mapped to the same table with Table Splitting with a Code First approach.

My scenario is: I have a table with a column that contains really lot of text data that is used rarely. The entity mapped on that table is used basically on every action of most of the controllers in a asp.NET web application. The overall performance is degraded, because each time more than 1 MB of data is loaded into memory, only to read a bunch of other values of that table. The database should not be refactored and we should find a solution to load the big text data only in the rare situation where it is really needed.

This is the original EntityA properties: Id, Name, Surname, BigData properties, where BigData contains lots of text rarely needed.

The solution to the problem is Table Splitting or mapping two distinct entities on the same table. The “master” entity maps all standard fields of the table, the other one maps data that are rarely needed (BigData in this example). These are the final entities.

The key points are: they have both same Id property, EntityA has a reference to the EntityAPayloadMap to estabilish a 1:1 relation and finally all properties are virtual to enable Lazy Loading. It can sound weird, but the 1:1 relation is mandatory and also is clearer to comprehend. If you map two unrelated entities on the very same table, EF will gave you an error telling that you should establish a relation. This limitation is not a problem for me, because I do not like the “unrelated entities” approach, after all the Payload is part of EntityA, it has no sense without an EntityA, and I only want the loading to be done only when needed. Mapping those two is really simple.

You should have the same Id, this means that the two class should have an Id property mapping to the same column of the table and they should point to the same table. Then it is mandatory that EntityA has a 1:1 relationship with the payload, estabilished by the HasRequired().WithRequiredPrincipal. This is everything you need to do to enable Table Splitting, but you should be aware of a couple of issues that gave me a little headache.

You are not able to save an instance of EntityA if it has Payload property set to null, the relationship is mandatory and you cannot avoid populating it. This lead naturally to a simple solution: populate the Payload property on the constructor (or lazy initialize in the getter) so you will never have an EntityA without a Payload. The net result is that you cannot load the payload anymore. Suppose you initialize Payload in the constructor to make sure that each EntityA always have a Payload.

public EntityA()
{
_payload = new EntityAPayload();
}

If you try to load EntityA with an include to Payload, you will get: Multiplicity constraint violated. The role ‘EntityA_Payload_Target’ of the relationship ‘EfTableSplitting.Models.EntityA_Payload’ has multiplicity 1 or 0..1.

Figure 1: Error happens if you automatically initialize _payload in the constructor

Here is basically what happens inside EF: after issuing the query to the database, when he construct an instance of EntityA, it got the Payload property already populated with an Id of 0 and EF complains because he got different id from the database, so he is thinking that Entity2 refers two distinct PayloadEntity when the relation is 1:1 (they should share the very same id). This supposition is enforced because if you normally load an instance of EntityA, if you access Payload.BigData property it is always null. This happens because you are creating an empty instance of Payload in the constructor and EF does not populate Payload property with the Proxy object that will trigger lazy load.

This is usually a no-problem, because in DDD or more generally in OOP you should create Factory Methods to create entities in valid state. The best approach is make the constructor protected and create a factory method to create an instance in a valid state.

EF is able to use the protected constructor when it rebuild entities from database data, this will make everything working as usual. The usage of EntityA is simple, thanks to factory methods you should never worry of forgetting populating the Payload because it will happens automatically. Since all the properties are virtual, EF can enable Lazy load, and you are able to write the following code.

If you use EfProfiler or a standard Sql profiler, you can verify that this piece of code actually issues two distinct queries to the database. The first one is used to load all properties of EntityA, and only if you access the Payload object LazyLoad will load the BigData for you. This approach is more complex of Nhibernate Lazy Properties, but it gives you the very same result: Lazy loading one or more columns of a table.

This is the first query issued by EF

SELECT TOP (2) [Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent1].[Surname] AS [Surname]
FROM [dbo].[EntityA] AS [Extent1]
WHERE [Extent1].[Id] = 76 /* @p__linq__0 */

When you access the Payload property EF will issue another query to load properties mapped in the Payload.

I find this approach really appealing because it explicitly suggest to the user of the Entity that the Payload is something that is somewhat separate from the standard entity. While Nhibernate property lazy loading is much more simpler to use (just declare the property lazy), it completely hides from the user of the entity what is happening behind the scenes. EF approach create a structure that is much clearer IMHO. Thanks to this clearer structure, a user can ask EF to eager-fetch the Payload if he knows in advance that the code will use it.

You can in fact use the .Include as for any other navigation property, ignoring that the underling database structure actually stores everything in the same table. Here is generated query.

SELECT [Limit1].[Id] AS [Id],
[Limit1].[Name] AS [Name],
[Limit1].[Surname] AS [Surname],
[Extent2].[Id] AS [Id1],
[Extent2].[BigData] AS [BigData]
FROM (SELECT TOP (2) [Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent1].[Surname] AS [Surname]
FROM [dbo].[EntityA] AS [Extent1]
WHERE [Extent1].[Id] = 76 /* @p__linq__0 */) AS [Limit1]
LEFT OUTER JOIN [dbo].[EntityA] AS [Extent2]
ON [Limit1].[Id] = [Extent2].[Id]

This is where EF fails a little, because it generates the query as if the two entities would be mapped on different tables instead of issuing a standard query on the EntityA table. This is a minor issue, but I think that it should be addressed in future versions. The important aspect is that I’m able to load everything with a simple roundtrip to the database when needed.

My final conclusion is: Table Splitting in Entity Framework is a really good technique especially for legacy databases, where you cannot change the schema. Thanks to it, you can split columns of a single database table on two distinct entities, loading the less-used data only when needed. If you are not working with legacy database you should consider if a standard two tables approach is preferable, especially because you can also store the less-used table in other filegroup placed on slower disks. With the emerging trend of SSD, it is not uncommon to have server instances with 250 Gb of high Speed SSD and much more space on standard mechanical disks (much slower). Being able to decide where to place the data is a key factor in system where the speed is critical.

In the first three parts of this little tutorial I showed how easy is to save objects to database with EF 4.1 code first approach, now it is time to understand how to retrieve objects from the database.

Querying objects from database is really easy in EF, because you have full LINQ support; as an example suppose you want to retrieve all warriors with a name that contains a particular search string.

1:using (var db = new BattlefieldContext())

2: {

3: List<Warrior> searchResult = db.Warriors

4: .Where(w => w.Name.Contains("am"))

5: .ToList();

6:foreach (var warrior in searchResult)

7: {

8: Console.WriteLine("Found warrior:" + warrior.Name);

9: }

10: }

Ad you can see you are expressing the query against the object model, you are in fact asking for all warriors whose name Contains the string (â€œamâ€). If you analyze the LINQ query you can notice that I used the method System.String::Contains but Entity Framework Profiler shows the real query issued to the database.

This is the real power of LINQ: the original query expressed in terms of Object model gets analyzed from the EF LINQ provider and translated to the equivalent SQL syntax, another interesting feature is that the query returns objects, not a DataReader or some Database related structure. Now look at this other piece of code.

1:using (var db = new BattlefieldContext())

2: {

3: IQueryable<Warrior> searchResult = db.Warriors

4: .Where(w => w.Name.Contains("am"));

5:

6:foreach (var warrior in searchResult)

7: Console.WriteLine("Found warrior:" + warrior.Name);

8:

9:foreach (var warrior in searchResult)

10: Console.WriteLine("Found warrior:" + warrior.Id);

11: }

This code is really similar to the previous one, but with the only difference that Iâ€™m not using ToList() method in the query, instead the result is stored into a IQueryable<Warrior> object. Once the query is created I iterate two times the IQueryable object and the result is that the same query is issued two times to the database. To understand why you need to understand the difference from a Deferred and Non-Deferred operator in LINQ.

In the above example the Where operator, used to specify the criteria for object retrieval is a Deferred Operator, this means that the operator gets executed only when we iterate through its content. This is why each time you iterate the IQueryable<T> object another query gets issued to the database. To avoid this you can call ToList()non-deferred operator, that executes immediately the LINQ query, returning all objects inside a List<T> object; now objects are in memory and you can iterate the list how many time you want without issuing further queries to the database.

This is a really interesting concept that you can use for your advantage, like in the following example.

This piece of code creates a IQueryable<T> that filters all warrior whose name contains the â€œamâ€ string, then print the number of warriors that satisfy that criteria (line 6), and then iterate through all records to print details. This piece of code actually creates two queries to the database.

Figure 2: The two queries issued by the previous snippet of code

The interesting aspect is that the first query actually issue a select COUNT to the database, it does not load all objects. Suppose that you need to do server side pagination, loading three warriors for each iteration.

1:using (var db = new BattlefieldContext())

2: {

3: IQueryable<Warrior> searchResult = db.Warriors

4: .Where(w => w.Name.Length > 1)

5: .OrderBy(w => w.Name);

6:

7:int count = searchResult.Count();

8: Console.WriteLine("Found NÂ°:" + count + " warriors.");

9:

10: Console.WriteLine("Printing in page of three");

11: Int32 pageNumber = 0;

12: Int32 pageSize = 3;

13:while (pageNumber * pageSize < count)

14: {

15: Console.WriteLine("Printing page " + (pageNumber + 1));

16:foreach (var warrior in searchResult

17: .Skip(pageNumber * pageSize)

18: .Take(pageSize))

19: Console.WriteLine("Found warrior:" + warrior.Name);

20: pageNumber++;

21: }

22:

23: }

This time the query on line 3 has an order clause because you cannot use server side paging without ordering, then I issue a Count()query (line 7) to know all records that satisfy the condition, then I begin a cycle to load a page at a time, since I know the total number of records. Thanks to Skip and Take operators loading objects a page at a time is really super simple. Here is the output of the above snippet.

1: Found NÂ°:5 warriors.

2: Printing in page of three

3: Printing page 1

4: Found warrior:Alhana

5: Found warrior:Caramon

6: Found warrior:Dalamar

7: Printing page 2

8: Found warrior:Raistlin

9: Found warrior:Tanis

Being able to load object with server side pagination can tremendously increase the speed of programs; in this example I load all objects a page at a time, but in real situation you load the first page, show the result to the user and shows other pages only when the user ask explicitly to view data for another page.

The above piece of code issues three queries to the database, the first one is a simple count, the other two are more complex, because they should implement server-side pagination.

Figure 3: The query that loads the second page of the records

Actually you could create a more readable query if you hand-code server side pagination, but being able to do this with Skip() and Take(), makes possible even for a Sql Server novice developer to write efficient, but simple code.

Relations between objects are not limited to an object that keep a reference to another object (as seen in part 2), but I can have an object that contains a list of other objects. This is the classic RDBMS rough equivalent of the One-To-Many relation. I decided that my Warrior can carry a certain number of items, so I defined an Item class and a property on Warrior class to hold reference to a list of Items.

1:private IList<Item> _items;

2:public IList<Item> Items

3: {

4: get { return _items ?? (_items = new List<Item>()); }

5: set { _items = value; }

6: }

This is a standard implementation of a property that permits me to establish that a Warrior can have a list of Items. In the Getter part I use lazy initialization, so I can simply add an IList<Item> from external code, or I can simply let the object auto-initialize a standard list for me. This is the code that actually adds some item to a Warrior.

Again I want to point out that I did not make any other changes to the BattlefieldContext or to something related to persistence, I just defined the Item class and the Items property on the Warrior class, but Iâ€™m able to save everything to the database without writing any other line of code. Here is the query generated by EF.

Figure 1: Generated query to insert a warrior with two items.

This is the Database Schema that EF created to persist my objects.

Figure 2: The database schema to persist Warrior and Items

As you can see, EF created the Warrior_id column on the Items table to be able to keep the relation between items and warriors.

I want to strongly point out that this is nor DDD nor Domain Modeling, Iâ€™m simply using EF4.1 like a Super Dataset, to avoid writing CRUD. My primary reason for this little tutorial is moving people from HandWritten SQL code or from old style Dataset to something more flexible and more object oriented.

In previous post I showed how you can persist an object to the database thanks to EF41 and few lines of code, now it is time to understand how to manage relations. First of all create another class called weapon and then create a property of type Weapon into the Warrior class.

Figure 1: New model, the warrior has a property called ActiveWeapon of type Weapon

Now I can write the following code to save a warrior to the database with an associated weapon.

But if I ran previously the code of the first part of the sample, I got this exception because the database was already created.

The model backing the ‘BattlefieldContext’ context has changed since the database was created. Either manually delete/update the database, or call Database.SetInitializer with an IDatabaseInitializer instance. For example, the DropCreateDatabaseIfModelChanges strategy will automatically delete and recreate the database, and optionally seed it with new data.

This happens because the database already exists, and the schema is not compatible with this new version of the classes. So we need to add this line of code in the project to make EF update the db schema to reflect the change in object model.

1: Database.SetInitializer<BattlefieldContext>(

2:new DropCreateDatabaseIfModelChanges<BattlefieldContext>());

This single line of code tells EF to recreate the database if the model has changed and the schema of the db is outdated. This will actually Drop and recreate the database, so if you need to maintain the data you should manage the update with other tool as a DatabaseProject or some third party tool. If you intercept the queries done to the db you should see something like this.

Figure 2: The query issued to verify if the actual database structure is good for the current model

This query simply check if the database structure is up to date. Now back to the code to add the warrior with the weapon, as you can see I create a weapon and assigned to the ActiveWeapon property of the Warrior object. This is a really important concept, I do not care about ForeignKey or Id, I simply assign an object to another object and EF takes care for me of this. The query issued to insert the object into the database is the following one.

Figure 3: Inserting a Warrior with a Weapon into the database

As you can see we have two distinct queries, the first one is used to insert the Weapon into the database and retrieve the generated identity (the id of the object), immediately followed by an INSERT into the Weapon table, where the ActiveWeapon_id is set to the value of the id of the weapon generated in the previous query.

This is the advantage of using an ORM tool, you design object using standard object relations in mind, and the ORM has the duty of persisting this model into a database. As you can verify from Figure 4, the Warrior table has a column ActiveWeapon_id to store the foreign key to the Weapon table.

The cool part is that I did not write any single line of code related to the persistence of the Weapon class, I just added the class, added the relation to the Warrior object and all the rest is managed by EF41.

EF 4.1 is now in RC phase and as a NHibernate user Iâ€™m curious to check the fluent API to map entities to database. One of the feature that I and Andrea miss most is the possibility to map private properties with fluent interface. It seems strange to map private properties at once, but it can be useful in DDD. Suppose you have these simple classes.

Figure 1: A really simple domain

We have two things to notice, the first is that the Category class has a private property called PrivateDetails, and the other is that the Products collection is protected, and you can add products from the outside thanks to the AddProduct() method.

1:publicclass Category

2: {

3:public Int32 CategoryId { get; set; }

4:publicstring Name { get; set; }

5:private String PrivateDetails { get; set; }

6:

7:private ICollection<Product> _products;

8:protectedvirtual ICollection<Product> Products

9: {

10: get { return _products ?? (_products = new HashSet<Product>()); }

11: set { _products = value; }

12: }

13:

14:publicvoid AddProduct(Product p)

15: {

16: Products.Add(p);

17: }

18:

19:publicvoid SetDetails(String details)

20: {

21: PrivateDetails = details;

22: }

23: }

The idea behind this is that you should access only the AGGREGATE roots, not manipulating directly the collection of Products, this forces the user of the class to use specific methods. Now a problem arise, how we can map this class with EF 4.1 fluent interface? The problem is generated from the Fluent interface, that permits only to specify properties with Lambda.

Figure 2: The HasMany() method accepts an Expression

As you can see if I specify that CategoryId is mapped to an Identity database column with the instruction

Property(c => c.CategoryId)

this technique is known as static reflection and is really useful in such scenarios, but â€¦ now I could not use the HasMany() methods to map a protected property.

Figure 3: How can I map a protected property if I could not use in a lambda?

This problem derives only from the Fluent Interface, because EF is internally capable to map private members of objects, then we can use a little trick. I want to be able to write code like this

1:this.HasMany<Category, Product>("Products");

This would solve all our problems, because with this statement Iâ€™m asking EF to map a collectino called Products. Fortunately writing such an extension method is quite simple, it is just a bunch of Expressions

This code seems complex but it is rather simple. It creates a parameter expression of the same type of the object, then grab a reference to the property from its name with reflection and with the PropertyInfo creates an Expression.Property. This expression (created in line 13) is the equivalent of c => c.Products lambda and it can be passed to the Expression.Lambda to create an Expression<Func<T, ICollection<U>>> object, expected from the HasMany() method.

With the same technique I can write an extension method that maps a private string property.

Thanks to those two methods now Iâ€™m able to write this mapping for the category class.

1:publicclass CategoryMapping : EntityTypeConfiguration<Category>

2: {

3:public CategoryMapping()

4: {

5: Property(c => c.CategoryId).HasDatabaseGeneratedOption(

6: DatabaseGeneratedOption.Identity);

7:

8:this.PropertyStr("PrivateDetails").HasColumnName("Details");

9:

10:this.HasMany<Category, Product>("Products");

11: ToTable("Category");

12: }

13: }

As you can see Iâ€™m able to map the PrivateDetails property and I can choose column name, and the Products property with no problem. Now I can use my model

1: var food = new Category { Name = "Foods" };

2: food.SetDetails("Details");

3: db.Categories.Add(food);

4: Product p = new Product() {Name = "Beer"};

5: food.AddProduct(p);

6:int recordsAffected = db.SaveChanges();

As you can ses Iâ€™m able to add product without the need to directly access the collection, and I can set a private property through a method (not so useful technique, but just to show that mapping private properties works). Running the sample I got

Figure 4: The fact that three entities were saved confirmed me that the mapping of the protected collection works.