A blog about anything a software engineer might encounter during his programming efforts.

22 November 2011

Entity Framework: duplicate rows in resultset from a view

After including a view I have in my database in an Entity Framework data model, I noticed that for some strange reason, the result was showing duplicates. When a set should have for instance 10 different rows, it might contain only 4 different rows, some of which were duplicated so it was still a total of 10 rows.
The view itself showed the correct results, but when called from code using the Entity Framework, it once again showed the wrong results.
After some testing I found out that the rows that were duplicated, were rows that shared certain values with the rows that were replaced by it. After checking this in my Entity Model, I soon found at that the Entity Model has a strange way of handling rows with equal primary key values.

I'll try and explain my findings:
If multiple rows share the same value in the primary column(s), only the first of these seems to be retrieved from the database and copied into these other rows. For instance, if a set has 2 columns, with the first being marked as the primary column, and the correct result should be:

A, 1A, 2B, 4B, 6B, 7C, 5

The actual result in code with Entity Framework would become:

A, 1A, 1B, 4B, 4B, 4C, 5

When including a view in your Entity Model, the model seems to simply use the first not-nullable columns as primary key (as all columns used in the primary key should be non-nullable).To fix the problem, make sure that the primary key columns are chosen correctly. If you cannot create the correct primary key because of null-values or simply not having set of columns that differs for each row, try adding a column to your view that always contains a unique value for each row. After you add it, make sure it is set as the primary key in your Entity Model.

If I would set both rows to be the Primary Columns in the above example, I would get the desired result.The following solution was given by Lee Boozer (thank you, Lee):

That's one way of adding a column with a unique value for each row. I prefer adding a PK column of one of the tables in the view, since nothing has to be generated. But if that's not possible, your solution is a reliable option.

I think entity calls the primary key the "entity key", but it's basically the same thing.

Select datename(yyyy,SH.CreatedDateTime)AS Year,datename(mm,SH.CreatedDateTime) AS Month, SH.CreatedDateTime as Date,SH.TransactionName as Type,SH.SalesHeaderID as No,Customer.CustomerName as Customer,CustomerGroup.CustomerGroupName as Customer_Group, SH.Reference as Memo,Item.ItemName as Item ,SD.LineDescription as Item_Description, Item.ItemType as Item_Type,Item.UOM,ItemGroup.ItemGroupName as Item_Group, cast (SD.Quantity as int)as Quantity, cast(SD.Amount as money)as Amount,SD.Price, SD.Discount,SH.ExchangeRate as Exchange_Rate,Currency.CurrencyDescription as Currency, SD.ClassID as Class_ID,SD.SalesTaxID as SalesTax_ID, SalesTaxGroup.SalesTaxGroupName as Tax_Group, Employee.EmployeeName as Salesperson,ShippingMethod.ShippingMethodName as Shipping_Method, PaymentTerm.PaymentTermName as Payment_Term, PaymentMethod.PaymentMethodName as Payment_Method

from

SalesHeader SH left Outer join SalesDetail SD on SH.SalesHeaderID=SD.SalesHeaderID left outer join Item on SD.ItemID=Item.ItemID Left Outer join ItemGroup on Item.ItemGroupId =ItemGroup.ItemGroupID,Customer Left outer join CustomerGroup on Customer.CustomerGroupId=CustomerGroup.CustomerGroupID Left Outer join Employee on Customer.EmployeeID=Employee.EmployeeID Left outer join Currency on Customer.CurrencyID=Currency.CurrencyID Left Outer join SalesTaxGroup on Customer.SalesTaxGroupID=SalesTaxGroup.SalesTaxGroupID Left Outer join PaymentTerm on Customer.PaymentTermID=PaymentTerm.PaymentTermID Left Outer join ShippingMethod on Customer.ShippingMethodID=ShippingMethod.ShippingMethodID Left Outer join PaymentMethod on Customer.PaymentMethodID=PaymentMethod.PaymentMethodID where SH.CustomerID=Customer.CustomerID and(SH.TransactionName<>'SalesOrder' and Sh.TransactionName<>'Quote')

It depends, what does make a unique identifier for a row in this view? Depending on how all your tables are related, I think you need a combination of an ID for the sale and and ID for each line in the sale. So I think combining SalesHeaderID and ItemID would be a good unique identifier, but it all depends on your data model, really.

Just try to set these as a combined Entity Key and see if it solves your problem.

If you don't actually need unique records (for update purposes), by far the simplest solution is to add .AsNoTracking() to your Linq query. When you do this, EF returns the data as is, completely ignoring key definitions.