Entity Framework Feature Suggestions

TPT (Table-Per-Type) Inheritance Performance

Problem Description:
For TPT inheritance, the more subclasses you add, the time it takes to generate SQL, and the complexity of the SQL query itself, become unmanageable. With a simple base class (5 or 6 fields), and around 30 simple subclasses (2 or 3 fields a piece), it takes almost 2 minutes to generate the SQL (ObjectQuery.ToTraceString()) and execute it on the server. EF generates almost 8000 lines of SQL for a simple select query on the base class. This is because the SQL generated is a mess of crazy subselects, joins, and unions. Even with empty tables (so that a query would return no data), it takes that long to generate and execute the SQL.

Doing a simple "flat" query (no subselects), left joining each subclass table, could produce the same results, but with infinitely better performance. You wouldn't need any unions either. As it stands, even with just 3 or 4 subclasses, EF TPT inheritance can not be used in any kind of production environment.

I have written an article that describes this problem in more detail, where I have modeled and graphed the performance degradation as the number of subclasses go up.

I come from Oracle where dealing with 100 millions of records with hundreds columns (aka big cardinality) is a performance hell unleashed. Index stops working since every query is a full-table-scan query due to large data retrieval. Table-Per-Type (TPT) is a way to divide this huge dataset into smaller records with smaller cardinality.

I can see a "performance" reason of why TPT is slower in EF over Table-Per-Hierarchy (TPH). It is because EF is programmatically generating the SQL statements, which would include unnecessary joins. (Please correct me if I am wrong)

I am still learning so far, and Code First is a very convenient approach. However, this TPH recommendation over TPT makes me cautious to apply MVC in data intensive production environment. Ideally speaking, MVC should be neutral on whether it is TPT or TPH. Its should only be dependent upon the actual size of dataset.

Maybe as I am learning this MVC code first approach, I found a way to custom override the generated SQL, thus giving programmers more control over the SQL statement.

Huge bummer. We thought we just didnt know what we were doing with EF until we realized that everyone else has this problem too... Wish I had prototyped my project using 30 subclasses rather than just 3. We would have discovered this right off the bat rather than 6 months into the project.

I think part of the problem comes from the EF's tendency to always use polymorphic queries even when the application does not require it. For example when I write this to retrieve a single Employee object:

objectSet<Person>.OfType<Employee>().Where(o => o.Id == id).Single();

EF still builds the query so that it can return an Employee or anything derived from it. I would be great to have a way to tell EF that I'm only expecting an Employee object so don't bother trying to query for its subclasses. I think this would reduce the complexity of the generated SQL considerably.