There is an unsaved comment in progress. You will lose your changes if you continue. Are you sure you want to reopen the work item?

23

Closed

[Performance] Amount of associations in model affects materialization times

description

Analyzing bug
1781 it became apparent that a basic performance limitation of EF requires additional investigation. The time it takes EF to materialize data is a function of the amount of associations (foreign keys) the entire model has, even when such associations are
not part of the query in question.

As an example, using the AdventureWorks database generate an EDMX model, then have the following query:

This will materialize 31465 SalesOrderHeader entities. But the time it takes to do so depends on the size of the model we have. The test machine used has Windows 8.1 64-bit, .NET 4.5.1, high performance power setting, 12 GB RAM, Intel Xeon W3530 CPU at 2.8
GHz.

If the EDMX only has one entity type (SalesOrderHeader) materialization takes 840 milliseconds (median value, 10 runs), but when the EDMX contains a richer model, for example one with 67 entity types and 92 associations, the same test takes 7246 milliseconds
to complete (median value, 10 runs).

The additional CPU is spent on code that’s affected by the amount of foreign keys in the model. The biggest difference seems to come from EntityEntry.TakeSnapshot() as it’s massively affected. It went from 173 samples to 29547 samples due to the cost of the
call to TakeSnapshotOfForeignKeys(), according to Visual Studio 2013 sampling profiles.

The second biggest difference comes from ObjectStateManager.FixupReferencesByForeignKeys() which went from 5 samples to 20172 samples.

1: Entity Framework v6.0.0.0 (v6.1.21218.0) : 4.524,00ms.

2: Entity Framework v6.0.0.0 (v6.1.21218.0) : 4.577,50ms.

It is known that this issue is not addressed in 6.1 alpha 1. We are currently working on improving this aspect of performance, and yes, the plan is for those improvements we are working on to make it into the EF 6.1 release.

Commit a640bcb005a06faf2894e9bd932eaf0e5318f4d5 improves perf by roughly 30-35% in the scenario described above. On my machine the 31465 SalesOrderHeader entities are materialized in about 2300ms (Mini-6) and 530ms (Mini-6-Single). A higher number in the
Mini-6 vs the Mini-6-Single sample is expected because the entity type has 11 additional navigation properties.

In my more modest machine the improvement seems a bit higher than reported by Emil.

I compared EF from early December 2013 versus EF from late January 2014 using the attached test. The difference in my development machine was 69% improvement in the case of the model with multiple associations. Here are the numbers:

EF early December 2013
Model with no associations: 839ms
Model with multiple associations: 5874.8ms

EF late January 2014
Model with no associations: 823.2ms
Model with multiple associations: 3479.7ms

There's not much of a difference in the model with no associations, as expected. But the difference in the model with multiple associations is very satisfying.

Is this fix available in the source repo here on codeplex already? Because if I build the source to release build (the latest master branch 35e16b80e6e5c0dd09d06ff80b46ecdf81434655 ) and run the benchmarks (https://github.com/FransBouma/RawDataAccessBencher)
I see barely any improvement (3034ms for 6.0.2 and 2995ms for 6.1.0). Fetches done over network, not locally.

We made a check-in today that increases the performance of scenarios similar to Frans' current benchmark (in our own in-house testing between 15% and 20%). Unfortunately this won't make it in time for 6.1 beta, but it is checked-in for the next milestone.
We also came to the conclusion that we cannot make this part of materialization much faster without a more profound refactoring, something that would be too risky to do at this stage of the release.

On second thought: David Obando mentiond a model without associations, but that's something else than not FK fields. The model I used does have associations, the one without FK fields is faster in that case than the one with FK fields.

If I am reading the nightly build history correctly the change to improve the case without FK fields is not included in v6.1.30207.0. The first one that has it is v6.1.30208.0 (current nightly build is v6.1.30211.0). The improvement should be a bit more
than 15% (not 5% as resulted from the numbers above), unless I did something wrong while measuring.

Using database/model first, this takes approx. +1.600ms to execute
Using codefirst, this takes approx. +3.100ms to execute
Using plain old sql (SqlCommand), this takes approx. 70ms to execute

The latter just to state that there's nothing wrong with the I/O times on the system. The measured values are a medion after about 5 runs, in debug mode on Visual Studio 2013 Update 2 and initialized through a unit test.

We know cold start is not fast. Model initialization and validation require time to complete before the actual queries are executed. This process, however, happens only once per app domain. Once the model is "warm" then queries are leaner.

Executing a unit test in debug mode is not ideal if you're trying to measure the performance of a system. We recommend writing performance tests that resemble more what your production environment will look like, use release bits, tune the power settings of
your machine to "high performance". In order to get faster performance also ngen EntityFramework.dll and run in a controlled environment where the performance test is not competing for critical resources. We also recommend measuring different dimensions
such as the cost of cold versus warm queries and the cost of model building.

We noticed that in your case there's a large cost associated to using Code First instead of an EDMX. This additional cost comes from the model building process by which EF takes your code and builds the corresponding model in memory (the equivalent of an EDMX),
and then continues validating and using the model in the same way as models with an EDMX file. Depending on the size of your Code First mode, the process of model building might be lengthy. The only way to reduce this cost is by having an EDMX file that EF
can use to load the model instead of inferring it.

We are assuming your model is mid to large sized by the time of your code fist path. Can you confirm this is the case? Also, can share with us a profile of the execution of this test in order to understand where it is that EF is spending the time working?

Hi,
I used to use EF4, I never used EF5 and began to migrate from VS2010 to VS2013 in order to use more modern technologies.

I made 2 little Code First models :

one with 3 base abstract classes inherited by 12 (4x3) table-classes with no FK

one with 5 table-classes with only 4 FKs

Whatever the model it always use about 8 seconds to start from cold query. I won about 1 second with pre-generated views and another second with NGEN.
My computer is only about 4 years old and my customer are able to use older computers.

Fortunately, I use them with a little database used in a WinForms application and I can keep a reference during the whole application lifetime without having memory issues.

Unfortunately, this application needs to apply changes to about 150 differents databases using the same schema (model 1).
Disposing and re-creating a DbContext foreach database is very very costly.
I'll try to avoid using() { } statements, but I'm afraid that I'll need to rewrite everything with the good old SqlConnection, SqlCommand objects.

Maybe you may allow us to create EDMX file at runtime, save it as a file and reuse for later DbContext use. And adding some kind of EDMX updater with Code First migrations.

As a conclusion I'm disappointed by EF 6.1.1 performances but it's usage is simpler and easier genericable than EF4.0 and its ObjectContect.
I prefer to use MS products but this performance issue is very huge for such little models and I will certainly give a try to NHibernate.

Could you create a new workitem and include a repro? What you are describing seems completely unrelated to #1829, plus we would like to understand what is making your models so slow and also if you are following the necessary steps to cache the model correctly
for each separate database schema.

I second Diego: we'd like to hear more from you. If you could create a bug describing the models you are using, or better yet, attaching a repro of the bug for our analysis, that would be very helpful.