Tuesday, February 19, 2013

In this blog post I will describe the steps I took in order to find out why the title above was the case, and how it could be fixed.

On Stackoverflow the general opinion was that the reported slowness was “by design” and could not be fixed, but looking at recent tests posted on Stackoverflow pointed to the fact that something was not done right.

The test code is a simple Code First DbContext model. For each run I start with a new blank database, and creates it before doing SaveChanges, so that part of the process can be timed individually. The 2 options on lines 17 and 18 are there to ensure that the for loop runs quickly, without these option the loop adding objects takes much longer (test for yourself).

In order to find out where time was spent during SaveChanges, I ran a Visual Studio Performance Analysis. It turned out that all the time was spent in sqlceqp40.dll, the SQL Server Compact 4.0 unmanaged query processor – so something was amiss.

As described in my earlier blogpost, the SQL statements generated in order to return the server generated id (the IDENTITY value), looked like the following:

SELECT [Id] FROM [Students] WHERE [Id] = @@IDENTITY

So using the SQL Server Compact Toolbox, I coulde analyze the 2 statements:

And got the following result:

So for every INSERT, a table scan was performed, as for some reason, the SQL Server Compact query processor could not figure out to do an Index Seek. And the more rows to be scanned, the worse the performance got. And all the time for the operation was spent doing this.

In order to avoid this, I decided that the goal of the statement executed should be to avoid table scans, but return a value with the exact same shape as the previous statement executed, that is; it should have the name of the IDENTITY column, and be of the correct type (only bigint and int are supported as IDENTITY types with SQL Server Compact).

The return value of @@IDENTITY is numeric, so simply using “SELECT @@IDENTITY AS [Id]” would not work. So the statement could be:

SELECT [Id] FROM [Students] WHERE [Id] = CAST(@@IDENTITY as int)

The type could then be either int or bigint and the column alias should of course be the correct column name.

I could then analyze the modified statement:

And no table scan, instead an index seek, consuming only 19% of the total query cost instead of 100%.

And so this is what I have implemented in my fix, that I now need to figure out how to “submit a pull request” for.

8 comments:

Andre: I think the team will take the fix seriously, but I just need to submit it, and then they will revei/polish and include in the master branch - after taht it will be available in the source and all future builds.

Anthony: Thanks, that was very useful information, I will do some testing with rowversion in my model. The only place this can be fixed as I see it is in the EF provider, as SQL Compact and LINQ to SQL are both eol.

Anthony: Thanks very much, your proposal is the "correct" fix. And LINQ to SQL suffers from the same issue, yes, but has much less overhead when it comes to database ceation and adding objects (both less than 500 ms)