I haven’t seen this many fake ID’s since I was in college.
Back on April 26, Grant Fritchey (of execution plan fame) reported a strange bug in the newly released version of SQL Server, in which the values of certain identity columns are suddenly incremented by about 1,000. Since then, 36 other users have confirmed the problem and 15 have posted in the thread at Microsoft Connect titled Failover or Restart Results in Reseed of Identity. The circumstances under which the bug occurs are still a bit murky; at first, it was believed to occur only in situations where the new AlwaysOn availability feature was in use, but it has since become clear that it happens sometimes after server restarts. User Leo G. Miller speculated in that thread that it may be related to the implementation of the new Sequence feature in SQL Server 2012, but as of yet there is no confirmation of this from Microsoft’s end. There isn’t a fix yet either, although a Microsoft employee asked for more information about the circumstances of the error on Oct. 4 and added that the company is “trying to quickly assess the impact of this issue.” One user in that discussion reported that it was a potentially serious error, without indicating if data loss was involved. I can speak from experience, however, that this “identity crisis” can cause significant data integrity issues, to the point where a patch is urgently needed. Until one is released, the SQL Server community will have to make do with the three workarounds posted in that thread. While in the midst of writing the series of self-tutorials on SQL Server Data Mining (SSDM) I have been talking about for months, I had to take time out to deal with this error so often that I ended up developing my own duct-tape level solution. In lieu of my perpetually postponed SSDM series, I figured I’d post my observations and workaround here, in the hopes that they can be of use to others who might be suffering significant accuracy problems as a result – perhaps without even knowing it.
This can turn into a serious data integrity issue under the wrong circumstances, as I found out first-hand. Some of the single-user .Net applications I maintain suffered significant data loss when their datasets generated new identity values that were consecutive, as expected, which put them out of synch with the ones SQL Server had accidentally incremented by 1,000. When this occurs, the INSERT statements for the new records will succeed, but UPDATE statements to them will fail without warning if they are dependent on primary keys that use these faulty identity values. A user may create a new record in a front-end application and immediately begin changing values in it, but the changes won’t be recorded as expected, resulting in data loss that continues to accumulate until the .Net applications are restarted. For example, a user may create records with primary keys based on identity values that are reported as 55, 56, and 57 and so on by the front end app, but are assigned the erroneously incremented values of 1,055 through 1,057 and so on in SQL Server. Until the front-end app restarts or refreshes its data, any updates made to values 55 through 57 may fail without error because those primary keys don’t correspond to any record on the server. So in essence, the user may type away for days on end without having any of their changes actually recorded to the database. This probably will not be as much of an issue for OLTP systems, where developers expect the front-end apps to generate identity values that can conflict with each other and code accordingly to retrieve the actual values generated by the server. Single-user applications may not have these kinds of protections built in because concurrency issues with generated identity values are not expected. I have several such applications that worked well for years, until I upgraded the databases behind them to SQL Server 2012. Until this issue is addressed, upgrading to 2012 can be considered a breaking change for applications of that kind.
I can only contribute a few observations on the circumstances behind the error, which in my case may have occurred in between server restarts. I have not seen the bug crop up between restarts since it came to my attention, but I vaguely remember a few identical episodes of data loss happening in the midst of periods when these single-user .Net apps had been in use for many hours consecutively. Yet thanks to the stored procedure I developed to deal with the problem, I can say for certain that the only tables that I am seeing the auto-increment problem on are ones that updates or inserts prior to the last restart. Inactive tables or ones that are merely queried with SELECTs don’t seem to be affected, at least in the cases I’ve seen to date. I was able to determine this by running my procedure, dbo.ReturnIdentityInfoSP, which returns a wealth of identity-related information on every user table in your database. As depicted below, the tables and their identity columns are identified by the SchemaName, TableName, ColumnName and object_id columns returned by the procedure. Count represents the total number of rows in the respective table, while last_value represents the last_value column of the sys.identity_columns system information view. LastUsedIDValue is generated by running the ident_current function and MaxID is the maximum value for the identity column retrieved from directly from the table itself. CountDifference, LastIDDifference and MaxIDDifference equal last_value minus Count, LastUsedIDValue and MaxID respectively. If you see any of these three columns hovering around values of about 1,000 it may be a good idea to the run the CHECKIDENT code that is automatically created for that table, in the GeneratedDBCCCode column, after manually inspecting the data of course. In the illustration below, Table3 appears to have fallen victim to the Identity Crisis bug (click the link to expand this wide table to its full size):

You can of course add ORDER BY clauses at the end of the stored procedure to sort by CountDifference, LastIDDifference and MaxIDDifference to identify the tables with the biggest identity gaps. One interesting thing I have noticed each time I have run the procedure is that the last_value and LastUsedIDValue (i.e., ident_current) always report the new, false auto-increment value as having been used already, whereas MaxID shows that it has not. The last_value column of sys.identity_columns and the ident_current function are supposed to report only values that have actually been used by the column, so either they are consistently reporting the numbers wrong, or worse yet, the server is actually inserting and deleting records up to that point at a server restart. On some occasions, you may see the bug produce a last_value, LastUsedIDValue and MaxID that are roughly equivalent in conjunction with a CountDifference that is about 1,000 records off where you expect it to be. In these cases, the bug has probably false incremented the identity seed by about 1,000 in the past, but records were successfully added to the table afterwards at some point using these new fake ID’s. You may not experience further data loss in these cases, but there will be a permanent gap in your identity values of about a thousand records unless you take corrective action.

Before getting to the text of the procedure, some credits and notes are in order:

At first IDENT_CURRENT kept returning NULL, which can often be solved by granting permissions on the table being viewed to the user running the query; in this case, however, I initially failed to perform the necessary step of qualifying the table name with the schema name.

For the sake of legibility, I did not add any brackets to handle object names that may have spaces. I never include spaces (or underscores, for that matter) in my object names, so it’s not an issue for me, but as the saying goes, your mileage may vary. There are probably other undiscovered kinks to work out in the stored procedure that I have yet to encounter myself. I haven’t added any error checking either.

Take care not to run this procedure when your servers are under heavy loads. Even without cursors, the execution plans have more branches than any I’ve ever seen before (since they’re performing complicated queries on each user table in your database) so they’ll probably have some kind of performance impact.

If you run the generated DBCC commands to reseed the tables, you will have to insert the records with false identity values back into the table, then delete them. Otherwise, once you have added another thousand rows to the table, you may inadvertently end up inserting a duplicate primary key, causing the failure of your INSERT statements.

I am particularly proud of this solution, since it shines a bit more light on the problem. Most of all, however, itdoesn’t use cursors. I was diagnosed with a severe allergy to cursors six or seven years ago and am happy to report that I have never used a single one since; by hook or crook, I have been able to find a way around them in every bit of T-SQL code I’ve written since SQL Server 2005. I thought I would have to break down and actually type the word F*tch for the first time in years, but with a bit of perseverance I was able to lash together a table variable, WHILE loop and sp_executesql to accomplish what I needed. I essentially came up with a seemingly unique way to retrieve aggregates from all user tables in the same query, which isn’t possible with any other cursor-free approach I have ever seen posted on the Internet, like user functions, sp_msforeachtable and about a half-dozen others. Hopefully this will impress the first companies apply to, whenever I’m ready to start looking for my first DBA job. In the meantime, I can use all the constructive feedback I get from my ReturnIdentityInfoSP stored procedure (a.k.a, my Fake ID Detector):