Some Simple Code To Show The Difference Between Newid And Newsequentialid

Some Simple Code To Show The Difference Between Newid And Newsequentialid

In SQL Server 2000 we got the uniqueidentifier data type and the newid() function

Lots of people thought that newid() would be very handy to create some unique values across all databases.Newid() is nice but it has a little side effect; it causes terrible page splits because it is a random value. SQL Server 2005 introduced newsequentialid() where each value generated by the function is always greater than the previous value.

Wow the first table uses 21 pages while the second one uses 12.AvgerageFreeBytes is 3524 bytes per page in the first table and only 96 bytes per page in the second table. Obviously newsequentialid is the better choice.

Running the code below you will see that newid is still slower than newsequentialid by about 16% or so for batch 2

SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))

FROM TestGuid1

GROUPBY batchNumber

SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))

FROM TestGuid2

GROUPBY batchNumber

Now we will insert 10000 rows and then look at freespace and duration again

SETNOCOUNTON

INSERT TestGuid1 (SomeDate,batchNumber)VALUES(GETDATE(),3)

go 10000

SETNOCOUNTON

INSERT TestGuid2 (SomeDate,batchNumber)VALUES(GETDATE(),3)

go 10000

DBCC showcontig ('TestGuid1')WITH tableresults

DBCC showcontig ('TestGuid2')WITH tableresults

The first table uses 117 pages while the second one uses 80.AvgerageFreeBytes is 2574 bytes per page in the first table and only 21 bytes per page in the second table.

Running the code below you will see that newid is still slower than newsequentialid by about 40% or so for batch 3

SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))

FROM TestGuid1

GROUPBY batchNumber

SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))

FROM TestGuid2

GROUPBY batchNumber

Clean upDROPTABLE TestGuid2,TestGuid1

So that is all, do you use newid or newsequentialid and if you do use newid did you experience fragmentation or performance problems because of it?

Comments

Maybe I'm missing something, but whats the difference between an identity column, and a column that defaults to newsequentialid(), except for the fact that you can easily insert your own value if you wish?

"If I create a uniqueidentifier in my DB and you create a uniqueidentifier in your DB then these two values should never collide"

Are you sure? There is no guarantee that they will never collide, even in the same table. It may be a very, very small possibility, but it's still a possibility. The things are more or less random, after all.

I don't often see a good reason to use GUIDs instead of integers as surrogate keys but when I used to work with some C++ apps some of the developers liked GUIDs because they can be related back to object IDs. Another good reason would be certain forms of replication, e.g. Merge.

"So that is all, do you use newid or newsequentialid and if you do use newid did you experience fragmentation or performance problems because of it?"

I'll say. I learned a cautionary lesson about using COTs (commercial off-the-shelf) desktop databases in mission-critical healthcare apps. As preface; if memory serves, uniqueidentifier/newid() was introduced in SQL Server 7. Initially, it was generated using a call to a now-deprecated Win32 API which generated a monotonically increasing (sequential) guid based on the MAC address and time-stamp. The underlying platform api call changed in Windows XP and, as a result, entirely new behavior resulted. Imagine my rather nasty surprise, having characterized and tuned performance on Windows NT/SQL 7, to see this uber-fragmentation in our installed base of medical devices that where field upgraded to Windows XP/SQL 2000. I had been off the project and did not authorize the field upgrade. I was hastily yanked back and asked to find out what was going on.

As I understand the newsequentialid, it uses a MAC address on the system to get a shot at global uniqueness--looks like a return to the old method. (FWIW, last I knew, MAC's were not guaranteed unique, just supposed to be. In any case, dup's were not very likely.) I've used something like this to roll my own--just put the "unique" system ID in one field and an identity in another & put them together for a "world" unique key.

The same machine will never produce an identical GUID; ever. If Ethernet on your farm isn't messing around (MAC addresses are not colliding) and you have a Microsoft network: the machines on the network will also never produce an identical GUID.

I am not sure how the sequential IDs work; it might just be tweaking the bit position of the Time/TZ and the behavior of the PRNG; which would make these potentially collide with normal GUIDs.

June 9, 2010 2:35 AM

Leave a Comment

About Denis Gobo

I was born in Croatia in 1970, when I was one I moved to Amsterdam (and yes Ajax is THE team in Holland) and finally in 1993 I came to the US. I have lived in New York City for a bunch of years and currently live in Princeton, New Jersey with my wife and 3 kids. I work for Dow Jones as a Database architect in the indexes department, one drawback: since our data goes back all the way to May 1896 I cannot use smalldates ;-( I have been working with SQL server since version 6.5 and compared to all the other bloggers here I am a n00b. Some of you might know me from http://sqlservercode.blogspot.com/ or even from some of the newsgroups where I go by the name Denis the SQL Menace If you are a Tek-Tips user then you might know me by the name SQLDenis, I am one of the guys answering SQL Questions in the SQL Programming forum.