In a typical scenario, an object that needs to be persisted in the database will be created in application code without an identifier, and later saved to a table in the database. Some database construct, such as an identity column, will generate a new identifier, typically an integer that's unique for all of the rows of that table. This identifier can then be returned to application code and then treated as a key.

This second way provides the following advantage: the GUID can be generated in application code, which can then be used as a key for application caching/referencing purposes before the object is ever saved. However, there is a cost associated with using GUIDs in size and, especially look-up costs. These may be trivial when the table is small, but can become a major factor when the table is massive.

Are there any best practices, or techniques for the cases where an identifier unique to the object type needs to be generated in application code, but cannot be a GUID for whatever reason?

One thought was to have a another table generating identifier wrapped by an identifier service which application code could call.

What about using a single sequence? Does your RDBMS support sequences?
–
FalconSep 21 '11 at 19:35

You'll have more chances to have answers if you ask this question on dba.stackexchange.com Also, don't forget to precise the SQL server you use. Different servers behave differently, especially when it comes to performance.
–
MainMaSep 21 '11 at 19:36

We are using SQL Server 2008R2, and there is a probable migration to SQL Azure in the future.
–
afeyginSep 21 '11 at 19:50

1

@MainMa - actually, as it's a design question it's also on topic here. However, if afeygin wants we can migrate to dba.
–
ChrisF♦Sep 21 '11 at 19:54

1

When you say you cant use a GUID, do you mean you cant use an MS style GUID, or that you cant use any form of UUID? Can you elaborate on why you think there's a meaningful lookup cost when using GUID's/UUID's as keys? Because I'm not sure I buy your premise.
–
GrandmasterBSep 21 '11 at 20:42

A primary key is implemented via a unique index. For best performance, you should choose the smallest possible column length and if possible use numeric value. This has a performance effect in sorting and finding data.

The datatype int is usually large enough for most applications, but it may not be in your case.

A disadvantages identity fields is that if you use strictly sequential numbers, the indexes don't get built in an efficient way, however, this is really a minor issue. The other silly thing, is that you can't display the key value before you have completed the insert. This may sometime add some complexity to your GUI design. Also, you need to use SCOPE_IDENTITY() to capture the last inserted id which is an extra bit of work for the developer.

GUID is almost universally unique, so they are more generic than identity columns. With GUID, the key is text and its larger than the int type.
It is difficult to test with and difficult to provide to end user as part of a business application. For example, when used for customer number, invoice number, etc. it becomes impractical and causes problems for the end user. Also GUIDs are not 100% unique. I don't know the exact probability of duplication. The only nice thing about them is that they can be generated at the client without the round-trip to the server.

I know of a technique called Guid.Comb (see NHibernate) which has the advantages of a GUID but has a few tweaks to avoid database problems. The problem with GUIDs is that database indexes get fragmented due to their random nature. This is a severe problem, especially as GUIDs are often used as primary keys. But be aware that not every GUID algorithm suffers from that problem, as the database vendors are already aware of it. Not all GUIDs are inherently bad to look up.

What you basically do is creating an identifier that eliminates some of the randomness to get an ordering into the GUIDs. Look at this article here to get an idea of how it can be done.

I can see no reason for not using GUIDs if you won't run into space problems. For large data volumes this can be a very, very serious problem with huge impacts.

When you really can't have a GUID as PK, you can use a GUID as a secondary key, as Steven A. Lowe already stated.

Either one is OK depending on your requirements. With an integer identity column there is value little maintenance that will need to be done with the column over time because the data on the index is sequential. You can insert 1000s and 1000s of row and the index on the column will hardly fragment.

With a GUID, the index will fragment over time and periodic maintenance (rebuilding the index, etc.) will have to be done to maintain the index on the key column. So, as long as you are aware of this, I beleive either are OK.

The integers take up less space but are easier to guess, guids take up more and are harder to guess. As you said GUIDs can be generated in code, although SQL Server can do an identity column or newID() column as well. I also believe there is something called a sequential guid which helps in the fragmentation department.

If you ever had to merge two databases together, guids would be better as row IDs would probably collide between the two databases.

Not sure how a sequential GUID will help. If I have to generate it on the DB-side, I might as well use an integer identifier. If I generate it on the application server, it's not guaranteed to be unique across multiple servers. If I create a GUID service, I might as well have the DB generate me an integer identifier. Merging is not a consideration at this point, although it maybe be in the future... still I'd stick with the YAGNI principle.
–
afeyginSep 21 '11 at 21:48

If you want to use a persistence layer like NHibernate, then using a Guid has one significant advantage over an identity column: Guid's can be generated on the client side as soon as the entity object is created in memory, without hitting the database to do an insert. Using an incrementing identity column, you have to do the insert to get your value.

This is a specific benefit of the general benefit of being able to generate Id's in a distributed manner that Guids allow. If you plan to have any kind of mobile devices, or distributed databases, or just the ability to batch-import rows, it can be very helpful to generate new Ids without hitting the database.