You might have noticed that I don’t like Guids all that much. Guids seems like a great solution when you need to generate an id for something. And then reality intervenes, and you have a non understandable system problem.

Leaving aside the size of the Guid, or the fact that it is not sequential, two pretty major issues with an identifier, the major problem is that it is pretty much opaque for the users.

This was recently thrown in my face again as part of a question in the RavenDB mailing list. Take a look at the following documents. Do you think that those two documents belong to the same category or not?

One of the problems that we discovered was that the user was searching for category 4bf58dd8d48988d1c5941735, and the document had category was 4bf58dd8d48988d14e941735. And it drove everyone crazy about how could it be that this wasn’t working.

Here are those Guids again:

4bf58dd8d48988d1c5941735

4bf58dd8d48988d14e941735

Do you see it? I’m going to be putting some visual space and show you the difference.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Here they are:

4bf58dd8d48988d1c5941735

4bf58dd8d48988d14e941735

And if that isn’t enough for you to despise Guids. Feel free to read them to someone else over the phone, or try to find them in a log file. Especially when you have to deal with several of those dastardly things.

I have a cloud machine dedicated to generating and disposing Guids, I hope that in a few thousands years, I can kill them all.

Comments

The problem is not Guid itself, but encoding and its presentation. You can choose another one like this http://stackoverflow.com/questions/2827627/what-is-the-most-efficient-way-to-encode-an-arbitrary-guid-into-readable-ascii
Beside this, there are other generators, like Snowflake from Twitter which takes half the bits needed for Guid.

I find it extremely likely that the reason for those GUIDs being so close is that you haven't used standard GUIDs at all, but instead some home-brewed scheme of sequential GUIDs or something like that... you know, to handle the second major issue: "the fact that it is not sequential". I normally search for the first four digits in a GUID and I seldom see collisions.

The thing I like best there is that you copied one of the GUIDs wrong. The screenshot has 4bf58dd8d48988d16d941735 but you've written 4bf58dd8d48988d14e941735. You'd be hard-pushed to find a better example of why they're hideous for any kind of identifier that is supposed to be readable.

Sure, similar GUIDs look similar, obviously... but how often do you need to manually compare them? or read them over the phone? When I need to send an ID to a colleague, I use instant messaging for that, not the phone... even for relatively small integer identifiers (7 digits).

How do you do replication without GUIDs? You need a way to handle ID conflicts, in that case.

How do you create an object and give it an identifier before checking with a central authority whether that identifier is in use? You might want to do this if you're generating a large collection of objects which all need IDs, e.g. in a UI where you're building something new - which you'll then send to a server for persistence. GUIDs solve this. Just generate as you go.

They are clunky, yes, but I think tooling support can fix this. I had a go a long time ago here: http://www.rikkus.info/guids-in-colour

To make this truely useful, it would need to ensure that the colours were very different even when the guids had only one different byte. If anyone would like to have a go at that, please do!

What I'd also like is better tooling support. Currently copying and pasting GUIDs is painful because they're represented as hex strings in most places I see them. They should be first class objects.

"Replication can work just fine without Guids. In RavenDB, we do it just like that" You don't specify what 'that' is... I'd be interested to know. SQL Server seems to demand GUIDs for replication, which is what forced us to use them initially.

HiLo looks like you tell a client which range it's allowed to generate in and it sticks to that. That's fair enough, but I think Guid.NewGuid() is less code ;)

GUIDs suck for performance. They do however provide protection from a certain class of developer brain-melts: They make it impossible to accidentally use IDs out of context.

Let's say you're like everyone else these days and are building some multi-user online product, with all customers' data in the one database. Let's say a bug is introduced where an Order is queried where OrderID = CustomerID (instead of CustomerID = CustomerID). Using int sequences for the primary key means that it's very likely there is an order with the same ID as a customer, so you've just shown a customer somebody else's order. If both OrderID and CustomerID were GUIDs, there would be no possibility of a collision.

You do not need GUIDs for replication in SQL Server, you can use HiLo to create identifiers, which is the best approach. You should never generate IDs in the database itself. That's the worst thing you can let SQL Server do.

If users are directly searching for GUIDs, that's a UX problem, not a problem with the data type.

NEWSEQUENTIALID() provides a decent solution for overcoming the sequential issue (without the constraints of monotonically-increasing numeric IDs). Another option is to use a COMB, which gives you a timestamp for "free."

128 bits just isn't that big a deal for modern computers to store or compare, so while I use GUIDs heavily as surrogate primary keys, I don't notice a performance issue compared to the old days when I used 32-bit integers.

Granted, using GUIDs by default is just silly. Using them for Category IDs seems like overkill, there can't be that many "categories."