Humble Blog

Friday, June 3, 2011

Don't bikeshed

Let's not bikeshed during development. If you see some systems that uses GUID, it has a rationale why it is chosen. Don't outright dismiss/demean the design just because the system uses GUID as primary key. Try to grok why certain decisions are made as such.

I can hazard a guess that those Microsoft developers(Microsoft employees) doesn't bikeshed and didn't spent an inordinate amount of time on what field type to use for primary keys when they are designing Microsoft Dynamics ERP, Microsoft CRM, Microsoft Sharepoint. All those mentioned products uses GUID for primary keys. They have made a decision that the advantages far outweighs the disadvantages. And if Microsoft uses GUID for primary keys on an ERP-type of product is not confidence-inducing enough to those who are reluctant to use GUID as primary key, I don't know what could.

Another bikeshedding is debating whether LastModifiedDate is nullable or not. Trivial choices such as this won't go wrong either way. And if populating the non-nullable date column is a hassle on initialization(i.e. INSERT), just put GETDATE() as the default value on that column. Debating this decision rigorously have reminded me of the quote: "The minute you start to strategize too much, the more you start to think you're in control of your own fate. And you're not, really." Which reminded me of another technology, ah.. the list of things that can be blogged is endless :-) Do things in moderation.

And suggesting to use an integer type instead of GUID, this might lead you to fall into a dangerous trap that you will naturally assume things are slow just because the system uses particular data types (e.g. GUID), then you will be reluctant to know the real deep reason why things are slow.

And to prove that the choice of field type is not the major factor on causes of system bottleneck, here's an anecdote, I optimized a query that took 222 seconds; then optimizing it again, it became 71 milliseconds; then tried another approach but suprisingly is slower, 399 milliseconds; then tried another approach, the sweet spot, 37 milliseconds. I used varchar for GUID, Postgresql prior version 8.4 don't have a native(uses 16 bytes) GUID then, I used GUID to roll my own replication, I had an unenviable job to implement a system to make a selective replication based on business rules, and using GUID as primary keys for tables perfectly fits the bill. I digress. Here's the optimization I've done on a system that uses varchar'd GUID. http://stackoverflow.com/questions/3177038/why-the-most-natural-queryi-e-using-inner-join-instead-of-left-join-is-very