Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

From various posts, I have summarized about using number type instead of char/varchar for PK:

If you choose char/varchar type for PK and your values are short (just some alpha numberic codes) then it doesn't matter what type will you use. As long as the size (in bytes) of string value does not exceed the size of NUMBER type then performance should be the same. However, for larger tables with a wider variation among the key, this can slower the performance, for example, using e-mail address/user name from a user table, for example. If the table consists of a few million users and some of those users have long names or e-mail addresses then index will definitely be larger than just some numbers, so both the insert and search will be slower when compared to numbers. Any time you need to join this table using that key it becomes much more expensive. Also, some implementations might use hash of the actual value for indexing purposes, but even so, and even if it's a relatively short hash (say, 10 byte), it's still more work to compare two 10 byte hashes than two 4 byte integers.

Actually, I'm expecting to hear any argues on that :) Today had an argue with my colleague. My point of view was, it's better to use number PK for performance, while his arguments were that it doesn't matter (especially for selects when using index).
–
CenturionSep 14 '12 at 16:03

1 Answer
1

One of the biggest performance hits is updating the string data as it changes and you have to update all the child records (which depending on the application could be in the millions!)

But in general yes an int is the fastest thing to join on although the difference may be negligible if the string is small. If the stringvalues are large, performance will be slower will be slower and indexes will be greatly expanded as the PK is in all other indexes and thus a slower PK could slow down index use for other things as well.

Making child tables wider due to using a natural key as the FK (especially if it is a long one or a compound key) can also cause performance issues if it makes the table becomes too wide when using natural key as the Fk instead of the surrogate int key.

There is an argument that there is a lot of time saved from not having to make joins for selecting, but the performance hit for updating is usually not considered by those people. Since most string values are subject to be being changed, you have to account for that in your planning. Further, the addition of the natural key will only help some select queries, you will often still need to make the join to get all the field data that you need. And if the FKs are properly indexed, you often don't save that much time anyway. Saving developer time writing queries is, of course, irrelevant when discussing database performance.

So of a string key is both small and unchanging and if you will often be querying where the only thing you want from the parent table is the string value, then yes a string PK is fine. If you don't have those conditions, I personally would use the surrogate key with a unique index on the natural key.

What sane person would design an app where you update a primary key? One of the main reasons for using integer PKs is so that any associated string/GUID/etc data can be changed without having to update the actual PK.
–
PhilSep 14 '12 at 21:24

1

@Phil, in my experience most application developers who design databases don't consider maintaining the data over time and are more concerned about saving themselves some steps than database performance. That's why data specialists get big bucks to come in and fix their messes. Be better to hire data specialists to do the design, but....
–
HLGEMSep 14 '12 at 21:57

I guess it depend who is designing a database (application developer or database developer) and the size of database. In small systems (CMS or apps) databases are mostly designed by application developers who are lacking database knowledge, so they save their time by using simpler solutions but those solutions might be quite messy. I agree with @Phil, it's very unprofessional to design an app where you update a primary key. So the argument "it depends, use what makes more sense and works with your schema" is not acceptable for me.
–
CenturionSep 16 '12 at 16:52