I brought this to the attention of a coworker who claimed that not using a built in auto incrementing field is a perfectly reasonable approach to database design and in some circles a best practice. The coworker's rational is that it gives you more control over the values and allows you to manipulate the data more easily.

I could only counter by pointing out that it just seems silly to do implement your own version of a feature that is supported by the server. And it is extremely error prone should someone decide to use a different method to insert a record.

Are there really any good reasons not to use an Identity column when all you need is an artificial key on a table?

You don't need to "debate" this. You need to benchmark it. The SELECT MAX() is often be very, very slow and gets slower as the table gets bigger. There's little to discuss once you have measurements.
–
S.LottJul 8 '11 at 13:59

Your coworker has probably been characterized by Oracle's lack of auto-generated IDs, where you need a sequence and a trigger for almost every surrogate id. Assigning IDs like this is not a good idea for the reasons S.Lott stated.
–
FalconJul 8 '11 at 14:23

+1 for the race condition. This occurs far more frequently than might be expected, particularly in a web environment, and can be a nightmare to correct once it becomes an issue.
–
BeofettJul 8 '11 at 14:45

+1 for the race condition. I'd give it a day before it throws its first duplicate key error.
–
QwerkyJul 8 '11 at 14:52

In this case, since the value is (presumably) the PK of the record and therefore almost certainly indexed, the MAX(ID) is probably nearly a constant-time operation. I doubt if it's faster than using an autoincrementing field, but it should be largely invariant across table sizes. The race condition is a deal-breaker, though!
–
TMNJul 8 '11 at 15:19

@S.Lott: Yep, it would depend on how smart the optimizer is. The DBs I'm most familiar with (SQL Server and Oracle) have decent optimizers, but I'm guessing they've got man-centuries of development time in them.
–
TMNJul 8 '11 at 16:26

The burden of proof of neccessity of new code should be on the one who implements it.

When you say

I could only counter by pointing out
that it just seems silly to do
implement your own version of a
feature that is supported by the
server. And it is extremely error
prone should someone decide to use a
different method to insert a record.

i would ask, why only? In my opinion, he has to demonstrate that the assumend gain of 'control and ease of manipulation' outweighs your point of simplicity. What control does he need? What manipulations get easier?

If he does not need the control right now, it's a case of "You ain't gonna need it (yet)"

You can also run into data races with this kind of sequence number generation. It only takes a little code between the SELECT MAX(ID) and the INSERT to allow a second instance of a program to select the same "max" ID number and use it, ruining everything.

That kind of data race can also be very confusing, as people believe that it runs single-threaded. The belief overrides the reality.

If you need the identity column to be just that, a way for the 'database' to uniquely identify records, there is no reason to do this yourself and is a bad choice. However, there are several questions on stackoverflow where some sort of additional information is attached to these values.

One example was wanting to reset the identity if a record was deleted to prevent gaps in the sequence. That is why it is better to have a specific column to fill this need. If you want to resequence the values, you can do that with your own code and not mess up the identity. There are some db and/or developer limitations where they want to know what the identity will be before it is actually created. They want to have some sort of quote on their paperwork with this number and they may or may not actually enter the record. I don't understand the need for this business rule, but again you can have a specific column & code that accomodates it without interfering with an built-in identity field or primary key setup.

Maybe under these circumstances this is a common setup. There are those that feel a need to build intelligence into their keys. I've never had a need for that. Plenty of ways to control unique values and convoluted numbering systems.

Even if theres a business rule that you have to implement some kind of subsequent numbers (in germany, invoices have to be numbered this way) i would prefer two columns, one for the internal id, the other for the number. That way, the inevitable stupid requests ('we have to change the number', 'these two documents are an exception and need to have the same number') wont interfere with database integrity or algorithms.
–
kepplaJul 8 '11 at 14:42

This looks like a case of SELECT COUNT(1) FROM Foo is faster than SELECT COUNT(*) FROM Foo. Unless you have real numbers to back it up, this is just an urban legend, or a personal preference. SELECT MAX() is actually optimized, so there might actually be no performance difference between using that and a Oracle Sequence or something similar.