The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Alternatives to auto_increment

[Warning: I'm long-winded. Actual question is in bold, a tad below.]

Thanks to past rants found in a miriad of threads at Sitepointforums, I came to develop a new way of looking at 'auto_increment' column attributes to generate unique keys. I'd now like to try generating my own primary keys.

Aside from the many reasons found in threads like this one, I need to make sure that the application I currently develop will be portable and compatible with a handful of RDBMS. 'auto_increment' is a MySQL implementation, and while it's quite a good feature, I'm a control freak. I want the database system to obey me. I therefore want unique ID that mean something to me. I also want to make sure that the DB and my app will work under the pressure of a insert-intensive environment. While I'm not claiming that my site will get millions of visitors a day and each of them writing 100 messages an hour, I prefer coding for strenght and with an open mind, since when my app matures I see a good future for it (and me... and my Swiss banker... )

Having looked around a bit, I have experimented using a monotonic key generation by basing myself on the way PHPLIB does with it's 'nextid' function. For those unfamiliar with this method, a common table named 'db_sequence' is created with 2 columns (seq_name, nextid). For each table that an autonumber is needed, a row is added to 'db_sequence' with the table name (seq_name) and a blank start-up id (nextid). Now, every time a record is added to a table that requires an autonumber, we first fetch the nextid for that seq_name in the db_sequence table, then update db_sequence with nextid+1, and finally use the nextid for whatever row insertion we wish to do elsewhere. Nice, sweet, elegant and intuitive... but requires table locking.

But to tell the truth, I've been seduced by this Sybase paper on Surrogate Primary Key Generation, mostly aroused (...intellectually) by the Recycled Series Approach. In short, and from what I understand, the Recycled Series Approach allows for better concurrent use of the db_sequence table when under stress and would keep the keys small. As the doc says :

In the recycled series approach, each concurrent process has its own conceptual next-key table.
For example, three concurrent processes can acquire next key values as follows: One process can work on the sequential series from 1 to 1,000,000, the second can work on the series from 1,000,001 to 2,000,000, and the third from 2,000,001 to 3,000,000.

If a process dies, another picks up the slack instead of opening a new series. And no need to do table-locking since each process is responsible for it's own turf and will not simultaneously do two jobs at once (no collision).

...and then there's the Composite Key Approach, which basically means to tag the server's user_id to the precise date and to some other random number. The PHP function uniqid does exactly that I believe. This way we can do away with the extra queries associated with using and maintaining a sequence table. The downside seems to be larger keys, and a slight possibility of creating identical keys when on a multi-processor machine... or when someone readjusts the computer clock.

Now I'm perplexed as I am not sure which way to go.

Question:I'm looking for a rock-solid way to generate unique ids to use in tables that require primary keys. Any thoughts?
Requirements for proper key generation would be the following:

Must withstand and perform well under low-to-very-high levels of stress (in load balanced environment or whatever);

Should always create unique keys;

Keep overhead and impact low;

Set and forget, because once in a production environment it would be nightmarish to replace all generated ids to a new system.

Has anyone experimented with such systems and would like to comment on their experience? Does anyone know how do banks and telcos approach this problem? Should I simply give up and keep on usong auto_increment?

Last edited by Ramses; Aug 24, 2001 at 13:03.

SciFlicks.com -- Science Fiction Cinema
No, I'm not as old as Professor Farnsworth.

Having worked on Bank software in the mid-90's, I can say they use sequential number generators (a.k.a Auto-Number Fields). However Account numbers are made up from a couple of different fields depending on the bank, hence the dashes. These other fields include regions, branch numbers, check numbers etc.

You can search on the Internet for algorithms that create unique numbers however unless you have some other fanatical reason not to, I would use the AutoNumber version of MySQL.

The Auto_Increment is MySQL's terminology. Every database allows an easy way to create unique keys (and since they have referential integrity can actually enforce it). In Sybase and MS-SQL you have Indentity fields. In Oracle you use Sequences (these are powerful in that you can create the forum by which it increments by). Even Access allows you to create AutoNumber fields for this purpose.

As long as your code is written properly you shouldn't have a problem.

I'd go along with Wayne's advice.. The only time you'd ever see problems with an auto_increment (or the like) would be on very insert-heavy applications (e.g. 30+ inserts per second). I doubt MySQL would even be used in this sort of application (table lock, etc.). If you're really interested then try your hand at some of the Sybase paper's suggestions (they are all very cool; I have had a PDF of that document stored in my 'neat things' folder on my hard drive for quite some time ).

Thanks W.Luke and MattR, I guess I'll stop being paranoid and use current features where available! I'll probably end up using the next_id approach for RDBMS where no LAST_INSERT_ID() feature exist and no unique identifiers can be used to pull the latest entry number back and collisions might occur. But for everyday uses, autonumber will still be my best pal...

While the following is more like a rethorical question, and while I understand MySQL doesn't scale too well, I assume that exotic id number generation might be useful in instances where multiple DB servers run in tandem and where each machine is handling part of the data instead of replicated copies. In any case, I'm nowhere near that bridge yet.

SciFlicks.com -- Science Fiction Cinema
No, I'm not as old as Professor Farnsworth.