The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Count the number of scripts with reference number

Hi,

Currently I am dealing with a sys that track manuscripts. For every manuscript submitted, an auto increment id will assigned. If manuscript is accepted to be published, a reference number will be given.

How do I count the the number of manuscripts(in the database) already with reference number(accepted scripts) and then add one to this number.

I need to get the current largest reference number, so that when I insert the next manuscript, I can add one to the maximum reference. Increment in reference number, basically. But reference number starting at 10000.

Id and reference number not the same because id is for all submitted manuscripts while reference numbers are for the manuscripts selected to be published.

That'll give you the next ID to insert. How you get it with your script is up to you However, by the sounds of things your tables are very poorly designed. You might want to think about having a little reshuffle and normalize them.

junk, i could get into a big long discussion, but you may not have the time (nor the interest)

let me just caution you that having a separate auto_incrementing id together with a reference number that you assign by adding 1 to the highest existing number is a strategy with more processing overhead than necessary and a huge risk of losing data integrity (which risk might, however, be acceptable if you add only a manuscript or two a year, because then chances are you will not get what's called a "race condition")

here's a suggestion: start your auto_increment at 10050 and increment it in steps of 175

this will generate a nice-looking number, and then when you decide a particular submitted manuscript is going to be published, you update a field (called, perhaps, isPublished or Accepted) for that manuscript

kills both birds with one stone, and trust me, any solution which avoids code altogether is faster than a solution which involves code, no matter how efficient (and code to avoid a race condition is never efficient)

1) The extra overhead of running a refernce id and an auto incrementing id shouldn't add *considerably* to the processing overhead of the site.

2) I think there is a golden rule in db design that you have broken in suggesting that the auto-inc_id and the reference_id be combined. That rule is that the auto-inc_id should not be part of the business data. In the present application, I suspect the reference_id may be used for more than just linking tables in the database, it may well end up on the inside cover of a printed copy, or have other uses. I have been thinking of it like an ISBN number on books. One shouldn't use the ISBN of a book as the ID field in a database.

Your comment on the "race condition" is well made, though.

Without knowing the full details of the application, I would be tempted to use two tables to keep track of the manuscript. One for manuscript submissions and one to keep data about the published manuscripts. When a manuscript gets published the data can simply be transferred from one table to the other with some PHP to create the new reference id based on whatever business rules are applicable.

I don't think the zero-processing solution is good or better I come from an embedded engineering background where every processor cycle must be accounted for. Code for the sake of code is not what I'm suggesting.

I'm aware of the discussions on keys. I think the problem comes when (usually) some marketing bod decides that the reference ids are now going to be alpha-numeric or based on the phases of the 3rd moon of saturn or something else unrelated. Then having to change all the ids in that table and all related tables is a pain.

Re: one table is better ... I'm not sure. I'm thinking in terms of a large database with lots of scripts submitted and a very small percentage published. (I understand this is represents real-life in the book world.) To get data on published books means trawling through a large table with the vast majority of rows being completely irrelevant.

I think if you change your mindset from the reference number being an id to being some other string related to data outside of the database, then it becomes acceptable. As we don't know the business logic behind this application we are somewhat guessing at the best way forward, however I would suggest that my solution is less rigid and therefore offers more flexibilty in the future

... reference ids are now going to be alpha-numeric or based on the phases of the 3rd moon of saturn or something else unrelated. Then having to change all the ids in that table and all related tables is a pain.

that's not a design problem, that's a management problem

if the public identifier datatype has to change, it's easy to change

but the guy who suggests it is on shaky ground, e.g. suggesting a different reference id instead of the ISBN

... means trawling through a large table with the vast majority of rows being completely irrelevant.

Seriously, though. It's like using someones name as the foriegn key. They get married or divorced and you have to update all the tables that key is used in. I think that relationships between the tables should not be done using application/business data. Only data that makes sense inside the database, like autoinc fields, should be used.

comparing this example's auto id versus incremented reference number design question to the idea of using people's names as primary key is an apples and oranges comparison, and the oranges are sour

the main problem with people's names is that they're not unique

both the auto id and the reference number are unique, but the onus for this in the MAX()+1 strategy is clearly on the coder

both identify the same row, so both are acceptable candidate keys

my point is that it's better to have just one key here instead of two

mine's neater-looking that yours, and mine is an auto_increment id, i'm just saying it's also a natural key in exactly the same way as the inventor of the ISBN can feel a sense of pride in the widespread acceptance of the ISBN as a natural key

both the auto id and the reference number are unique, but the onus for this in the MAX()+1 strategy is clearly on the coder

This is true and admittedly this is a big but.

mine's neater-looking that yours, and mine is an auto_increment id, i'm just saying it's also a natural key in exactly the same way as the inventor of the ISBN can feel a sense of pride in the widespread acceptance of the ISBN as a natural key

Your solution has a lot going for it: neatness, reference number generation within the insert (so no concurancy issues), easy external code so less bugs and less maintenance.

I'm trying hard to think why I am resisting it

The only thing in my favour is that there is more flexibility in the form that the reference number could take. The original post indicated an incrementing number was required, which your solution satifies (albeit with big jumps). I added a suggestion that alternatives to incrementing numbers may be required in the future which was not even hinted previously. If they want it differnt they can pay to have it changed!

So, I admit your solution comes out on top.

I don't think the natural key argument swung it for me, though. Being clear on this - it is not that I'm against natural data as indexes, it's just natural data used as a primary link between multiple tables ... specifically where that natural data may be subject to change.

i have enjoyed this too, thanks for helping me carry this along (i sometimes have these types of conversations only with myself, so it's nice to bounce ideas around with someone who understands)

regarding primary keys that change -- if it's individual values that must occasionally change, this is what ON UPDATE CASCADE in the foreign key is for

if it's a wholesale domain change, well, that would be as disruptive as any domain change -- with or without surrogate keys -- so the conversion costs would be clearly need to be made known to management

more specifically, no manager (or, how did you phrase it earlier, "marketing bod") would decide that the reference ids are going to change without a pretty good reason, and my comment on this issue is that management should invest the few minutes now to fully understand the purpose of a reference number outside the database

I for one also enjoyed the exchange of ideas. Glad I read the thread. We need more of these and less repetitive questions where folks couldn't be bothered to search for similar question/answers if you ask me. ;-)