The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

well, the thing is that I've just placed a part of the application against you guys, you don't know the whole workings of it & it'll be too much for me to explain it all, its inter-connected & all hell will break loose if I change the structure, that is if I manage to change it!!
You guys might think me of as some one silly & I don't mind that a bit, but the truth is that I'm not the project lead here, so I can only give my suggestions, I don't have the authority to have them implemented!! So that's why, I know I'll not be able to pull this 'change the structure' off!! So please accept & cease to suggest that.

Originally Posted by TheRedDevil

Basically depending on the solution you chose (and you do have alot of solutions for this specifical problem) you can load the current IDs into a new table which only link the ID to its user, then delete the field from the database. Use either the real id field or create a new one.
Then update the script so it first check for the new id type, if it dont find any match there it check the new "linking" table if it matches the old id type.

Quite good alternative, but then, I've said before, no changes in the DB structure, I'm not the only one working on this stuff!!

Originally Posted by TheRedDevil

No offence is meant but for me it seems like your dropping all the "easiest" and most "effective" solutions due to you think/feel they wont work due to ... ... ... instead of seeing the most obvious solution to fix the problem/mess (whatever you want to call it).

Quite true & I hate to admit it but I don't have a choice in there but to drop the "easiest" & most "effective" solutions!!

Originally Posted by TheRedDevil

Actually if you sit down and consider this solution for 2 minutes Im sure you will notice how stupid you sound in that answer...

"stupid"?? I'll admit to be a third-rate idiot if admitting that can help things!!

Originally Posted by TheRedDevil

On the other hand, as it has been mentioned I dont really think you got a problem at all, just improve your query and ID generator slightly and you will be good to go.

Just rewrite your generator and you are ready to go, in 98% of the cases the query will only be run once with twenty matches in it. And of course the function can be even more optimized and more ids can be added to the array. Btw please note that I have not tested the function below, so there might be a syntax error.

hmm, well, this certainly has given me an idea on improving the ID generation & minimising repeat IDs, not exactly as you suggest but something similar!! will need to check that out.

Originally Posted by TheRedDevil

And just a side note, if your script manage to timeout on those few connections you might want to beat up whoever made the database and did the optimalizing or the guy who wrote the php script.

I'll add that to my ToDo list with a high priority, but then seeing what I've seen, I think that that person might not be the one to be blamed, not 100%, since he must've done what he was told to do, like I'm doing, not having much choice in there!!

If it was me, before I went rewriting functions or redesigning my database, I would make sure I knew where the problem is. For me, that would mean actually logging how many times duplicate ID's are generated each time getUniqueID is being run.

Precisely, you appear to be quite wise than the rest of them(no offense to anyone), suggesting the change without thinking about anything!! That's how I think, of finding the point where the problem is, but I think that I've not gone about it in the right way!!

Originally Posted by Snortles

You say it's being run 10,000 times each time you try to create a new ID but I don't see how that's possible. If it is only running once or twice (like I think it should be) then the problem isn't createUniqueID, it's somewhere else.

Did I say that?? Where?? No, I didn't say that, I merely expresed my thoughts on it that it might be running in loop for some higher number of times, say maybe 7000-8000 or possibly 10000, but like I admitted, I didn't check exactly how many times(maximum).

Originally Posted by ldivinag

no one's brought up:

is that an index?

if no, then index the table on that column. it wont have to search every row, even if it not in the table...

well, I believe that I said

Originally Posted by asp_funda

..... the field has just a PrimaryKey Index on it, I dunno if another Index(Full-Text??) would help, .....

Originally Posted by ldivinag

you can find out how many rows are being "looked at" by using the EXPLAIN command -- well at least in MySQL.

dont know about the other DBs out there.

The discussion so far suggests that its MySQL, no? Yes indeed, it is MySQL.

i had a similar issue. searches took like 20 seconds. really unacceptable. so i decided to EXPLAIN the sql statement. sure enough, it went through the entire table.

then i looked at the table schema. no index on the column i was searching on.

made a simple ALTER TABLE...

searches dropped down to 2-3 secs...

doh...

well, there's some different problem here, indexes are on every field that's required in a search here!! d'oh!!

Originally Posted by Etnu

Change it. There's really very little reason why you can't. The things listed here will help

I know!!

Originally Posted by Etnu

ultimately you're going to wind up dealing with this in other areas for some time to come. I'd even advocate changing it if it was 3rd party software -- this is just bad design. You don't make your primary key a field that you can't guarantee to be unique!

I'm not sure if you've solved your problem yet, but i would definitely say that if you are getting a large amount of repeating id's being generated then there is a problem with your random id generator script. I can't remember exactly what i was working on, but once i was working with random numbers and i was using the basic rand() function to generate numbers (can't remember the range). ALthough i had read that the rand function only generates numbers between 0, 32768, i found out that when i was generating random numbers over and over, that the pattern was actually repeating every 32768 entries.. I couldnt believe how non random the rand function was.. I switched to mtrand, and i think i may have added some kind of salt, and that fixed the problem completely..

[Sarcasm]Sounds like you have the perfect client to work with[/Sarcasm]

Hardly, if it had been a client, I would have dealt with it somehow, however, its my boss you see!! And you can't over-rule the Chief Software Architect, eh!! Not unless you are the CTO or something!!

Originally Posted by mwolfe

I'm not sure if you've solved your problem yet,

No I haven't solved it yet!!

Originally Posted by mwolfe

I can't remember exactly what i was working on, but once i was working with random numbers and i was using the basic rand() function to generate numbers (can't remember the range). ALthough i had read that the rand function only generates numbers between 0, 32768, i found out that when i was generating random numbers over and over, that the pattern was actually repeating every 32768 entries.. I couldnt believe how non random the rand function was.. I switched to mtrand, and i think i may have added some kind of salt, and that fixed the problem completely..

Well, I doubt that your suggestion will work, as I'm not using the random numbers as it is, I'm using them in a different way, to specify the character range(65 to 90) for obtaining uppercase roman alphabets!!

Well, I doubt that your suggestion will work, as I'm not using the random numbers as it is, I'm using them in a different way, to specify the character range(65 to 90) for obtaining uppercase roman alphabets!!

Well i'm guessing you understand this already, but possibly you don't.. Rand only generates numbers, not characters or any other type.. In some languages such as java, it produces a floating point number between 0 and 1, and you can simply multiply by a scale and add a offset to get a specific range.. I don't believe it works the same way in php but it only generates random numbers. In your case, you were just converting that random number to a character. On a windows machine, according to the manual, the randmax is set to 32768 automatically. I found that rand doesnt give random numbers at all, they completely repeat the same pattern after 32,768 iterations.. I'm not exactly sure what this means in the case of your code, but it could easily mean that your numbers being generated aren't very random.. try mtrand instead. I'll run your script though if i have time and see if
i see anything odd. btw, is this being ran on windows or *nix box? I'm actually using linux now so i don't know if i'll experience the same problem i had before.

In your case, you were just converting that random number to a character. On a windows machine, according to the manual, the randmax is set to 32768 automatically. I found that rand doesnt give random numbers at all, they completely repeat the same pattern after 32,768 iterations.. I'm not exactly sure what this means in the case of your code, but it could easily mean that your numbers being generated aren't very random.. try mtrand instead. I'll run your script though if i have time and see if
i see anything odd. btw, is this being ran on windows or *nix box? I'm actually using linux now so i don't know if i'll experience the same problem i had before.

Oh, I see what you mean!!

Originally Posted by mwolfe

is this being ran on windows or *nix box?

The script is being ran on a Linux box but the problems I had came on a win server where I tested it!!

Maybe I should try this on the Live server & take execution time measures as well!! Thanks for the tip!!

i ran the script on my box, here is what i did
I created a new db called random, with 1 table - named test, with 2 fields in it.. an auto increment field named id and a field named random_id. I put a unique key on the random_id field.

Here is the basic script i ran, try it out onyour windows box and see if you get any interesting results

this loops 100,000 times in a single click of the generate button.. On my althlon xp 1800 it took about 20 seconds to run i believe.. (i didnt time it but it never hit the settimeout limit which i believe is either 30 seconds or a minute).
I did not experience anything overly odd or non-random with the results.. I typically had 1 non unique key in the first pass, and maybe 1 or two the second time around.. After about 4 passes i would get maybe 5 or so duplicates.. So by this time therer were 400,000-500,00 entries though. I tested it about 10 times altogether, and i truncated the table several times to see if any pattern occurred. I didnt find any, however this is running linux, and a newer version of php than i was running on windows when i had the problem before.

I don't remember all the calculations for expected values, but you could probably calculate the expected number of times you would get a match in 100,000 iterations.. I would guess that its less than 1, but i'm not sure.

Each time it took 120 to 129 seconds. Each time the first 4096 iterations produced a useable ID, after 4096 every single iteration produced a duplicate.

So I changed the rand to mt_rand and ran it four more times, each time making sure the table was empty. The execution times were the same (approx 2 mins) but the number of duplicates dropped to 0, 1, or 2.

After timing it, i found the first iteration (with an empty table) it took less than 25 seconds, but it took longer and longer each time (which makes sense because it has to check that the values don't exist already before it inserts them).
I'm running php 5.04 i believe, and mysql 4.1
on an amd 1800, 512 mb ram on ubuntu linux (i don't have a ton of extras but i havent really optimized it..)
maybe its your php version is slower.. i wouldnt know why. or perhaps its just winblows, lol

The in_array vs. db lookup isn't necessarily true. The DB can take advantage of faster indexing; when I feed it in an id that begins with the letter a, it can immediately skip comparisons with anything that doesn't begin with A, immediately cutting the lookup time required by as much as 95%.

If you can't change the id to an int, can you at least expand it to be more characters? You could then use php's uniqid() function to generate an id that's guaranteed to be unique on 2 requests at the same microsecond.

The in_array vs. db lookup isn't necessarily true. The DB can take advantage of faster indexing; when I feed it in an id that begins with the letter a, it can immediately skip comparisons with anything that doesn't begin with A, immediately cutting the lookup time required by as much as 95%.

Yes indeed a faster index will result in lesser execution times, that's why I asked, whether a Full-Text index on the id column help??

Originally Posted by Etnu

If you can't change the id to an int, can you at least expand it to be more characters? You could then use php's uniqid() function to generate an id that's guaranteed to be unique on 2 requests at the same microsecond.

The ID structure can't be changed in any way, it has to be 4 alphabets & 4 numbers in the same order as in ABCD1234, otherwise I'd have changed it before now, & increased the length to 12 or 14 characters in total.

I've changed the rand() function to mt_rand(), so any other optimisations without changing the structure? Like I said, the id field just has a PrimaryKey index on it, so a Full-Text or a normal index on it help?

I've just come in on this and had a quick scan through the many posts.

Your Structure of ABCD1234 has over 4500 million combinations and youre hitting trouble at 50,000, so you've bought some time with your changes but for how long, I guess that depends on how many users you expect to get on the system an in what timeframe.

Why does a user ID need to be random ? Presumably this is coupled with other form of security like a password, wouldn't it be worth considering just stepping through the combinations as people signup ?

Just a thought

A Little Knowledge Is A Very Dangerous Thing.......That Makes Me A Lethal Weapon !!!!!!!!

..........but since he started that way and he can't redo everyones id's, he has to keep it that way.........

I was thinking more along the lines of, given the probable spacing between used ID's at the moment he could start at AAAA0000 with the next signup, each one is incremented and checked with the database for an existing match, already used ID's are jumped over.

A Little Knowledge Is A Very Dangerous Thing.......That Makes Me A Lethal Weapon !!!!!!!!

Your Structure of ABCD1234 has over 4500 million combinations and youre hitting trouble at 50,000, so you've bought some time with your changes but for how long, I guess that depends on how many users you expect to get on the system an in what timeframe.

Yeah, many combinations are possible & its true, the switch from rand() to mt_rand() has just pushed the problem from present to near future, for how long, I don't know!! I & the management are expecting a total of about 150000(a margin of 50000 is kept in case there are more signups than expected, so atmost a total of 200000, we are already at about 56000-57000, so another 150000 or so), with the rate we are having signups, this figure would probably will be touched by about April-May 2006, so that's about 8-9 months, and its quite possible that this problem might come in again before that!!

Originally Posted by Mandes

Why does a user ID need to be random ? Presumably this is coupled with other form of security like a password, wouldn't it be worth considering just stepping through the combinations as people signup ?

Well the reason I know for it to be random is just because the people can't guess the next ID, and doing random might've looked easier when the system started, you know, with over many million combinations possible.

Originally Posted by mwolfe

mandes, it probabaly doesnt have to be random, but since he started that way and he can't redo everyones id's, he has to keep it that way or he will overwrite already existing id's

yeah, too late to question about the random factor!!

Originally Posted by Mandes

I was thinking more along the lines of, given the probable spacing between used ID's at the moment he could start at AAAA0000 with the next signup, each one is incremented and checked with the database for an existing match, already used ID's are jumped over.

After reading this, I looked over the DB, didn't peruse every of the 50000+ records but got to the point what I doubted, the IDs are spread over different series, so that would be quite, well, much of an headache to lookup & keep incrementing. Its possible, ofcourse, & its slightly better as well, but then I'd like to keep it as the last option, if there's nothing else that will help!!

Guys, no one answered what I asked many times, will a Full-Text index on the ID column help in the record retrieval or will the difference be negligible? This index will ofcourse require quite some disk-space but then I don't give a damn about it, over 2GB of disk-space is available & more can be obtained, no problem in that!!

I need something that can keep this problem away from me for some more time, not asking for anything permanent, something that will stay put for another 10-11 months, see through another 150000 IDs.

Guys, no one answered what I asked many times, will a Full-Text index on the ID column help in the record retrieval or will the difference be negligible? This index will ofcourse require quite some disk-space but then I don't give a damn about it, over 2GB of disk-space is available & more can be obtained, no problem in that!!

I'm not expert in SQL, but I'd have thought with 50+K of records it must have a benefit, I'd probably post on the SQL forum though to make sure

A Little Knowledge Is A Very Dangerous Thing.......That Makes Me A Lethal Weapon !!!!!!!!

A "full text" index will not help in your case. That one is used to index text fields to allow a "full text search" to be indexed. And since you only got one "word" in that field it wouldnt help you.

Yeah, that was something I was wondering but not sure about, not proficient in PHP & MySQL enough yet!!

Originally Posted by TheRedDevil

You mentioned earlier in the thread that alot of the fields in that table was indexed? Have you tried to remove the index on some of them? As too many indexes on a table will slow it down too.

Well, not a lot of indexes, only the required fields like username, referrer, email, etc. have indices as they are all used in searches & the search feature is used quite a bit, so removing them would certainly cause some problems. None of the indices are un-important though, its bare minimum!!

hey i thought of something that might be of interest.. I won't code it myself, but how about this..
Alright, if you can write the script, and then get some 'down time' in which users can't sign up, like maybe late at night for serveral hours, you may be able to pull something off along these lines...

take that random id generator function you have, and use it to generate random id's that have not been chosen.. have it generate maybe 100,000 - 200,000 different id's. Now i'm not really familiar which way would be the most efficient, but you could either put all the those id's in an array, and then write them to a text file, one per line, or you could just write them to the text file each time you get a non used id..
Then when a user signs up, grab that top number off the text file, and truncate that line.. I'm not totally sure of the code for doing that, but i think this could very well be a possible solution, and it shouldnt be too difficult..
If having 100,000 id's in one file makes the file too large and bulky to work with, maybe you could somehow automate this task so it only puts maybe 10,000 or so but it automatically runs again when no id's are left in that file..
If i have time i'll try and test out this method and see how resource heavy it is..

Why does a user ID need to be random ? Presumably this is coupled with other form of security like a password, wouldn't it be worth considering just stepping through the combinations as people signup ?

This is originally what mandes said, and thats what i was referring to. He clarified that they would be checked against the db if they already existed or not.. But i'm wondering, isnt he trying not to have to query the database of 50k records each time a user signs up.. This may be slightly better as you won't have to deal with the unrandomness of random number generators (if that makes any sense..)
but think about it, the odds of an id matching a previously given id are pretty low. So the odds of two matches in a row is very very very low, just as there would be lots of numbers that would sit right next to each other sequentially...

I'm still not really sure why there is a problem to begin with. But the idea for the text file may work.. It may be require a bit more work, but it should be faster and keep the "randomness" of your id's. Just make sure users can't get to this file or edit it as that would be very very bad..

Start at AAAA0000. Check to see if that exists. If it does, get the next one AAAA0001. Check that.

Keep track of where you ended, so next time someone requests an ID, you start there.

Yeah well, the thing is that this no different in terms of what's going on currently, the DB will be hit with similar impact as of now(multiple times) just to check the ID, & I don't think that there's an improved possibility of having more un-used IDs, not with using the mt_rand() which generated a duplicate well after 37000-38000 iterations.

Originally Posted by mwolfe

take that random id generator function you have, and use it to generate random id's that have not been chosen.. have it generate maybe 100,000 - 200,000 different id's. Now i'm not really familiar which way would be the most efficient, but you could either put all the those id's in an array, and then write them to a text file, one per line, or you could just write them to the text file each time you get a non used id..
Then when a user signs up, grab that top number off the text file, and truncate that line.. I'm not totally sure of the code for doing that, but i think this could very well be a possible solution, and it shouldnt be too difficult..
If having 100,000 id's in one file makes the file too large and bulky to work with, maybe you could somehow automate this task so it only puts maybe 10,000 or so but it automatically runs again when no id's are left in that file..
If i have time i'll try and test out this method and see how resource heavy it is..

Yeah, well, I've this thing up standing by, to be implemented if nothing else is working, though I thought about having generated 50000 unique IDs and storing them in a seperate table & then getting 1 ID from it on a signup & deleting that ID from that table. This will take some work from me(generating the unique IDs) but will work quite well, & the signup will become faster as well.
So if there's no other way, then I'll implement this!!

But i'm wondering, isnt he trying not to have to query the database of 50k records each time a user signs up..

I thought the problem was that he was hitting the DB multiple times as he was getting used ID's coming up, anyway I suspect that the first thing the script does on someone signing in is search for some details about them, name, preferences etc, in which case your still searching the full database at some time.

A Little Knowledge Is A Very Dangerous Thing.......That Makes Me A Lethal Weapon !!!!!!!!