The regexp solution benefits from the very efficient regexp engine. But it is a solution that is built upon a big-O polynomial algorithm. If we expand the problem to finding uniqueness in strings consisting of three-character-wide groups of alphabetical characters, that gives us a lot of room for dataset growth while maintaining a string of unique groups. The hash solution grows at O(n) since each hash insert occurs at an average of O(1). I can't quite figure out how bad the regular expression approach gets as the string grows, but it's probably something like O(n^2) or worse.

For short test strings the raw speed of the regexp engine wins over the complexity of the hashing algorithm. But for longer strings, there's literally no comparison. Here's some test code:

At first I thought my eyes were decieving me. 1.84e-002 iterations per second? That's horrible. But then I realized that the regexp solution was so slow that Benchmark switched to showing seconds per iteration. So it takes 1.15 seconds per iteration for the regexp approach in my test example, and a blink of an eye (1.84e-002) for the hash approach with a test string of 1353 groups. Try testing 'aaa' .. 'faa'. You'll have to increase the testing time about a minute to even get reliable results out of Benchmark at that point because the regexp approach becomes so sluggish.

Of course this is a contrived example, but aren't they all? ;) And I did have to modify the RE a little so that it would maintain proper framing. But the discussion caught my attention and I just had to prove to myself what I already suspected.

The (?=.*?0) asserts that $digits contains a '0'. If that fails, then the regex fails. If a '0' is found, then (since that was a look-ahead assertion), we start over at the beginning of $digits and (?=.*?1) asserts that $digits contains a '1'. etc.

Sometimes verbose code is faster. This beat out the above regexp examples in all the cases I threw at it. (NOTE: the "Note unique" example is a best case scenario for this example, but the worst case of "All unique" still performs better").

You have a point, though if I were going the verbose route, I would use the transliterate function rather than an index-substr. The back reference method come out slightly faster on my computer than index-substr with transliterate sightly ahead of both. It is moot however, tye's solution above (u5) pretty much spanks all of them.

Hi,
Unfortunately at this moment i can't test the speed of my little idea about your problem. Well, i think that a simple solution is to add all digits and test if the result is 45, infact the sum of all digits in a ten character string is 45 only where every digit from the set 0-9 occurs exactly once, according to the Gauss' formula (N*(N*1)/2). I hope that, in spite of my ugly english, this little idea can help someone.

A combination of all ten digits uniquely is a permutation which can be generated using Algorithm::FastPermute.

So given that the code has to execute for thousands of candidate combinations, it seems it might be better to store all the permutations as the keys of a hash (with value 1 throughout) only once and start accepting candidates to check just by looking up in the hash. If using Linux, an additional optimisation could be to store the pre-calculated set of permutations in a Storable on the /shm (persistent shared memory) device, so that the hash of valid permutations doesn't even need to be calculated between runs of the program and can be loaded directly from persistent shared memory at the beginning of any given run.

There will be 10! entries in the the hash. Extrapolating the size it takes to store 10!/10, 10!/5 and 10!/2 10 character strings in a hash, I estimate the needed size to be around 170Mb. Trying to stuff 10! of those strings in a hash cause all the machines I tried it on to swap - with top showing a memory usage of around half a Gb of the running program.

This would reduce the storage by a factor of more than a hundred, while requiring that the last half of the digits would then need to be compared with the array being also pre-stored at each outer node of the hash, but with the same 120x performance for those last digits as compared with processing 10 without prestored results -- total performance improvement should be around 80x in this case (but PLUS the fixed overheads if you don't have linux shared memory to pre-store the solution)

(pseudocode because my perl has become rusty and i would probably get the syntax wrong)
use an array $d[0] to d9, set all elements to 0;
loop over digits in string:
if ( $d$_ ) return 0;
$d$_ = 1; # should be faster then adding 1
return 1;

If you're gonna get this nuts about speed, you may as well fire up Inline C or XS. If you know XS, this solution is actually easier to grok than some of the more esoteric solutions above, as well as faster: