Update: I know know why the problem didn't show up in my version of the script. As I mentioned, I'm using a custom shuffle routine, which does an in-place shuffle. The List::Util version of shuffle is not in-place, hence the last line of the for loop should be @in = shuffle @in; for use with List::Util, not shuffle @in; as for my version. I've update the code below.

I apologise. It is a legit bug. I'm amazed it hasn't shown up before and confused as to why it doesn't show up in my lightly modified version of the same script you posted, but...this fixes it. Till the next one....

It's working so far in a relatively small number of arrays (1000). I already found one script that's giving me no error in tens of thousands on arrays. You can get it from the conversation, posted by 'tilly'.

For now I think I'll stick to that one. But after two days of trying everything I found here, I believe I must try yours together with tilly's and see if both of you have reached the same nirvana. Just for the heck of being able to thank everyone that helped, and post a couple of scripts that work undeniably well.

If you have N random integers of size 0-M, worst case memory use and running time should both scale with my algorithm like O(N*N*M). In some cases memory use might scale a little better than that.

About a reproducible random shuffle, why not just call srand with some specific value and then shuffle using the fact that rand is now a known, deterministic function? For testing purposes you can combine different combinations of srand and datasets.

However, you have a decision to make. It's a trade off between consistently optimal results and consistently predictable timings. tilly's exhaustive search suffers from similar problems to my own attempt at such. Whilst it will eventually find an optimum solution, sometimes quite quickly, quite frequently it will spend an inordinate amount of time finding it. And an inordinate amount of memory too!

Below, you will see a comparison of tilly's (latest) and my (latest) code run against the same random data sets with 1e1 thru 1e6 elements. Notice how with 1e3 and 1e5 elements, tilly's code was timed out after a full five minutes where the semi random approach never takes more than ~10 seconds. For the 1e3 case, tilly's code uses over 1GB of ram. For the 1e5 case, it came perilously close to exhausting my 1.5GB of physical memory.

The memory consumption may not be a problem if your data sets are always smallish, but occasionally even with a relatively small dataset of 100 elements it will take longer that 60 seconds to find a solution, and yet only take 18 seconds for 1 million!

I appreciate your sentiments elsewhere that tilly's code is reliable--that's why I chose it to verify my own against:)--but being exhaustive and deterministic makes it easier to test and debug. The non-determinism of a random (genetic) algorithm makes it much harder to test. That's why I've spent the last two days attempting to get a reproducible random shuffle.

But, for NP-hard problems, Genetic Algorithms have one huge advantage. You can specify how long you are prepared to ever wait (in terms of either iterations or best solution within a time limit) for a solution. That's much harder to arrange with an exhaustive solution.

There is another possibility. You can combine the two approaches. You start off using the exhaustive approach and either through a time limit, or (and you'd have to consult tilly for details), through some heuristic guesstimate of how long it is likely to take for the given data set, switch to using the genetic algorithm. Perhaps passing the best found so far from the exhaustive attempt into the genetic to see if it can improve it within some specified number of iterations.

In any case, whichever approach meets your needs, thank you for presenting an extremely interesting (if at times, frustrating:) problem.

That's only the fourth or fifth GA I've ever attempted, and they are frustratingly hard to verify. I learnt one big thing from your feedback. Using small datasets (3 or 4 elements) is a far better testing strategy that large arrays of random values! Three nested loops cycling through all the possible permutations of 3 x 0 .. 99, would be trivial and fast to verify through brute force, but (as your feedback showed), stands to highlight bugs very quickly.

Obvious now I seen it in action. But hey! We're never too old to learn something new.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

When putting a smiley right before a closing parenthesis, do you:

Use two parentheses: (Like this: :) )
Use one parenthesis: (Like this: :)
Reverse direction of the smiley: (Like this: (: )
Use angle/square brackets instead of parentheses
Use C-style commenting to set the smiley off from the closing parenthesis
Make the smiley a dunce: (:>
I disapprove of emoticons
Other