saintmike has asked for the
wisdom of the Perl Monks concerning the following question:

Seems that starting in perl 5.17, keys() to obtain the keys in a hash is now truly randomized, which means that even within the same process, calling keys() twice on the same hash will result in a different key order.

In previous versions, this used to be just per-process, so you couldn't rely on the sort order from one process invocation to the next, but calling keys() twice on the same hash within the same process always returned consistent results.

This seems like a major challenge for testing applications. If my hash serializer produces different results every time, should my CPAN module have to deal with the daunting task of creating every single permutation of the keys() result that's used underneath, leading to an explosion of application-layer results to test against?

So far, in the test suite for
https://github.com/mschilli/php-httpbuildquery-perl
I've worked around this issue by running keys() once to determine the order, to test against the result generated by my second call by the application.
With perl 5.17, this is no longer possible.

What do people do to test against unpredictable core functions? Shouldn't the core provide some kind of API to figure out what the order is/was for testing?

Comment on Truly randomized keys() in perl 5.17 - a challenge for testing?

Sort the keys in tests, or use tied hashes that preserve key order. Or test a round-trip (i.e. use your module to convert Perl data structure -> serialized -> Perl data structure, and use is_deeply to compare the input and output data structures).

Actually, this change is removal from the code base. It's a simplification of the existing mechanism. Rather than perturbing the hash when an attack is detected, the salt is always applied. To make that simplification safe, the salt needs to be different for each hash.

demerphq Yes, I am unable to show any actual performance gains either.

So, not for performance (gain) reasoning.

Also in his own words:

demerphq So I think that the current rehash mechanism is about as secure as the
random hash seed proposal.

And:

demerphqPersonally I dont think its worth the effort of doing much more than
thinking about this until someone demonstrates at least a laboratory
grade attack. IMO in a real world environment with multi-host web
sites, web servers being restarted, load balancers, and etc, that
simple hash randomization is sufficient. Seems like any attack would
require large numbers of fetch/response cycles and in general would
just not be effective in a real production environment. I would assume
that the administrators would notice the weird request pattern before
an attacker could discover enough information to cause damage. Same
argument for non-web services IMO.

And it doesn't make anything more secure.

it looks like the primary motivation for moving to rehash was to restore
the binary compatibility within the 5.8.x branch inadvertently broken by
5.8.1.

I'm not particularly keen on having hashes always randomised - it makes
debugging harder, and reproducing a reported issue nigh-on impossible;
but if Yves can show a measurable performance gain without the rehash
checks, then I'll approve, as long as the hash seed can still be
initialised with the env var PERL_HASH_SEED=0 - otherwise debugging
becomes impossible.

Which mirrors the OPs objections.

So, significant consequences

So, why? What did Perl gain?

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

which means that even within the same process, calling keys() twice on the same hash will result in a different key order.

That's not what it means at all. For a given hash, multiple calls to keys (and values) are still guaranteed to return the same order if there has been no change to the hash, and the order has always been subject to change after hash modifications.

Difference one: The order is more likely to change on hash modification.

Difference two: In a given interpreter, if you built two hashes using identical insert and delete steps, you used to get the same key orderings. This is not always the case now.

You are mistaken. That file never calls keys twice on the same hash. It call keys on two different hashes (containing the same data). That code has been buggy since 5.8.1. The bug is just more likely to occur now.

Let me clarify: The module referenced in the original posting has a function hash_serialize() that takes a reference to a hash like { a => "b", c => "d" } and turns it into something like "a=b&c=d" or "c=d&a=b", depending on what keys() returns underneath:

Now how am I supposed to test that the outcome is what I'd expect? With two hash elements, you could argue that I could generate all permutations of possible result strings and check if one of them matches the one I got by running the function, but with thousand entries this becomes unwieldy.

The general problem is this: I have an unpredictable function (keys()) and its result gets processed, and the processed result needs to get checked against an expected outcome.

Unless keys() can be switched to a test mode to produce predictable results (doesn't need to be sorted, just give me something predictable, like before perl 5.17), what are the options?

By the way, some people have suggested to use "sort keys" in my algorithm every time, but putting in extra complexity (sort) at runtime to make sure I can test perfectly fine code is just plain wrong.

Unless keys() can be switched to a test mode to produce predictable results (doesn't need to be sorted, just give me something predictable, like before perl 5.17), what are the options?

To test hash_serialize():

I would deserialize it to memory, and compare with original hash (with something based on "sort keys" or with cmp_deeply).

However this violates some ideas of unit testing (while it's ok for integration testing).
If it's worrying you - test it for 1-element hash and 2-elements hash (yes, all 2 permutations), and stop. Do other testing with deserialization.

just for your testing. This seeems to be simple enough that no new bugs are introduced and will not require a sort in your production code.

While de-serialization could be a solution as well, one has to be very careful as it adds additional complexity. For example, if your serialization function returns "a=b&c=d&a=b" because of some bug, it could easily be "fixed" by a de-serialization procedure:

At this point it's more of an academic exercise, I'm looking for a general solution to the problem of having a unpredictable function somewhere deeply embedded in my call hierarchy, and writing unit tests to confirm the result.

Do other languages have this problem, except for very few functions like random()?

Another possibility for your particular case is to sort keys. Performance penalty is small, and having random URLs generated each time is a bigger problem and can cause some caching issue and other troubles.

[Your Mother]:I adored the comic and the movie was almost a frame for frame remake (excepting the plot liberties to shorten it) but I still didn't quite like it. What's his nose, Jamie Earl whoever was flawless though.