I often use very deeply nested structures in my code. This can make for ugly code. $self->{thiskey}{theykey}{etc}="etc..."
I've been going back and cleaning up some older code by taking a ref to the HoHoHoHo..s and it got me to thinking about hashes in general( That and reading perlguts).

I always assumed that when you said $hash{'constant_key'}->{key2}, Perl would optimize this on compile and not keep fetching $hash{'constant_key'} after it first encountered it.

I thought wrong.

It would appear that using a reference to the desired "Hash in a Hash" has a fairly large performance implication. This very well may be a "Duh!" for others but for me this was very eye opening.
Using a reference to a HoH speeds things up on average about 15%! A more practical test (HoH) bears this out. Not a bad speedup.
On HoHoH it averages about 35% faster. The deeper the reference, the bigger the payoff.

I don't do a lot of benchmarking so if someone sees a flaw in my method, I'd appreciate feedback.

Lesson: So not only does using refs to nested data structures make your code easier to read, it makes it a good bit faster.

In general I take the fact that complicated data structures start to occur in code as a sign that it's time to introduce some classes (refactoring term: the code smells ;-) Usually the structure actually reflects the relationships between various objects (that can contain other objects, etc.).

Regardless whether or not this results in a speed up of the code, introducing OO definitely leads to clearer and cleaner code.

I agree for the most part, but sometimes you have to go for speed and sometimes, it's because it's not worth all the extra coding overhead. Most of these $self->{key}{subkey} items (in my code anyway) are because they satisfy my requirements, they are not sufficiently clustered to be considered their own object (I don't write OO for OO sake), or the number of times it will be accessed would make the extra level of indirection unacceptable performance-wise.

The former clearly shows you are collecting data, per host,
per user and per process. The latter is just a mess, and you
quickly run out of sensible variable names.

If you have cases where $var {key1} {key2} {key3}
becomes unclear, you either have to redesign your datastructure,
or need to find better key or variable names.

That using a reference to an inner structure is a win in your
benchmark is clear, as you don't have to redo some calculations.
But you cannot do that always - you can only do that if you access
the same keys repeatedly. Often, the keys used are variable,
and will differ from iteration to iteration.

can be a little hard on the eyes. I'm certainly not advocating that everyone use refs when working with HoH's. The main point of the post was that
going deep gets slow quick and that I'm suprised that in certain cases, it isn't optimized away. Perl seems to do everything else for me so I was suprised.

On occasion, I've used hashes simply as a way to group variables.

%hash =( headers=>{data}, data=>{ "huge hash" } );

One such instance was a quick and dirty search tool. It was a HoH with a huge amount of entries. If by using a simple assignment, I can get a 15% speed up when iterating through large data sets, I think it's worth doing.

"That using a reference to an inner structure is a win in your benchmark is clear, as you don't have to redo some calculations. But you cannot do that always - you can only do that if you access the same keys repeatedly. Often, the keys used are variable, and will differ from iteration to iteration."

I'm not sure what you mean by this point. Care to clarify?

Thanks,
-Lee

"To be civilized is to deny one's nature."
update
Do you mean $h{$thiskey}{$thatkey} ?, but even then, orderly traversals are a fairly common thing, so in those cases where time is an issue, I still think it's a good thing to know.

I don't know if there isn't any way such an optimization would be possible in certain cases, but it's only useful in certain contexts so probably wouldn't be worth it to implement it into perl itself.

I guess I assumed it did this optimizing because it can't detect nested assignments in TIEs. I assumed it went right to the nested reference. Since it appears to hit every key, I am a bit perplexed as to why it cannot catch nested stores and retrieves, but that's a different problem for another day.

It could do that, if it was smart enough to look around that far in the code. In reality, the optimizer is really rather dumb. If you say perl -MO=Deparse,-x7 you'll get to see exactly what the compiler has done to your code, the -x7 switch tells B::Deparse not to rearrange constructs in order to prettify them. The code you see in in that dump will be exactly what perl is going to execute; anything that's mentioned twice there will actually be done twice. Poke around and you'll see that the compiler's optimizer is really quite limited and mainly implements a few highly specialized shortcuts.

The reason the compiler can't set lookups to constant keys in stone is that the hash itself is not set in stone, and as you add or delete keys, things shift about. I suppose at least some shortcuts might still be possible, but, well - given the general level of ignorance of the optimizer..

When putting a smiley right before a closing parenthesis, do you:

Use two parentheses: (Like this: :) )
Use one parenthesis: (Like this: :)
Reverse direction of the smiley: (Like this: (: )
Use angle/square brackets instead of parentheses
Use C-style commenting to set the smiley off from the closing parenthesis
Make the smiley a dunce: (:>
I disapprove of emoticons
Other