Feature #5903

Optimize st_table (take 2)

Given some of preparations to this patches already merged into ruby-trunk,
I suggest patches for improving st_table second time (first were #5789):

1) Usage of packing for st_table of any kind, not only for numeric hashes.

Most of hashes, allocated during page render in Rails are smaller than 6 entries.
In fact, during rendering "Issues" page of Redmine, 40% of hashes not even grows
above 1 entry. They are small options hashes, passed to numerous helper methods.

This patch packs hashes upto 6 entries in a way like numeric hashes from trunk.
Also it pack hashes of size 0 and 1 into st_table inself, so that there is no
need to allocate any "bins" at all.

Another question about packing.
Why are PKEY_POS and PVAL_POS from the tail?

It allows hash values to be very close to each other, so that while loop in find_packed_index runs through them very fast and does not touch another cache line of cpu.
And only when it found equal hash it jumps to check key. This allows searching in packed hash be even slightly faster than in not packed hash of same size.

Initially I experiment with variable sized packed hashes, so that num_bins is used and they goes from tail to avoid division by 3.
With fixed size this could be simplified.

I pushed a commit which places PKEY_POS and PVAL_POS after hashes, but in forward order.

They could be placed altogether (like i*3, i*3+1, i*3+2). remove_packed_entry should be changed accordantly. I think, this could improve iteration over hash.

change rb_hash_modify to rb_hash_modify_check when st_table allocation is not necessary

move part of safe iteration logic to st.c to make it clearer
This is arguable change, cause it clearly do not have positive impact on performance,
but make check consumes 592.2 second before this change and 595.4 after - less than 1 percent,
so that I suppose, difference is negligible.

Removal of ST_CHECK .
ST_CHECK were always returned instead of ST_CONTINUE inside of some st_foreach loops.
Now such calls to st_foreach are converted to calls to st_foreach_check.
So that, there is no reason to differentiate ST_CHECK and ST_CONTINUE, which simplifies calling code a bit.
Also, it allows to simplify st_foreach_check code.