On Mon, 2008-12-22 at 13:47 -0600, Kenneth Marshall wrote:
> Dear PostgreSQL developers,
>
> I am re-sending this to keep this last change to the
> internal hash function on the radar.
>
Hi Ken,
A few comments:
1. New patch with very minor changes attached.
2. I reverted the change you made to indices.sgml. We still don't use
WAL for hash indexes, and in my opinion we should continue to discourage
their use until we do use WAL. We can add back in the comment that hash
indexes are suitable for large keys if we have some results to show
that.
3. There was a regression test failure in union.sql because the ordering
of the results was different. I updated the regression test.
4. Hash functions affect a lot more than hash indexes, so I ran through
a variety of tests that use a HashAggregate plan. Test setup and results
are attached. These results show no difference between the old and the
new code (about 0.1% better).
5. The hash index build time shows some improvement. The new code won in
every instance in which a there were a lot of duplicates in the table
(100 distinct values, 50K of each) by around 5%.
The new code appeared to be the same or slightly worse in the case of
hash index builds with few duplicates (1000000 distinct values, 5 of
each). The difference was about 1% worse, which is probably just noise.
Note: I'm no expert on hash functions. Take all of my tests with a grain
of salt.
I would feel a little better if I saw at least one test that showed
better performance of the new code on a reasonable-looking distribution
of data. The hash index build that you showed only took a second or two
-- it would be nice to see a test that lasted at least a minute.
Regards,
Jeff Davis