PJ> Date: Sun, 30 Dec 2007 08:35:21 +1100
PJ> From: Peter Jeremy
PJ> On Sat, Dec 29, 2007 at 08:50:14PM +0000, Edward B. DREGER wrote:
PJ> >
PJ> >perfect_hash = ( hash1[x] + hash2[x] ) % entry_count ;
PJ> This relies on pre-knowledge of all possible entries. It's
PJ> excellent for (eg) keyword lookups in a compiler (and gcc uses gperf
PJ> for that reason) but no good where the input can be arbitrary.
IIUC, Garrett wanted to eliminate duplicates. When an item is found,
check against the current [OP]MPH table; either make new reference or
follow existing, as appropriate.
IIRC, CHM92 had to precompute the entire table. However, one can
implement OPMPHashing in a way that supports dynamic deletions and
incremental additions.
Unfortunately, I know of no implementation that's either BSD-licensed or
in C, let alone both. It's been nearly a couple years since I wrote a
[proprietary] C++ implementation, but I recall that it wasn't terribly
difficult.
The trick for incremental additions is to deal with hash collisions,
drastically reducing the probability of a <hash1,hash2> graph cycle.
Thus, one rarely must discard the entire OPMPH table in favor of one
built from new hash functions. (Note that the OPMPH table must still be
"large enough" to deal with the additional items. I forget the exact
numbers, but space requirements were notably less than CHM92.)
Eddy
--
Everquick Internet - http://www.everquick.net/
A division of Brotsman & Dreger, Inc. - http://www.brotsman.com/
Bandwidth, consulting, e-commerce, hosting, and network building
Phone: +1 785 865 5885 Lawrence and [inter]national
Phone: +1 316 794 8922 Wichita
________________________________________________________________________
DO NOT send mail to the following addresses:
davidc at brics.com -*- jfconmaapaq at intc.net -*- sam at everquick.net
Sending mail to spambait addresses is a great way to get blocked.
Ditto for broken OOO autoresponders and foolish AV software backscatter.