{-# LANGUAGE BangPatterns #-}{-# LANGUAGE CPP #-}{-# LANGUAGE MagicHash #-}{-|
A basic open-addressing hash table using linear probing. Use this hash table if
you...
* want the fastest possible lookups, and very fast inserts.
* don't care about wasting a little bit of memory to get it.
* don't care that a table resize might pause for a long time to rehash all
of the key-value mappings.
* have a workload which is not heavy with deletes; deletes clutter the table
with deleted markers and force the table to be completely rehashed fairly
often.
/Details:/
Of the hash tables in this collection, this hash table has the best insert and
lookup performance, with the following caveats.
/Space overhead/
This table is not especially memory-efficient; firstly, the table has a maximum
load factor of 0.83 and will be resized if load exceeds this value. Secondly,
to improve insert and lookup performance, we store the hash code for each key
in the table.
Each hash table entry requires three words, two for the pointers to the key and
value and one for the hash code. We don't count key and value pointers as
overhead, because they have to be there -- so the overhead for a full slot is
one word -- but empty slots in the hash table count for a full three words of
overhead. Define @m@ as the number of slots in the table and @n@ as the number
of key value mappings. If the load factor is @k=n\/m@, the amount of space
wasted is:
@
w(n) = 1*n + 3(m-n)
@
Since @m=n\/k@,
@
w(n) = n + 3(n\/k - n)
= n (3\/k-2)
@
Solving for @k=0.83@, the maximum load factor, gives a /minimum/ overhead of 2
words per mapping. If @k=0.5@, under normal usage the /maximum/ overhead
situation, then the overhead would be 4 words per mapping.
/Space overhead: experimental results/
In randomized testing (see @test\/compute-overhead\/ComputeOverhead.hs@ in the
source distribution), mean overhead (that is, the number of words needed to
store the key-value mapping over and above the two words necessary for the key
and the value pointers) is approximately 2.29 machine words per key-value
mapping with a standard deviation of about 0.44 words, and 3.14 words per
mapping at the 95th percentile.
/Expensive resizes/
If enough elements are inserted into the table to make it exceed the maximum
load factor, the table is resized. A resize involves a complete rehash of all
the elements in the table, which means that any given call to 'insert' might
take /O(n)/ time in the size of the table, with a large constant factor. If a
long pause waiting for the table to resize is unacceptable for your
application, you should choose the included linear hash table instead.
/References:/
* Knuth, Donald E. /The Art of Computer Programming/, vol. 3 Sorting and
Searching. Addison-Wesley Publishing Company, 1973.
-}moduleData.HashTable.ST.Basic(HashTable,new,newSized,delete,lookup,insert,mapM_,foldM,computeOverhead)where------------------------------------------------------------------------------importControl.Exception(assert)importControl.Monadhiding(mapM_,foldM)importControl.Monad.STimportData.Hashable(Hashable)importqualifiedData.HashableasHimportData.MaybeimportData.MonoidimportData.STRefimportGHC.ExtsimportPreludehiding(lookup,read,mapM_)------------------------------------------------------------------------------importData.HashTable.Internal.ArrayimportqualifiedData.HashTable.Internal.IntArrayasUimportData.HashTable.Internal.CacheLineimportData.HashTable.Internal.UtilsimportqualifiedData.HashTable.ClassasC-------------------------------------------------------------------------------- | An open addressing hash table using linear probing.newtypeHashTableskv=HT(STRefs(HashTable_skv))dataHashTable_skv=HashTable{_size::{-# UNPACK #-}!Int,_load::!(U.IntArrays)-- ^ How many entries in the table? Prefer-- unboxed vector here to STRef because I-- know it will be appropriately strict,_delLoad::!(U.IntArrays)-- ^ How many deleted entries in the table?,_hashes::!(U.IntArrays),_keys::{-# UNPACK #-}!(MutableArraysk),_values::{-# UNPACK #-}!(MutableArraysv)}------------------------------------------------------------------------------instanceC.HashTableHashTablewherenew=newnewSized=newSizedinsert=insertdelete=deletelookup=lookupfoldM=foldMmapM_=mapM_computeOverhead=computeOverhead------------------------------------------------------------------------------instanceShow(HashTableskv)whereshow_="<HashTable>"-------------------------------------------------------------------------------- | See the documentation for this function in-- "Data.HashTable.Class#v:new".new::STs(HashTableskv)new=newSized30{-# INLINE new #-}-------------------------------------------------------------------------------- | See the documentation for this function in-- "Data.HashTable.Class#v:newSized".newSized::Int->STs(HashTableskv)newSizedn=doletm=nextBestPrime$ceiling(fromIntegraln/maxLoad)ht<-newSizedRealmnewRefht{-# INLINE newSized #-}------------------------------------------------------------------------------newSizedReal::Int->STs(HashTable_skv)newSizedRealm=do-- make sure the hash array is a multiple of cache-line sized so we can-- always search a whole cache line at onceletm'=((m+numWordsInCacheLine-1)`div`numWordsInCacheLine)*numWordsInCacheLineh<-U.newArraym'k<-newArraymundefinedv<-newArraymundefinedld<-U.newArray1dl<-U.newArray1return$!HashTablemlddlhkv-------------------------------------------------------------------------------- | See the documentation for this function in-- "Data.HashTable.Class#v:delete".delete::(Hashablek,Eqk)=>(HashTableskv)->k->STs()deletehtRefk=doht<-readRefhtRef_<-delete'htTruekhreturn()where!h=hashk{-# INLINE delete #-}-------------------------------------------------------------------------------- | See the documentation for this function in-- "Data.HashTable.Class#v:lookup".lookup::(Eqk,Hashablek)=>(HashTableskv)->k->STs(Maybev)lookuphtRef!k=doht<-readRefhtReflookup'htwherelookup'(HashTablesz__hasheskeysvalues)=dolet!b=whichBuckethszdebug$"lookup sz="++showsz++" h="++showh++" b="++showbgob0szwhere!h=hashkgo!b!start!end={-# SCC "lookup/go" #-}doidx<-forwardSearch2hashesbendhemptyMarkerdebug$"forwardSearch2 returned "++showidxif(idx<0||idx<start||idx>=end)thenreturnNothingelsedoh0<-U.readArrayhashesidxdebug$"h0 was "++showh0ifrecordIsEmptyh0thenreturnNothingelsedok'<-readArraykeysidxifk==k'thendodebug$"value found at "++showidxv<-readArrayvaluesidxreturn$!Justvelseifidx<bthengo(idx+1)(idx+1)belsego(idx+1)startend{-# INLINE lookup #-}-------------------------------------------------------------------------------- | See the documentation for this function in-- "Data.HashTable.Class#v:insert".insert::(Eqk,Hashablek)=>(HashTableskv)->k->v->STs()inserthtRef!k!v=doht<-readRefhtRef!ht'<-insert'htwriteRefhtRefht'whereinsert'ht=dodebug"insert': calling delete'"b<-delete'htFalsekhdebug$"insert': writing h="++showh++" b="++showbU.writeArrayhashesbhwriteArraykeysbkwriteArrayvaluesbvcheckOverflowhtwhere!h=hashkhashes=_hasheshtkeys=_keyshtvalues=_valuesht{-# INLINE insert #-}-------------------------------------------------------------------------------- | See the documentation for this function in-- "Data.HashTable.Class#v:foldM".foldM::(a->(k,v)->STsa)->a->HashTableskv->STsafoldMfseed0htRef=readRefhtRef>>=workwherework(HashTablesz__hasheskeysvalues)=go0seed0wherego!i!seed|i>=sz=returnseed|otherwise=doh<-U.readArrayhashesiifrecordIsEmptyh||recordIsDeletedhthengo(i+1)seedelsedok<-readArraykeysiv<-readArrayvaluesi!seed'<-fseed(k,v)go(i+1)seed'-------------------------------------------------------------------------------- | See the documentation for this function in-- "Data.HashTable.Class#v:mapM_".mapM_::((k,v)->STsb)->HashTableskv->STs()mapM_fhtRef=readRefhtRef>>=workwherework(HashTablesz__hasheskeysvalues)=go0wherego!i|i>=sz=return()|otherwise=doh<-U.readArrayhashesiifrecordIsEmptyh||recordIsDeletedhthengo(i+1)elsedok<-readArraykeysiv<-readArrayvaluesi_<-f(k,v)go(i+1)-------------------------------------------------------------------------------- | See the documentation for this function in-- "Data.HashTable.Class#v:computeOverhead".computeOverhead::HashTableskv->STsDoublecomputeOverheadhtRef=readRefhtRef>>=workwherework(HashTablesz'loadRef____)=do!ld<-U.readArrayloadRef0letk=fromIntegralld/szreturn$constOverhead/sz+overheadkwheresz=fromIntegralsz'-- Change these if you change the representationconstOverhead=14overheadk=3/k-2-------------------------------- Private functions follow --------------------------------------------------------------------------------------------------------------{-# INLINE insertRecord #-}insertRecord::Int->U.IntArrays->MutableArraysk->MutableArraysv->Int->k->v->STs()insertRecord!sz!hashes!keys!values!h!key!value=dolet!b=whichBuckethszdebug$"insertRecord sz="++showsz++" h="++showh++" b="++showbprobebwhereprobe!i={-# SCC "insertRecord/probe" #-}do!idx<-forwardSearch2hashesiszemptyMarkerdeletedMarkerdebug$"forwardSearch2 returned "++showidxassert(idx>=0)$doU.writeArrayhashesidxhwriteArraykeysidxkeywriteArrayvaluesidxvalue------------------------------------------------------------------------------checkOverflow::(Eqk,Hashablek)=>(HashTable_skv)->STs(HashTable_skv)checkOverflowht@(HashTableszldRefdelRef___)=do!ld<-U.readArrayldRef0let!ld'=ld+1U.writeArrayldRef0ld'!dl<-U.readArraydelRef0debug$concat["checkOverflow: sz=",showsz," entries=",showld," deleted=",showdl]iffromIntegral(ld+dl)/fromIntegralsz>maxLoadthenifdl>ld`div`2thenrehashAllhtszelsegrowTablehtelsereturnht------------------------------------------------------------------------------rehashAll::Hashablek=>HashTable_skv->Int->STs(HashTable_skv)rehashAll(HashTableszloadRef_hasheskeysvalues)sz'=dodebug$"rehashing: old size "++showsz++", new size "++showsz'ht'<-newSizedRealsz'let(HashTable_loadRef'_newHashesnewKeysnewValues)=ht'U.readArrayloadRef0>>=U.writeArrayloadRef'0rehashnewHashesnewKeysnewValuesreturnht'whererehashnewHashesnewKeysnewValues=go0wherego!i|i>=sz=return()|otherwise={-# SCC "growTable/rehash" #-}doh0<-U.readArrayhashesiwhen(not(recordIsEmptyh0||recordIsDeletedh0))$dok<-readArraykeysiv<-readArrayvaluesiinsertRecordsz'newHashesnewKeysnewValues(hashk)kvgo$i+1------------------------------------------------------------------------------growTable::Hashablek=>HashTable_skv->STs(HashTable_skv)growTableht@(HashTablesz_____)=dolet!sz'=bumpSizeszrehashAllhtsz'-------------------------------------------------------------------------------- Helper data structure for delete'dataSlot=Slot{_slot::{-# UNPACK #-}!Int,_wasDeleted::{-# UNPACK #-}!Int-- we use Int because Bool won't-- unpack}deriving(Show)------------------------------------------------------------------------------instanceMonoidSlotwheremempty=SlotmaxBound0(Slotx1b1)`mappend`(Slotx2b2)=ifx1==maxBoundthenSlotx2b2elseSlotx1b1-------------------------------------------------------------------------------- Returns the slot in the array where it would be safe to write the given key.delete'::(Hashablek,Eqk)=>(HashTable_skv)->Bool->k->Int->STsIntdelete'(HashTableszloadRefdelRefhasheskeysvalues)clearOutkh=dodebug$"delete': sz="++showsz++" h="++showh++" b0="++showb0(found,slot)<-gomemptyb0Falselet!b'=_slotslotwhenfound$bumploadRef(-1)-- bump the delRef lower if we're writing over a deleted markerwhen(notclearOut&&_wasDeletedslot==1)$bumpdelRef(-1)returnb'wherebumprefi=do!ld<-U.readArrayref0U.writeArrayref0$!ld+i!b0=whichBuckethszhaveWrapped!(Slotfp_)!b=iffp==maxBoundthenFalseelseb<=fp-- arguments:-- * fp maintains the slot in the array where it would be safe to-- write the given key-- * b search the buckets array starting at this index.-- * wrap True if we've wrapped around, False otherwisego!fp!b!wrap=dodebug$"go: fp="++showfp++" b="++showb++", wrap="++showwrap!idx<-forwardSearch3hashesbszhemptyMarkerdeletedMarkerdebug$"forwardSearch3 returned "++showidxifwrap&&idx>=b0-- we wrapped around in the search and didn't find our hash code;-- this means that the table is full of deleted elements. Just return-- the first place we'd be allowed to insert.---- TODO: if we get in this situation we should probably just rehash-- the table, because every insert is going to be O(n).thenreturn$!(False,fp`mappend`(Slot(error"impossible")0))elsedo-- because the table isn't full, we know that there must be either-- an empty or a deleted marker somewhere in the table. Assert this-- here.assert(idx>0)$return()h0<-U.readArrayhashesidxdebug$"h0 was "++showh0ifrecordIsEmptyh0thendoletpl=fp`mappend`(Slotidx0)debug$"empty, returning "++showplreturn(False,pl)elsedolet!wrap'=haveWrappedfpidxifrecordIsDeletedh0thendoletpl=fp`mappend`(Slotidx1)debug$"deleted, cont with pl="++showplgopl(idx+1)wrap'elseifh==h0thendok'<-readArraykeysidxifk==k'thendoletsamePlace=_slotfp==idxdebug$"found at "++showidxdebug$"clearout="++showclearOutdebug$"sp? "++showsamePlace-- "clearOut" is set if we intend to write a new-- element into the slot. If we're doing an update-- and we found the old key, instead of writing-- "deleted" and then re-writing the new element-- there, we can just write the new element. This-- only works if we were planning on writing the-- new element here.when(clearOut||notsamePlace)$dobumpdelRef1U.writeArrayhashesidx1writeArraykeysidxundefinedwriteArrayvaluesidxundefinedreturn(True,fp`mappend`(Slotidx0))elsegofp(idx+1)wrap'elsegofp(idx+1)wrap'------------------------------------------------------------------------------maxLoad::DoublemaxLoad=0.82------------------------------------------------------------------------------emptyMarker::IntemptyMarker=0------------------------------------------------------------------------------deletedMarker::IntdeletedMarker=1------------------------------------------------------------------------------{-# INLINE recordIsEmpty #-}recordIsEmpty::Int->BoolrecordIsEmpty=(==emptyMarker)------------------------------------------------------------------------------{-# INLINE recordIsDeleted #-}recordIsDeleted::Int->BoolrecordIsDeleted=(==deletedMarker)------------------------------------------------------------------------------{-# INLINE hash #-}hash::(Hashablek)=>k->Inthashk=outwhere!(I#h#)=H.hashk!m#=maskw#h#0#`or#`maskw#h#1#!nm#=not#m#!r#=((int2Word#2#)`and#`m#)`or#`(int2Word#h#`and#`nm#)!out=I#(word2Int#r#)------------------------------------------------------------------------------newRef::HashTable_skv->STs(HashTableskv)newRef=liftMHT.newSTRef{-# INLINE newRef #-}writeRef::HashTableskv->HashTable_skv->STs()writeRef(HTref)ht=writeSTRefrefht{-# INLINE writeRef #-}readRef::HashTableskv->STs(HashTable_skv)readRef(HTref)=readSTRefref{-# INLINE readRef #-}------------------------------------------------------------------------------{-# INLINE debug #-}debug::String->STs()#ifdef DEBUGdebugs=unsafeIOToST(putStrLns)#elsedebug_=return()#endif