{-# LANGUAGE BangPatterns #-}{-# LANGUAGE CPP #-}{-# LANGUAGE MagicHash #-}{-# LANGUAGE RankNTypes #-}{-| An implementation of linear hash tables. (See
<http://en.wikipedia.org/wiki/Linear_hashing>). Use this hash table if you...
* care a lot about fitting your data set into memory; of the hash tables
included in this collection, this one has the lowest space overhead
* don't care that inserts and lookups are slower than the other hash table
implementations in this collection (this one is slightly faster than
@Data.HashTable@ from the base library in most cases)
* have a soft real-time or interactive application for which the risk of
introducing a long pause on insert while all of the keys are rehashed is
unacceptable.
/Details:/
Linear hashing allows for the expansion of the hash table one slot at a time,
by moving a \"split\" pointer across an array of pointers to buckets. The
number of buckets is always a power of two, and the bucket to look in is
defined as:
@
bucket(level,key) = hash(key) mod (2^level)
@
The \"split pointer\" controls the expansion of the hash table. If the hash
table is at level @k@ (i.e. @2^k@ buckets have been allocated), we first
calculate @b=bucket(level-1,key)@. If @b < splitptr@, the destination bucket is
calculated as @b'=bucket(level,key)@, otherwise the original value @b@ is used.
The split pointer is incremented once an insert causes some bucket to become
fuller than some predetermined threshold; the bucket at the split pointer
(*not* the bucket which triggered the split!) is then rehashed, and half of its
keys can be expected to be rehashed into the upper half of the table.
When the split pointer reaches the middle of the bucket array, the size of the
bucket array is doubled, the level increases, and the split pointer is reset to
zero.
Linear hashing, although not quite as fast for inserts or lookups as the
implementation of linear probing included in this package, is well suited for
interactive applications because it has much better worst case behaviour on
inserts. Other hash table implementations can suffer from long pauses, because
it is occasionally necessary to rehash all of the keys when the table grows.
Linear hashing, on the other hand, only ever rehashes a bounded (effectively
constant) number of keys when an insert forces a bucket split.
/Space overhead: experimental results/
In randomized testing (see @test\/compute-overhead\/ComputeOverhead.hs@ in the
source distribution), mean overhead is approximately 1.51 machine words per
key-value mapping with a very low standard deviation of about 0.06 words, 1.60
words per mapping at the 95th percentile.
/Unsafe tricks/
Then the @unsafe-tricks@ flag is on when this package is built (and it is on by
default), we use some unsafe tricks (namely 'unsafeCoerce#' and
'reallyUnsafePtrEquality#') to save indirections in this table. These
techniques rely on assumptions about the behaviour of the GHC runtime system
and, although they've been tested and should be safe under normal conditions,
are slightly dangerous. Caveat emptor. In particular, these techniques are
incompatible with HPC code coverage reports.
References:
* W. Litwin. Linear hashing: a new tool for file and table addressing. In
/Proc. 6th International Conference on Very Large Data Bases, Volume 6/,
pp. 212-223, 1980.
* P-A. Larson. Dynamic hash tables. /Communications of the ACM/ 31:
446-457, 1988.
-}moduleData.HashTable.ST.Linear(HashTable,new,newSized,delete,lookup,insert,mapM_,foldM,computeOverhead)where------------------------------------------------------------------------------importControl.Monadhiding(mapM_,foldM)importControl.Monad.STimportData.BitsimportData.HashableimportData.STRefimportPreludehiding(mapM_,lookup)------------------------------------------------------------------------------importqualifiedData.HashTable.ClassasCimportData.HashTable.Internal.ArrayimportqualifiedData.HashTable.Internal.Linear.BucketasBucketimportData.HashTable.Internal.Linear.Bucket(Bucket)importData.HashTable.Internal.Utils#ifdef DEBUGimportSystem.IO#endif-------------------------------------------------------------------------------- | A linear hash table.newtypeHashTableskv=HT(STRefs(HashTable_skv))dataHashTable_skv=HashTable{_level::{-# UNPACK #-}!Int,_splitptr::{-# UNPACK #-}!Int,_buckets::{-# UNPACK #-}!(MutableArrays(Bucketskv))}------------------------------------------------------------------------------instanceC.HashTableHashTablewherenew=newnewSized=newSizedinsert=insertdelete=deletelookup=lookupfoldM=foldMmapM_=mapM_computeOverhead=computeOverhead------------------------------------------------------------------------------instanceShow(HashTableskv)whereshow_="<HashTable>"-------------------------------------------------------------------------------- | See the documentation for this function in-- "Data.HashTable.Class#v:new".new::STs(HashTableskv)new=dov<-Bucket.newBucketArray2newRef$HashTable10v-------------------------------------------------------------------------------- | See the documentation for this function in-- "Data.HashTable.Class#v:newSized".newSized::Int->STs(HashTableskv)newSizedn=dov<-Bucket.newBucketArraysznewRef$HashTablelvl0vwherek=ceiling(fromIntegraln*fillFactor/fromIntegralbucketSplitSize)lvl=max1(fromEnum$log2k)sz=power2lvl-------------------------------------------------------------------------------- | See the documentation for this function in-- "Data.HashTable.Class#v:delete".delete::(Hashablek,Eqk)=>(HashTableskv)->k->STs()deletehtRef!k=readRefhtRef>>=workwherework(HashTablelvlsplitptrbuckets)=dolet!h0=hashKeylvlsplitptrkdebug$"delete: size="++show(power2lvl)++", h0="++showh0++"splitptr: "++showsplitptrdelete'bucketsh0k{-# INLINE delete #-}-------------------------------------------------------------------------------- | See the documentation for this function in-- "Data.HashTable.Class#v:lookup".lookup::(Eqk,Hashablek)=>(HashTableskv)->k->STs(Maybev)lookuphtRef!k=readRefhtRef>>=workwherework(HashTablelvlsplitptrbuckets)=doleth0=hashKeylvlsplitptrkbucket<-readArraybucketsh0Bucket.lookupbucketk{-# INLINE lookup #-}-------------------------------------------------------------------------------- | See the documentation for this function in-- "Data.HashTable.Class#v:insert".insert::(Eqk,Hashablek)=>(HashTableskv)->k->v->STs()inserthtRefkv=doht'<-readRefhtRef>>=workwriteRefhtRefht'whereworkht@(HashTablelvlsplitptrbuckets)=dolet!h0=hashKeylvlsplitptrkdelete'bucketsh0kbsz<-primitiveInsert'bucketsh0kvifcheckOverflowbszthendodebug$"insert: splitting"h<-splithtdebug$"insert: done splitting"returnhelsedodebug$"insert: done"returnht{-# INLINE insert #-}-------------------------------------------------------------------------------- | See the documentation for this function in-- "Data.HashTable.Class#v:mapM_".mapM_::((k,v)->STsb)->HashTableskv->STs()mapM_fhtRef=readRefhtRef>>=workwherework(HashTablelvl_buckets)=go0where!sz=power2lvlgo!i|i>=sz=return()|otherwise=dob<-readArraybucketsiBucket.mapM_fbgo$i+1-------------------------------------------------------------------------------- | See the documentation for this function in-- "Data.HashTable.Class#v:foldM".foldM::(a->(k,v)->STsa)->a->HashTableskv->STsafoldMfseed0htRef=readRefhtRef>>=workwherework(HashTablelvl_buckets)=goseed00where!sz=power2lvlgo!seed!i|i>=sz=returnseed|otherwise=dob<-readArraybucketsi!seed'<-Bucket.foldMfseedbgoseed'$i+1-------------------------------------------------------------------------------- | See the documentation for this function in-- "Data.HashTable.Class#v:computeOverhead".computeOverhead::HashTableskv->STsDoublecomputeOverheadhtRef=readRefhtRef>>=workwherework(HashTablelvl_buckets)=do(totElems,overhead)<-go000letn=fromIntegraltotElemsleto=fromIntegraloverheadreturn$(fromIntegralsz+constOverhead+o)/nwhereconstOverhead=5.0!sz=power2lvlgo!nelems!overhead!i|i>=sz=return(nelems,overhead)|otherwise=dob<-readArraybucketsi(!n,!o)<-Bucket.nelemsAndOverheadInWordsblet!n'=n+nelemslet!o'=o+overheadgon'o'(i+1)-------------------------------- Private functions follow --------------------------------------------------------------------------------------------------------------delete'::Eqk=>MutableArrays(Bucketskv)->Int->k->STs()delete'bucketsh0k=dobucket<-readArraybucketsh0_<-Bucket.deletebucketkreturn()------------------------------------------------------------------------------split::(Hashablek)=>(HashTable_skv)->STs(HashTable_skv)splitht@(HashTablelvlsplitptrbuckets)=dodebug$"split: start: nbuck="++show(power2lvl)++", splitptr="++showsplitptr-- grab bucket at splitPtroldBucket<-readArraybucketssplitptrnelems<-Bucket.sizeoldBucketlet!bsz=maxBucket.newBucketSize$ceiling$(0.625::Double)*fromIntegralnelems-- write an empty bucket theredbucket1<-Bucket.emptyWithSizebszwriteArraybucketssplitptrdbucket1-- grow the buckets?letlvl2=power2lvlletlvl1=power2$lvl-1(!buckets',!lvl',!sp')<-ifsplitptr+1>=lvl1thendodebug$"split: resizing bucket array"letlvl3=2*lvl2b<-Bucket.expandBucketArraylvl3lvl2bucketsdebug$"split: resizing bucket array: done"return(b,lvl+1,0)elsereturn(buckets,lvl,splitptr+1)letht'=HashTablelvl'sp'buckets'-- make sure the other split bucket has enough room in it alsoletsplitOffs=splitptr+lvl1db2<-readArraybuckets'splitOffsdb2sz<-Bucket.sizedb2letdb2sz'=db2sz+bszdb2'<-Bucket.growBucketTodb2sz'db2debug$"growing bucket at "++showsplitOffs++" to size "++showdb2sz'writeArraybuckets'splitOffsdb2'-- rehash old bucketdebug$"split: rehashing bucket"letf=uncurry$primitiveInsertht'forceSameTypef(uncurry$primitiveInsertht)Bucket.mapM_foldBucketdebug$"split: done"returnht'------------------------------------------------------------------------------checkOverflow::Int->BoolcheckOverflowsz=sz>bucketSplitSize-------------------------------------------------------------------------------- insert w/o splittingprimitiveInsert::(Hashablek)=>(HashTable_skv)->k->v->STsIntprimitiveInsert(HashTablelvlsplitptrbuckets)kv=dodebug$"primitiveInsert start: nbuckets="++show(power2lvl)leth0=hashKeylvlsplitptrkprimitiveInsert'bucketsh0kv------------------------------------------------------------------------------primitiveInsert'::MutableArrays(Bucketskv)->Int->k->v->STsIntprimitiveInsert'buckets!h0!k!v=dodebug$"primitiveInsert': bucket number="++showh0bucket<-readArraybucketsh0debug$"primitiveInsert': snoccing bucket"(!hw,m)<-Bucket.snocbucketkvdebug$"primitiveInsert': bucket snoc'd"maybe(return())(writeArraybucketsh0)mreturnhw------------------------------------------------------------------------------fillFactor::DoublefillFactor=1.3------------------------------------------------------------------------------bucketSplitSize::IntbucketSplitSize=Bucket.bucketSplitSize------------------------------------------------------------------------------{-# INLINE power2 #-}power2::Int->Intpower2i=1`iShiftL`i------------------------------------------------------------------------------{-# INLINE hashKey #-}hashKey::(Hashablek)=>Int->Int->k->InthashKey!lvl!splitptr!k=h1where!h0=hashAtLvl(lvl-1)k!h1=if(h0<splitptr)thenhashAtLvllvlkelseh0------------------------------------------------------------------------------{-# INLINE hashAtLvl #-}hashAtLvl::(Hashablek)=>Int->k->InthashAtLvl!lvl!k=hwhere!h=hashcode.&.mask!hashcode=hashk!mask=power2lvl-1------------------------------------------------------------------------------newRef::HashTable_skv->STs(HashTableskv)newRef=liftMHT.newSTRefwriteRef::HashTableskv->HashTable_skv->STs()writeRef(HTref)ht=writeSTRefrefhtreadRef::HashTableskv->STs(HashTable_skv)readRef(HTref)=readSTRefref------------------------------------------------------------------------------{-# INLINE debug #-}debug::String->STs()#ifdef DEBUGdebugs=unsafeIOToST$doputStrLnshFlushstdout#else#ifdef TESTSUITEdebug!s=dolet!_=lengthsreturn$!()#elsedebug_=return()#endif#endif