{-# LANGUAGE ScopedTypeVariables #-}-- | Functions to represent a 'Vector' on disk in efficient, if-- unportable, ways.---- This module uses memory-mapping, a feature of all modern-- operating-systems, to mirror the disk contents in memory. There are-- quite a few advantages to memory-mapping files instead of reading-- the files traditionally:---- * Speed: memory-mapping is often much faster than traditional-- reading.---- * Memory efficiency: Memory-mapped files are loaded into RAM-- on-demand, and easily swapped out. The upside is that the-- program can work with data-sets larger than the available RAM,-- as long as they are accessed carefully.---- The caveat to using memory-mapping is that it makes the files-- specific to the current architecture because of the endianness of-- the data. For more information, see the description in-- "System.IO.MMap"---- If you wish to write the contents in a portable fashion, either use-- the ASCII load and save functions in "Numeric.Container", or use-- the binary serialization in "Data.Binary".moduleData.Packed.Vector.MMap(-- * Memory-mapping 'Vector' from diskunsafeMMapVector,unsafeLazyMMapVectors,-- * Writing 'Vector' to disk-- | These functions write the 'Vector' in a way suitable for-- reading back with 'unsafeMMapVector'.hPutVector,writeVector)whereimportControl.Monad(when)importSystem.IOimportSystem.IO.MMapimportSystem.IO.UnsafeimportForeign.ForeignPtrimportForeign.PtrimportForeign.StorableimportqualifiedData.Packed.DevelopmentasIimportqualifiedData.Packed.VectorasIimportData.Int----------------------------- Memory-Mapping 'Vector' from disk-- | Map a file into memory (read-only) as a 'Vector'.---- It is considered unsafe because changes to the underlying file may-- (or may not) be reflected in the 'Vector', which breaks referential-- transparency.unsafeMMapVector::foralla.Storablea=>FilePath-- ^ Path of the file to map->Maybe(Int64,Int)-- ^ 'Nothing' to map entire file into memory, otherwise 'Just (fileOffset, elementCount)'->IO(I.Vectora)unsafeMMapVectorpathrange=do(foreignPtr,offset,size)<-mmapFileForeignPtrpathReadOnly$caserangeofNothing->NothingJust(start,length)->Just(start,length*sizeOf(undefined::a))return$I.unsafeFromForeignPtrforeignPtroffset(size`div`sizeOf(undefined::a))-- | Map a file into memory as a lazy-list of equal-sized 'Vector',-- even if they can't all fit in the address space at the same time.---- > (numVectors,vectors) <- unsafeLazyMMapVectors filename Nothing vectorSize---- Commonly, a data file will contain multiple vectors of equal length-- (matrix). This function is convenient for those uses, but it plays-- a more important role: supporting data-sets that cannot fit in the-- address space of the current machine.---- On 32-bit machines the address space is only 4GB, and it is-- actually pretty easy to find data-sets that are too large to be-- represented, even in virtual memory.---- This function loads the data in chunks, and as long as you drop-- your reference to the vectors as you consume the data, the old-- chunks will be unmapped before mapping the next chunk.---- The number of vectors in the list is returned because it's often-- needed, yet calculating it using 'length' would demand the whole-- list.unsafeLazyMMapVectors::foralla.Storablea=>FilePath-- ^ Path of the file to map->Maybe(Int64,Int64)-- ^ 'Nothing' to map entire file into memory,-- otherwise @'Just' (fileOffset, totalElementCount)@->Int-- ^ The number of elements in each 'Vector'->IO(Int64,[I.Vectora])-- ^ Return @(numberOfVectors,vectors)@unsafeLazyMMapVectorspathrangevsize=dowhen(vecSize>maxChunkSize)vecTooBigErrorfilesize<-withFilepathReadModehFileSizeletfilesize'::Int64filesize'=fIfilesizeimgs<-unsafeLazyMMapVectors'filesize'pathrangevsizereturn(nimagesrangefilesize',imgs)wherenimages::Maybe(Int64,Int64)->Int64->Int64nimagesNothingfsz=fsz`div`imageSizenimages(Just(_,sz))_=sz`div`imageSizeimageSize=fIvsize*eltSizeeltSize=fI(sizeOf(undefined::a))vecSize=fIvsize*eltSizevecTooBigError=fail"The requested vector size can't be mapped into memory"unsafeLazyMMapVectors'::foralla.Storablea=>Int64->FilePath->Maybe(Int64,Int64)->Int->IO[I.Vectora]unsafeLazyMMapVectors'fileSizefileNamefileRangenumEltsPerVec|mapSize<maxChunkSize=mmapAll|otherwise=mmapChunks0wheremapSize,eltSize,vecSize,chunkSize,baseOffset::Int64eltSize=fI$sizeOf(undefined::a)(baseOffset,mapSize)=casefileRangeofJust(off,nelts)->(off,nelts*eltSize)_->(0,fileSize)vecSize=fInumEltsPerVec*eltSizechunkSize=(maxChunkSize`div`vecSize)*vecSizefileRange'=do(offset,nelts)<-fileRangereturn(offset,fInelts)splitVecs::I.Vectora->[I.Vectora]splitVecsbigVec=letnvecs=I.dimbigVec`div`numEltsPerVecinI.takesV(replicatenvecsnumEltsPerVec)bigVecmmapAll::IO[I.Vectora]mmapAll=doallVecs<-unsafeMMapVectorfileNamefileRange'return$splitVecsallVecsmmapChunks::Int64->IO[I.Vectora]mmapChunksoffs|remaining<=0=return[]|otherwise=dochunk<-unsafeMMapVectorfileNamemmapRangerest<-unsafeInterleaveIO$mmapChunks(offs+chunkSize')return$splitVecschunk++restwheremmapRange=Just(baseOffset+offs,fI(chunkSize'`div`eltSize))remaining=mapSize-offschunkSize'=minchunkSizeremaining-- Maximum size for chunksmaxChunkSize::Int64maxChunkSize=fI(maxBound`div`4::Int)-- Handy alias for 'fromIntegral'fI::(Integrala,Numb)=>a->bfI=fromIntegral----------------------------- Writing 'Vector' to disk-- | Write out a vector verbatim into an open file handle.hPutVector::foralla.Storablea=>Handle->I.Vectora->IO()hPutVectorhv=withForeignPtrfp$\p->hPutBufh(p`plusPtr`offset)szwhere(fp,offset,n)=I.unsafeToForeignPtrveltsize=sizeOf(undefined::a)sz=n*eltsize-- | Write the vector verbatim to a file.writeVector::foralla.Storablea=>FilePath->I.Vectora->IO()writeVectorfpv=withFilefpWriteMode$\h->hPutVectorhv