Hi,
I've been playing around with some code to track the file offsets of
data being read from an iteratee. It is currently in a branch named
"offset-bytestring" in zoom-cache, but I'd like some feedback before
doing anything more with it (eg. merging it into something, splitting
it out into a new package etc.)
The approach is to introduce a wrapper type (Offset a):
> data Offset a = Offset {-# UNPACK #-}!FileOffset !a
with instances for Nullable, NullPoint, Monoid, FoldableLL, ListLike defined in:
https://github.com/kfish/zoom-cache/blob/offset-bytestring/Data/Offset.hs
We then read data from a file using an enumerator of stream (Offset
ByteString). When each Chunk (Offset ByteString) is constructed it is
tagged with the current file position before reading; see
makeFdCallbackOBS in:
https://github.com/kfish/zoom-cache/blob/offset-bytestring/Data/Iteratee/IO/OffsetFd.hs
Data can be read from such a stream using any Iteratee ByteString,
with the iteratee transformer convOffset (a hacked-up version of
countConsumed, which Alex Lang recently contributed to iteratee; I
wonder if it's possible to implement convOffset using
countConsumed...). There are also iteratee versions of tell, take etc.
which operate on (Offset ByteString) and update the offset tag
appropriately, defined in:
https://github.com/kfish/zoom-cache/blob/offset-bytestring/Data/Iteratee/Offset.hs
All this seems to be working ok, and that branch of zoom-cache reports
the offsets of packets and summaries, and the corresponding branch of
scope works on streams of type (Offset Block) etc.
The definition of (Offset a) may be inefficient, and the
implementation of OffsetFd.hs is currently just a hacked-up version of
Data.Iteratee.IO.Posix.
The next step in zoom-cache is to actually use the file offsets for
building seek tables and so on. However I'm wondering if this approach
is the right way to go, or is there a simpler way to associate file
offsets with an iteratee stream.
thoughts?
Conrad.