oleg:
> Given the stance against top-level mutable variables, I have not
> expected to see this Lazy IO code. After all, what could be more against
> the spirit of Haskell than a `pure' function with observable side
> effects. With Lazy IO, one indeed has to choose between correctness
> and performance. The appearance of such code is especially strange
> after the evidence of deadlocks with Lazy IO, presented on this list
> less than a month ago. Let alone unpredictable resource usage and
> reliance on finalizers to close files (forgetting that GHC does not
> guarantee that finalizers will be run at all).
>> Is there an alternative?
Hi Oleg!
I'm glad you joined the thread at this point.
Some background: our best solutions for this problem using lazy IO, are
based on chunk-wise lazy data structures, typically lazy bytestrings.
Often we'll write programs like:
import qualified Data.ByteString.Lazy.Char8 as B
import System.Environment
main = do
[f] <- getArgs
s <- B.readFile f
print (B.count '\n' s)
Which are nicely efficient
$ ghc -O2 A.hs --make
$ du -hs data
100M data
$ time ./A data
11078540
./A data 0.17s user 0.04s system 100% cpu 0.210 total
And we know from elsewhere the performance is highlycompetitive:
http://shootout.alioth.debian.org/gp4/benchmark.php?test=sumcol&lang=all
Now, enumerators are very promising, and there's a lot of interest at
the moment, (e.g. just this week, Johan Tibell gave an inspiring talk at
Galois about this approach to IO, http://www.galois.com/blog/2008/09/12/left-fold-enumerators-a-safe-expressive-and-efficient-io-interface-for-haskell/
and we spent the day sketching out an enumerator bytestring design,
But there are some open questions. Perhaps you have some answers?
* Can we write a Data.ByteString.Enumerator that has matching or
better performance than its "dual", the existing chunk-wise lazy
stream type?
* Is there a translation from
data ByteString = Empty | Chunk {-# UNPACK #-} !S.ByteString ByteString
and functions on this type,
foldlChunks :: (a -> S.ByteString -> a) -> a -> ByteString -> a
foldlChunks f z = go z
where
go !a Empty = a
go !a (Chunk c cs) = go (f a c) cs
to an enumerator implementation?
* Can we compose enumerators as we can stream functions?
* Can we do fusion on enumerators? Does that make composition easier?
(Indeed, is there an encoding of enumerators analogous to stream
fusion control?)
Any thoughts?
-- Don