on the type of operation) on a 1GHz cpu. The Handle operations, for comparision, show speed of 1-10 mb/s on the same computer. But that don't means that each and any operation in new library is 10 times faster. Strict I/O (including vGetChar/vPutChar) is a LOT faster. I included demonstration of this fascinating speed as the "Examples/wc.hs". If you need a really high speed, don't forget to set larger buffer size by the "vSetBuffering".

+

on the type of operation) on a 1GHz cpu. The Handle operations, for comparison, show speed of 1-10 mb/s on the same computer. But that don't means that each and any operation in new library is 10 times faster. Strict I/O (including vGetChar/vPutChar) is a LOT faster. I included a demonstration of this fascinating speed as "Examples/wc.hs". If you need a really high speed, don't forget to increase buffer size with "vSetBuffering".

−

On the other side, lazy I/O (including any operations that receives or returns strings) show only modest speedup. It is the limits of Haskell/GHC itself and I can't do much to get around these limits. Instead, I plan to provide support for I/O using packed strings. This will allow to write I/O-intensive Haskell programs that is as fast as their C counterparts.

+

On the other side, lazy I/O (including any operations that receive or return strings) show only modest speedup. This is limited by Haskell/GHC itself and I can't do much to get around these limits. Instead, I plan to provide support for I/O using packed strings. This will allow to write I/O-intensive Haskell programs that are as fast as their C counterparts.

The library includes benchmarking code in the file "Examples/StreamsBenchmark.hs"

The library includes benchmarking code in the file "Examples/StreamsBenchmark.hs"

Line 132:

Line 132:

Main disadvantage of the library is that it supports only Hugs and GHC

Main disadvantage of the library is that it supports only Hugs and GHC

−

because of using extensions in type classes system. I think that it

+

because of using extensions in type classe system. I think that it

can be made H98-compatible at cost of excluding support for non-IO

can be made H98-compatible at cost of excluding support for non-IO

−

monads. I will try to make such stripped version for other compilers

+

monads. I will try to make such a stripped version for other compilers

−

if this will arose interest.

+

if people are interested.

== Overview of Stream Transformers ==

== Overview of Stream Transformers ==

Line 146:

Line 146:

support only vGetBuf/vPutBuf (or vReceiveBuf/vSendBuf) operations.

support only vGetBuf/vPutBuf (or vReceiveBuf/vSendBuf) operations.

−

The first transformer can be applied to any streams supporting

+

The first transformer can be applied to any stream supporting

vGetBuf/vPutBuf. This is applied by the operation "bufferBlockStream". The

vGetBuf/vPutBuf. This is applied by the operation "bufferBlockStream". The

1 Introduction

1.1 Streams: the extensible I/O library

I have developed a new I/O library that IMHO is so sharp that it can
eventually replace the current I/O facilities based on using Handles.
The main advantage of the new library is its strong modular design using
typeclasses. The library consists of small independent modules, each
implementing one type of stream (file, memory buffer, pipe) or one
part of common stream functionality (buffering, Char encoding,
locking). 3rd-party libs can easily add new stream types and new common
functionality. Other benefits of the new library include support for
streams functioning in any monad, Hugs and GHC compatibility, high
speed and an easy migration path from the existing I/O library.

The Streams library is heavily based on the HVIO module written by John
Goerzen. I especially want to thank John for his clever design and
implementation. Really, I just renamed HVIO to Stream and presented
this as my own work. :) Further development direction was inspired
by the "New I/O library" written by Simon Marlow.

1.2 Simple Streams

The key concept of the lib is the Stream class, whose interface mimics
familiar interface for Handles, just with "h" replaced with "v" in
function names:

This means that you already know how to use any stream! The Stream interface
currently has 8 implementations: a Handle itself, raw files, pipes,
memory buffers and string buffers. Future plans include support for
memory-mapped files, sockets, circular memory buffers for interprocess
communication and UArray-based streams.

By themselves, these Stream implementations are rather simple. Basically,
to implement new Stream type, it's enough to provide vPutBuf/vGetBuf
operations, or even vGetChar/vPutChar. The latter way, although
inefficient, allows us to implement streams that can work in any monad.
StringReader and StringBuffer streams use this to provide string-based
Stream class implementations both for IO and ST monads. Yes, you can
use the full power of Stream operations inside the ST monad!

1.3 Layers of Functionality

All additional functionality is implemented via Stream Transformers,
which are just parameterized Streams, whose parameters also
implement the Stream interface. This allows you to apply any number of stream
transformers to the raw stream and then use the result as an ordinary
Stream. For example:

This code creates a new FD, which represents a raw file, and then adds
to this Stream buffering, Char encoding and locking functionality. The
result type of "h" is something like this:

WithLocking (WithEncoding (BufferedBlockStream FD))

The complete type, as well as all the intermediate types, implements the Stream
interface. Each transformer intercepts operations corresponding to its
nature, and passes the rest through. For example, the encoding transformer
intercepts only vGetChar/vPutChar operations and translates them to
the sequences of vGetByte/vPutByte calls of the lower-level stream.
The locking transformer just wraps any operation in the locking wrapper.

We can trace, for example, the execution of a "vPutBuf" operation on the
above-constructed Stream. First, the locking transformer acquires a lock
and then passes this call to the next level. Then the encoding transformer does
nothing and passes this call to the next level. The buffering
transformer flushes the current buffer and passes the call further.
Finally, FD itself performs the operation after all these
preparations and on the returning path, the locking transformer release
its lock.

As another example, the "vPutChar" call on this Stream is
transformed (after locking) into several "vPutByte" calls by the
encoding transformer, and these bytes go to the buffer in the
buffering transformer, with or without a subsequent call to the FD's
"vPutBuf".

1.4 Modularity

As you can see, stream transformers really are independent of each
other. This allows you to use them on any stream and in any combination
(but you should apply them in proper order - buffering, then Char
encoding, then locking). As a result, you can apply to the stream
only the transformers that you really need. If you don't use the
stream in multiple threads, you don't need to apply the locking
transformer. If you don't use any encodings other than Latin-1 -- or
don't use text I/O at all -- you don't need an encoding transformer.
Moreover, you may not even need to know anything about the UserData transformer
until you actually need to use it :)

Both streams and stream transformers can be implemented by 3rd-party
libraries. Streams and transformers from arbitrary libraries will
seamlessly work together as long as they properly implement the Stream
interface. My future plans include implementation of an on-the-fly
(de)compression transformer and I will be happy to see 3rd-party
transformers that intercept vGetBuf/vPutBuf calls and use select(),
kqueue() and other methods to overlap I/O operations.

1.5 Speed

A quick comment about speed: it's fast enough -- 10-50 MB/s (depending
on the type of operation) on a 1GHz cpu. The Handle operations, for comparison, show speed of 1-10 mb/s on the same computer. But that don't means that each and any operation in new library is 10 times faster. Strict I/O (including vGetChar/vPutChar) is a LOT faster. I included a demonstration of this fascinating speed as "Examples/wc.hs". If you need a really high speed, don't forget to increase buffer size with "vSetBuffering".

On the other side, lazy I/O (including any operations that receive or return strings) show only modest speedup. This is limited by Haskell/GHC itself and I can't do much to get around these limits. Instead, I plan to provide support for I/O using packed strings. This will allow to write I/O-intensive Haskell programs that are as fast as their C counterparts.

The library includes benchmarking code in the file "Examples/StreamsBenchmark.hs"

1.6 Stage of Development

The library is currently at the beta stage. It contains a number of
known minor problems and an unknown number of yet-to-be-discovered bugs.
It is not properly documented, doesn't include QuickCheck tests, is not
cabalized, and not all "h*" operations still have their "v*" equivalents.
If anyone wants to join this effort in order to help fix these oddities
and prepare the lib for inclusion in the standard libraries suite, I would
be really happy. :) I will also be happy (although much less ;) to see
bug reports and suggestions about its interface and internal
organization. It's just a first public version, so we still can change
everything here!

Main disadvantage of the library is that it supports only Hugs and GHC
because of using extensions in type classe system. I think that it
can be made H98-compatible at cost of excluding support for non-IO
monads. I will try to make such a stripped version for other compilers
if people are interested.

2 Overview of Stream Transformers

2.1 Buffering

There are 3 buffering transformers. Each buffering transformer
implements support for vGetByte, vPutChar, vGetContents and other
byte- and text-oriented operations for the streams, which by themselves
support only vGetBuf/vPutBuf (or vReceiveBuf/vSendBuf) operations.

The first transformer can be applied to any stream supporting
vGetBuf/vPutBuf. This is applied by the operation "bufferBlockStream". The
well-known vSetBuffering/vGetBuffering operations are intercepted by
this transformer and used to control buffer size. At this moment, only
BlockBuffering is implemented, while LineBuffering and NoBuffering are
only in the planning stages.

Two other transformers can be applied to streams that implement
vReceiveBuf/vSendBuf operations -- that is, streams whose data
resides in memory, including in-memory streams and memory-mapped
files. In these cases, the buffering transformer doesn't need to allocate
a buffer itself, it just requests from the underlying stream the address and
size of the next available portion of data. Nevertheless, the final
result is the same -- we get support for all byte- and text-oriented
I/O operations. The "bufferMemoryStream" operation can be applied to any
memory-based stream to add buffering to it. The "bufferMemoryStreamUnchecked" operation (which implements the third buffering
transformer) can be used instead, if you can guarantee that I/O
operations can't overflow used buffer.

2.2 Encoding

The Char encoding transformer allows you to encode each Char written to the
stream as a sequence of bytes, implementing UTF and other encodings.
This transformer can be applied to any stream implementing
vGetByte/vPutByte operations and in return it implements
vGetChar/vPutChar and all other text-oriented operations. This
transformer can be applied to a stream with the "withEncoding encoding"
operation, where `encoding` may be `latin1`, `utf8` or any other
encoding that you (or a 3rd-party lib) implement. Look at the
"Data.CharEncoding" module to see how to implement new encodings.
Encoding of streams created with the "withEncoding" operation can be
changed at any moment with "vSetEncoding" and queried with
"vGetEncoding". See examples of their usage in the file
"Examples/CharEncoding.hs"

2.3 Locking

The locking transformer ensures that the stream is properly shared by
several threads. You already know enough about its basic usage --
"withLocking" applies this transformer to the stream and all the
required locking is performed automagically. You can also use "lock"
operations to acquire the lock explicitly during multiple operations:

See the file "Examples/Locking.hs" for examples of using locking transformer.

2.4 Attaching user data

This transformer allows you to attach arbitrary data to any Stream. It does nothing extraordinary except that the stream with attached data is the proper Stream, again. See example of its usage in the file "Examples/UserData.hs"

3 Overview of Stream Types

3.1 Handle (legacy way to access files/sockets)

Handle is an instance of the Stream class, with a straightforward implementation. You can use the
Char encoding transformer with Handles. Although Handles implement
buffering and locking by themselves, you may also be interested in
applying these transformers to the Handle type. This has
benefits -- "bufferBlockStream" works faster than internal Handle
buffering, and the locking transformer enables the use of a "lock" operation to
create a lock around a sequence of operations. Moreover, the locking
transformer should be used to ensure proper multi-threading operation
of Handle with added encoding or buffering facilities.

3.2 FD (new way to access files)

The new method of using files, independent of the existing I/O
library, is implemented with the FD type. FD is just an Int representing a
POSIX file descriptor and FD type implements only basic Stream I/O
operations - vGetBuf and vPutBuf. So, to create a full-featured FD-based
stream, you need to apply buffering transformers. Therefore, library
defines two ways to open files with FD - openRawFD/openRawBinaryFD
just creates FD, while openFD/openBinaryFD creates FD and immediatelly
apply buffering transformer (bufferBlockStream) to it. In most cases
you will use the later operations. Both pairs mimics the arguments and
behaviour of well-known Handle operations openFile/openBinaryFile, so
you already know how to use them. Other transformers may be used then
as you need. So, abovementioned example can be abbreviated to:

h <- openFD "test" WriteMode
>>= withEncoding utf8
>>= withLocking

Thus, to switch from the existing I/O library to using Streams, you
need only to replace "h" with "v" in names of Handle operations, and
replace openFile/openBinaryFile calls with openFD/openBinaryFD while
adding "withLocking" transformer to files used in multiple threads.
That's all!

File "Examples/FD.hs" will show you the usage of FD to work with files.

3.3 MemBuf (memory-resident stream)

MemBuf is a stream type, that keeps its contents in memory buffer.
There are two types of MemBufs you can create - you can either open
existing memory buffer with "openMemBuf ptr size" or create new one
with "createMemBuf initsize". MemBuf opened by "openMemBuf" will be
never resized or moved in memory, and will not be freed by "vClose".
MemBuf created by "createMemBuf" will grow as needed, can be manually
resized by "vSetFileSize" operation, and is automatically freed by
"vClose".

Actually, raw MemBufs created by the "createRawMemBuf" and "openRawMemBuf"
operations, while createMemBuf/openMemBuf incorporates additional
"bufferMemoryStream" call (as you should remember, buffering adds vGetChar,
vPutStr and other text- and byte-i/o operations on top of vReceiveBuf
and vSendBuf). You can also apply Char encoding and locking
transformers to these streams. The "saveToFile" and "readFromFile" operations provide easy way to save/restore buffer contents in a file.

File "Examples/MemBuf.hs" demonstrates usage of MemBuf.

3.4 FunctionsMemoryStream

This Stream type allow to implement arbitrary streams just by
providing 3 functions that implement vReceiveBuf, vSendBuf and cleanup
operations. It seems that this Stream type is of interest only for my
own program and can be scrutinized only as example of creating 3-party
Stream types. It named "FunctionsMemoryStream", see the sources if you
are interested.

3.5 StringReader & StringBuffer (String-based streams)

Four remaining Stream types was a part of HVIO module and I copy their
description from there:

In addition to Handle, there are several pre-defined stream types for
your use. 'StringReader' is a particularly interesting one. At
creation time, you pass it a String. Its contents are read lazily
whenever a read call is made. It can be used, therefore, to implement
filters (simply initialize it with the result from, say, a map over
hGetContents from another Stream object), codecs, and simple I/O
testing. Because it is lazy, it need not hold the entire string in
memory. You can create a 'StringReader' with a call to
'newStringReader'.

'StringBuffer' is a similar type, but with a different purpose. It
provides a full interface like Handle (it supports read, write and
seek operations). However, it maintains an in-memory buffer with the
contents of the file, rather than an actual on-disk file. You can
access the entire contents of this buffer at any time. This can be
quite useful for testing I/O code, or for cases where existing APIs
use I/O, but you prefer a String representation. Note however that
this stream type is very inefficient. You can create a 'StringBuffer'
with a call to 'newStringBuffer'.

One significant improvement over the original HVIO library is that
'StringReader' and 'StringBuffer' can work not only in IO, but also in
ST monad.

3.6 Pipes (passing data between Haskell threads)

Finally, there are pipes. These pipes are analogous to the Unix pipes
that are available from System.Posix, but don't require Unix and work
only in Haskell. When you create a pipe, you actually get two Stream
objects: a 'PipeReader' and a 'PipeWriter'. You must use the
'PipeWriter' in one thread and the 'PipeReader' in another thread.
Data that's written to the 'PipeWriter' will then be available for
reading with the 'PipeReader'. The pipes are implemented completely
with existing Haskell threading primitives, and require no special
operating system support. Unlike Unix pipes, these pipes cannot be
used across a fork(). Also unlike Unix pipes, these pipes are
portable and interact well with Haskell threads. A new pipe can be
created with a call to 'newHVIOPipe'.