Conduit interface

runResourceT is a function provided by the resourcet package, and ensures
that resources are properly cleaned up, even in the presence of exceptions. The
type system will enforce that runResourceT is called as needed. The remainder
of this tutorial will not discuss runResourceT; please see the documentation
in resourcet for more information.

Looking at the rest of our example, there are three components to understand:
sourceFile, sinkFile, and the $$ operator (called "connect"). These
represent the most basic building blocks in conduit: a Source produces a
stream of values, a Sink consumes such a stream, and $$ will combine these
together.

In the case of file copying, there was no value produced by the Sink.
However, often times a Sink will produce some result value. For example:

Notice the addition of the $=, or left fuse operator. This combines a
Source and a Conduit into a new Source, which can then be connected to a
Sink (in this case, consume). We can similarly perform right fusion to
combine a Conduit and Sink, or middle fusion to combine two Conduits.

A number of very common functions are provided in the Data.Conduit.List
module. Many of these functions correspond very closely to standard Haskell
functions.

In addition to connecting and fusing components together, we can also build up
more sophisticated components through monadic composition. For example, to
create a Sink that ignores the first 3 numbers and returns the sum of the
remaining numbers, we can use:

In some cases, we might end up consuming more input than we needed, and want to
provide that input to the next component in our monadic chain. We refer to this
as leftovers. The simplest example of this is peek.

Notice that, although we "consumed" the first value from the stream via
peek, it was still available to fold. This idea becomes even more important
when dealing with chunked data such as ByteStrings or Text.

Final note: Notice in the types below that Source, Sink, and Conduit
are just type aliases. This will be explained later. Another important aspect
is resource finalization, which will also be covered below.

Consumes a stream of input values and produces a final result, without
producing any output.

Since 0.5.0

Connect/fuse

It is important to understand the lifecycle of our components. Notice that we
can connect or fuse two components together. When we do this, the component
providing output is called upstream, and the component consuming this input
is called downstream. We can have arbitrarily long chains of such fusion, so
a single component can simultaneously function as upstream and downstream.

Each component can be in one of four states of operation at any given time:

It hasn't yet started operating.

It is providing output downstream.

It is waiting for input from upstream.

It has completed processing.

Let's use sourceFile and sinkFile as an example. When we run sourceFile
input $$ sinkFile output, both components begin in the "not started"
state. Next, we start running sinkFile (note: we always begin processing on
the downstream component). sinkFile will open up the file, and then wait for
input from upstream.

Next, we'll start running sourceFile, which will open the file, read some
data from it, and provide it as output downstream. This will be fed to
sinkFile (which was already waiting). sinkFile will write the data to a
file, then ask for more input. This process will continue until sourceFile
reaches the end of the input. It will close the file handle and switch to the
completed state. When this happens, sinkFile is sent a signal that no more
input is available. It will then close its file and return a result.

Now let's change things up a bit. Suppose we were instead connecting
sourceFile to take 1. We start by running take 1, which will wait for
some input. We'll then start sourceFile, which will open the file, read a
chunk, and send it downstream. take 1 will take that single chunk and return
it as a result. Once it does this, it has transitioned to the complete state.

We don't want to pull any more data from sourceFile, as we do not need it.
So instead, we call sourceFile's finalizer. Each time upstream provides
output, it also provides a finalizer to be run if downstream finishes
processing.

One final case: suppose we connect sourceFile to return (). The latter does
nothing: it immediately switches to the complete state. In this case, we never
even start running sourceFile (it stays in the "not yet started" state),
and so no finalization occurs.

So here are the takeaways from the above discussion:

When upstream completes before downstream, it cleans up all of its resources
and sends some termination signal. We never think about upstream again. This
can only occur while downstream is in the "waiting for input" state, since
that is the only time that upstream is called.

When downstream completes before upstream, we finalize upstream immediately.
This can only occur when upstream produces output, because that's the only
time when control is passed back to downstream.

If downstream never awaits for input before it terminates, upstream was never
started, and therefore it does not need to be finalized.

Note that all of the discussion above applies equally well to chains of
components. If you have an upstream, middle, and downstream component, and
downstream terminates, then the middle component will be finalized, which in
turn will trigger upstream to be finalized. This setup ensures that we always
have prompt resource finalization.

The connect operator, which pulls data from a source and pushes to a sink.
When either side closes, the other side will immediately be closed as well.
If you would like to keep the Source open to be used for another
operations, use the connect-and-resume operator $$+.

Both Conduits will be closed when the newly-created Conduit is closed.

Leftover data returned from the right Conduit will be discarded.

Since 0.4.0

Pipe interface

We discussed three main types in the conduit package: Source, Sink, and
Conduit. In fact, these are all unified into a single type, Pipe. This
greatly simplifies the internal workings of this package, and makes it much
easier to build more powerful components from simpler ones. For example, it is
easy to combine a number of simple Sinks together to produce a more powerful
Conduit. To create a Conduit which drops 3 input elements and doubles the
rest, we could use:

If we look again at our examples from above, we'll see a few different aspects
to Pipes:

Sinks and Conduits can consume a stream of input values. Both map and
fold took a stream of Ints, while sinkFile took a stream of
ByteStrings.

Sources and Conduits can produce a stream of output values. sourceFile
produced a stream of ByteStrings, which was then consumed by sinkFile. This
is an important point in conduit: the output of the left-hand pipe (a.k.a.,
upstream) must match the input of the right-hand pipe (a.k.a., downstream).

All Pipes have some underlying Monad. The sourceFile and sinkFile
functions needed to use MonadResource from resourcet to get exception
handling, but our other functions could live in any monad. Since Pipe
provides a MonadTrans instance, you can actually lift any action from the
underlying Monad into your Pipe.

Sinks can provide a result type. Our fold returned a final Int, while
sinkFile returned ().

A Pipe also exposes two other features as well, not covered by the above
three types:

Each Pipe has some leftover value. Above, we described a situation where
the leftover would be identical to the input type. However, Pipe provides a
type parameter for this instead, so that you can alternatively set the leftover
type to Void, thereby ensuring that a Pipe does not provide any leftover
values. This is important for ensuring that leftover values aren't accidentally
discarded.

Above, we described a situation where only Sinks could return results.
However, sometimes it's advantageous to allow stream producers to also produce
a result type. We call this the upstream result.

Putting this all together, a Pipe has six type parameters: Pipe l i o u m r,
corresponding to each of the bullets above. Source, Conduit, and Sink are
simply type aliases that restrict one or more of these type parameters to
specific types. For example, both Source and Conduit have r restricted to
(), since neither may return a result.

There are two ways that Pipes can be composed: via the Monad instance, and
via fusion. (Note: connecting is just a special case of fusion, where the
Pipe is then run. We'll discuss that more later on.) In the pipes package,
these are referred to as vertical and horizontal composition, respectively.
Let's clarify the distinction between these two:

Monadic composition takes two Pipes with the same input and output types, and
combines them into a single Pipe. These two Pipes will be run one after the
other, and they will share the same input and output streams. Essentially, the
second Pipe will continue consuming input where the first left off, and the
output streams of each will be concatenated. Any leftover values from the first
Pipe will be fed to the second Pipe. Let's see a simple example:

Fusion, on the other hand, will connect the output from an upstream Pipe to
the input of a downstream Pipe. The upstream Pipe is required to have a
result type of (), since any results it produces are thrown out. This form of
composition produces a new Pipe with the input parameter of the upstream
Pipe and the output and result parameters of the downstream Pipe. (For
examples, see the initial examples on this page. Every usage of the connect or
fusion operators is fusion composition.)

Note: If you are building a library of conduit functions, it is best to
keep the type signatures as general as possible. For example, even though the
simplest type signature for the drop function would be Int -> Sink i m (),
this would prevent it from being used in construction of Conduits. Instead,
we give it a type signature of Int -> Pipe l i o u m ().

The underlying datatype for all the types in this package. In has six
type parameters:

l is the type of values that may be left over from this Pipe. A Pipe
with no leftovers would use Void here, and one with leftovers would use
the same type as the i parameter. Leftovers are automatically provided to
the next Pipe in the monadic chain.

i is the type of values for this Pipe's input stream.

o is the type of values for this Pipe's output stream.

u is the result type from the upstream Pipe.

m is the underlying monad.

r is the result type.

A basic intuition is that every Pipe produces a stream of output values
(o), and eventually indicates that this stream is terminated by sending a
result (r). On the receiving end of a Pipe, these become the i and u
parameters.

Fuse together two Pipes, connecting the output from the left to the
input of the right.

Notice that the leftover parameter for the Pipes must be Void. This
ensures that there is no accidental data loss of leftovers during fusion. If
you have a Pipe with leftovers, you must first call injectLeftovers. For
example:

Transforms a Pipe that provides leftovers to one which does not,
allowing it to be composed.

This function will provide any leftover values within this Pipe to any
calls to await. If there are more leftover values than are demanded, the
remainder are discarded.

Since 0.5.0

Primitives

While conduit provides a number of built-in Sources, Sinks, and
Conduits, you will almost certainly want to construct some of your own.
Previous versions recommended using the constructors directly. Beginning with
0.5, the recommended approach is to compose existing Pipes into larger ones.

It is certainly possible (and advisable!) to leverage existing Pipes- like
those in Data.Conduit.List. However, you will often need to go to a lower
level set of Pipes to start your composition. The following few functions
should be sufficient for expressing all constructs besides finalization. Adding
in bracketP and addCleanup, you should be able to create any Pipe you
need. (In fact, that's precisely how the remainder of this package is written.)

The three basic operations are awaiting, yielding, and leftovers.
Awaiting asks for a new value from upstream, or returns Nothing if upstream
is done. For example:

>>> :load Data.Conduit.List
>>> sourceList [1..10] $$ await
Just 1

>>> :load Data.Conduit.List
>>> sourceList [] $$ await
Nothing

Similarly, we have a yield function, which provides a value to the downstream
Pipe. yield features auto-termination: if the downstream Pipe has
already completed processing, the upstream Pipe will stop processing when it
tries to yield.

The upshot of this is that you can write code that appears to loop infinitely,
and yet will terminate.

Connect-and-resume

Sometimes, we do not want to force our entire application to live inside the
Pipe monad. It can be convenient to keep normal control flow of our program,
and incrementally apply data from a Source to various Sinks. A strong
motivating example for this use case is interleaving multiple Sources, such
as combining a conduit-powered HTTP server and client into an HTTP proxy.

Normally, when we run a Pipe, we get a result and can never run it again.
Connect-and-resume allows us to connect a Source to a Sink until the latter
completes, and then return the current state of the Source to be applied
later. To do so, we introduce three new operators. Let' start off by
demonstrating them:

The connect-and-resume operator. This does not close the Source, but
instead returns it to be used again. This allows a Source to be used
incrementally in a large program, without forcing the entire program to live
in the Sink monad.

Complete processing of a ResumableSource. This will run the finalizer
associated with the ResumableSource. In order to guarantee process resource
finalization, you must use this operator after using $$+ and $$++.

A ResumableSource represents a Source which has already been run, and
therefore has a finalizer registered. As a result, if we want to turn it
into a regular Source, we need to ensure that the finalizer will be run
appropriately. By appropriately, I mean:

If a new finalizer is registered, the old one should not be called.
* If the old one is called, it should not be called again.

This function returns both a Source and a finalizer which ensures that the
above two conditions hold. Once you call that finalizer, the Source is
invalidated and cannot be used.

Convenience re-exports

The Resource transformer. This transformer keeps track of all registered
actions, and calls them upon exit (via runResourceT). Actions may be
registered via register, or resources may be allocated atomically via
allocate. allocate corresponds closely to bracket.

Releasing may be performed before exit via the release function. This is a
highly recommended optimization, as it will ensure that scarce resources are
freed early. Note that calling release will deregister the action, so that
a release action will only ever be called once.

A Monad which allows for safe resource allocation. In theory, any monad
transformer stack included a ResourceT can be an instance of
MonadResource.

Note: runResourceT has a requirement for a MonadBaseControl IO m monad,
which allows control operations to be lifted. A MonadResource does not
have this requirement. This means that transformers such as ContT can be
an instance of MonadResource. However, the ContT wrapper will need to be
unwrapped before calling runResourceT.

A Monad which can throw exceptions. Note that this does not work in a
vanilla ST or Identity monad. Instead, you should use the ExceptionT
transformer in your stack if you are dealing with a non-IO base monad.

Unwrap a ResourceT transformer, and call all registered release actions.

Note that there is some reference counting involved due to resourceForkIO.
If multiple threads are sharing the same collection of resources, only the
last call to runResourceT will deallocate the resources.