Sunday, April 21, 2013

I was originally planning to release an extension library for pipes that would add an io-streams-like interface to pipes. However, this week I discovered a surprisingly elegant way to implement io-streams behavior (minus pushback) using the existing pipes API. In fact, this translation has been possible ever since pipes-2.4!

This post assumes that you are familiar with both pipes and io-streams. If not, then you can read the pipes tutorial or the io-streams tutorial, both of which I wrote!

Edit: I also just discovered that when you re-implement io-streams using pipes, then pipes ties io-streams in performance. See this reddit comment for code and benchmarks.

Conversions

I'll begin with the following generic correspondence between io-streams and pipes:

An input stream transformation is a fold in disguise! These transformations fold 0 or more values from the old input stream to return a new value.

Notice how this differs from the conventional flow of information pipes. We're not converting a stream of as into a new stream of bs. Instead, we're folding a stream of as into a single return value of type b.

How do we even connect map to stdin, though? They don't have the right types for ordinary pipe composition:

We need some way to feed the return value of stdin into the upstream input of map. This means we need to call stdin once each time that map requests a new value and feed that into the request. How can we do that using the existing pipes API?

Fortunately, this is exactly the problem that the (\>\) composition operator solves:

>>> read (stdin \>\ map length)
Test<Enter>
Just 4

Woah, slow down! What just happened?

The "request" category

The definition of (\>\) is actually really simple:

(f \>\ g) replaces all of g's requests with f.

So when we wrote (stdin \>\ map length), we replaced the single request in map with stdin as if we had written it in by hand:

-- Before
map length () = runIdentityP $ do
ma \ map length) () = runIdentityP $ do
ma
In fact, this behavior means that (\>\) and request form a category. (\>\) is the composition operation, request is the identity, and they satisfy the category laws:

We can illuminate this type by type-restricting the arguments. First, let's consider the case where the first argument is a Session and the second argument is a Consumer, and see what type the compiler infers for the result:

We need some way to feed the stream output of filterOutput into the argument of stdout. This means that we need to call stdout on each output of filterOutput. How can we do that using the existing pipes API?

We'll specialize this type signature in two ways. First, we'll consider the case where the first argument is a Producer and the second argument is a Session and then see what type the compiler infers for the result:

The only difference between an input stream and an output stream is whether it consumes or produces a value and bidirectional streams blur the line even further.

Purity

None of this machinery is specific to the IO monad. Everything I've introduced is polymorphic over the base monad! We only incur IO if any of our stages use IO. This departs greatly from io-streams, where you are trapped in the IO monad to do everything. However, I want to qualify that statement with a caveat: I expect that pipes-parse will require either IO or an ST base monad (your choice) to properly manage pushback like io-streams does. Even still, you only pay this price if you actually need it.

pipes improves on purity in two other ways. First, building input and output streams is pure when you use pipes:

So far we've been ignoring the x'/x interfaces entirely, treating them as closed interfaces. However, there's no reason we have to! If we open up this interface, then every processing stage that we connect can read and write to this streaming interface.

In a future post, I will show you how you can really elegantly use this trick to generalize Hutton-Meijer parsers to permit effects.

ListT

The people who read the pipes-3.2 announcement post know that the (\>\) and (/>/) composition operators correspond to (>=>) for the two pipesListT monads.
This means that you can take any output stream transformation and assemble more sophisticated behaviors using do notation:

The exact same trick works for input stream transformations, too, but it's less useful because we don't send information upstream as frequently as downstream.

Conclusions

One of the common questions I often get was "How do I escape from the pipe monad?" and I wrote this post to demonstrate how to do so. You can use pipes to program exactly in the same style as io-streams. The only thing missing is a standard library of io-streams-like utilities.

So far, I have not touched on the issue of push-back, a feature which io-streams provides. pipes-parse will build on the ideas I've introduced in this post to provide an io-streams-style interface to parsing that includes push-back functionality. This will make it easy to dynamically add or remove processing stages throughout the parsing process, which is necessary for parsing HTTP. In fact, everything I've presented here fell out very naturally from my work on pipes-parse and I only later discovered that I had accidentally reinvented io-streams entirely within pipes.

So I will cautiously state that I believe pipes is "io-streams done right". Everything that Greg has been doing with io-streams matches quite nicely with the pre-existing "request" and "respond" pipes categories that I independently discovered within pipes. This indicates that Greg was really onto something and pipes provides the elegant theoretical foundation for his work.

Sunday, April 14, 2013

This post introduces the pipes-concurrency library (which you can find here), which is the renamed pipes-stm library that I previously promised. I ended up completing this much sooner than I anticipated, which is why it precedes the upcoming pipes-parse package.

Begin with the tutorial if you want to learn how to use the library. This post mainly highlights some features and compares the pipes-concurrency approach to other libraries. Also, I'll provide some bonus examples that are not in the tutorial.

Before I continue, I want to credit Eric Jones, who first began the library as pipes-stm on Github. Unfortunately, I lost all contact with him and he didn't include a LICENSE in his repository, so I had to rebuild the library from scratch because I wasn't sure if a fork would constitute copyright infringement. If he reads this and gets in touch with me and approves of the BSD license then I will add him to the LICENSE and also add him as a library author.

Reactive programming

Several people are beginning to realize that streaming libraries overlap substantially with reactive programming frameworks. pipes-concurrency provides the basic building blocks necessary to build reactive programming abstractions.

For example, let's say that I want to simulate reactive-banana's Events using pipes-concurrency:

Simple API

The spawn function creates a FIFO channel, send adds messages to the channel, and recv takes messages off the channel. That's it! The rest of the library are the following two higher-level pipes that build on those two functions to stream messages into and out of the channel:

The library only has five functions total, making it very easy to learn.

Deadlock safety

What distinguishes this abstraction from traditional STM channels is that send and recv hook into the garbage collection system to automatically detect and avoid deadlocks. If they detect a deadlock they just terminate cleanly instead.

Surprisingly, this works so well that it even correctly handles crazy scenarios like cyclic graphs. For example, the run-time system magically ties the knot in the following example and both pipelines successfully terminate and get garbage collected:

I don't even know why the above example works, to be completely honest. I really only designed pipes-concurrency to avoid deadlocks for acyclic graphs and the above was just a really nice emergent behavior that fell out of the implementation. I think this is an excellent tribute to how amazing ghc is, and I want to give a big shout-out to all the people who contribute to it.

I call this "semi-automatic" reference management because you must still call the garbage collector manually after you stop using each reference, otherwise you cannot guarantee promptly releasing the reference. However, even if you forget to do this, all that happens is that it just delays stream termination until the next garbage collection cycle.

Severability

I designed the API so that if any other streaming library wants to use it I can cleanly separate out the pipes-agnostic part, consisting of spawn, send, and recv. If you want to build on this neat deadlock-free abstraction with, say, conduit or io-streams, then just let me know and I will factor those functions out into their own library.

Comparisons

People might wonder how pipes-concurrency compares to the stm-conduit and io-streams approaches for managing concurrency. Before I continue I just want to point out that I contributed some of the concurrency code to io-streams, so I am at fault for some of its current weaknesses. One of the reasons I made the pipes-concurrency functionality severable was so that io-streams could optionally merge in this same feature to fix some of the concurrency issues that I couldn't resolve the first time around.

pipes-concurrency does several things that are very unique in the streaming concurrency space, including:

Dynamic communication graphs with semi-automatic reference management

Correctly handling multiple readers and writers on the same resource

Deadlock safety (as mentioned above)

Exception safety (by virtue of deadlock safety)

Conclusion

There are still more features that I haven't even mentioned, so I highly recommend you read the tutorial to learn other cool tricks you can do with the library.

For people following the pipes ecosystem, the next library coming up is pipes-parse which is getting very close to completion, although the version currently on Github is stale and doesn't reflect the current state of the project. Expect to see some very cool and unique features when it comes out, which should be within the next two weeks.

Wednesday, April 10, 2013

Many programs require default values of some sort and normally we would consider this aspect of programming "business logic" and not give it a second thought. However, mathematics provides some some surprising insight into the banal task of choosing appropriate defaults, so let's temporarily put on our theory hats and try to over-think this problem.

Patterns

Let's begin by trying to identify a common pattern unifying all default values. If I were to name a few types, most people would agree on the following default values:

Int: 0

Bool: False

[a]: [], the empty list

Text: "", the empty Text string (if you use OverloadedStrings)

Why not choose 5 as the default Int or "ridiculous" as the default Text string? What makes us gravitate towards choosing those particular values?

Well, we can discern a common pattern: all these default values seem to correspond to something "empty". But what does it really mean to be "empty"?

Well, for numbers it is obvious why we consider 0 empty. If you add 0 to any number n, you get back n, signifying that 0 must be empty:

0 + n = n
n + 0 = n

We can extend this same reasoning to the other values to justify why we consider them empty. For example, if you append "" to any Text string str, you get back str, signifying that "" added nothing:

"" `append` str = str
str `append` "" = str

Similarly, the empty list adds nothing when you concatenate it:

[] ++ xs = xs
xs ++ [] = xs

... and False does nothing when you (||) it:

False || b = b
b || False = b

The pattern is obvious: in every case we have some empty default value, which we will call mempty, and some combining function, which we will call mappend, that satisfy the following two equations:

mempty `mappend` x = x
x `mappend` mempty = x

This is a Monoid, and Haskell's Data.Monoid module defines mempty and mappend in the Monoid type class:

class Monoid m where
mempty :: m
mappend :: m -> m -> m

... and also provides a convenient infix operator for mappend:

(<>) :: (Monoid a) => a -> a -> a
m1 <> m2 = m1 `mappend` m2

Emptiness

Not all types have a unique Monoid instance. For example, Bool has two separate Monoid instances and we must use the Any or Allnewtypes from Data.Monoid to distinguish which one we prefer.

The Any monoid corresponds to the one we chose above, where False is the empty value and (||) is the combining operation:

Monoids teach a valuable lesson: there is no such thing as an intrinsically empty value. Values are only empty with respect to a specific combining operation. We can choose more exotic default values to be mempty, but if we must select an unusual mappend to justify their emptiness then that suggests that we chose the wrong defaults.

So the next time you find yourself needing a Default type class, chances are that you actually should use the Monoid type class instead. The Monoid class forces us to demonstrate why our default is truly empty by also providing the associated combining operator. This basic sanity check discourages us from defining "ridiculous" defaults.