Sunday, December 30, 2012

The continuation monad is one of the least appreciated monads and in this post I hope to motivate when to use it. This post will first motivate continuations in general and then motivate them in their specific capacity as monads.

Continuations

A Haskell continuation has the following type:

newtype Cont r a = Cont { runCont :: (a -> r) -> r }

A continuation takes a function of type (a -> r) and generates an r, where r can sometimes be a fixed value like Int or IO ().

For example, I might write a long-running process that spawns an action every time the user enters a line of input:

onInput :: (String -> IO ()) -> IO ()
-- i.e. Cont (IO ()) String
onInput f = forever $ do
str
You will recognize this idiom if you've ever used frameworks with callbacks. We supply the framework with a function (i.e. a continuation) and the framework uses that function to do its job.

"Complete me Later"

You generally use continuations when you are programming something, but you want somebody else to complete it. Common reasons include:

You are programming a framework with callbacks that users supply

You are defining a custom map engine for game players to program

You are lazy

I'll use the following hypothetical code segment as an example:

unitAttack :: Target -> IO ()
unitAttack target = do
swingAxeBack 60
valid
Let's imagine you have to package up and compile this code for somebody else (say, a fellow colleague) to use later, but it won't compile yet because you still have the unspecified ??? function. What do you do?

Like all good programming, the best solution is the laziest one. We punt and take the incomplete behavior as a parameter so that whoever finishes the function later on can complete the function by passing the specified behavior in:

unitAttack :: Target -> (Target -> IO ()) -> IO ()
unitAttack target todo = do
swingAxeBack 60
valid
Problem solved! Notice how the right hand side of the type signature resembles the shape of our Cont type. If we just add a newtype, we can wrap it in Cont ourselves:

unitAttack :: Target -> Cont (IO ()) Target
unitAttack target = Cont $ \todo -> do
swingAxeBack 60
valid
... or, even better, we can use ContT instead. The benefit of ContT is that it is also a monad transformer, which comes in handy. ContT has the exact same Monad instance as Cont, so they are otherwise interchangeable:

Fortunately, there is a clean and general solution. Just define a data type that wraps both possible arguments in a sum type, and just define a single continuation that accepts this sum type:

data Hole = Swing Int | Attack Target
unitAttack :: Target -> ContT () IO Hole
unitAttack target = ContT $ \k -> do
k (Swing 60)
valid
Each constructor acts as a place-holder that signals to the continuation which hole it is currently filling. Then somebody else can continue where we left off and just write:

This trick generalizes to n holes with variable arguments per hole. Just define a type with n constructors, one for each hole, where each constructor stores whatever arguments that particular continuation will need:

data Hole = Hole1 Arg1 Arg2 | Hole2 | Hole3 Arg3 | Hole4

Algebraic Data Types

I want to digress for a moment to talk about algebraic data types. If you are not interested, skip to the next section.

It turns out we can elegantly derive the above trick for multiple holes. Type algebra says that if we squint then we can translate the following type constructors to algebraic operators and derive equivalent types from simple algebraic manipulations:

Either a b <=> a + b
(a, b) <=> a * b
a -> b <=> b ^ a

That means that if we have a function with two continuations:

(a1 -> r) -> ((a2 -> r) -> r)

... we just translate it to the equivalent algebraic expression:

(r ^ (r ^ a2)) ^ (r ^ a1)

... and then we can derive equivalent representations just by using the rules of algebra:

This is a Kleisli arrow! That means we can compose it with our previous Kleisli arrow:

unitAttack >=> halfAssedCompletion :: Target -> ContT () IO Int

This composition substitutes in halfAssedCompletion for each hole we left in the unitAttack function. However, halfAssedCompletion left smaller Int holes of its own that somebody else now has to finish up.

Notice how now we originally needed a handler of type:

handler :: Target -> IO ()

... but now we only need a smaller handler of type:

newHandler :: Int -> IO ()

... in other words, halfAssedCompletion acts as an intermediary that transforms handlers of type (Int -> IO ()) into handlers of type (Target -> IO ()).

The Cont monad is all about chaining these kinds of partial completions together until all the holes are finally filled. You could use this abstraction to complete a project in stages and seamlessly hand off work from person to person whenever circumstances require a change in maintainer before completing the project. Alternative, you can use this to condense the callback API of a framework into a single point of entry.

The Kleisli Category

Earlier I said that the key to a monad is its Kleisli arrows. The reason why is that Kleisli arrows are morphisms in the Kleisli category, where (>=>) is Kleisli arrow composition:

Things that obey these laws have nice properties. For example, it guarantees that you can reason about each Kleisli arrow in a composition chain in isolation. Each Kleisli arrow's behavior is completely determined by its input (i.e. domain) and output (i.e. codomain). So let's think about how that modularity translates to the Cont Kleisli category.

When you switch maintainers, you don't have to give the next maintainer a bunch of holes sprawled out over a large code base like this:

largeProgram = do
...
x
Instead you can unify all the holes using a single callback that accepts a single type (the "codomain") unifying all the holes you left:

largeProgram :: () -> ContT () IO Hole
largeProgram () = ContT $ \k -> do
...
x
This give the next person a single point of entry to continue from, because now they only have to write a Kleisli arrow that handles a single Hole input which encompasses all the previous holes:

Then you just use Kleisli composition to connect your code contribution:

largeProgram >=> nextContribution

This cleanly modularizes the first person's contribution so that you can hermetically seal it off from subsequent contributions. By repeating this process, each subsequent contribution to the code base becomes its own modular and composable Kleisli arrow, cleanly separated from other contributions:

This is why frameworks and game custom map makers all use continuations to delimit the interface between the company's code and the user's code. The continuation monad is all about establishing air-tight code boundaries, both internally within a project, and externally for user facing APIs. This lets you isolate responsibilities if you can separate each code contribution into its own Kleisli arrow.

Callback Hell

Frameworks are the canonical example of separating responsibilities, where the framework writer provides some code, but the user is expected to fill in the gap with callbacks of their own. This often results in callback hell in frameworks that take this principle to the extreme, like Node.js.

But it doesn't have to be that way. The continuation monad teaches us that we can always condense a sprawling API filled with callbacks into a single callback that takes a single argument. Even better, we get monadic syntactic sugar for composing multiple layers of callbacks.

I'll use the GLUT package as an example, which requires several callbacks like:

It says the return type is polymorphic, meaning that there is no hole left to fill. The above function just inserts return () in all holes and calls it a day. We can even prove a chain of continuations is done if its final return value type-checks as Void, the empty type:

run only accepts completed programs that have no holes left. We can use run for our previous GLUT example, since the final user callback handler leaves no unfinished holes:

run ((glut >=> userCallbacks) ()) :: IO ()

Conclusion

I hope this post inspires people to use the continuation monad to structure and modularize code completion boundaries. The continuation monad naturally arises at the boundaries between programmers and cleanly abstracts away callback hell into a simple and uniform interface with a single entry point.

Wednesday, December 12, 2012

Introduction

I'm releasing pipes-3.0, which significantly simplifies the entire library. This release began as the misnamed 2.6 branch of my Github repository, but then I finally cleanly solved the polymorphic constraints issue and this solution unlocked several features that I could finally implement.

The large change log includes:

Type-classing the entire Proxy API

Offering both fast and correct base implementations

Fixing type synonyms

Unifying the Pipes and Proxy APIs and dropping Frames

Adding the PFunctor type class (functors over proxies)

Performance improvements

Expanded laws and guarantees

Smaller dependencies

As always, if you want to learn how to use pipes, just consult the tutorial.

Code

I can demonstrate a lot of the new improvements just by taking the take' function from the pipes-2.4 announcement post and showing the difference before and after the new changes.

Control.Proxy now imports useful things from other libraries (like lift).

However, this release includes several more great changes as well.

Polymorphic Constraints

The big issue that held back the library in the wake of the 2.4 release was that I could not type class the request and respond operations. The pipes library really only has three fundamental operations:

(>->)

request

respond

Unfortunately, I could not type class request and respond to work with proxy transformers, meaning any standard libraries would be substantially crippled.

The problem boiled down to my inability to write polymorphic type class contexts like this:

This method has one drawback, which is that sometimes you need to embed the code in a newtype if all the type variables are polymorphic. The IdentityP proxy transformer fits this purpose perfectly, meaning that if you don't use any proxy transformers, you just write:

myPipe () = runIdentityP $ pipeCode

The () and runIdentityP are the only syntactic noise now. The () is essential, as I will describe below, and the runIdentityP avoids OverlappingInstances when all the type variables are polymorphic.

With this change I could then lift request and respond over proxy transformers and consolidate the entire Proxy API into the single Proxy type class. this solution is entirely Haskell98 and no longer requires FlexibleContexts. If you use pipes already you will discover that your new type signatures will become gorgeous now.

Type-classed API

pipes-3.0 type classes the entire Proxy API using the Proxy type class, which defines the following three essential functions:

(>->)

request

respond

The entire library implement EVERYTHING using these three functions and the upcoming prompt finalization library also is built entirely on these three functions. I can safely say that you don't need to use any other functions to write fully featured pipes libraries, which dramatically simplifies the API.

I type-classed all the utilities which means that I can now offer two base proxy implementations:

ProxyFast The fast proxy implementation from pipes-2.5

ProxyCorrect The correct proxy implementation from pipes-2.4

So now you can choose which implementation you prefer. Control.Proxy exports ProxyFast implementation by default, but you can easily switch to the correct implementation. Because all pipes code builds on the Proxy type class you are completely free to pick whichever implementation you prefer and they will both work transparently with all standard libraries.

However, these utilities don't just work with both base proxy implementations; they also now work as proxy transformers, too! This means you can now use these utilities seamlessly within any feature set without any lifting.

Fixing the double request law was an interesting challenge, but it was ultimately very rewarding and led to the correct formulation of the laws in terms of two symmetric proxy composition categories:

Pull-based category: (>->) and idT (the current one)

Push-based category: (>~>) and coidT (the new one)

The pull-based category composes proxies blocked on respond and returns a new proxy blocked on respond. The push-based category composes proxies blocked on request and returns a new proxy blocked on request.

The proxy laws say that these two categories are perfectly symmetric and that your code's behavior never changes under the following functor, which is an isomorphism between the two categories:

(>~>) to (>->)

(>->) to (>~>)

request to respond

respond to request

This means that the choice of composition operator is arbitrary, and I standardize on using the pull-based (>->) operator because the library already uses it and also matches the lazy demand-based expectations of Haskell programmers. Therefore, you never have to use the (>~>) operator, but I include in the library as a theoretical curiosity.

I've already spoken about the practical benefit of centralizing on proxies (duplex channels), but I also standardized on proxies because of their elegance, even more elegant than even pipes. For example:

The Proxy laws have a purely categorical formulation. Pipes do not permit such a formulation.

Proxy Kleisli arrows form morphisms in at least 5 categories, all of which have the same shape. Pipes are morphisms in just two categories (composition and Kleisli categories), and their morphisms do not overlap.

You can assemble most useful proxies purely using composition operators: i.e. (>=>) and (>->), unlike Pipes, which have an impedance mismatch between the two categories.

You cannot implement a push-based category using Pipes.

Type synonyms

Now type synonyms work incredibly well after I parametrized them to take the Proxy instance as a type parameter. This means that you never have to give them up when things get hairy and if you loved your good old-fashioned and simple Pipe, Consumer and Producer types, you can now use them transparently with the entire proxy ecosystem and they mesh perfectly with everything.

There is another reason I really like proxies: The Producer type synonym forbids requests. You can't do that with the unidirectional pipe implementation. I special case the Producer type synonym to close the upstream end to forbid communication upstream, so now you have a stronger guarantee.

I also went back to using () and C for type synonyms instead of Rank2Types to universally quantify unused ends. Rank2Types caused all sorts of problems when you tried to write proxy combinators that accepted pipes as arguments, whereas the simple approach always works and gives clearer types. The only disadvantage is that if you want to insert a pipe with a blocked end within an open one (like a Producer within a Pipe), you must explicitly reopen the end using the unitD and unitU helper functions.

Also, I chose not to include the initial parameter of proxies in the type synonym. The main reason is that you then lose the type synonym any time you define anything other than a composable proxy, which defeats all the other gains I just mentioned. Also, proxies lie along 5 categories, and special-casing the type synonyms for just one of those categories is a bad idea, especially when at least one of those other categories (the Kleisli category) is commonly used as well.

I also empirically experimented with both approaches in my own projects, and I can pretty confidently state that the type synonyms should never include the initial parameter. It does mean that your type signatures will be slightly longer, but it's worth it. Also, it makes it much easier for people less familiar with the pipes library to consume pipes utilities because they don't need to remember what the initial argument for any given type synonym is supposed to be.

API Consolidation

I went to a great deal of trouble to clean up the fragmentation that I began when I released Frames. Frames are gone and I've merged Pipe functionality into Proxys, so the library now only has three operations you ever need to use:

(>->)

request

respond

There is now just one way to compose things, something which a lot of users have requested.

I've deprecated the old Pipe API and included a "Control.Proxy.Pipe" module which helps users transition from Pipes to Proxys. This module not only provides the transition API but also has detailed instructions for how to upgrade your library while still keeping the niceties of the Pipes API.

In addition, I've consolidated all the tutorials into a single coherent tutorial in Control.Proxy.Tutorial with a single logical progression. The tutorial is long, but describes a great deal of the pipe idioms I've collected through my extensive usage of pipes in my own projects. More importantly, the tutorial collects all of the documentation for the library into a single location again so that users don't need to hunt over several tutorial modules or several blog posts of mine to learn about some key concept.

I also spent a lot of effort into guiding people towards a single natural coding style for pipes to make it easier for pipes users to read each other's code. The API exported from Control.Proxy has very little redundancy, and many old redundant API features (like the old await and yield) require you to pay an "import tax" to the tune of one extra import if you want to deviate from the new pipes "style guide". The tutorial also spends a lot of time talking about common idioms to encourage a uniform coding style.

Performance

The pipes library now includes rewrite rules that rewrite unoptimized user code into the equivalent hand-tuned code. Despite type-classing the entire standard library these rewrite rules fire very robustly without any assistance on your part. Just enjoy the extra performance. There are no performance regressions in this release and all the standard library utilities perform just as damn fast as they did in pipes-2.5, despite now being fully polymorphic and written using ordinary do notation.

Like always, "Control.Proxy.Prelude.Base" provides tons of examples of how to write idiomatic and high-performance pipes code, so feel free to consult that if you are learning how to write your own utilities.

Lightweight

I've dropped Frames, which means that I can now drop the index-core dependency. This means that the library only has one dependency: transformers-0.2. pipes is now the lightest dependency for any streaming library.

I will not add any extra dependencies to the main package and I will release all additional features in separate packages that build on top of pipes. I would like to see the core pipes package go as viral as possible. This is quite easy because in addition to having only a single dependency, pipes is:

Quick to compile

Safe Haskell

Light on extensions: Only KindSignatures and Rank2Types (for MFunctor and PFunctor)

Convenience imports

I added some extra imports from other modules to Control.Proxy, like lift and forever. I was pretty conservative in what I added and stuck to things that I considered essential for using pipes idiomatically.

hoist and lift

I renamed MFunctor's mapT to hoist. Sorry for the inconvenience, but mapT was a terrible name for several reasons and instead went with Edward's naming convention which I felt was more tasteful.

One of the really nice things to come out of pipes is a clean mechanism for mixing arbitrary monad transformer stacks and arbitrary proxy transformer stacks. It turns out that if two transformer stacks have the same base monad, then you can always interleave them in any way using the right combination of hoist and lift.

The exact same trick also applies to proxy transformer stacks. You can always interleave them using the right combination of hoistP (from the new PFunctor class) and liftP, which are just the higher-kinded versions of hoist and lift (and they also obey more laws, too).

The tutorial describes this trick in more detail and I find that this works phenomenally well and is very easy to reason about. For this reason I think MFunctor needs to be somewhere in the standard libraries as soon as possible because it is incredibly useful for this purpose.

Stability

I can state confidently that the pipes API can now handle any problem you throw at it and that the core is feature complete. The tutorial shows how you can combine the incredibly small pipes core into all the functionality you need. The only exception is prompt finalization, which I will describe in a separate upcoming library, but finalization similarly builds entirely on this core and does not add any new concepts or primitives.

There are only a few changes I foresee making to the main library in the future, which are:

Moving MFunctor to a better home

Adding a base proxy implementation with strictness annotations (never leaks, but half the performance of the fast implementation)

Note that this doesn't mean that pipes frequently leaks space. I've only found one case of leaking, which is the following code segment:

runProxy $ enumFromS 1

Everything else I have tried does not leak and I currently use pipes in long-running server applications of my own that also do not leak without any strictness annotations. However, if users do encounter leaks then I will go ahead and add the strict base implementation.

I will also be pretty conservative about expanding the core library because I want to stabilize it now. Almost all new functions will go in separate libraries now.

Conclusion

At this point I am very happy with the state of the main pipes library and I'm currently devoting my next efforts to standard libraries. The next libraries I will release will be (in this order):

Wednesday, October 31, 2012

Introduction

I optimized the entire pipes library very aggressively for version 2.5, and now the library runs faster than conduit on my micro-benchmarks. I'll begin with the purest benchmark which gives the greatest difference in speed since it only measure the efficiency of each library's implementation without any IO bottlenecks:

I achieved this speed increase by reverting to the pipes-1.0 trick of making the base monad optional, at the expense of breaking the monad transformer laws. I spent a considerable amount of effort trying to get the correct version to work, but I was led inexorably to the same conclusion that Michael already reached, which was that the original approach was best and that the gain in performance is worth bending the monad transformer laws.

Note that the above benchmark exaggerates the differences and is not indicative of real-world performance differences. For typical code you will not observe measurable differences between pipes and conduit when IO is the bottle-neck.

There is also one area in which conduit may still give (very slightly) better performance, which is in speeding up user-defined pipes. One goal I did not complete for this release was copying Michael's trick of using rewrite RULES to inline the Monad instance for user-defined pipes. I plan to copy this same trick in a separate release because I want to take the time to ensure that I can get the rewrite RULES to always fire without interfering with other optimizations.

Light-weight

The big focus of this release was to make pipes a very light-weight dependency, both in terms of performance and transitive dependencies. In the rewrite I dropped the free dependency so now the package only has two non-base dependencies:

transformers >= 0.2.0.0

index-core

... and I plan on dropping index-core along with Frames once I complete my resource management solution, leaving just a transformers dependency, which is about as light-weight as it could possibly get.

I also stole a page from Michael's book, by removing the -O2 flag from the pipes.cabal file. This flag no longer has any effect on performance after the rewrite, so you should see quicker compile times, making the pipes dependency even lighter.

There is still one other way I could make the pipes dependency even lighter, which is to remove the MFunctor class. The Control.MFunctor module requires Rank2Types, which might rule out pipes for projects that use non-GHC compilers, so if this is an issue for you, just let me know and I will try to migrate MFunctor to a separate library. Frames use a lot of extensions, but those will be on the way out, leaving behind just FlexibleContexts and KindSignatures, which are very mild extensions.

Resource management

I also wanted to use this update to point out that you can get deterministic resource management with pipes today if you use Michael's ResourceT in the base monad. So if you want to use pipes and all you care about is resource determinism then you can switch over already.

However, that alone will NOT give you prompt finalization and if you want promptness you will have to wait until I complete my own resource management extension. The extension I have in mind will be released as a proxy transformer that you can layer in any proxy transformer stack, so any proxy code you currently write can be transparently upgraded to work with resource management later on when I release the extension.

Another thing I want to mention is that while I will release the tools to manage resources promptly and deterministically, I do not plan on using these tools in the proxy standard libraries that I will release. The main reason for this is that:

There is no one true solution to finalization and I don't want people to have to buy in to my finalization approach to use the standard libraries I provide.

Most people I've talked to who care about finalization usually take the initiative write their own higher-level abstractions on top of whatever finalization primitives I provide them.

So my plan is that the standard libraries I will write will purely focus on the streaming part of the problem and leave initialization/finalization to the end user, which they can optionally implement using my resource management solution or whatever other approach they prefer (such as ResourceT, for example).

If you want to know what I personally use in my own projects at the moment, I just use the following pattern:

do withResource $ \h ->
runProxy $ ...
This gives "good enough" behavior for my purposes and out of all the finalization alternatives I've tried, it is by far the easiest one to understand and use.

The other reason I'm trying out this agnostic approach to finalization is due to discussion with Gregory Collins about his upcoming io-streams library, where he takes a very similar approach to the one I just described of leaving initialization/finalization to the end user to avoid cross-talk between abstractions and to emphasize handling the streaming aspect correctly.

Goals for the near future

I focused on improving the performance of pure code because I plan to release bytestring/text standard libraries and their corresponding parsing proxy transformers very soon, which demand exceptional pure performance. The goal of the upcoming proxy-based parsing libraries is not to beat attoparsec in speed (which I'm reasonably sure is impossible), but rather to:

Interleave parsing with with effects

Provide a low-memory streaming parser by allowing the user to selectively control backtracking

Still be really fast

The first two features are sorely missing from attoparsec, which can't interleave effects and always backtracks so that the file isn't cleared from memory until the parsing completes. For my own projects I need the second feature the most because I get a lot of requests to parse huge files (i.e. 20 GB) that do not fit in my computer's memory. More generally, I want pipes to be the fallback parser of choice for all problems that attoparsec does not solve.

Saturday, October 20, 2012

Haskell optimization is very opaque and getting tight inner loops is something of a black art for people who don't take the time to learn GHC's core output. If you like fine-grained optimization, this post will walk you through a very simple example to help you learn to read core.

I'm planning to write some posts in the future discussing optimizing Haskell code and I would like to use this post as a foundation for those latter discussions.

ghc-core

When you practice reading core, stick to really simple programs. In fact, let's study the core generated by the simplest possible Haskell program:

For example, Cheap=True says that it is cheap to inline main1. Similarly, ConLike=True says it is okay if a rewrite rule requires duplicating main1's code.

However, for the purposes of this post, we will just completely ignore these annotations and just focus on the type signatures and code. I'll remove the attributes to leave behind something easier on the eyes:

Now we see that there are four functions total, but a lot of noise still remain.

Primitive types and values

Let's walk through this core, beginning with the type signature of main1:

main1
:: State# RealWorld
-> (# State# RealWorld, () #)

Somebody sprinkled #s all over our type signature. These #s denote primitive types and values exposed by the compiler that are not defined within the Haskell language. You can check out the GHC.Prim module which declares these values and type. Notice how it cheats and declares all types empty and the values are all set to let x = x in x just so that everything type-checks.

IO is a state monad

Notice that this is just the right type to wrap main1, which uses the same stateful pattern:

main1 :: State# RealWorld -> (# State# RealWorld, () #)

IOs state-based implementation ensures proper sequencing of computations and the State# RealWorld token is an empty token that exists solely to ensure that each sequenced computation depends on the previous one so that the compiler does not rearrange, merge or duplicate them.

The chain of command

Now, let's study the definition of main1:

main1 =
\ (eta_B1 :: State# RealWorld) ->
(# eta_B1, () #)

If we tidy this up and remove the type annotation, we get:

main1 = \s -> (s, ())

This is just return () in the State (State# RealWorld) monad. The compiler translated our original return () statement into the equivalent stateful function under hood.

However, the next function seems to accomplish nothing:

main :: IO ()
main =
main1

... but that's only because the --no-casts flag removed type casts. If we add them back in, we get:

But wait, it runs main1 and not main! I'm guessing that the main token exists solely to establish the correspondence to the user-written code, but perhaps ghc doesn't actually care about it and instead uses the raw main1 directly.

Now, to be honest, I don't actually know what runMainIO1 does, but according to Edward Yang, it initializes some things like interrupt handlers before running the program.

Finally, we reach the true entry point of our program:

:main :: IO ()
:main =
main2 -- There is a hidden cast here

This passes our program to the Haskell runtime to be executed. That's it! We're done.

Monday, October 15, 2012

This is the second in a series of coding examples from my own work. /r/programming asked for non-trivial, yet digestible, Haskell examples, so I hope this fits the bill.

In this post I will show how learning Haskell changes the way you think. I will take a common Haskell idiom (monadic parsing) and apply it in a new way to solve a structural bioinformatics problem. To make this post more accessible to non-Haskell programmers, I'm going to gloss over implementation details and instead try to discuss at a high-level how I apply Haskell idioms to solve my problem.

The Parser type

When we approach parsing problems we normally focus on the text as the central element of our algorithm and design our program around combining and manipulating text. Monadic parsing turns this approach on its head and places the parser front and center, where we instead combine and manipulate parsers.

This means we need to define a Parser type:

type Parser a = String -> [(a, String)]

A Parser a takes a starting String as its input and parses it to return a single value of type a and the remaining unconsumed String. There might be multiple valid parses, so we actually return a list of possible parsings instead of a single one. If the parse fails, we return an empty list signifying no valid parsings.

We can also use Haskell's newtype feature to encapsulate the implementation and hide it from the user:

newtype Parser a
= Parser { runParser :: String -> [(a, String)] }

This defines the Parser constructor which we use to wrap our parsing functions:

Parser :: (String -> [(a, String)]) -> Parser a

.. and the runParser function which unwraps the Parser to retrieve the underlying function:

Monads

Now we want to combine our Parsers so we can parse both a True and a False with optional spaces in between. This means we need some elegant way to take the unconsumed input from each Parser and feed it directly into the next Parser in the chain.

Fortunately, Haskell solves this problem cleanly using monads. A Monad defines an interface to two functions:

Like interfaces in any other language, we can program generically to that interface. Haskell's do notation works with this generic Monad interface, so we can use the imperative do syntax to manipulate anything that implements Monad.

Our Parser implements the Monad interface quite nicely:

instance Monad Parser where
return a = Parser (\str -> [(a, str)])
m >>= f = Parser (\str1 ->
-- This is a list comprehension, basically
[(b, str3) | (a, str2)
This is not a monad tutorial, so I'm glossing over why that is the correct definition or what it even means, but if you want to learn more about monads, I highly recommend: You could have invented monads.

When we make Parser a Monad, we gain the ability to assemble Parsers using do notation, so let's use do notation to combine multiple parsers in an imperative style:

trueThenFalse :: Parser (Bool, Bool)
trueThenFalse = do
t
That reads just like imperative code: parse a True, skip some spaces, then parse a False. Finally, return the two values you parsed. This seems straightforward enough until you realize we haven't actually parsed any text, yet! All we've done is combine our smaller parsers into a larger parser poised to be run on as many inputs as we please.

Alternatives

Sometimes we want to try multiple parsing alternatives. For example, what if I want to parse a True or a False? I can define a () operator that tries both parsers and then returns the union of their results:

Since s is "polymorphic", we can set it to any conceivable type and the above code still works. The only String-specific behavior lies within the specific parser definitions, and their new compiler-inferred types reflect that:

But there's no reason we can't define entirely different Parsers that accept completely different non-textual input, such as chemical structures. So I'll switch gears and define parsers for chemical Structures, where a
Structure is some sort of a labeled graph:

Now, I can define new Parsers that operate on Structures instead of Strings.

Parsing bonds

The most primitive parser I'm interested in parses a single bond. It requires two AtomNames which specify what kind of bond to look for (i.e. a carbon-carbon bond, except it can be even more specific). Then, it outputs which two indices in the graph matched that bond-specification:

parseBond :: AtomName -> AtomName -> Parser Structure (Int, Int)

I can use the list monad (i.e. a list comprehension) to define this primitive parser (and don't worry if you can't precisely follow this code):

Haskell strongly encourages a pure functional style, which keeps me from "cheating" and using side effects or mutation to do the parsing. By sticking to a pure implementation, I gain several bonus features for free:

If our bond occurs more than once, this correctly matches each occurrence, even if some matches share an atom

If both AtomNames are identical, this correctly returns both orientations for each matched bond

This handles backtracking with () correctly

I can parallelize the search easily since every search branch is pure

I got all of that for just 6 lines of code!

Parsing substructures

Now I can build more sophisticated parsers on top of this simple bond parsers. For example, I can build a generic substructure parser which takes a sub-Structure to match and returns a list of matched indices:

Reusable abstractions

I find it pretty amazing that you can build a substructure parser in just 18 lines of Haskell code. You might say I'm cheating because I'm not counting the amount of lines of code I took to define the Parser type, the Monad implementation, and the () type. However, the truth is I can actually get all of those features using 1 line of Haskell:

I don't expect the reader to know what StateT or [] are, but what you should take away from this is that both of them are part of every Haskell programmer's standard repertoire of abstractions.

Moreover, when I combine them I automatically get a correct Monad implementation (i.e. do notation) and a correct Alternative implementation (which provides the () function), both for free!

Conclusions

This is just one of many abstractions I used to complete a structural search engine for proteins. Now that it's done, I'll be blogging more frequently about various aspects of the engine's design to give people ideas for how they could use Haskell in their own projects. I hope these kinds of code examples pique people's interest in learning Haskell.

Appendix

I've included the full code for the String-based Parsers. The Structure-based Parsers depend on several project-specific data types, so I will just release them later as part of my protein search engine and perhaps factor them out into their own library.

Also, as a stylistic note, I prefer to use ($) to remove dangling final parentheses like so:

Parse (\str -> => Parse $ \str ->
someCode => someCode
) =>

... but I didn't want to digress from the post's topic by explaining how the ($) operator behaves.

Saturday, October 6, 2012

This release packs a LOT of new features, so I will begin with the most significant feature: proxy transformers. The proxy transformer pattern provides a very simple extension framework that cleanly solves many problems that iteratee library authors face.

Users of the library should read Control.Proxy.Trans.Tutorial, which explains how proxy transformers work. However, this post also provides a decent introduction to them, too.

Introduction

Wouldn't it be nice if you could catch and handle errors within a proxy? Now you can! It's as simple as:

>>> main
1<Enter>
1
Test<Enter>
Could not parse an integer
Apple<Enter>
Could not parse an integer
5<Enter>
5

The above program condenses many new features of this release into a nice compact example and I'll use it to show-case each feature.

Proxy transformers

The above program uses the EitherP proxy transformer. To access this feature, you just import the transformer you wish to use:

import Control.Proxy.Trans.Either as E

Control.Proxy imports all the remaining machinery you need.

EitherP extends any proxy-like type with the ability to throw and catch errors locally, as if it lived inside a native EitherT block. It does so in such a way that preserves composition (and the category laws!), so you can directly compose the result without unwrapping the EitherP.

When you are done composing, just use runEitherK to convert it back to the underlying proxy:

Utilities

This release introduces the "proxy prelude", a set of convenience functions for users of the library. Control.Proxy automatically exports these and they don't clash with the Prelude or any common libraries.

Our old friend printer got a name-change and now goes by printD. This utility function prints all values bound downstream:

printD :: Show a => x -> Proxy x a x a IO r

I provide many more utility functions under the Control.Proxy.Prelude hierarchy, and people who enjoyed my functor design pattern post will also enjoy the abundance of cute trivial examples of the functor pattern in the documentation for Control.Proxy.Prelude.Base.

Proxy transformers are functors

However, this release includes a far more sophisticated set of functors: the proxy transformers themselves. Each proxy transformer implements the ProxyTrans class which defines two functions: mapP and liftP, related by the equation:

mapP = (liftP .)

mapP defines two separate functors.

The first functor behaves like a traditional monad transformer, converting the base Kleisli category to the extended Kleisli category:

mapP return = return
mapP (f >=> g) = mapP f >=> mapP g

You can write these laws using liftP to see that our proxy transformers behave like ordinary monad transformers:

liftP $ return x = return x
do x
The above program uses this capacity of liftP to lift operations from the Proxy monad to the EitherP String Proxy monad.

The second functor lifts the base proxy composition to the extended proxy composition:

mapP idT = idT
mapP (p1 >-> p2) = mapP p1 >-> mapP p2

This latter functor lets you compose simpler proxies with extended proxies. The above program uses mapP in this capacity to promote printD for composition with promptInts:

mapP printD
This demonstrates a concrete application of the functor design pattern, allowing seamless interoperability between proxies written to varying feature sets. The proxy transformers lift both the monad instance and the composition instance correctly so that simpler proxies play nicely with extended proxies.

Type signatures

The above program demos the new replacement for Void: C. This will shorten type signatures and also removes the dependency on void.

Also, now the Proxy type is a newtype around the underlying FreeT implementation. This gives nicer type errors when things go wrong.

Proxy Transformer Stacks

Just like monad transformers, you can stack proxy transformers to automatically combine their effects. By combining the StateP and EitherP proxy transformers, you can implement non-backtracking parsers for free:

>>> runProxy $ runParseK $ parser
I didn't include parsers in the library because I didn't want to add a bytestring or text dependency to the main pipes package. Instead, I will release the parsing extension as a separate library. This library will provide you with the streaming benefits of attoparsec with the ability to interleave effects.

Pushback

The above parsing example suggests my solution to push-back, which is to give each proxy its own local state using the StateP proxy transformer. You can then use the local state to keep track of unused input, as the above parsing example did.

Like all proxy transformers, this extension requires no special integration with the underlying proxy type and you can layer it anywhere within a proxy transformer stack with no special considerations.

Extra categories

The library now provides two additional categories for interacting with the Proxy type. These are term-rewriting categories (I believe the technical term is "sesquicategory", but I may be mistaken).

The first category's composition operator replaces all request statements within a Proxy with a suitably typed replacement:

Lifting request and respond

I originally envisioned that proxy transformers would also automatically lift request and respond statements. The laws for this lifting are quite simple:

mapP request = request
mapP respond = respond

In other words, the functor laws, applied to the identity of the two new categories I just introduced. However, unfortunately Haskell's type class system severely got in my way and I could not solve the issue before the release. I have a tentative plan for how to solve this using Edward's constraint package but it will take time. Until then, you will have to manually lift request and respond statements from the base Proxy type.

Overall, I was pretty disappointed with Haskell's type class system (more so than usual). This library really exercised it considerably and I even had to drop an additional proxy transformer because it was unimplementable due to the broken constraint system.

Conduit dramatically improves for entirely pure code since it bends the monad transformer laws to skip binds in the base monad. This is one reason that this pipes release type-classes all the Proxy operations. If people request that I copy conduit's approach, I will release a separate library that copies conduit's optional monad bind and have it implement all the same type-classes. Then all the proxy transformers are guaranteed to work transparently with it because they abstract completely over the type classes.

Additionally, I want to note that the pipes library currently has only one optimization PRAGMA in the entire library:

{-# INLINABLE mapK #-} -- An obscure utility function

... whereas conduit uses a considerable number of rewrite rules and INLINABLE statements. I don't know how much these contribute to conduit's speed, but I will copy Michael's optimizations in the next few releases and benchmark how much they contribute to performance.

Additionally, I've also benchmarked the overhead of proxy transformers. First, comparing performance for some trivial IO:

So I can summarize these benchmarks by saying that if you are doing even a little bit of IO, the performance differences are pretty small, and as I aggressively optimize the library, they should get even smaller.

Switch to free

Edward was kind enough to migrate my transformers-free functionality into his free package, so now pipes uses free for its free monad transformer dependency.

Resource management

I plan on releasing a Proxy-like type that implements resource management that will replace the Frame type. This type will include functions to promote existing Proxy code to this resource-managed version. Until then, you will have to manually manage resources by opening all file handles before composition, and closing them all afterwards, like so:

import Control.Proxy
import System.IO
main = do
h
... or you can use Michael's ResourceT in the base monad, if that is your thing.

You won't get the benefit of conserving handles, but you will still get predictable streaming performance.

Library writers

If you are considering building off the pipes library, I recommend implementing any functionality using the Proxy type, which I guarantee will be promotable to any future extensions, and I plan on personally writing several Proxy-based libraries over the next few months.

While I still preserve the Pipe type, I fully endorse the Proxy type as the type to standardize on as it has many more nice theoretical properties than the Pipe type and also supports greater functionality.

Conclusions

This release is very close to the final state I envisioned for the core pipes library. Most existing features won't disappear, with the exception of Control.Frame, which I will phase out once I release a suitable replacement in a separate library.

Most additional features that I plan on implementing will go into separate libraries that build on top of this one. I only plan on adding functionality to the core library if I discover additional interesting structure for the Proxy type.

In otherwords, I've stolen a page from Conor McBride's notebook and defined lift and embed as a higher-order monad in the category of monad morphisms. Going back to the previous names, we can establish that certain laws must hold:

Obviously, I won't suggest we break the existing MonadTrans class by adding an additional method. All we have to do is simply define a new MonadM class and make all existing monad transformers instances of it and possibly make MonadTrans a super-class of it.

I'll bet more experienced Haskell programmers have wanted mapT or squash in one form or another. The above type-class provides a uniform interface to these operations, so that you don't have to rely on transformer-specific functions like mapStateT or mapMaybeT.

Note that all monad transformers have a sensible instance for MonadM that obeys the above laws. Usually the easiest route is to first define squash (i.e. joinM) and mapT (i.e. fmapM). mapT is usually very straight-forward to write and simply involves type-chasing. squash simply takes the inner monad transformer and combines its operations with the outer monad transformer. Once you define these two, then you can easily define embed:

-- i.e.: (bindM f) = joinM . fmapM f
embed f = squash . mapT f

In the near future I will release a package containing this type-class and appropriate instances for the standard monad transformers. Additionally, the pipes-2.4 release will include an extra motivation for defining the above type-class, besides the obvious utility of having mapT and squash functions.

Saturday, September 15, 2012

This post builds on my previous post on the category design pattern and this time I will discuss the functor design pattern. If you are an intermediate Haskell programmer and you think you already understand functors, then think again, because I promise you this post will turn most of your preconceptions about functors on their head and show you how functors are much more powerful and generally applicable than you might realize.

Mixing features

So let's pretend that my previous post on categories inflamed your interest in compositional programming. The first thing you might ask is "Which category do I use?". This seems like a perfectly reasonable question, since you'd like to pick a category that:

attracts a lot of mindshare,

contains a large library of reusable components,

boasts many features, and

is simple to use.

Unfortunately, reality says that we seldom get all of the above qualities and they often conflict with one another. For example, let's compare two categories I previously discussed:

Ordinary functions of type a -> b that you compose using: (.)

Monadic functions of type a -> m b that you compose using: (<=

Ordinary functions are simpler to read, write, use, and you can reason about their behavior more easily. However, monadic functions boast more useful features, some of which are indispensable (such as side effects when we use the IO monad). We really need some way to mix these two categories together to get the best of both worlds.

Fortunately, programmers solve compatibility problems like this all the time. We often have tools written in different languages or different frameworks and if we want to mix features from multiple frameworks we write code to bridge between them. So let's solve our category mixing problem by writing an adapter layer between the two categories. We write some function that transforms all the components in one category into components in the other category, so that we can then freely mix components from the two categories.

Typically, one category will be more featureful than the other, so the
transformation is unidirectional. Using the above example, monadic
functions are strictly more featureful and powerful than ordinary functions.
Fortunately, we can promote all ordinary functions to monadic functions using the following map function:

Perfect! Now we can reuse all of our ordinary functions within this Kleisli
category and not have to rewrite anything!

However, there's still a problem. Monad binds are not free and sometimes they get in the way of compiler optimization, so you can imagine that it would be
wasteful if we lifted two pure functions in a row:

Well, we just unwittingly defined our first functor! But where is the functor?

Functors

A functor transforms one category into another category. In the previous section we transformed the category of Haskell functions into the category of monadic functions and that transformation is our functor.

I will notationally distinguish between the two categories in question so I can be crystal clear about the mathematical definition of a functor. I will denote our "source" category's identity as idA and its composition as (._A), and these must obey the category laws:

In other words, functors serve as adapters between categories that promote code
written for the source category to be automatically compatible with the destination
category. Functors arise every time we write compatibility layers and adapters between different pieces of software.

Functors hidden everywhere

I'll provide a few more examples of functors to tickle people's brains and show how functors arise all the time in your code without you even realizing it. For example, consider the length function:

So don't assume that the Functor class is in any way representative of the full breadth of functors.

The Functor class

So far I've deliberately picked examples that do not fit within the mold of
Haskell's Functor class to open people's minds about functors. A lot of new Haskell programmers mistakenly believe that functors only encompass "container-ish" things and I hope the previous examples dispel that notion.

However, the Functor class still behaves the same way as the functors I've already discussed. The only restriction is that the Functor class only encompass the narrow case where the source and target categories are both categories of ordinary functions:

Notice that functors free us from having to write code that targets some monolithic category. Instead, we write all our code using whatever category we deem most
appropriate and then promote it as necessary to whatever other categories might
need the code we just wrote. This lets us work within focused and specialized categories suitable for their respective tasks rather than waste our time arguing over what category to standardize on.

Another benefit of functors is that they make our code automatically future-proof. We write our components using whatever category we have at our disposal and then as new categories arise we just define new functors to promote our existing code to work within those new categories.

Monad morphisms

Compatibility issues arise all the time between various Haskell frameworks. For
example, let's assume I have a sizeable code-base written using the
iteratee library, but then I find a really useful library on hackage
using enumerator. I would rather not rewrite the enumerator-based library to use iteratee so I instead choose to write an adapter function that allows me to mix the two. I have to define some function, morph, that transforms Iteratees from the iteratee library into Iteratees from the enumerator library:

However, I might suspect that the iteratee library has a faster
Monad instance since it uses continuation-passing style (disclaimer: I
have no idea if this is true, it's just a hypothetical example). This means
that I would like to be able to factor code to use the iteratee
library's monad whenever possible:

These are just the monad transformer laws! However, they are usually written in this form:

lift $ return x = return x
lift $ do y
In other words, monad transformers are a special subset of monad morphisms and the monad transformer laws are just the functor laws in disguise!

Now, every time you use a monad transformer you can appreciate that you are
using a functor as an adapter layer between two categories: the base monad's
Kleisli category and the transformed monad's Kleisli category.

Conclusion

The functor design pattern embodies a philosophy of programming that emphasizes:

compatibility over standardization,

specialization over monolithic frameworks, and

short-term completion over future-proofing.

However, the above tenets, while popular, haven't completely taken hold because we associate:

compatibility with cruft,

specialization with fragmentation, and

short-term completion with lack of foresight.

In a future post I will discuss how the functor laws mitigate these problems and allow you to layer on as many abstractions over as many tools as you wish without the abstractions collapsing under their own weight.

Functors don't even necessarily need to be within a single programming language. A programmer could even use the category design pattern in completely separate programming languages and then use the functor design pattern to bridge components written in one language to another. Please don't limit your imagination to just the examples I gave!

However, the functor design pattern doesn't work at all if you aren't using categories in the first place. This is why you should structure your tools using the compositional category design pattern so that you can take advantage of functors to easily mix your tools together. This is true whether you are programming in one language or several languages. As long as each tool forms its own category and you obey the functor laws when switching between them, you can be very confident that all your tools will mix correctly.