The most difficult concept to master while learning Haskell is that
of understanding and using monads. We can distinguish two
subcomponents here: (1) learning how to use existing monads and (2)
learning how to write new ones. If you want to use Haskell, you must
learn to use existing monads. On the other hand, you will only need
to learn to write your own monads if you want to become a "super
Haskell guru." Still, if you can grasp writing your own monads,
programming in Haskell will be much more pleasant.

So far we've seen two uses of monads. The first use was IO actions:
We've seen that, by using monads, we can get away from the
problems plaguing the RealWorld solution to IO presented in
the chapter IO. The second use was representing different types of
computations in the section on Classes-computations. In
both cases, we needed a way to sequence operations and saw that a
sufficient definition (at least for computations)
was:

Let's see if this definition will enable us to also perform IO.
Essentially, we need a way to represent taking a value out of an
action and performing some new operation on it (as in the example from
the section on Functions-io, rephrased slightly):

main = do
s <- readFile "somefile"
putStrLn (show (f s))

But this is exactly what augment does. Using augment, we
can write the above code as:

In this definition, return is equivalent to our success;
fail is equivalent to our failure; and >>= (read:
"bind" ) is equivalent to our augment. The
>> (read: "then" ) method is simply a version of
>>= that ignores the a. This will turn out to be useful;
although, as mentioned before, it can be defined in terms of >>=:

We have hinted that there is a connection between monads and the
do notation. Here, we make that relationship concrete. There is
actually nothing magic about the do notation – it is simply
"syntactic sugar" for monadic operations.

As we mentioned earlier, using our Computation class, we could
define our above program as:

main =
readFile "somefile" `augment` \s ->
putStrLn (show (f s))

But we now know that augment is called >>= in the monadic
world. Thus, this program really reads:

main =
readFile "somefile" >>= \s ->
putStrLn (show (f s))

And this is completely valid Haskell at this point: if you defined a
function f :: Show a => String -> a, you could compile and run
this program)

This suggests that we can translate:

x <- f
g x

into f >>= \x -> g x. This is exactly what the compiler does.
Talking about do becomes easier if we do not use implicit layout
(see the section on Layout for how to do this). There are four
translation rules:

The first translation rule, do {e} → e, states (as we
have stated before) that when performing a single action, having a
do or not is irrelevant. This is essentially the base case for
an inductive definition of do. The base case has one action
(namely e here); the other three translation rules handle the
cases where there is more than one action.

This states that do {e; es} → e >> do {es}. This
tells us what to do if we have an action (e) followed by a list
of actions (es). Here, we make use of the >> function,
defined earlier. This rule simply states that to do{e;
es}, we first perform the action e, throw away the result, and
then does.

For instance, if e is putStrLn s for some string s,
then the translation of do {e; es} is to perform e (i.e.,
print the string) and then does. This is clearly what we
want.

This states that do {let decls; es}
→ let decls in do {es}. This rule tells us how to deal
with lets inside of a do statement. We lift the
declarations within the let out and do whatever comes after
the declarations.

This states that do {p <- e; es} → let ok p = do
{es} ; ok _ = fail "..." in e >>= ok. Again, it is not exactly
obvious what is going on here. However, an alternate formulation of
this rule, which is roughly equivalent, is: do {p <- e; es} →
e >>= \p -> es. Here, it is clear what is happening. We run
the action e, and then send the results into es, but first
give the result the name p.

The reason for the complex definition is that p doesn't need to
simply be a variable; it could be some complex pattern. For instance,
the following is valid code:

foo = do ('a':'b':'c':x:xs) <- getLine
putStrLn (x:xs)

In this, we're assuming that the results of the action getLine
will begin with the string "abc" and will have at least one more
character. The question becomes what should happen if this pattern
match fails. The compiler could simply throw an error, like usual,
for failed pattern matches. However, since we're within a monad, we
have access to a special fail function, and we'd prefer to fail
using that function, rather than the "catch all" error
function. Thus, the translation, as defined, allows the compiler to
fill in the ... with an appropriate error message about the
pattern matching having failed. Apart from this, the two definitions
are equivalent.

This states that return a >>= f ≡ f a. Suppose we
think about monads as computations. This means that if we create a
trivial computation that simply returns the value a regardless of
anything else (this is the return a part); and then bind it
together with some other computation f, then this is equivalent
to simply performing the computation f on a directly.

For example, suppose f is the function putStrLn and a
is the string "Hello World." This rule states that binding a
computation whose result is "Hello World" to putStrLn is the
same as simply printing it to the screen. This seems to make sense.

In do notation, this law states that the following two programs
are equivalent:

The second monad law states that f >>= return ≡ f for some computation f. In other words, the law states that if we perform
the computation f and then pass the result on to the trivial
return function, then all we have done is to perform the
computation.

That this law must hold should be obvious. To see this, think of
f as getLine (reads a string from the keyboard). This law
states that reading a string and then returning the value read is
exactly the same as just reading the string.

In do notation, the law states that the following two programs
are equivalent:

This states that f >>= (\x -> g x >>= h) ≡ (f >>=
g) >>= h. At first glance, this law is not as easy to grasp as the
other two. It is essentially an associativity law
for monads.

Note

Outside the world of monads, a function ⋅{\displaystyle \cdot } is associative if (f⋅g)⋅h=f⋅(g⋅h){\displaystyle (f\cdot g)\cdot h=f\cdot (g\cdot h)}. For instance, + and
* are associative, since bracketing on these functions doesn't
make a difference. On the other hand, - and / are not
associative since, for example, 5−(3−1)≠(5−3)−1{\displaystyle 5-(3-1)\not =(5-3)-1}.

If we throw away the messiness with the lambdas, we see that this law
states: f >>= (g >>= h) ≡ (f >>= g) >>= h. The
intuition behind this law is that when we string together actions, it
doesn't matter how we group them.

For a concrete example, take f to be getLine. Take g
to be an action which takes a value as input, prints it to the screen,
reads another string via getLine, and then returns that newly
read string. Take h to be putStrLn.

Let's consider what (\x -> g x >>= h) does. It takes a value
called x, and runs g on it, feeding the results into h.
In this instance, this means that it's going to take a value, print
it, read another value and then print that. Thus, the entire left
hand side of the law first reads a string and then does what we've
just described.

On the other hand, consider (f >>= g). This action reads a
string from the keyboard, prints it, and then reads another string,
returning that newly read string as a result. When we bind this with
h as on the right hand side of the law, we get an action that
does the action described by (f >>= g), and then prints the
results.

Clearly, these two actions are the same.

While this explanation is quite complicated, and the text of the law
is also quite complicated, the actual meaning is simple: if we have
three actions, and we compose them in the same order, it doesn't matter
where we put the parentheses. The rest is just notation.

In do notation, the law says that the following two programs are
equivalent:

One of the simplest monads that we can craft is a state-passing monad.
In Haskell, all state information usually must be passed to functions
explicitly as arguments. Using monads, we can effectively hide some
state information.

Suppose we have a function f of type a -> b, and we need
to add state to this function. In general, if state is of type
state, we can encode it by changing the type of f to a
-> state -> (state, b). That is, the new version of f takes
the original parameter of type a and a new state parameter. And,
in addition to returning the value of type b, it also returns an
updated state, encoded in a tuple.

For instance, suppose we have a binary tree defined as:

data Tree a
= Leaf a
| Branch (Tree a) (Tree a)

Now, we can write a simple map function to apply some function to each
value in the leaves:

This works fine until we need to write a function that numbers the
leaves left to right. In a sense, we need to add state, which keeps
track of how many leaves we've numbered so far, to the mapTree
function. We can augment the function to something like:

This is beginning to get a bit unwieldy, and the type signature is
getting harder and harder to understand. What we want to do is
abstract away the state passing part. That is, the differences
between mapTree and mapTreeState are: (1) the augmented
f type, (2) we replaced the type -> Tree b with -> state
-> (state, Tree b). Notice that both types changed in exactly the
same way. We can abstract this away with a type synonym declaration:

Let's examine each of these in turn. The first function,
returnState, takes a value of type a and creates something
of type State st a. If we think of the st as the state, and
the value of type a as the value, then this is a function that
doesn't change the state and returns the value a.

The bindState function looks distinctly like the interior
let declarations in mapTreeState. It takes two arguments.
The first argument is an action that returns something of type a
with state st. The second is a function that takes this a
and produces something of type b also with the same state. The
result of bindState is essentially the result of transforming the
a into a b.

The definition of bindState takes an initial state, st. It
first applies this to the State st a argument called m.
This gives back a new state st' and a value a. It then lets
the function k act on a, producing something of type
State st b, called m'. We finally run m' with the new
state st'.

We write a new function, mapTreeStateM and give it the type:

mapTreeStateM :: (a -> State st b) -> Tree a -> State st (Tree b)

Using these "plumbing" functions (returnState and
bindState) we can write this function without ever having to
explicitly talk about the state:

In the Leaf case, we apply f to a and then bind
the result to a function that takes the result and returns a Leaf
with the new value.

In the Branch case, we recurse on the left-hand-side, binding the
result to a function that recurses on the right-hand-side, binding
that to a simple function that returns the newly created Branch.

As you have probably guessed by this point, State st is a monad,
returnState is analogous to the overloaded return method, and
bindState is analogous to the overloaded >>= method. In
fact, we can verify that State st a obeys the monad laws:

In the first step, we simply substitute the definition of
bindState. In the second step, we simplify the last two lines
and substitute the definition of returnState. In the third step,
we apply st to the lambda function. In the fourth step, we
rename st' to st and remove the let. In the last step,
we eta reduce.

Moving on to Law 2, we need to show that f >>= return
≡ f. This is shown as follows:

Finally, we need to show that State obeys the third law: f
>>= (\x -> g x >>= h) ≡ (f >>= g) >>= h. This is much
more involved to show, so we will only sketch the proof here. Notice
that we can write the left-hand-side as:

The interesting thing to note here is that we have both action
applications on the same let level. Since let is
associative, this means that we can put whichever bracketing we prefer
and the results will not change. Of course, this is an informal,
"hand waving" argument and it would take us a few more derivations
to actually prove, but this gives the general idea.

Now that we know that State st is actually a monad, we'd like to
make it an instance of the Monad class. Unfortunately, the
straightforward way of doing this doesn't work. We can't write:

instance Monad (State st) where { ... }

This is because you cannot make instances out of non-fully-applied
type synonyms. Instead, what we need to do instead is convert the
type synonym into a newtype, as:

newtype State st a = State (st -> (st, a))

Unfortunately, this means that we need to do some packing and
unpacking of the State constructor in the Monad instance
declaration, but it's not terribly difficult:

which is significantly cleaner than before. In fact, if we
remove the type signature, we get the more general type:

mapTreeM :: Monad m => (a -> m b) -> Tree a ->
m (Tree b)

That is, mapTreeM can be run in any monad, not just our
State monad.

Now, the nice thing about encapsulating the stateful aspect of the
computation like this is that we can provide functions to get and
change the current state. These look
like:

getState :: State state state
getState = State (\state -> (state, state))
putState :: state -> State state ()
putState new = State (\_ -> (new, ()))

Here, getState is a monadic operation that takes the current
state, passes it through unchanged, and then returns it as the value.
The putState function takes a new state and produces an action
that ignores the current state and inserts the new one.

This may seem like a large amount of work to do something simple.
However, note the new power of mapTreeM. We can also print out
the leaves of the tree in a left-to-right fashion as:

Example:

State> mapTreeM print testTree
'a'
'b'
'c'
'd'
'e'

This crucially relies on the fact that mapTreeM has the more
general type involving arbitrary monads -- not just the state monad.
Furthermore, we can write an action that will make each leaf value
equal to its old value as well as all the values preceding:

In fact, you don't even need to write your own monad instance and
datatype. All this is built in to the Control.Monad.State
module. There, our runStateM is called evalState; our
getState is called get; and our putState is called
put.

This module also contains a state transformer monad, which we
will discuss in the section on Transformer.

List comprehension form is simply an abbreviated form of a monadic
statement using lists. In fact, in older versions of Haskell, the
list comprehension form could be used for any monad -- not just
lists. However, in the current version of Haskell, this is no longer
allowed.

The Maybe type is also a monad, with failure being
represented as Nothing and with success as Just. We get the
following instance declaration:

What this means is that if we write a function (like
searchAll from
the section on Classes) only in terms of monadic
operators, we can use it with any monad, depending on what we mean.
Using real monadic functions (not do notation), the
searchAll function looks something like:

The type of this function is Monad m => Graph v e -> Int -> Int ->
m [Int]. This means that no matter what monad we're using at the
moment, this function will perform the calculation. Suppose we have
the following graph:

This represents a graph with four nodes, labelled a,b,c and d. There is an edge from a to both b and c. There is also an edge from both b and c to d. Using the Maybe monad, we can compute the path from a to d:

Example:

Monads> searchAll gr 0 3 :: Maybe [Int]
Just [0,1,3]

We provide the type signature, so that the interpreter knows what
monad we're using. If we try to search in the opposite direction,
there is no path. The inability to find a path is represented as
Nothing in the Maybe monad:

Example:

Monads> searchAll gr 3 0 :: Maybe [Int]
Nothing

Note that the string "no path" has disappeared since there's no way
for the Maybe monad to record this.

If we perform the same impossible search in the list monad, we get the
empty list, indicating no path:

Example:

Monads> searchAll gr 3 0 :: [[Int]]
[]

If we perform the possible search, we get back a list containing the
first path:

Example:

Monads> searchAll gr 0 3 :: [[Int]]
[[0,1,3]]

You may have expected this function call to return all paths,
but, as coded, it does not. See the section on Plus for more
about using lists to represent nondeterminism.

If we use the IO monad, we can actually get at the error message,
since IO knows how to keep track of error messages:

In the first case, we needed to type it to get GHCi to actually
evaluate the search.

There is one problem with this implementation of searchAll: if it
finds an edge that does not lead to a solution, it won't be able to
backtrack. This has to do with the recursive call to searchAll
inside of search'. Consider, for instance, what happens if
searchAll g v dst doesn't find a path. There's no way for this
implementation to recover. For instance, if we remove the edge from
node {\bf b} to node {\bf d}, we should still be able to find a path
from {\bf a} to {\bf d}, but this algorithm can't find it. We define:

The mapM, filterM and
foldM are our old friends map, filter and
foldl wrapped up inside of monads. These functions are
incredibly useful (particularly foldM) when working with monads.
We can use mapM_, for instance, to print a list of things to the
screen:

Example:

Monads> mapM_ print [1,2,3,4,5]
1
2
3
4
5

We can use foldM to sum a list and print the intermediate sum at
each step:

We can see that the underscored version doesn't return each value,
while the non-underscored version returns the list of the return
values.

The liftM function "lifts" a non-monadic function
to a monadic function. (Do not confuse this with the lift
function used for monad transformers in
the section on Transformer.) This is useful for shortening code
(among other things). For instance, we might want to write a function
that prepends each line in a file with its line number. We can do
this with:

Given only the >>= and return functions, it is impossible to
write a function like combine with type c a
-> c a -> c a. However, such a function is so generally useful
that it exists in another class called
MonadPlus. In addition to having a
combine function, instances of MonadPlus also have a
"zero" element that is the identity under the "plus" (i.e.,
combine) action. The definition is:

In order to gain access to MonadPlus, you need to import the
Monad module (or Control.Monad in the hierarchical
libraries).

In the section on Common, we showed that
Maybe and list are both monads. In
fact, they are also both instances of MonadPlus. In the case of
Maybe, the zero element is Nothing; in the case of lists, it
is the empty list. The mplus operation on Maybe is
Nothing, if both elements are Nothing; otherwise, it is the
first Just value. For lists, mplus is the same as ++.

Now, when we're going through the edge list in search', and we
come across a matching edge, not only do we explore this path, but we
also continue to explore the out-edges of the current node in the
recursive call to search'.

The IO monad is not an instance of MonadPlus; we we're not able
to execute the search with this monad. We can see that when using
lists as the monad, we (a) get all possible paths in gr and (b)
get a path in gr2.

But note that this doesn't do what we want. Here, if the recursive
call to searchAll2 fails, we don't try to continue and execute
search' es. The call to mplus must be at the top level in
order for it to work.

Exercises

Suppose that we changed the order of arguments to mplus. I.e.,
the matching case of search' looked like:

Often we want to "piggyback" monads on top of each other. For
instance, there might be a case where you need access to both IO
operations through the IO monad and state functions through some state
monad. In order to accomplish this, we introduce a
MonadTrans class, which essentially "lifts"
the operations of one monad into another. You can think of this as
stacking monads on top of each other. This class has a simple
method: lift. The class declaration for
MonadTrans is:

class MonadTrans t where
lift :: Monad m => m a -> t m a

The idea here is that t is the outer monad and that m lives
inside of it. In order to execute a command of type Monad m => m
a, we first lift it into the transformer.

The simplest example of a transformer (and arguably the most useful)
is the state transformer monad, which is a state monad wrapped around an arbitrary monad. Before, we defined a state
monad as:

newtype State state a = State (state -> (state, a))

Now, instead of using a function of type state -> (state, a) as
the monad, we assume there's some other monad m and make the
internal action into something of type state -> m (state, a).
This gives rise to the following definition for a state
transformer:

newtype StateT state m a =
StateT (state -> m (state, a))

For instance, we can think of m as IO. In this case, our state
transformer monad is able to execute actions in the IO monad. First,
we make this an instance of MonadTrans:

The idea behind the definition of return is that we keep the
state constant and simply return the state/a pair in the enclosed
monad. Note that the use of return in the definition of
return refers to the enclosed monad, not the state transformer.

In the definition of bind, we create a new StateT that takes a
state s as an argument. First, it applies this state to the
first action (StateT m) and gets the new state and answer as a
result. It then runs the k action on this new state and gets a
new transformer. It finally applies the new state to this
transformer. This definition is nearly identical to the definition of
bind for the standard (non-transformer) State monad described in
the section on State.

The fail function passes on the call to fail in the enclosed
monad, since state transformers don't natively know how to deal with
failure.

Of course, in order to actually use this monad, we need to provide
function getT , putT and
evalStateT . These are analogous to
getState, putState and runStateM from
the section on State:

These functions should be straightforward. Note, however, that the
result of evalStateT is actually a monadic action in the enclosed
monad. This is typical of monad transformers: they don't know how to
actually run things in their enclosed monad (they only know how to
lift actions). Thus, what you get out is a monadic action in the
inside monad (in our case, IO), which you then need to run yourself.

We can use state transformers to reimplement a version of our
mapTreeM function from the section on State. The only
change here is that when we get to a leaf, we print out the value of
the leaf; when we get to a branch, we just print out "Branch."

The only difference between this function and the one from
the section on State is the calls to lift (putStrLn ...) as
the first line. The lift tells us that we're going to be
executing a command in an enclosed monad. In this case, the enclosed
monad is IO, since the command lifted is putStrLn.

Ignoring, for a second, the class constraints, this says that
mapTreeM takes an action and a tree and returns a tree. This
just as before. In this, we require that t is a monad
transformer (since we apply lift in it); we require that t
IO is a monad, since we use putStrLn we know that the enclosed
monad is IO; finally, we require that a is an instance of
show -- this is simply because we use show to show the value of
leaves.

Now, we simply change numberTree to use this version of
mapTreeM, and the new versions of get and put, and we
end up with:

In this graph, there is a back edge from node b back to node
a. If we attempt to run searchAll2, regardless of what
monad we use, it will fail to terminate. Moreover, if we move this
erroneous edge to the end of the list (and call this gr4), the
result of searchAll2 gr4 0 3 will contain an infinite number of
paths: presumably we only want paths that don't contain cycles.

In order to get around this problem, we need to introduce state.
Namely, we need to keep track of which nodes we have visited, so that
we don't visit them again.

Here, we implicitly use a state transformer (see the calls to
getT and putT) to keep track of visited states. We only
continue to recurse, when we encounter a state we haven't yet visited.
Furthermore, when we recurse, we add the current state to our set of
visited states.

Now, we can run the state transformer and get out only the correct
paths, even on the cyclic graphs:

This is not so useful in our case, as it will return exactly the
reverse of evalStateT (try it and find out!), but can be useful
in general (if, for instance, we need to know how many numbers are
used in numberTree).

Exercises

Write a function searchAll6, based on the code for
searchAll2, that, at every entry to the main function (not the
recursion over the edge list), prints the search being conducted. For
instance, the output generated for searchAll6 gr 0 3 should look
like:

Combine the searchAll5 function (from this section) with the
searchAll6 function (from the previous exercise) into a single
function called searchAll7. This function should perform IO as
in searchAll6 but should also keep track of state using a state

It turns out that a certain class of parsers are all monads. This
makes the construction of parsing libraries in Haskell very clean. In
this chapter, we begin by building our own (small) parsing library in
the section on A Simple Parsing Monad and then, in the final section, introduce the Parsec parsing library.

This shows how easy it is to combine these parsers. We don't need to
worry about the underlying string -- the monad takes care of that for
us. All we need to do is combine these parser primitives. We can
test this parser by using runParser and
by supplying input:

On top of these primitives, we usually build some combinators. The
many combinator, for instance, will take a parser
that parses entities of type a and will make it into a parser
that parses entities of type [a] (this is a Kleene-star
operator):

The idea here is that first we try to apply the given parser, p.
If this fails, we succeed but return the empty list. If p
succeeds, we recurse and keep trying to apply p until it fails.
We then return the list of successes we've accumulated.

In general, there would be many more functions of this sort, and they
would be hidden away in a library, so that users couldn't actually
look inside the Parser type. However, using them, you could
build up, for instance, a parser that parses (non-negative) integers:

In this function, we first match a digit (the isDigit function
comes from the module Char/Data.Char) and then match as many
more digits as we can. We then read the result and return it.
We can test this parser as before:

Now, suppose we want to parse a Haskell-style list of Ints. This
becomes somewhat difficult because, at some point, we're either going
to parse a comma or a close brace, but we don't know when this will
happen. This is where the fact that Parser is an instance of
MonadPlus comes in handy: first we try one, then we try the
other.

The first thing this code does is parse and open brace. Then, using
mplus, it tries one of two things: parsing using
intList', or parsing a close brace and returning an empty list.

The intList' function assumes that we're not yet at the end of
the list, and so it first parses an int. It then parses the rest of
the list. However, it doesn't know whether we're at the end yet, so
it again uses mplus. On the one hand, it tries to parse a comma
and then recurse; on the other, it parses a close brace and returns
the empty list. Either way, it simply prepends the int it parsed
itself to the beginning.

One thing that you should be careful of is the order in which you
supply arguments to mplus. Consider the following parser:

tricky =
mplus (string "Hal") (string "Hall")

You might expect this parser to parse both the words "Hal" and
"Hall;" however, it only parses the former. You can see this with:

This is because, again, the mplus doesn't know that it needs to
parse the whole input. So, when you provide it with "Hall," it
parses just "Hal" and leaves the last "l" lying around to be
parsed later. This causes eof to produce an error message.

This works precisely because each side of the mplus knows that it
must read the end.

In this case, fixing the parser to accept both "Hal" and "Hall"
was fairly simple, due to the fact that we assumed we would be reading
an end-of-file immediately afterwards. Unfortunately, if we cannot
disambiguate immediately, life becomes significantly more complicated.
This is a general problem in parsing, and has little to do with
monadic parsing. The solution most parser libraries (e.g., Parsec,
see the section on Parsec) have adopted is to only
recognize "LL(1)" grammars: that means that you must be able to
disambiguate the input with a one token look-ahead.

Exercises

Write a parser intListSpace that will parse int lists but will
allow arbitrary white space (spaces, tabs or newlines) between the

commas and brackets.

Given this monadic parser, it is fairly easy to add information
regarding source position. For instance, if we're parsing a large
file, it might be helpful to report the line number on which an error occurred. We could do this simply by
extending the Parser type and by modifying the instances and the
primitives:

As you continue developing your parser, you might want to add more and
more features. Luckily, Graham Hutton and Daan Leijen have already
done this for us in the Parsec library. This section is intended to
be an introduction to the Parsec library; it by no means covers the
whole library, but it should be enough to get you started.

Like our library, Parsec provides a few basic functions to build
parsers from characters. These are: char, which is
the same as our char; anyChar, which is the
same as our anyChar; satisfy, which is the
same as our matchChar; oneOf, which takes a
list of Chars and matches any of them; and
noneOf, which is the opposite of oneOf.

The primary function Parsec uses to run a parser is
parse. However, in addition to a parser, this
function takes a string that represents the name of the file you're
parsing. This is so it can give better error messages. We can try
parsing with the above functions:

Here, we can see a few differences between our parser and Parsec:
first, the rest of the string isn't returned when we run parse.
Second, the error messages produced are much better.

In addition to the basic character parsing functions, Parsec provides
primitives for: spaces, which is the same as ours;
space which parses a single space;
letter, which parses a letter;
digit, which parses a digit;
string, which is the same as ours; and a few
others.

First, note the type signatures. The st type variable is simply
a state variable that we are not using. In the int function, we
use the many function (built in to Parsec) together with the
digit function (also built in to Parsec). The intList
function is actually identical to the one we wrote before.

Note, however, that using mplus explicitly is not the preferred
method of combining parsers: Parsec provides a <|> function that
is a synonym of mplus, but that looks nicer:

In addition to these basic combinators, Parsec provides a few other
useful ones:

choice takes a list of parsers and performs an or operation (<|>) between all of them.

option takes a default value of type a and a parser that returns something of type a. It then tries to parse with the parser, but it uses the default value as the return, if the parsing fails.

optional takes a parser that returns () and optionally runs it.

between takes three parsers: an open parser, a close parser and a between parser. It runs them in order and returns the value of the between parser. This can be used, for instance, to take care of the brackets on our intList parser.

notFollowedBy takes a parser and returns one that succeeds only if the given parser would have failed.

Suppose we want to parse a simple calculator language that includes
only plus and times. Furthermore, for simplicity, assume each
embedded expression must be enclosed in parentheses. We can give a
datatype for this language as:

Here, the parser alternates between two options (we could have used
<|>, but I wanted to show the choice combinator in action).
The first simply parses an int and then wraps it up in the Value
constructor. The second option uses between to parse text
between parentheses. What it parses is first an expression, then one
of plus or times, then another expression. Depending on what the
operator is, it returns either e1 :+: e2 or e1 :*: e2.

We can modify this parser, so that instead of computing an Expr,
it simply computes the value:

Now, suppose we want to introduce bindings into our
language. That is, we want to also be able to say "let x = 5 in"
inside of our expressions and then use the variables we've defined.
In order to do this, we need to use the getState
and setState (or
updateState) functions built in to Parsec.

The int and recursive cases remain the same. We add two more cases,
one to deal with let-bindings, the other to deal with usages.

In the let-bindings case, we first parse a "let" string, followed by
the character we're binding (the letter function is a Parsec
primitive that parses alphabetic characters), followed by its value
(a parseValueLet). Then, we parse the " in " and update the
state to include this binding. Finally, we continue and parse the
rest.

In the usage case, we simply parse the character and then look it up
in the state. However, if it doesn't exist, we use the Parsec
primitive unexpected to report an error.

We can see this parser in action using the
runParser command, which enables us to provide
an initial state:

Note that the bracketing does not affect the definitions of the
variables. For instance, in the last example, the use of "x" is, in
some sense, outside the scope of the definition. However, our parser
doesn't notice this, since it operates in a strictly left-to-right
fashion. In order to fix this omission, bindings would have to be
removed (see the exercises).

Exercises

Modify the parseValueLet parser, so that it obeys bracketing. In
order to do this, you will need to change the state to something like
FiniteMap Char [Int], where the [Int] is a stack of