C arrays may be used in Haskell as [http://haskell.org/haskellwiki/Arrays#StorableArray_.28module_Data.Array.Storable.29 StorableArray] type.

+

C array may be manipulated in Haskell as [http://haskell.org/haskellwiki/Arrays#StorableArray_.28module_Data.Array.Storable.29 StorableArray].

There is no built-in support for marshalling C structures and using C constants in Haskell. These are implemented in c2hs preprocessor, though.

There is no built-in support for marshalling C structures and using C constants in Haskell. These are implemented in c2hs preprocessor, though.

Revision as of 08:26, 9 May 2008

Haskell I/O has always been a source of confusion and surprises for new Haskellers. While simple I/O code in Haskell looks very similar to its equivalents in imperative languages, attempts to write somewhat more complex code often result in a total mess. This is because Haskell I/O is really very different internally. Haskell is a pure language and even the I/O system can't break this purity.

The following text is an attempt to explain the details of Haskell I/O implementations. This explanation should help you eventually master all the smart I/O tricks. Moreover, I've added a detailed explanation of various traps you might encounter along the way. After reading this text, you will receive a "Master of Haskell I/O" degree that is equal to a Bachelor in Computer Science and Mathematics, simultaneously :)

If you are new to Haskell I/O you may prefer to start by reading the Introduction to IO page.

1 Haskell is a pure language

Haskell is a pure language, which means that the result of any function call is fully determined by its arguments. Pseudo-functions like rand() or getchar() in C, which return different results on each call, are simply impossible to write in Haskell. Moreover, Haskell functions can't have side effects, which means that they can't effect any changes to the "real world", like changing files, writing to the screen, printing, sending data over the network, and so on. These two restrictions together mean that any function
call can be omitted, repeated, or replaced by the result of a previous call with the same parameters, and the language guarantees that all these rearrangements will not change the program result!

Let's compare this to C: optimizing C compilers try to guess which functions have no side effects and don't depend on mutable global variables. If this guess is wrong, an optimization can change the program's semantics! To avoid this kind of disaster, C optimizers are conservative in their guesses or require hints from the programmer about the purity of functions.

Compared to an optimizing C compiler, a Haskell compiler is a set of pure mathematical transformations. This results in much better high-level optimization facilities. Moreover, pure mathematical computations can be much more easily divided into several threads that may be executed in parallel, which is increasingly important in these days of multi-core CPUs. Finally, pure computations are less error-prone and easier to verify, which adds to Haskell's robustness and to the speed of program development using Haskell.

Haskell purity allows compiler to call only functions whose results
are really required to calculate final value of high-level function
(i.e., main) - this is called lazy evaluation. It's great thing for
pure mathematical computations, but how about I/O actions? Function

like (

putStrLn"Press any key to begin formatting"

) can't return any

meaningful result value, so how can we ensure that compiler will not
omit or reorder its execution? And in general: how we can work with
stateful algorithms and side effects in an entirely lazy language?
This question has had many different solutions proposed in 18 years of
Haskell development (see History of Haskell), though a solution based on monads is now
the standard.

2 What is a monad?

What is a monad? It's something from mathematical category theory, which I
don't know anymore :) In order to understand how monads are used to
solve the problem of I/O and side effects, you don't need to know it. It's
enough to just know elementary mathematics, like I do :)

Let's imagine that we want to implement in Haskell the well-known
'getchar' function. What type should it have? Let's try:

getchar ::Char
get2chars =[getchar,getchar]

What will we get with 'getchar' having just the 'Char' type? You can see
all the possible problems in the definition of 'get2chars':

Because the Haskell compiler treats all functions as pure (not having side effects), it can avoid "excessive" calls to 'getchar' and use one returned value twice.

Even if it does make two calls, there is no way to determine which call should be performed first. Do you want to return the two chars in the order in which they were read, or in the opposite order? Nothing in the definition of 'get2chars' answers this question.

How can these problems be solved, from the programmer's viewpoint?
Let's introduce a fake parameter of 'getchar' to make each call
"different" from the compiler's point of view:

getchar ::Int->Char
get2chars =[getchar 1, getchar 2]

Right away, this solves the first problem mentioned above - now the
compiler will make two calls because it sees them as having different
parameters. The whole 'get2chars' function should also have a
fake parameter, otherwise we will have the same problem calling it:

Now we need to give the compiler some clue to determine which function it
should call first. The Haskell language doesn't provide any way to express
order of evaluation... except for data dependencies! How about adding an
artificial data dependency which prevents evaluation of the second
'getchar' before the first one? In order to achieve this, we will
return an additional fake result from 'getchar' that will be used as a
parameter for the next 'getchar' call:

So far so good - now we can guarantee that 'a' is read before 'b'
because reading 'b' needs the value ('i') that is returned by reading 'a'!

We've added a fake parameter to 'get2chars' but the problem is that the
Haskell compiler is too smart! It can believe that the external 'getchar'
function is really dependent on its parameter but for 'get2chars' it
will see that we're just cheating because we throw it away! Therefore it won't feel obliged to execute the calls in the order we want. How can we fix this? How about passing this fake parameter to the 'getchar' function?! In this case
the compiler can't guess that it is really unused :)

get2chars i0 =[a,b]where(a,i1)= getchar i0
(b,i2)= getchar i1

And more - 'get2chars' has all the same purity problems as the 'getchar'
function. If you need to call it two times, you need a way to describe
the order of these calls. Look at:

But what's the fake value 'get2chars' should return? If we use some integer constant, the excessively-smart Haskell compiler will guess that we're cheating again :) What about returning the value returned by 'getchar'? See:

Believe it or not, but we've just constructed the whole "monadic"
Haskell I/O system.

3 Welcome to the RealWorld, baby :)

The 'main' Haskell function has the type:

main :: RealWorld ->((), RealWorld)

where 'RealWorld' is a fake type used instead of our Int. It's something
like the baton passed in a relay race. When 'main' calls some IO function,
it passes the "RealWorld" it received as a parameter. All IO functions have
similar types involving RealWorld as a parameter and result. To be
exact, "IO" is a type synonym defined in the following way:

typeIO a = RealWorld ->(a, RealWorld)

So, 'main' just has type "IO ()", 'getChar' has type "IO Char" and so
on. You can think of the type "IO Char" as meaning "take the current RealWorld, do something to it, and return a Char and a (possibly changed) RealWorld". Let's look at 'main' calling 'getChar' two times:

Look at this closely: 'main' passes to first 'getChar' the "world" it
received. This 'getChar' returns some new value of type RealWorld
that gets used in the next call. Finally, 'main' returns the "world" it got
from the second 'getChar'.

Is it possible here to omit any call of 'getChar' if the Char it read is not used? No, because we need to return the "world" that is the result of the second 'getChar' and this in turn requires the "world" returned from the first 'getChar'.

Is it possible to reorder the 'getChar' calls? No: the second 'getChar' can't be called before the first one because it uses the "world" returned from the first call.

Is it possible to duplicate calls? In Haskell semantics - yes, but real compilers never duplicate work in such simple cases (otherwise, the programs generated will not have any speed guarantees).

As we already said, RealWorld values are used like a baton which gets passed
between all routines called by 'main' in strict order. Inside each
routine called, RealWorld values are used in the same way. Overall, in
order to "compute" the world to be returned from 'main', we should perform
each IO procedure that is called from 'main', directly or indirectly.
This means that each procedure inserted in the chain will be performed
just at the moment (relative to the other IO actions) when we intended it
to be called. Let's consider the following program:

Now you have enough knowledge to rewrite it in a low-level way and
check that each operation that should be performed will really be
performed with the arguments it should have and in the order we expect.

But what about conditional execution? No problem. Let's define the
well-known 'when' operation:

when ::Bool->IO()->IO()
when condition action world =if condition
then action world
else((), world)

As you can see, we can easily include or exclude from the execution chain
IO procedures (actions) depending on the data values. If 'condition'
will be False on the call of 'when', 'action' will never be called because
real Haskell compilers, again, never call functions whose results
are not required to calculate the final result (i.e., here, the final "world" value of 'main').

Loops and more complex control structures can be implemented in
the same way. Try it as an exercise!

Finally, you may want to know how much passing these RealWorld
values around the program costs. It's free! These fake values exist solely for the compiler while it analyzes and optimizes the code, but when it gets to assembly code generation, it "suddenly" realize that this type is like "()", so
all these parameters and result values can be omitted from the final generated code. Isn't it beautiful? :)

4 '>>=' and 'do' notation

All beginners (including me :)) start by thinking that 'do' is some
magic statement that executes IO actions. That's wrong - 'do' is just
syntactic sugar that simplifies the writing of procedures that use IO (and also other monads, but that's beyond the scope of this tutorial). 'do' notation eventually gets translated to statements passing "world" values around like we've manually written above and is used to simplify the gluing of several
IO actions together. You don't need to use 'do' for just one statement; for instance,

main =doputStr"Hello!"

is desugared to:

main =putStr"Hello!"

But nevertheless it's considered Good Style to use 'do' even for one statement
because it simplifies adding new statements in the future.

Let's examine how to desugar a 'do' with multiple statements in the
following example:

main =doputStr"What is your name?"putStr"How old are you?"putStr"Nice day!"

The 'do' statement here just joins several IO actions that should be
performed sequentially. It's translated to sequential applications
of one of the so-called "binding operators", namely '>>':

main =(putStr"What is your name?")>>((putStr"How old are you?")>>(putStr"Nice day!"))

This binding operator just combines two IO actions, executing them
sequentially by passing the "world" between them:

Now you can substitute the definition of '>>' at the places of its usage
and check that program constructed by the 'do' desugaring is actually the
same as we could write by manually manipulating "world" values.

A more complex example involves the binding of variables using "<-":

main =do a <-readLnprint a

This code is desugared into:

main =readLn>>=(\a ->print a)

As you should remember, the '>>' binding operator silently ignores
the value of its first action and returns as an overall result
the result of its second action only. On the other hand, the '>>=' binding operator (note the extra '=' at the end) allows us to use the result of its first action - it gets passed as an additional parameter to the second one! Look at the definition:

First, what does the type of the second "action" (more precisely, a function which returns an IO action), namely "a -> IO b", mean? By
substituting the "IO" definition, we get "a -> RealWorld -> (b, RealWorld)".
This means that second action actually has two parameters
- the type 'a' actually used inside it, and the value of type RealWorld used for sequencing of IO actions. That's always the case - any IO procedure has one
more parameter compared to what you see in its type signature. This
parameter is hidden inside the definition of the type alias "IO".

Second, you can use these '>>' and '>>=' operations to simplify your
program. For example, in the code above we don't need to introduce the
variable, because the result of 'readLn' can be send directly to 'print':

where the second argument of '>>=' has the type "a -> IO b". It's the way
the '<-' binding is processed - the name on the left-hand side of '<-' just becomes a parameter of subsequent operations represented as one large IO action. Note also that if 'action1' has type "IO a" then 'x' will just have type "a"; you can think of the effect of '<-' as "unpacking" the IO value of 'action1' into 'x'. Note also that '<-' is not a true operator; it's pure syntax, just like 'do' itself. Its meaning results only from the way it gets desugared.

Look at the next example:

main =doputStr"What is your name?"
a <-readLnputStr"How old are you?"
b <-readLnprint(a,b)

This code is desugared into:

main =putStr"What is your name?">>readLn>>= \a ->putStr"How old are you?">>readLn>>= \b ->print(a,b)

I omitted the parentheses here; both the '>>' and the '>>=' operators are
left-associative, but lambda-bindings always stretches as far to the right as possible, which means that the 'a' and 'b' bindings introduced
here are valid for all remaining actions. As an exercise, add the
parentheses yourself and translate this procedure into the low-level
code that explicitly passes "world" values. I think it should be enough to help you finally realize how the 'do' translation and binding operators work.

Oh, no! I forgot the third monadic operator - 'return'. It just
combines its two parameters - the value passed and "world":

return:: a ->IO a
return a world0 =(a, world0)

How about translating a simple example of 'return' usage? Say,

main =do a <-readLnreturn(a*2)

Programmers with an imperative language background often think that
'return' in Haskell, as in other languages, immediately returns from
the IO procedure. As you can see in its definition (and even just from its
type!), such an assumption is totally wrong. The only purpose of using
'return' is to "lift" some value (of type 'a') into the result of
a whole action (of type "IO a") and therefore it should generally be used only as the last executed statement of some IO sequence. For example try to
translate the following procedure into the corresponding low-level code:

main =do a <-readLn
when (a>=0)$doreturn()print"a is negative"

and you will realize that the 'print' statement is executed even for non-negative values of 'a'. If you need to escape from the middle of an IO procedure, you can use the 'if' statement:

main =do a <-readLnif(a>=0)thenreturn()elseprint"a is negative"

Moreover, Haskell layout rules allow us to use the following layout:

main =do a <-readLnif(a>=0)thenreturn()elsedoprint"a is negative"...

that may be useful for escaping from the middle of a longish 'do' statement.

Last exercise: implement a function 'liftM' that lifts operations on
plain values to the operations on monadic ones. Its type signature:

liftM ::(a -> b)->(IO a ->IO b)

If that's too hard for you, start with the following high-level
definition and rewrite it in low-level fashion:

liftM f action =do x <- action
return(f x)

5 Mutable data (references, arrays, hash tables...)

As you should know, every name in Haskell is bound to one fixed (immutable) value. This greatly simplifies understanding algorithms and code optimization, but it's inappropriate in some cases. As we all know, there are plenty of algorithms that are simpler to implement in terms of updatable
variables, arrays and so on. This means that the value associated with
a variable, for example, can be different at different execution points,
so reading its value can't be considered as a pure function. Imagine,
for example, the following code:

Does this look strange? First, the two calls to 'readVariable' look the same, so the compiler can just reuse the value returned by the first call. Second,
the result of the 'writeVariable' call isn't used so the compiler can (and will!) omit this call completely. To complete the picture, these three calls may be rearranged in any order because they appear to be independent of each
other. This is obviously not what was intended. What's the solution? You already know this - use IO actions! Using IO actions guarantees that:

the execution order will be retained as written

each action will have to be executed

the result of the "same" action (such as "readVariable varA") will not be reused

Here, 'varA' has the type "IORef Int" which means "a variable (reference) in
the IO monad holding a value of type Int". newIORef creates a new variable
(reference) and returns it, and then read/write actions use this
reference. The value returned by the "readIORef varA" action depends not
only on the variable involved but also on the moment this operation is performed so it can return different values on each call.

Arrays, hash tables and any other _mutable_ data structures are
defined in the same way - for each of them, there's an operation that creates new "mutable values" and returns a reference to it. Then special read and write
operations in the IO monad are used. The following code shows an example
using mutable arrays:

Here, an array of 10 elements with 37 as the initial value at each location is created. After reading the value of the first element (index 1) into 'a' this element's value is changed to 64 and then read again into 'b'. As you can see by executing this code, 'a' will be set to 37 and 'b' to 64.

Other state-dependent operations are also often implemented as IO
actions. For example, a random number generator should return a different
value on each call. It looks natural to give it a type involving IO:

rand ::IOInt

Moreover, when you import C routines you should be careful - if this
routine is impure, i.e. its result depends on something in the "real
world" (file system, memory contents...), internal state and so on,
you should give it an IO type. Otherwise, the compiler can
"optimize" repetitive calls of this procedure with the same parameters! :)

For example, we can write a non-IO type for:

foreign import ccall
sin::Double->Double

because the result of 'sin' depends only on its argument, but

foreign import ccall
tell ::Int->IOInt

If you will declare 'tell' as a pure function (without IO) then you may
get the same position on each call! :)

6 IO actions as values

By this point you should understand why it's impossible to use IO
actions inside non-IO (pure) procedures. Such procedures just don't
get a "baton"; they don't know any "world" value to pass to an IO action.
The RealWorld type is an abstract datatype, so pure functions also can't construct RealWorld values by themselves, and it's a strict type, so 'undefined' also can't be used. So, the prohibition of using IO actions inside pure procedures is just a type system trick (as it usually is in Haskell :)).

But while pure code can't _execute_ IO actions, it can work with them
as with any other functional values - they can be stored in data
structures, passed as parameters, returned as results, collected in
lists, and partially applied. But an IO action will remain a
functional value because we can't apply it to the last argument - of
type RealWorld.

In order to _execute_ the IO action we need to apply it to some
RealWorld value. That can be done only inside some IO procedure,
in its "actions chain". And real execution of this action will take
place only when this procedure is called as part of the process of
"calculating the final value of world" for 'main'. Look at this example:

Here we first bind a value to 'get2chars' and then write a binding
involving 'putStr'. But what's the execution order? It's not defined
by the order of the 'let' bindings, it's defined by the order of processing
"world" values! You can arbitrarily reorder the binding statements - the execution order will be defined by the data dependency with respect to the
"world" values that get passed around. Let's see what this 'main' looks like in the 'do' notation:

As you can see, we've eliminated two of the 'let' bindings and left only the one defining 'get2chars'. The non-'let' statements are executed in the exact order in which they're written, because they pass the "world" value from statement to statement as we described above. Thus, this version of the function is much easier to understand because we don't have to mentally figure out the data dependency of the "world" value.

Moreover, IO actions like 'get2chars' can't be executed directly
because they are functions with a RealWorld parameter. To execute them,
we need to supply the RealWorld parameter, i.e. insert them in the 'main'
chain, placing them in some 'do' sequence executed from 'main' (either directly in the 'main' function, or indirectly in an IO function called from 'main'). Until that's done, they will remain like any function, in partially
evaluated form. And we can work with IO actions as with any other
functions - bind them to names (as we did above), save them in data
structures, pass them as function parameters and return them as results - and
they won't be performed until you give them the magic RealWorld
parameter!

6.1 Example: a list of IO actions

I used additional parentheses around each action, although they aren't really required. If you still can't believe that these actions won't be executed immediately, just recall the real type of this list:

ioActions ::[RealWorld ->((), RealWorld)]

Well, now we want to execute some of these actions. No problem, just
insert them into the 'main' chain:

main =dohead ioActions
ioActions !!1last ioActions

Looks strange, right? :) Really, any IO action that you write in a 'do'
statement (or use as a parameter for the '>>'/'>>=' operators) is an expression
returning a result of type 'IO a' for some type 'a'. Typically, you use some function that has the type 'x -> y -> ... -> IO a' and provide all the x, y, etc. parameters. But you're not limited to this standard scenario -
don't forget that Haskell is a functional language and you're free to
compute the functional value required (recall that "IO a" is really a function
type) in any possible way. Here we just extracted several functions
from the list - no problem. This functional value can also be
constructed on-the-fly, as we've done in the previous example - that's also
OK. Want to see this functional value passed as a parameter?
Just look at the definition of 'when'. Hey, we can buy, sell, and rent
these IO actions just like we can with any other functional values! For example, let's define a function that executes all the IO actions in the list:

No black magic - we just extract IO actions from the list and insert
them into a chain of IO operations that should be performed one after another (in the same order that they occurred in the list) to "compute the final world value" of the entire 'sequence_' call.

With the help of 'sequence_', we can rewrite our last 'main' function as:

main =sequence_ ioActions

Haskell's ability to work with IO actions as with any other
(functional and non-functional) values allows us to define control
structures of arbitrary complexity. Try, for example, to define a control
structure that repeats an action until it returns the 'False' result:

while ::IOBool->IO()
while action =???

Most programming languages don't allow you to define control structures at all, and those that do often require you to use a macro-expansion system. In Haskell, control structures are just trivial functions anyone can write.

6.2 Example: returning an IO action as a result

How about returning an IO action as the result of a function? Well, we've done
this each time we've defined an IO procedure - they all return IO actions
that need a RealWorld value to be performed. While we usually just
execute them as part of a higher-level IO procedure, it's also
possible to just collect them without actual execution:

These assigned IO procedures can be used as parameters to other
procedures, or written to global variables, or processed in some other
way, or just executed later, as we did in the example with 'get2chars'.

But how about returning a parameterized IO action from an IO procedure? Let's define a procedure that returns the i'th byte from a file represented as a Handle:

readi h i =do hSeek h i AbsoluteSeek
hGetChar h

So far so good. But how about a procedure that returns the i'th byte of a file
with a given name without reopening it each time?

This way of using IO actions is very typical for Haskell programs - you
just construct one or more IO actions that you need,
with or without parameters, possibly involving the parameters that your
"constructor" received, and return them to the caller. Then these IO actions
can be used in the rest of the program without any knowledge about your
internal implementation strategy. One thing this can be used for is to
partially emulate the OOP (or more precisely, the ADT) programming paradigm.

6.3 Example: a memory allocator generator

As an example, one of my programs has a module which is a memory suballocator. It receives the address and size of a large memory block and returns two
procedures - one to allocate a subblock of a given size and the other to
free the allocated subblock:

How this is implemented? 'alloc' and 'free' work with references
created inside the memoryAllocator procedure. Because the creation of these references is a part of the memoryAllocator IO actions chain, a new independent set of references will be created for each memory block for which
memoryAllocator is called:

What we've defined here is just a pair of closures that use state
available at the moment of their definition. As you can see, it's as
easy as in any other functional language, despite Haskell's lack
of direct support for impure functions.

The following example uses procedures, returned by memoryAllocator, to
simultaneously allocate/free blocks in two independent memory buffers:

6.4 Example: emulating OOP with record types

Let's implement the classical OOP example: drawing figures. There are
figures of different types: circles, rectangles and so on. The task is
to create a heterogeneous list of figures. All figures in this list should
support the same set of operations: draw, move and so on. We will
represent these operations as IO procedures. Instead of a "class" let's
define a structure containing implementations of all the procedures
required:

Now let's define "full-featured" figures that can actually be
moved around. In order to achieve this, we should provide each figure
with a mutable variable that holds each figure's current screen location. The
type of this variable will be "IORef Point". This variable should be created in the figure constructor and manipulated in IO procedures (closures) enclosed in
the Figure record:

It's important to realize that we are not limited to including only IO actions
in a record that's intended to simulate a C++/Java-style interface. The record can also include values, IORefs, pure functions - in short, any type of data. For example, we can easily add to the Figure interface fields for area and origin:

7 Exception handling (under development)

Although Haskell provides set of exception rasising/handling features comparable to those in popular OOP languages (C++, Java, C#), this part of language receives much less attention than there. First reason is that you just don't need to pay attention - most times it just works "behind the scene". Second reason is that Haskell, being lacked OOP inheritance, doesn't allow to easily subclass exception types, therefore limiting flexibility of exception handling.

First, Haskell RTS raise more exceptions than traditional languages - pattern match failures, calls with invalid arguments (such as head []) and computations whose results depend on special values undefined and error "...." all raise their own exceptions:

example 1:

main =print(f 2)
f 0="zero"
f 1="one"

example 2:

main =print(head[])

example 3:

main =print(1+(error"Value that wasn't initialized or cannot be computed"))

This allows to write programs in much more error-prone way.

8 Interfacing with C/C++ and foreign libraries (under development)

While Haskell is great at algorithm development, speed isn't its best side. We can combine best of both worlds, though, by writing speed-critical parts of program in C and rest in Haskell. We just need a way to call C functions from Haskell and vice versa, and to marshal data between two worlds.

We also need to interact with C world for using Windows/Linux APIs, linking to various libraries and DLLs. Even interfacing with other languages requires to go through C world as "common denominator". Appendix [6] to Haskell'98 standard provides complete description of interfacing with C.

We will learn FFI via series of examples. These examples includes C/C++ code, so they need C/C++ compilers to be installed, the same will be true if you need to include code written in C/C++ in your program (C/C++ compilers are not required when you need just to link with existing libraries providing APIs with C calling convention). On Unix (and MacOS?) systems system-wide default C/C++ compiler typically used by GHC installation. On Windows, no default compilers exist, so GHC typically shipped with C compiler, and you may find on download page GHC distribution with bundled C and C++ compilers. Alternatively, you may find and install gcc/mingw32 version compatible with your GHC installation.

If you need to make your C/C++ code as fast as possible, you may compile your code by Intel compilers instead of gcc. However, these compilers are not free, moreover on Windows code compiled by Intel compilers may be interact with GHC-compiled code only if one of them is put into DLLs (due to RTS incompatibility) [not checked! please correct if i'm wrong].

8.1 Calling functions

First, we will learn how to call C functions from Haskell and Haskell functions from C. The first example consists of three files:

Or, you may compile C module(s) separately and link in .o files (it may preferable if you use make and don't want to recompile unchanged sources - ghc's --make option provides smart recompilation only for .hs files):

ghc -c evil.c
ghc --make main.hs evil.o

You may use gcc/g++ directly to compile your C/C++ files but i recommend to do linking via ghc because it adds a lots of libraries required for execution of Haskell code. For the same reasons, even if your main routine is written in C/C++, i recommend you to call it from Haskell function main - otherwise you'll have to explicitly init/shutdown the GHC RTS (run-time system).

8.2 All about "foreign" statement

"ccall" specifier in foreign statements means use of C (not C++ !) calling convention. This means that if you want to write external function in C++ (instead of C) you should add export "C" specification to its declaration - otherwise you'll get linking error. Let's rewrite out first example to use C++ instead of C:

where evil.cpp is just renamed evil.c from the first example. Note that new prototypes.h is written in the manner that allows to compile it both as C and C++ code. When it's included from evil.cpp, it's compiled as C++ code. When GHC compiles main.hs via C compiler (enabled by -fvia-C option), it also includes prototypes.h but compiles it in C mode. It's why you need to specify .h files in "foreign" declarations - depending on Haskell compiler you use, these files may be included to check consistency of C and Haskell declarations.

Quoted part of foreign statement may also be used to import or export function under another name:

specifies that C function called CFunction will become known as Haskell function c_function, while Haskell function haskell_function will be known in C world as HaskellFunction. It's required when C name doesn't conform to Haskell naming requirements.

Although Haskell FFI standard tells about many other call type conventions in addition to ccall - cplusplus, jvm, net - current Haskell implementations support only ccall and stdcall. Later, also known as Pascal calling convention, used to interface with WinAPI:

And finally about safe/unsafe specifier: C function imported with "unsafe" keyword is called directly and Haskell runtime is stopped while C function is executed (when there are several OS threads executing Haskell program, only current OS thread is delayed). This call doesn't allowed to recursively enter into Haskell world by calling any Haskell function - the Haskell RTS is just not prepared to such event. OTOH, unsafe calls are as quick as calls in C world. It's ideal for "momentary" calls that quickly returns back to the caller.

When "safe" is specified, C function called in safe environment - Haskell execution context is saved, so it's possible to call back to Haskell and, if C call executed too much time, other OS thread may be started to execute Haskell code (of course, in threads other that one called C code). This has its own price, though - around 1000 CPU ticks per call.

You can read more about interaction between FFI calls and Haskell concurrency in [7].

8.3 Marshalling simple types

Calling by itself is relatively easy, the real problem of interfacing languages with different data models is passing data between them. There is no even guarantee that Haskell Int is the same type as C int, Haskell Double is the same as C double and so on. While on *some* platforms they are the same and you can write throw-away programs relying on these, portability goal requires you to declare imported and exported functions using special types described in FFI standard which are guaranteed to correspond to C types. These are:

Note that pure C functions (whose results are depend only on their arguments) are imported without IO in their return type. "const" C specifiers doesn't reflect in Haskell types, so appropriate compiler checks are not performed.

All these numeric types are instances of the same classes as their Haskell cousins (Ord, Num, Show and so on), so you may perform calculations on these data directly. Alternatively, you may convert them to native Haskell types. It's very typical to write simple wrappers around imported and exported functions just to provide interfaces having native Haskell types:

8.6 Marshalling composite types

There is no built-in support for marshalling C structures and using C constants in Haskell. These are implemented in c2hs preprocessor, though.

Binary marshalling (serializing) of data structures of any complexity is implemented in library Binary.

8.7 Dynamic calls

8.8 DLLs

because i don't have experience of using DLLs, can someone write into this section? ultimately, we need to consider the following tasks:

using DLLs of 3rd-party libraries (such as ziplib)

putting your own C code into DLL to use in Haskell

putting Haskell code into DLL which may be called from C code

8.9 HSFFIG (C libraries binding generator)

HSFFIG is an utility that automatically generates bindings to C libraries by analyzing C header files.

9 Dark side of IO monad

9.1 unsafePerformIO

Programmers coming from an imperative language background often look for a way to execute IO actions inside a pure procedure. But what does this mean?
Imagine that you're trying to write a procedure that reads the contents of a file with a given name, and you try to write it as a pure (non-IO) function:

readContents :: Filename ->String

Defining readContents as a pure function will certainly simplify the code that uses it. But it will also create problems for the compiler:

This call is not inserted in a sequence of "world transformations", so the compiler doesn't know at what exact moment you want to execute this action. For example, if the file has one kind of contents at the beginning of the program and another at the end - which contents do you want to see? You have no idea when (or even if) this function is going to get invoked, because Haskell sees this function as pure and feels free to reorder the execution of any or all pure functions as needed.

Attempts to read the contents of files with the same name can be factored (i.e. reduced to a single call) despite the fact that the file (or the current directory) can be changed between calls. Again, Haskell considers all non-IO functions to be pure and feels free to omit multiple calls with the same parameters.

So, implementing pure functions that interact with the Real World is
considered to be Bad Behavior. Good boys and girls never do it ;)

Nevertheless, there are (semi-official) ways to use IO actions inside
of pure functions. As you should remember this is prohibited by
requiring the RealWorld "baton" in order to call an IO action. Pure functions don't have the baton, but there is a special "magic" procedure that produces this baton from nowhere, uses it to call an IO action and then throws the resulting "world" away! It's a little low-level magic :) This very special (and dangerous) procedure is:

where 'createNewWorld' is an internal function producing a new value of
the RealWorld type.

Using unsafePerformIO, you can easily write pure functions that do
I/O inside. But don't do this without a real need, and remember to
follow this rule: the compiler doesn't know that you are cheating; it still
considers each non-IO function to be a pure one. Therefore, all the usual
optimization rules can (and will!) be applied to its execution. So
you must ensure that:

The result of each call depends only on its arguments.

You don't rely on side-effects of this function, which may be not executed if its results are not needed.

Let's investigate this problem more deeply. Function evaluation in Haskell
is determined by a value's necessity - the language computes only the values that are really required to calculate the final result. But what does this mean with respect to the 'main' function? To "calculate the final world's" value, you need to perform all the intermediate IO actions that are included in the 'main' chain. By using 'unsafePerformIO' we call IO actions outside of this chain. What guarantee do we have that they will be run at all? None. The only time they will be run is if running them is required to compute the overall function result (which in turn should be required to perform some action in the
'main' chain). This is an example of Haskell's evaluation-by-need strategy. Now you should clearly see the difference:

- An IO action inside an IO procedure is guaranteed to execute as long as
it is (directly or indirectly) inside the 'main' chain - even when its result isn't used (because the implicit "world" value it returns will be used). You directly specify the order of the action's execution inside the IO procedure. Data dependencies are simulated via the implicit "world" values that are passed from each IO action to the next.

- An IO action inside 'unsafePerformIO' will be performed only if
result of this operation is really used. The evaluation order is not
guaranteed and you should not rely on it (except when you're sure about
whatever data dependencies may exist).

I should also say that inside 'unsafePerformIO' call you can organize
a small internal chain of IO actions with the help of the same binding
operators and/or 'do' syntactic sugar we've seen above. For example, here's a particularly convoluted way to compute the integer that comes after zero:

and in this case ALL the operations in this chain will be performed as
long as the result of the 'unsafePerformIO' call is needed. To ensure this,
the actual 'unsafePerformIO' implementation evaluates the "world" returned
by the 'action':

Semantically inlinePerformIO = unsafePerformIO
in as much as either of those have any semantics at all.

The difference of course is that inlinePerformIO is even less safe than
unsafePerformIO. While ghc will try not to duplicate or common up
different uses of unsafePerformIO, we aggressively inline
inlinePerformIO. So you can really only use it where the IO content is
really properly pure, like reading from an immutable memory buffer (as
in the case of ByteStrings). However things like allocating new buffers
should not be done inside inlinePerformIO since that can easily be
floated out and performed just once for the whole program, so you end up
with many things sharing the same buffer, which would be bad.

So the rule of thumb is that IO things wrapped in unsafePerformIO have
to be externally pure while with inlinePerformIO it has to be really
really pure or it'll all go horribly wrong.

That said, here's some really hairy code. This should frighten any pure
functional programmer...

This does not adhere to my rule of thumb above. Don't ask exactly why we
claim it's safe :-) (and if anyone really wants to know, ask Ross
Paterson who did it first in the Builder monoid)

9.3 unsafeInterleaveIO

But there is an even stranger operation called 'unsafeInterleaveIO' that
gets the "official baton", makes its own pirate copy, and then runs
an "illegal" relay-race in parallel with the main one! I can't talk further
about its behavior without causing grief and indignation, so it's no surprise
that this operation is widely used in countries that are hotbeds of software piracy such as Russia and China! ;) Don't even ask me - I won't say anything more about this dirty trick I use all the time ;)

One can use unsafePerformIO (not unsafeInterleaveIO) to perform I/O
operations not in predefined order but by demand. For example, the
following code:

dolet c = unsafePerformIO getChar
do_proc c

will perform getChar I/O call only when value of c is really required
by code, i.e. it this call will be performed lazily as any usual
Haskell computation.

Three chars inside this list will be computed on demand too, and this
means that their values will depend on the order they are consumed. It
is not that we usually need :)

unsafeInterleaveIO solves this problem - it performs I/O only on
demand but allows to define exact *internal* execution order for parts
of your datastructure. It is why I wrote that unsafeInterleaveIO makes
illegal copy of baton :)

First, unsafeInterleaveIO has (IO a) action as a parameter and returns
value of type 'a':

do str <- unsafeInterleaveIO myGetContents

Second, unsafeInterleaveIO don't perform any action immediately, it
only creates a box of type 'a' which on requesting this value will
perform action specified as a parameter.

Third, this action by itself may compute the whole value immediately
or... use unsafeInterleaveIO again to defer calculation of some
sub-components:

This code will be executed only at the moment when value of str is
really demanded. In this moment, getChar will be performed (with
result assigned to c) and one more lazy IO box will be created - for s.
This box again contains link to the myGetContents call

Then, list cell returned that contains one char read and link to
myGetContents call as a way to compute rest of the list. Only at the
moment when next value in list required, this operation will be
performed again

As a final result, we get inability to read second char in list before
first one, but lazy character of reading in whole. bingo!

PS: of course, actual code should include EOF checking. also note that
you can read many chars/records at each call:

A little disclaimer: I should say that I'm not describing
here exactly what a monad is (I don't even completely understand it myself) and my explanation shows only one _possible_ way to implement the IO monad in
Haskell. For example, the hbc Haskell compiler implements IO monad via
continuations. I also haven't said anything about exception handling,
which is a natural part of the "monad" concept. You can read the "All About
Monads" guide to learn more about these topics.

But there is some good news: first, the IO monad understanding you've just acquired will work with any implementation and with many other monads. You just can't work with RealWorld
values directly.

Second, the IO monad implementation described here is really used in the GHC,
yhc/nhc (Hugs/jhc, too?) compilers. Here is the actual IO definition
from the GHC sources:

newtypeIO a =IO(State# RealWorld ->(# State# RealWorld, a #))

It uses the "State# RealWorld" type instead of our RealWorld, it uses the "(# #)" strict tuple for optimization, and it adds an IO data constructor
around the type. Nevertheless, there are no significant changes from the standpoint of our explanation. Knowing the principle of "chaining" IO actions via fake "state of the world" values, you can now easily understand and write low-level implementations of GHC I/O operations.

This implementation makes the "World" disappear somewhat, and returns Either a
result of type "a", or if an error occurs then "IOError". The lack of the World on the right-hand side of the function can only be done because the compiler knows special things about the IO type, and won't overoptimise it.