Sunday, January 1, 2012

Haskell for C Programmers - For Loops

C programmers make incredibly heavy use of for loops and when they switch over to Haskell they have trouble writing idiomatic code because Haskell doesn't provide an analagous programming construct. Simon Peyton Jones calls Haskell "the world's finest imperative programming language", so I'll take a few simple C code examples and translate them to exactly equivalent Haskell and then compare them to idiomatic Haskell.

sum :: [Double] -> Int -> Double
sum xs n = flip evalState def $ do
result ~= 0
for (i ~= 0) (liftM (< n) (access i)) (i += 1) $
i'
However, we don't really need to use the length, because Control.Monad provides a way to call an action once for each element of a list, namely forM. This eliminates the n and i variables completely and removes any chance of an out-of-bounds run-time error by providing the wrong list length. forM resembles other languages' foreach constructs, except it is more general:

Hopefully you can see that Haskell's forM/mapM replace C's for loops for monadic code, but you shouldn't rely on them like a crutch. For example, if I wanted to print all pair-wise combinations of two numbers from 1 to 10, I could write:

forM_ [1..10] $ \i ->
forM_ [1..10] $ \j ->
print (i, j)

... which looks an awful lot like a nested for-loop you'd write in C. But I could also write it like:

mapM_ print $ do
i
... or you could get even fancier and use the Applicative class:

mapM_ print $ (,) [1..10] [1..10]

... which almost seems like cheating.

Modularity

Idiomatic Haskell separates data generation from data consumption and the last function demonstrates that. You can partition the function into a generator and consumer of data:

generator = (,) [1..10] [1..10]
consumer = mapM_ print

This design pattern separate concerns and creates more modular code.

You might think this doesn't work when you need to process data before all of it is available, but you'd be wrong. Haskell uses enumerators and "iteratees" to solve precisely that problem. You write generator and consumer code independently, yet they still magically "fuse" into the correct data-processing pipeline when you combine them, handling data as it is generated. I will discuss these in a later post.

This leads to a good rule of thumb: strive to separate data generation from consumption, even when not programming in Haskell.

6 comments:

For a newcomer to Haskell, this blog is not easy. Without introducing type system of haskell everything seems hazy. May be if you could write a blog to mark the correct order of studying your other blogs then it could be immensely helpful.Thanks for your efforts!

This is a big problem of Haskell. Unless you completely scrap everything you know, and learn EVERYTHING in haskell from examples. Only THEN can you look at your old code, spend a few dozen of hours figuring out what the program has to do (as it was initially designed in a testable imperative way, and haskell does not support imperative constructs), then you can, using the big dictionary of expressions defined in Haskell, write your program in a way which makes sense in terms of functional programming. After it works, you can rewrite it back as imperative program (this transition is easier) while knowing that it does exactly what you want it to do, without any distractions arising from data-loss during optimization step which sacrifices expressiveness for efficiency.

You'd be right if you had made that remark a few years ago, back when Haskell was essentially playing catch-up with imperative languages. Things are different now and Haskell is slowly but surely beating imperative languages at their own game. I can think of at least four killer Haskell features off of the top of my head where Haskell is best-in-class, many of which are virtually inexistent in mainstream languages:

I think you would have difficulty writing anything even remotely as powerful as that in an imperative language.

Now combine that with a type system that makes programs immensely more bug-free and maintainable. Sure, you can translate a Haskell program back to an imperative one, but you'll wish you hadn't the moment your client requests new features or for you to rearchitect your code, both of which are significantly easier in Haskell.

Haskell's strong and expressive type system acts like a silent guardian protecting you from your mistakes, so you feel much braver about adding new features or re-engineering large swaths of code, even more so than if you had constructed a very large and comprehensive test suite in another language. This alone is the greatest contributor to productivity in Haskell, since you spend less time writing tests and fixing bugs and more time actually implementing features, the way programming was meant to be.

However, I want to caution you that pure functions are not the solution to everything. Haskell is not an excellent programming language because everything is done purely and functionally. Rather, Haskell is an excellent language because functional programming is the perfect foundation for rigorously building more sophisticated abstractions. Imperative languages can't do this because they can't opt-out of state and side effects, making it incredibly difficult to reason about the correctness of abstractions, whereas Haskell gets to opt-in whenever it wants.

If you haven't read my category design pattern post then please do, because there I argue that the key to powerful, easy, and reliable programming is not just functions, but rather categories in general. Of the four killer features I mentioned above, only one actually uses the category of Haskell functions (lenses). I consider the overemphasis on pure functional style to be the biggest cultural weakness of the Haskell community.

It took some time for me to realize that this examples don't work anymore. Probably because of some old version of lens. Here is my attemp to rewrite them (this is one of my first experiences of using lenses ;) ):