Introduction to Haskell, Part 2: Pure Functions

General purpose programming languages can be divided into two broad categories: Haskell and everything else.

In Part 1 of this series, I described how all programming languages are grouped into four families: imperative, object-oriented, logic, and functional. Most programming languages start with an imperative core and add object-oriented or functional features on top. C, for example, is a purely imperative programming language. Languages like C++, Java, and C# build on C and wrap up this imperative model with objects. Perl, Python, and Ruby add a little bit of functional programming into the mix as well.

Haskell is different because it is a purely functional language. That is, it supports only the functional style of programming. This unconventional approach gives Haskell a reputation of being a difficult language to learn and use, especially among professional programmers who casually jump around between languages like Java, Perl, Python, PHP, and Ruby. Most languages focus on what features to add and blend to make developers more productive. Haskell takes the opposite approach, removing needless features until there is nothing left to take away.

Pure vs. Impure Functions

Functions are the heart of Haskell. However, functions in Haskell aren't like functions in other languages. Haskell functions are pure, meaning that they define a relationship between inputs and outputs, and have no side effects. This behavior mimics the notion of function you learned in algebra many years ago. Operations like square root, sine, and cosine are pure functions; the square root of four will always be two, regardless of how, where, or when it is computed.

Impure functions refer to external state, modify their behavior based on external state, or change external state. C functions like fopen(), printf(), and malloc() are examples. Calling malloc() returns a pointer to a different block of memory each and every time it is called, but when you attempt to allocate more than the operating system can provide, malloc() will fail. Similarly, if you attempt to allocate a large block of memory, malloc() may succeed, but cause the operating system to swap considerably, degrading performance for the entire machine. Clearly, impure functions are necessary, but unpredictable, because their behavior varies based on how, when, and where they are invoked.

Both kinds of functions are necessary to model the real world. Converting temperatures from Fahrenheit to Celsius is a pure function; by definition, 212&deg F should always convert to 100&deg C. Converting U.S. dollars to U.K. pounds is an impure function that is determined by a very large number of external variables, and there is no expectation that performing the same transaction at two different times or two different places should produce the same result.

Of course, to be a truly general purpose programming language, Haskell needs to support both pure and impure functions. Haskell makes impure functions available, but isolates them through the use of monads, which are actually built from pure functions. Monads are a deep and interesting topic, and are discussed in the next part of this series.

Programming with Pure Functions

Because pure functions refer only to their inputs, they are deterministic, easy to understand, and easy to predict. For example, a function like:

square x = x * x

will always receive a single argument, and return the value of multiplying that argument by itself. Therefore, an expression like square 2 always returns 4. A function that computes 2 * 2and writes a message to the console or a logfile is an impure function whose behavior cannot be predicted; the computation may succeed, but the function may fail, due to the logging operation. Logging may succeed most of the time, but there is no way to predict whether this function will attempt to write to a closed file handle, write to a file on a filesystem that is no longer available, write to a full filesystem, or any of a number of other I/O failures.

The key to understanding pure functions is to think in terms of small, composable units of code. Simple operations like addition, multiplication, sine, cosine, and concatenation are all pure functions. Pure functions can be composed from other pure functions, like these here:

f x = (3 * x * x) + (6 * x) + 9

Many functions depend on more than one value, and accept multiple arguments:

min a b = if a < b
then a
else b

Note that Haskell can use indentation to signal when an expression is defined over multiple lines. This is known as the off-side rule, and is similar to Python's rules for block scoping. Unlike Python, Haskell's off-side rule is merely a convention for inferring where to place curly braces and semicolons around a series of expressions and statements. Either form is valid, but using the off-side rule is the more convenient and more common form. (The Haskell 98 Report defines the precise interpretation of the layout rule.)

Some functions are complex and refer to more involved sub-expressions. Here is a function that calculates the cost of a shopping basket, where some items are taxable, and others are not:

This example defines two functions, taxRate, which returns a constant value, and total, which computes the total cost of the list of items in a shopping cart. (Although the taxRate definition appears to be defining a variable, it's best to think of it as a constant function, a function that takes no parameters and always returns the same value.) The definition of total is quite expressive, and highlights the intent of the function, by isolating and naming important sub-expressions in the computation. (total also refers to an isTaxable function, not presented here.)

The where clause in total defines meaningful sub-expressions that are used to evaluate the primary expression, subtotal + tax. The subtotal, as you would expect, is just the sum of all the items in the cart. The definition of tax refers to taxable, which is the subset of all items in the cart that are taxable (more precisely, the set of items in the cart where isTaxable is true for that item). Computing the tax involves computing the sum of the taxable subset, and multiplying it by the tax rate. The two utility functions sum and filter do what their names imply and are predefined in the Prelude, Haskell's standard library of useful functions.

We could also define total by avoiding the sub-expressions in the where clause, and writing the entire computation out in a fully expanded form. This version is just as correct, but much harder for people to understand:

total cart = (sum cart) +
(sum (filter isTaxable cart)) * taxRate

The compiler doesn't care how you write these functions, because they are equivalent expressions of the same idea. In fact, the compiler is allowed to rewrite the first definition into the second. However, using tools like where clauses and libraries of useful and reusable functions can help make functions concise and expressive, and easier for people to read.

Real-World Functional Programming

Haskell has deep roots in the world of mathematics, and it is easy to see how it can be used to define simple mathematical functions. However, real-world problems have very little in common with simple mathematical functions.

Or do they?

Consider a program that needs to find the 10 most-used words in a document. Assume for the time being that the document is available as a string value. (Reading a file from disk or standard input would require using the I/O monad, which will be discussed in the next article in this series.)

Fortunately, there are a variety of pre-defined functions in the Prelude that can make this job simple. The first step is to convert a document into a list of words. This is precisely the operation performed by the words function in the Prelude:

top10 doc = ....
where
listOfWords = words doc
...

Next, the list of words needs to be normalized so that variations in capitalization don't sway the results. One way to do this is to convert every word to lowercase. However, there is no lowercase that converts every uppercase character in a string to its lowercase equivalent. But there is a toLower function in the Data.Char library that will lowercase one character. Using map we can apply this transformation to each character in a string:

import Data.Char
lowercase str = map toLower str

Or, more consisely:

import Data.Char
lowercase = map toLower

The Haskell compiler can infer that lowercase must take a string (a list of characters) as an argument, because it knows that map takes two arguments (a function and a list), and toLower converts one character into another. Because map toLower needs one argument, lowercase must need the same argument.

Updating the definition of top10 is easy. Instead of splitting the document into a list of words, we can split the lowercased version of the document into a list of lowercased words:

Next, the words need to be sorted and grouped. These operations can be performed by the sort and group functions in the Data.List library. The sort function does what you would expect. The group function takes a list of values and returns a list of nested lists, where each nested list contains adjoining equal values. For example:

Ordering words by frequency involves a little manipulation of the list of word groups. Each word group is a list of one or more instances of a single word. An easy way to order by frequency is to convert these lists into a pair of values, where the first element is the word count (frequency) and the second element is the word itself. A simple function can perform this action on a single list of words:

makePair l = (length l, head l)

Converting a list of word groups is as simple as using map to apply this function over the entire list:

However, makePair is deeply tied to the problem we are trying to solve and isn't very generic. Instead of creating a name for this throwaway function, we can use a lambda expression to define this function in-place:

-- same as above
map (\l -> (length l, head l)) wordGroups

The \ character starts a lambda, or an anonymous function. Named parameters are found to the left of the arrow and the body of the function is found to the right of the arrow.

Generating the list of words and their frequencies is only part of the solution. Next, the words need to be sorted in descending order:

reverse (sort (map (\l -> (length l, head l)) wordGroups))

Unfortunately, the nesting parentheses complicate the structure of this expression. We can use the $ operator to clean up the expression like so:

reverse $ sort $ map (\l -> (length l, head l)) wordGroups

The $ operator is a little bit of syntactic sugar. It takes everything to the right and treats it as a single expression, then uses the right hand side as the last parameter to the expression on the left hand side. Therefore, these two forms are actually seen by the compiler as the very same expression.

Next, the list of (frequency, word) pairs need to be reduced to a list of words. Pairs are such a useful data structure in Haskell programs that the Prelude includes two functions to get at their contents: fst returns the first element in a pair, and snd returns the second element in a pair. To get the words out of these pairs, we need to use snd. To convert the list of pairs into a list of words, we just need to use map:

Finally, the goal of the top10 function is to find the ten most frequent words in a document. Since wordsByFreq is ordered from most frequent to least frequent, all we need to do is find the first ten elements in the list. This behavior is captured in the take function in the Prelude. take has two parameters: a desired size and a list of values; it returns a list no larger than the desired size.

With this one last piece of the puzzle, here is the complete definition of the top10 function:

Using this bottom-up approach to defining a function like top10, we could write a series of similar functions. For example, a version that strips leading and trailing punctuation characters on each word. Or a version that builds a unique list of words, for use inside a spell checker. Or a version that returns both words and frequencies. Or a version that ignores stopwords like the, of and to in the source text. Writing a series of these functions will likely identify common components, like a function that converts a list of words to a list of (frequency, word) pairs, or a function that converts a list of pairs into a list of words.

The list goes on, and I encourage you to play with this simple little function to explore Haskell further.

Conclusion

Many experienced programmers feel a kind of culture shock when first approaching Haskell. How can you program without mutable variables? Where are the objects? The class hierarchies? The design patterns? How do you debug a program without peppering print statements about? If pure functions can model simple mathematical concepts, can they also model real programming problems?

In this article, I've demonstrated how to develop real code using only pure functions. This approach is both powerful and different from what most programmers use today. Haskell encourages you to think in terms of small, powerful functions that do one thing and do it well. To help you in your efforts, the standard libraries provide a wealth of functions to study, reuse, and combine into interesting and novel functions that let you focus on the problems you are trying to solve, not on how to make your problem conform to an API or design pattern.

Articles in This Series

Adam Turoff
is a software developer with more than a dozen years of experience in building web backend systems with open source tools. His current project uses a mixture of Perl, Ruby, Rails, Haskell, and other tools.