This error indicates that the compiler found a problem when it was checking the types. The error message gives us lots of information about exactly where it ran into trouble. In most cases when I get a type error, looking at just the following information is enough to quickly see and fix the problem:

But sometimes the error is not so obvious and the extra information is useful. Let's look more closely:

/tmp/Test.hs:1:12:

This tells us that it ran into a problem in /tmp/Test.hs at line 1, column 12, i.e., when it sees 'r'.

Couldn't match expected type `Bool' against inferred type `Char'

GHCi was expecting to find a Bool value, (e.g., True or False), but instead it found a Char value.

In the first argument of `not', namely 'r' In the expression: (not 'r') In the definition of `test': test = (not 'r')

The next three lines give us a contextual description of where the error occurred. Although the first line already told us the exact file, line number, and column, the context information can sometimes help you understand how the compiler sees the code better. For example, you will can see that it implicitly added parentheses in this expression (not 'r'), and that it thinks (correctly) that 'r' is the first argument to the function not.

Creating Your Own Types

The Bool type can be defined like this:

dataBool=False|True

The keyword data indicates that we are declaring a new algebraic data type. Bool is the name of the type, also known as a type constructor. After the = we have two data constructors: False and True. The | is a separator that is required between constructors.

That is all that is required to define the type. We can use it like this:

moduleMainwhere

importPreludehiding(Bool(..))-- this hides the predefined version of Bool

dataBool=False|True

main=caseTrueofTrue->putStrLn"True"False->putStrLn"False"

We see the case statement that we introduced last lesson. This time instead of pattern matching on a String we are matching on a Bool.

Cool Stuff We Learned Today

Today we saw some really cool stuff!

Type Inference

The first thing we saw is that the compiler is able to infer the types of expressions automatically. This inference is not limited to just predefined values like not or True. The compiler was able to infer the types of new expressions that we created, like not True.

Static Type Checking

In addition to inferring the types of expressions, the compiler also checked that we used expressions in a sensible way at compile time. For example, it noticed that we tried to pass 'r' to the not.

Static type checking eliminates a whole category of common bugs by preventing you from calling a function with a nonsense argument. It is also useful if you change the arguments that a function takes. The compiler will let you know all the places that need to be updated to reflect the change. This is especially useful if you change a library function that is used by lots of applications. When you later try to rebuild those applications, you won't have to remember if they use any functions that have changed -- the compiler will tell you.

Due to type inference, we get these benefits for free -- we don't have to run the code, we don't have to write unit tests to test the code, and we don't have to explicitly declare our types, we just have to compile it, or load it into the interpreter.

New Types

We also learned how easy it is to create new types in Haskell.

We Know A Lot About not

We saw earlier that not is a pure function with the type signature Bool -> Bool. It turns out that we know an awful lot about the function just from the type signature.

We know this function is pure because all functions in Haskell are pure by default. A pure function always produces the same output when given the same input. This is because pure function has no way to remember information from one call to another and it has no way to access external IO resources, such as the disk, keyboard, a random number generator, etc. Therefore, there is no way to return a different result, when presented with the same input. You are already familiar with pure functions from basic arithmetic; functions like + and - are pure functions. 1 + 1 always equals 2 no matter what calculations you have done in the past.

We also know that this function takes a Bool and returns a Bool.

Combining purity and the type signature Bool -> Bool, we can see that not could only do one of five possibly things:

Later, we will see how purity makes it easy to do automated unit testing.

Additional Notes

There a few simple rules you need to know when declaring a new data type.

data Declarations Must Be at the Top Level

You can not declare new data types inside functions or inside anything else. They must always be declared at what is called the top-level.

Case matters

Type and data constructors must always start with an uppercase letter, followed by zero or more upper or lower case letters or numbers.

Exercise

Your simple exercise for today is to create your own data type, and use it in a case statement, similar to the Bool example given above.

Preview

If you experimented with the :t command, you may have found that some values return some unusual looking results, for example, we see that "hello" is a list of characters:

Prelude> :t "hello""hello" :: [Char]

We will learn more about lists very soon. The number 1 has a really funny looking type:

Prelude> :t 11 :: (Num t) => t

That is because we want to be able to use 1 to represent values of several different types. For example, an Integer or a Float. We will learn more about this type signature when we learn about polymorphism and type classes.

Closing

Today's lesson had lots of technical terms in it. Don't worry too much about about the exact meaning of the words, and don't worry about trying to remember them all at once. These same terms will come up again and again in future lessons and I will continue to try to make their meanings obvious from the context. Once you have more experience with Haskell, it will be easier to give concrete definitions of the terms.

Tuesday, September 11, 2007

Overview

Today we will start learning about the case statement. Here is some code to get us started:

moduleMainwhere

main=doputStrLn"Do you like Haskell? [yes/no]"answer<-getLinecaseanswerof"yes"->putStrLn"yay!""no"->putStrLn"I am sorry to hear that :("_->putStrLn"say what???"

A Closer Look at case

The first line of the case statement looks like this:

caseanswerof

You can put any valid Haskell expression between the keywords case and of. In this example we have a very simple expression: the variable answer.

The next two lines look like this:

"yes"->putStrLn"yay!""no"->putStrLn"I am sorry to hear that :("

Notice that they are indented more than the case line. This is another example of whitespace sensitive layout in Haskell. Like with the do statement, each alternative of the case statement will be indented the same amount. If a line is indented more, then it is a continuation of the previous line. If a line is indented less, then the previous line is the last alternative in the case statement.

The case statement will check each alternative, in the order they are listed until it finds a pattern that matches. Once a match is found, the expression on the right hand side of the -> is evaluated. After a match is found, no further alternatives are considered.

The Default Wild Card Alternative

The final line is:

_->putStrLn"say what???"

The underscore is a wild card pattern that will match anything. So this alternative will match when the user enters something besides yes or no.

case Always Matches Exactly One Alternative

The case statement will always evaluate exactly one alternative. Let's see what happens when there is more than one match or no matches at all.

Overlapping Patterns

Let's say we stay up too late hacking Haskell code, and we accidentally put in the "yes" alternative twice:

main=doputStrLn"Do you like Haskell? [yes/no]"answer<-getLinecaseanswerof"yes"->putStrLn"yay!""yes"->putStrLn"awesome!""no"->putStrLn"I am sorry to hear that :("_->putStrLn"say what???"

If you try running the code, you will see that when you enter yes it always prints yay! and never prints awesome!. Notice that if we put the wild card pattern first, we will also get an overlapping pattern warning:

main=doputStrLn"Do you like Haskell? [yes/no]"answer<-getLinecaseanswerof_->putStrLn"say what???""yes"->putStrLn"yay!""no"->putStrLn"I am sorry to hear that :("

GHCi tells us that "yes" and "no" will never be considered, since _ matches everything:

It's nice that the exception tells us which file and line number the non-exhaustive pattern is at, but it would be even nicer if it told us before we tried to run the code. GHC can do this if we enable some extra warnings with the -W flag. In GHCi, we can set this flag by typing :set -W at the prompt:

Now, GHCi produces a (somewhat bizarre) warning, telling us that we have a non-exhaustive pattern. The last part of the error is not very easy to understand, but if we just look at the first two lines, things make sense:

/tmp/Incomplete.hs:6:7: Warning: Pattern match(es) are non-exhaustive

This tells us that the case statement at Line 6, Column 7 in the file Incomplete.hs does not have alternatives for all possible values.

If you are compiling the code, you can just add the flag -W to the command-line:

$ ghc --make -O2 -W Incomplete.hs -o incomplete

You may wonder why incomplete pattern matching is not enabled by default. Consider the following example:

The warning says, you only matched on the value 2, but you have not handled all the cases where the value is not equal to 2 (e.g. 1,3,4,5,6,...). Obviously 2 is the only value that will ever come up, so it does not matter that the other alternatives are not matched.

In this case, it is rather obvious that the warning can be ignored. A more sophisticated compiler might be able to figure this out as well, and not bother to warn you. In fact, there is a program catch, by Neil Mitchell, which does just that. I expect catch will be integrated in GHC someday.

Cool Stuff

We are not done learning about the case statement yet, but we have already seen some cool stuff. If you have used other languages such as C, C++, Java, etc, you are probably familiar with a similar construct know as the switch statement. However, in many languages, switch only works with a few (numeric) data types. The case statement in Haskell, however, can be used with (almost) all data types. In C, we would have to use a bunch of if-then-else statements like:

I think you will agree that the Haskell version looks a lot more elegant and easier to comprehend. At the very least, the Haskell version is easier on the fingers to type.

Aesthetics aside, Haskell can help us avoid bugs by noticing overlapping patterns or incomplete patterns. A C compiler is not likely to notice if we have overlapping or incomplete patterns in our if-then-else-if... statement.

Warnings

GHC has lots of warnings that you can enable. They are documented here. Some projects, such as xmonad enable all the warnings using the -Wall flag, and fix all the warnings before shipping. All the extra warnings can be bothersome when you are developing. But, enabling and fixing the warnings is a good way to clean up your code and perhaps kill a few bugs before a release.

Friday, September 7, 2007

Overview

main=doputStrLn"What is your name?"name<-getLineputStrLn("Hello, "++name++". I think you will really like Haskell!")

Copy this code into a file named HelloYou.hs, and then run it in GHCi (C-c C-l, and then run the main function), or compile it an run it (M-C ghc --make -O2 HelloYou.hs -o helloYou).

Files containing Haskell source code will almost always end with the extention .hs. You should follow this convention as well, because the compiler expects it.

do notation

The first new thing we see is the do keyword. In this context, the do keyword indicates that we want to perform several IO (input/output) actions in a row.

Notice how each action is indented the same amount. In Haskell, the layout of the code is significant. The do statement automatically ends when a line that is indented less is encountered.

For example, in this code:

main = do putStrLn "What is your name?" name putStrLn ("Hello, " ++ name ++ ". I think you will really like Haskell!")

cheese = "cheddar"

Because cheese = "cheddar" starts at the first column, it is not part of the previous do statement. The blank line before cheese is ignored.

If a line is indented more, then it is considered to be a continuation of the previous line. For example, we can reformat our program so that the third action is split across two lines:

main = do putStrLn "What is your name?" name putStrLn ("Hello, " ++ name ++ ". I think you will really like Haskell!")

The meaning is not changed at all.

Variable Binding

The next new thing we see is this line:

name<-getLine

getLine is an IO action that reads a line of input from stdin. The <- operator binds the variable name to the value read by getLine.

What Does bind Mean?

You can imagine that the value returned by getLine is a cardboard box with a String inside. bindingname to the value is like putting a label on the cardboard box. This makes it easy to refer to that value later, because we can just use the variable name.

Not Like Variables You Have Seen Before

If you have used other programming languages, you are probably familiar with a different kind of variable. For example, in C, you would declare a variable i and assign it the value 9 like this:

int i; i = 9;

This called a destructive update and is different than binding a variable. If we think about the box analogy, the statement int i; creates a new cardboard box with the label i already attached to it. The statement i = 9; opens up the box, destroys the current contents, and then puts 9 in the box.

This difference is pretty substantial -- just imagine how you would feel if you were still using the old contents of the box! We will cover this concept more in a few lessons, and see why the difference is so exciting.

The ++ operator

The next new thing we see is the ++ operator. ++ concatenates two lists together and returns a new list. In Haskell, a String is just a list a characters.

Where did those () come from?

In the first lesson, we noted that Haskell does not require you to use parentheses when calling a function; but now we have some parentheses, so what's the deal? The parentheses in Haskell are used to group operations, in the same way you would in math.

So in this line:

putStrLn ("Hello, " ++ name ++ ". I think you will really like Haskell!")

The parentheses indicated that we want to concatenate the strings first, and then apply putStrLn to the new string. If we did not use parentheses, the compiler would add implicit parentheses like this:

which does not make any sense; it says we want to print the string Hello , and then append the String, (name ++ ". I think you will really like Haskell!"), to the value returned by putStrLn.

No Variable Declarations, But Still Safe

In our program, we use a variable called name. You may have noticed that we did not declare that variable before we started using it. That saves us some typing, which is nice, but what happens if we make a typo and spell it nmae? Let's load this code into GHCi and see:

module Main where

main = do putStrLn "What is your name?" name putStrLn ("Hello, " ++ nmae ++ ". I think you will really like Haskell!")

Nice! GHC tells us that the identifier `nmae' at line 6, column 30, is not defined. Notice that we have not tried to run the code yet, the bug was detected at compile time. So, we get the best of both worlds: we don't have to tell the compiler about our variables before we use them, but the compiler can still tell us if we accidentally use an undefined variable.

Cool Stuff We Learned Today

Today's code example was pretty simple, but, we managed to avoid four extremely common bugs that have been responsible for thousands of security holes and program crashes.

Safe from Buffer Overflows

The first two bugs we avoid are buffer overflows. Buffer overflows are an extremely common source of security holes and program crashes. Buffer overflows occur when a string is too big to fit in the space allocated for it, or when some code thinks a string is longer than it really is, and tries to read characters beyond the end of the string.

Two common places to encounter buffer overflows are when you are reading input and receive more input than you expected, or when you are concatenating strings and don't allocate enough space, or you accidentally copy too much data.

In our code sample of the day, we read input with getLine and concatenate Strings with ++. But, we never had to worry about how long the Strings were, it was all handled automatically for us.

Safety from buffer overflows is nice, but does not really set Haskell apart. Many other languages, such as Java, Python, Perl, Ruby, etc, are also safe from buffer overflows in this way (as far as I known). So let's look at the next two bugs we managed to avoid, which are a bit more interesting.

Safe from Uninitialized Variables

We saw earlier that when we misspelled name as nmae, the compiler caught our mistake. It noticed that we were trying to use the variable nmae, but nmae had not been bound to anything yet.

This is really nice! In many languages that do not required you to declare your variables in advance, you would not notice this bug until you ran the program. In some cases the program would die when it tried to use nmae. In other cases, it would just assume that nmae was equal to the empty string "", so you might not even notice the problem!

Even in languages where you have to declare your variables in advance, we are susceptible to a similar bug. Consider the following C code:

int a; int b; int c;

b = 1; c = a + b; printf("%d\n", c);

Even though we declared our variables a,b, and c, I forgot to assign a value to a. This means that when try to use a to calculate c we have absolutely no idea what will happen. On my system c was equal to -1077263575 the first time I ran it, and -1080545591 the second time.

Fortunately, Haskell goes the extra step and requires that all the variables we use are actually bound to a value, so we will not be seeing of those nasty behaviors.

How It Looks Is Meaningful

We saw in this lesson that the formatting of the code is important to the compiler. This is nice because the code looks pretty, and we don't have to type in lots of extra characters like (){};. But it also helps us avoid bugs. Almost all programmers try to format their code so that you can understand the flow of the code by the way it looks. But, in most languages, the compiler does not care how the code looks. Consider the following C code:

if (something) doSomething(); doSomethingElse();

doSomethingToo();

Even though doSomethingElse()looks like it is inside the if statement, it is not. The compiler reads it like this:

if (something) doSomething();

doSomethingElse(); doSomethingToo();

Ouch! Since Haskell does care about the formatting, you are far less likely to see things differently than the compiler does.

Summary

Well, I think I went over my time again, but there was a lot of new material, even though today's program was only two lines longer than yesterday's. Next time we will learn about the case statement. This will allow us to do different things depending on what the user enters.

You don't need to memorize or perfectly understand everything in this lesson. We will be exploring these concepts more in the upcoming lessons, which should help you to remember and understand them.

emacs corner: TAB indenting

In the previous lesson, we saw that using emacs makes it easy to load programs into GHCi. You probably also noticed that emacs colored the source code for you. In this lesson, we learned that the indentation of each line is significant. emacs can help here too. If you copied and pasted the code into emacs try typing it in by hand instead. When you need to indent a line, press the TAB key a couple times in a row. You will see that emacs cycles through the different possibilities. Personally, I find this feature extremely useful.

Installing Haskell and supporting tools

Disclaimer

In some of the initial lessons I assume you are as well, but if you are not, you should still be able to follow along. Once we really get into the language, it should not matter much, until we get to advanced lessons that use features only found in recent versions of GHC.

To switch between the code and ghci buffers press C-x o, or use the mouse and click inside the buffer you want active.

To run the main function, just type main and hit enter at the *Main> prompt:

*Main> mainHello, World!*Main>

Compiling the Code

Switch back to the HelloWorld.hs buffer and enter M-C. The M stands for Meta, which is usually the key labelled Alt, and the C is uppercase. So you will need to press Alt-Shift-c. At the bottom of the emacs window it should now say:

This tells the compiler that we want to compile a module and all of its dependencies, and produce an executable that we can run.

-O2

This tells the compiler that we want to enable level 2 optimizations (the highest level available). This should result in an executable that runs faster.

HelloWorld.hs

This is the name of the file that contains the main function.

-o helloWorld

This tells the compiler that the executable should be named helloWorld. If you do not specify a name, it will probably default to a.out

Parts of the Program

The program is pretty short, but let's go over the details anyway.

module Main where

This tells the compiler that this file contains the Main module. When your program is run, it always starts at the main function in the module Main.

main = putStrLn "Hello, World"

This defines a function named main that prints Hello, World followed by a carriage return to stdout. putStrLn is a function, and "Hello, World" is a String. To call a function with an argument, we simply put some whitespace after the function name followed by the argument.

Cool Stuff We Learned

This lesson was a bit long, because we had to install stuff, but we learned some cool stuff:

Haskell code can be compiled to an executable, or run interactively

Haskell syntax is very clean. We do not need lots of characters like {}(); to get stuff done