Back in FIXME: add ref to chapter 3ish, you saw how to
use type to create handy aliases for types. That's a useful feature,
but in this chapter we'll take it a step further. We'll show you how to
create entirely new types. After doing that, we'll also show you some of
the built-in tools that Haskell provides for arranging large amounts of
data.

Creating New Types

To create a new type, we use the data keyword. You can create an
amazing variety of types using data. We'll take a look at them,
starting with the most simple, and moving on to more complex types.

Basic Type Creation

To create a new type, we use the data keyword. In its most simple,
though probably useless[10], form, you can create a type like this:

data Silly = Foodata Silly = Foo

This defines a new type called Silly. There is one
type constructor for Silly:
Foo. When you write Foo in your
program, this Foo is a value of type
Silly. You can actually use the same word for both,
but it must start with an uppercase letter.

Looking at it with ghci, there's not much you can do with it yet:

ghci> :l data1.hs
[1 of 1] Compiling Main ( data1.hs, interpreted )
Ok, modules loaded: Main.
ghci> :t Foo
Foo :: Silly
ghci> Foo
<interactive>:1:0:
No instance for (Show Silly)
arising from use of `print' at <interactive>:1:0-2
Possible fix: add an instance declaration for (Show Silly)
In the expression: print it
In a 'do' expression: print it
ghci> Foo == Foo
<interactive>:1:0:
No instance for (Eq Silly)
arising from use of `==' at <interactive>:1:0-9
Possible fix: add an instance declaration for (Eq Silly)
In the expression: Foo == Foo
In the definition of `it': it = Foo == Foo
ghci> :l data1.hs
[1 of 1] Compiling Main ( data1.hs, interpreted )
Ok, modules loaded: Main.
ghci> :t Foo
Foo :: Silly
ghci> Foo
<interactive>:1:0:
No instance for (Show Silly)
arising from use of `print' at <interactive>:1:0-2
Possible fix: add an instance declaration for (Show Silly)
In the expression: print it
In a 'do' expression: print it
ghci> Foo == Foo
<interactive>:1:0:
No instance for (Eq Silly)
arising from use of `==' at <interactive>:1:0-9
Possible fix: add an instance declaration for (Eq Silly)
In the expression: Foo == Foo
In the definition of `it': it = Foo == Foo

Note that ghci doesn't know how to display Foo to
the screen, or how to compare it to itself. That's because we haven't
made our new type a member of the Show and Eq typeclasses. We'll
make a new example that is a member of these classes, and then we'll
get more information out of ghci. For more on typeclasses, refer to
Chapter 6, Using Typeclasses.

This defined one new type named Color. You can
create a value of type Color by using any of the
three literals Red, Green, or
Blue. There are several reasons this might be
useful as opposed to using something like a String to store simple
color names:

You are guaranteed that a Color
always represents one of these three values

When you do pattern matching on a
Color, the compiler will warn you if you don't
consider all three possible values

It is possible to hide the implemention of
Color from users when exporting symbols from the
module; you can include Color in the list but
not the three constructors
FIXME: need to go into detail about exporting types
somewhere and add a ref to it from here

That's pretty simple: a function that takes a Color
and converts it to
a String. Note that we don't have to deal with the case where the
input to the function is something other than our three colors, because
that can't possibly happen.

You use a type defined in this way all the time. Haskell 98 defines
this:

As you'll see in the rest of this chapter, many of the core Haskell
features you rely upon are actually defined in the prelude using
data. That is, they're not built into the compiler in any special
way. The only thing special about them is that they are loaded for you
by default since they're in the prelude.

FIXME: have we discussed prelude?

Defining Records

Custom types need not be defined solely in terms of static data. They
can also take parameters. Let's say that we wanted to take our
color example a new direction and let a user represent arbitrary colors
in the RGB (red, green, blue) colorspace. We could use an
(Int, Int, Int) tuple for this, but to illustrate
records, let's define a new type.

This defines a new type CustomColor and one type
constructor also named CustomColor. This type
constructor, however, requires three parameters, all Ints.

The type constructor here is a unique beast in Haskell. You can use it
as a function that takes three parameters, and it will return to you
one value ot type CustomColor. You can also use it
to perform pattern matching in function or case definitions.

Let's use ghci to inspect this for a bit:

You can see that CustomColor 100 0 50 returns a
single value of type CustomColor. Now you're
probably wondering how to extract the data from that
CustomColor. We use pattern matching to do that.
Here's how:

A ClockTime consists of two Integers. The first
is the number of whole seconds since midnight UTC on January 1, 1970.
The second is an additional number of picoseconds. Since an Integer
can be negative and is unbounded, a ClockTime can
effectively represent any moment in history down to the picosecond.

This is useful as it is, but when you have half a dozen or more
values to store in your record, it can get annoying to have to match
them all. There's where named fields come in, which we'll discuss
next.

Records with Named Fields

Our earlier example of a type that holds a color looked easy enough.
But you have to remember the ordering of the fields that are part of
the type. Also, you always have to pattern match all fields, even if
you're interested in only one. Of course, you could write a function
such as getRed to do that, but there's an easier
way.

In Haskell, records can have named fields. When you name the fields
in a record, you can still access it just as you would without the
named fields. But you gain two things: automatic functions for
picking out specific fields, plus an easier way to create and update
these objects.

Let's take a look at a re-designed CustomColor
type that uses named fields:

First, we inspected the type of the CustomColor
constructor. Note that it's type is exactly the same as the
constructor for the type
that didn't use named fields. Then, we created a object
with identical data three different ways. The first way didn't make
use of the named fields. The second and third ways did. Notice that
when you use named fields, you don't have to specify the values in
order.

Haskell automatically creates accessor functions for each of the
named fields. Let's look at how we extract the red component of our
color:

We can use that to write a modified color2string
function that accesses each named field without having to pattern
match each individual field. For reference, the original
color2string is included here as
color2string2.

With just three fields, the burden of pattern matching isn't all that
great. But what if you had a dozen or more? Named fields really
help out in that case. The standard Haskell library uses named
fields in System.Time.CalendarTime. Here's a
excerpt from its definition:

Named fields are a tremendous time saver here. If you have a
CalendarTime and want to extract just the year,
you can simply say ctYear ct rather than having to
match against 12 different fields in order.

Polymorphic Types

Sometimes you don't know in advance what the types of the data you
want to store will be. For instance, our
CustomColor type used Ints. But one could
certainly also represent color values as a Float or an Integer.
Haskell makes it possible to write generic code that
could work with any of those. Why not make the type generic as well?

Haskell has a well-known type that is defined just this way: Maybe.
It's defined in the prelude like this:

This type is often used when the result of a function could be an
error or some other invalid result (Nothing), or else a real value
(Just). For instance, if you are searching in a list for the first
result that matches search criteria, you will either find something
or you won't. If you do, you could get back the data wrapped in
Just, and if you don't, you could get back Nothing.

Maybe is a polymorphic type because its type depends on the type of
data that is encapsulated within it. You can, in fact, encapsulate
any type of data within it. Let's look at this
with ghci.

Notice here that the type of Nothing is Maybe a,
but the type of Just "hi" is Maybe
[Char]. The reason for this is Haskell's type inference.
Nothing doesn't actually encapsulate any data, so it -- by itself
-- is valid with any Just. You can also give it an explicit type
to force it to behave a certain way.

You can compare Nothing to a Just easily enough; that comparison
will of course always be false. The comparison between two Strings
wrapped in Just also works. But the type checker wisely gives an
error on the last attempt, because you're trying to compare a
String to a number.

You'll see Maybe used quite a bit throughout this book, and in
Haskell code in general.

A Final Word on Types

We've shown you a lot of different ways to create your own types.
Note that, by using multiple constructors, you can effectively
combine various approaches in a single type. For instance, you could
say data Foo a = Bar | Baz Int | Other a if you so
desire. This may be useful in certain situations.

Association Lists

Often times, we have to deal with data that is unordered but is indexed
by a key. For instance, a Unix administrator might have a list of
numeric UIDs and the textual usernames that they correspond to. The
utility of this list is being able to look up a textual username for a
given UID, not the order of the data. In otherwords, the UID is a key
into a database.

In Haskell, there are several ways to handle data that is structured in
this way. The two most common are association lists and the
Data.Map module. Association lists are handy
because they are simple. They are standard Haskell lists, so all the
functions that work on those lists work on association lists. However,
for large data sets, Data.Map will have a
considerable performance advantage over association lists. We'll
consider both in this chapter.

An association list is just a normal list containing (key, value)
tuples. The type of a list of mappings from UID to username might be
[(Integer, String)]. You could use just about any
type for both the key and the value.

You can built association lists just like you would build any other
list. Haskell comes with one built-in function called
Data.List.lookup to look up data in an association
list. Its type is Eq a => a -> [(a, b)] -> Maybe b.
Can you guess how it works from that type? Let's take a look in
ghci.

This function returns Nothing if passed the empty list. Otherwise,
it compares the key with the key we're looking for. If a match is
found, the corresponding value is returned. Otherwise, it searches
the rest of the list.

Let's take a look at a more complex example of association lists.
On Unix/Linux machines,
there is a file called /etc/passwd that stores
usernames, UIDs, home directories, and various other information.
Let's write a program that parses such a file, creates an association
list, and lets the user look up a username by giving a UID.

Let's look at this program. The heart of it is
findByUID, which is a simple function that parses
the input one line at a time, then calls lookup over
the result. The remaining program is concerned with parsing the input.
The input file looks like this:

The Data.Map module has some functions with same
names as those in Prelude or other common modules.
Therefore, when using it, most people import it using
import qualified Data.Map as Map and use
Map.function to refer to
functions in that module.
Let's start our look at Data.Map by
taking a look at some ways to build a map.

Functions like Map.insert work in the usual Haskell
way: they return a copy of the input data, with the requested change
applied. This is quite handy with maps. It means that you can use
foldl to build up a map as in the
mapFold example. Or, you can chain together
calls to Map.insert as in the
mapManual example. Let's use ghci to verify
that all of these maps are as expected:

Notice that the output from mapManual doesn't occur
in the order it was passed in. Maps do not guarantee that they will
preserve the original ordering.

Maps operate similar in concept to association lists. The
Data.Map module provides functions for adding and
removing data from maps. It also provides functions for converting
maps back and forth to association lists, filtering them, modifying
them, and folding them. The library documentation for this module is
good, so instead of going into detail on each function, we're going to
present an example that ties together much of the concepts we've
discussed in this chapter.

Extended Example: /etc/passwd

In order to illustrate the usage of a number of different data
structures together, we've prepared an extended example. This example
parses and stores entries from files in the format of
a typical /etc/passwd file.

This example maintains two maps: one from username to
PasswdEntry and another one from UID to
PasswdEntry. Database developers may find it
convenient to think of this has having two different indices into the
data to speed searching on different fields.

Take a look at the Show and Read instances for
PasswdEntry. There is already a standard format for
rendering data of this type as a string: the colon-separated version
already used by the system. So our Show function displays a
PasswdEntry in the format, and Read parses that
format.

[10] The built-in type
() carries no data, so there is little need to
define your own unless you are writing FFI interfaces to C types.