Translate Haskell into English Manually

Write a program in Haskell that translates C type declarations into English.

Welcome to Haskell

I'm going to work my way toward the complete program via a series of smaller
programs. Start by installing Hugs (www.haskell.org/hugs), which is a
friendly, easy-to-use interpreter for Haskell.

Let's start with the obligatory "Hello, World!":

main = do putStrLn "Hello, World!"

The advanced Haskell programmer may note that the
do is not strictly
required here. At this point, I'm going to "hand waive" and say that the
do is syntactically unnecessary (I'm not referring to style) in the
same
way that the braces are unnecessary in the following C snippet:

if (n > 5) {
n -= 3;
}

To run the program, simply execute the following at the command line:

$ runhugs hello_world.hs
Hello, World!

Translating the source into English, it's simply:

The value of main is to do:
Write "Hello, World!\n"

I like to think of Haskell programs as a bunch of mathematical equations
pretending to act like a programming language. After all, the following makes
sense to a mathematician, but similar code most definitely won't work in most
other languages:

a = 1
main = do print c
c = a + b
b = 2

No matter which order those four lines are arranged, it still works.
In most languages, "a = 1" is read "set a equal to 1", but in Haskell,
it's read as "a is defined to be 1". The order of the definitions don't
really matter most of the time.

By the way, print c is the same as
putStrLn (show c). show c
converts c into a string; In Java, you'd write
c.toString(). This
illustrates three points:

You don't use parentheses for function
calls in Haskell (although you may sometimes have to wrap a particular
argument in parentheses).

In general, what would be object.method()
in Java is written method object in Haskell.

Every call to print
implicitly calls show. Because Haskell is oriented toward functions,
the idiom is to pass an object as the first argument to a function that
acts like a method.

Before I get too carried away, let me show you one of the things I find most
fascinating about Haskell's type system. (Haskell's type system and this
feature in particular are shared with other
languages in the ML language family.) Haskell's compiler is
smart enough to let you write a swap function that takes two values of
any
type and return a tuple of the two values reversed.

swap x y = (y, x)
main = do print (swap 3 4)

This is possible because the types really don't matter here. However, it's
also smart enough to infer types if you don't explicitly state them and even
complain about type errors at compile time that you would think could be
caught only at runtime. For instance, the following is acceptable:

It's complaining that I passed a Char to something that requires a Num. Num is
what's called a type class, which is similar to an interface in Java. There
are many different types that "implement" the Num "interface", but Char is not
one of them.

Now, Pascal programmers might start feeling smug at this point until I remind
them that nowhere in the code did I mention x is an int or a Num or anything
like that. Similarly, fans of scripting languages with duck typing
(en.wikipedia.org/wiki/Duck_typing) might also start feeling smug at
this point until I remind them that this error was caught at compile
time. (Although Hugs is an interpreter, there is a traditional, native
compiler
for Haskell called the Glasgow Haskell Compiler
(www.haskell.org/ghc), and it could indeed detect this type error
at
compile time.)

Translating the State of the World

To remain purely functional, Haskell makes a strong distinction between
functions that do I/O, which is considered a side effect, and those that
don't, which is considered purely functional. In Java, all non-runtime
exceptions that are raised in a method must be declared in the signature.
(Whether or not checked exceptions are a good thing in Java is another
subject.) Similarly, in Haskell, all functions that do I/O must
make use of the IO monad, which impacts the signature of the function.
Hence, the question is, as a programmer, where do you draw the line
between functions that can do I/O and functions that live in the purely
functional world?

For most applications, actual I/O can be constrained to a very tiny
portion of the program. For instance, in cdecl, I/O can be limited to
the main function. The main function can read the C type declaration
from the user and later write the English output to the user. The rest
of the program can be devoted to the logic necessary to convert C type
declarations into English.

If you generalize this pattern, you might say that a small part of the program
reads and writes the "state of the world" (for instance, reading from STDIN,
writing to STDOUT, manipulating files, talking to network sockets and so
on),
whereas the rest of the program can be composed of functions that are purely
functional, that is, they take data and return data--that's it.

Comparing the situation to Star Trek, the purely functional part of
the program is Captain Kirk--you give him a question, and he'll give
you a decision. The part of the program that does I/O is the ensign
sitting at the controls, and for completeness, Haskell is the Enterprise!

If the "translator" at the heart of the Haskell program simply outputs
whatever it is given as input, you arrive at a stripped-down version of
the UNIX program cat (or rather, the DOS program type), which is a
nice place to start:

main = do s <- getContents
putStr s

Note that getContents is a function that returns a string that reads
from standard input as needed. Here, the program is taking all of
standard input and printing it back out. You would think that would
be hideously inefficient, but thanks to Haskell's lazy nature, Haskell
doesn't have to be finished reading input before it can start writing
output.

Now, ideally, it'd be nice to do something in the middle. cat is
interesting only for so long. Ideally, it'd be nice to
change the state of the
world. Consider:

makeCool is a function that takes an input string (which happens to
be all of standard input when it is called in main), translates it
(by appending " is cool!"), and then returns it (where it happens
to get printed to STDOUT). This is a suitable beginning for a "C to
English" compiler. Now, if only the makeCool function were a
lot
more intelligent!

Keeping an eye on the C version, let's start by translating the token
struct and token stack into Haskell. As I mentioned earlier, there are
no global variables in Haskell, so I'll have to use a new data type
to represent the state of the parser. I'll call it ParseContext.
This ParseContext can be passed to functions explicitly. Later,
I'll show how the State monad can be used to pass the ParseContext
to functions implicitly.

The data keyword creates a new data type in Haskell. Hence, TokenType
is a new type. The data keyword replaces C's enum, struct and union
in one shot. Here, it's acting as a cross between an enum and a union.
A value of the type TokenType might be an Identifier, a Qualifier,
a Type or a Symbol that wraps a specific Char (for instance,
Symbol '+').

deriving Show tells the compiler that it can automatically figure out a
suitable implementation for the show and read functions (conceptually,
toString() and fromString()). deriving Eq tells the compiler that
it can figure out a suitable implementation for ==. For instance,
it just makes sense that Symbol '+' == Symbol '+' should be True, but
Identifier == Qualifier should be False.

Token is a new type. A value of that type is constructed using the
constructor named Token. (Note that sometimes the type and the
constructor have the same name, but sometimes they don't--look at
TokenType above; It has four different constructors.) A Token
has a member tokenType of type TokenType and a member tokenValue
of type String. The :: should be read aloud as "has type" or "of type".
Notice that this data declaration is very much like a struct in C.

Let's start with makeCool :: ParseContext ->
ParseContext. This says
makeCool is a function that takes a ParseContext as an argument
and returns a ParseContext. Declaring the signature for a function
is usually optional in Haskell, so I could have left this line out.
However, it often aids readability. Furthermore, it ensures that you
and the compiler agree about what's going on.

ParseContext {input = s} can be read as "I take a 'ParseContext'
as input, and I'm going to call the value of the 'input' field 's'."
The ParseContext that it returns has "" for the "input" field, but
's ++ " is cool!"' for the "output" field, which is really the core of
the earlier makeCool function.

The main function has changed by the addition of a let statement.
A let statement is a way to create a sub-equation that's usable
by the lines nested within the "in" clause. Here a new ParseContext
is being created named ctx. The "input" is "s", and "output" is "".

The next line, putStrLn $ output $ makeCool ctx, may cause the reader
to have Perl flashbacks, but here, $ is actually used for readability.
(That's a joke. Perl programmers should avoid the urge to send me hate
mail at this point!)

I think it's helpful to read the last line backwards: "Pass 'ctx' to
the 'makeCool' function. Then pass the result to the 'output' function.
Then pass the result to the 'putStrLn' function."

Last of all, note that the makeCool function returns a ParseContext
and the output function is how to get the output field from the
ParseContext. Remember, what would be ctx.output in Java is written
output ctx in Haskell.

The $ leads me to my next point. If you look at
putStrLn $ output $ makeCool ctx, it's sort of a pipeline that runs from
right to left. If you look at the makeCool function, it's a function that
takes a ParseContext and returns a ParseContext. Whether or not the $
construct is used isn't all that important, but it leads me to the point that
the main structure of the program can simply be:

A main function that reads input from the outside world and writes
output to the outside world.

A ton of functions that transform ParseContext objects, all tied
together into a pipeline.

Hence, if the C version is written as:

a();
b();
c();

Where a, b and c each might modify some global state and print
output to the user, in Haskell, one might write:

The output function isn't defined in the Haskell source code given. If it's operating on your ParseContext type, presumably you'd have to write it yourself? I know it'd be a pretty simple function, but I'm just checking that you do have to write it yourself, and that's it's not generated automatically via some form of Haskell magic?

Ugh, looking back, I got confused. When you said "output" function, you were literally talking about the function named "output", whereas I instantly jumped to the conclusion that you were talking about the function used to do output. Sorry for the confusion.

So any type deriving from the Show class becomes compatible with the 'output' function which expects instances of the Show class? Or is it that the Show class allows each field of a type to be referred to by field name?

Using the following as a minimal example:
data ParseContext = ParseContext {
input :: String, -- The input that has not been parsed yet.
output :: String -- The output generated so far.
} deriving Show

But changing the name of the output field to outputB gives an error when using the function 'output', resolved by using a corresponding function name of 'outputB'. So is it that field names become functions, presumably with the type ParseContext -> String ?

Incidentally, I tried dropping off the 'deriving Show' declaration and accessing fields by name still works, perhaps because String instances already derive Show?

I know I'm probably belabouring the obvious here, I'm just trying to pin the origin of specific functionality down, which has been my biggest bugbear with learning Haskell to date. Looking forward to part 2!

Haskell automatically creates functions "input" and "output" of type "ParseContext -> String" as you suggested. This comes for free and has nothing to do with "deriving Show". As another commenter commented, what would be "ctx.output" in Java is "output ctx" in Haskell.

What "deriving Show" gives you is the ability to call "show ctx" which has type "something that is showable -> String". Imagine an interface in Java that has one method, "toString()". "deriving show" means that not only should ParseContext implement that interface, but the compiler should automatically figure out a reasonable implementation for what in Java would be the "toString()" method. What Java calls an interface, Haskell calls a type class (which is a horribly confusing re-use of the word class). ParseContext is a member of the Show type class. This is documented here.

My hope is that if you take your time studying my article, you won't have as hard a time reading other Haskell tutorials as I did ;)

By the way, I just asked the editor to hurry up and publish the second half ;)

That's my understanding too I think! It's funny, I'd implicitly associated field access in that way with OO programming, but I guess there's no reason it shouldn't apply to functional programming too. I'd gotten out of the habit of equating objects with types but it looks like I took it too far.

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.