Solipsistic philosophers

Of course, a good optimizing compiler will replace your solipsistic philosopher with a no-op.

A good optimising compiler, or any Haskell compiler :-) Since results that are never required are not computed in Haskell, due to laziness, we can write high performance solipsism simulators all day long:

main=doletlargestNumber=last[1..]return()

Running our simulation of the philosopher-mathematician pondering some thoughts on large numbers:

Getting to work

Yesterday we implemented a few toy unix programs, including ‘cat’. Today we’ll look at writing a complete cat program, but with a focus on interacting properly with the environment and being careful about command line handling. For our running examples, we’ll consider the ‘cat’ and ‘tac’ programs. The basic spec for ‘cat’ is:

The cat utility reads files sequentially, writing them to the standard output. The file operands are processed in command-line order. If file is a single dash (`-‘) or absent, cat reads from the standard input.

It’s the ‘id’ function of the unix shell. BSD ‘cat.c’ is a 255 line C program. From the man page we can see it does more than just concatenate files. It can also:

Numbers the output lines, starting at 1.

Squeezes multiple adjacent empty lines

Displays non-printing characters so they are visible.

Let’s start by looking at the command line argument processing code.

Getting in arguments

The basic way to get arguments in a Haskell program is provided by the System.Environment library. We can use the getArgs function:

This program concatenates and prints the contents of files in reverse (or reads from stdin with no arguments), along with a couple of basic command line flags for version and help strings. It’s also reasonably careful about setting exit status on finishing, using the functions from System.Exit. The actual core algorithm for ‘tac’ is a nice pure Haskell function, and really all the hard work is done processing the command line args.

That is, the lookup function takes some key, ‘k’, and a Map from keys to elements of type ‘a’, and returns an element, if found, in some monad.

More on failure

You may recall from the first tutorial that the Map ‘lookup’ function will fail if the key is not found. The particular way you wish it to fail depends on which monad you use. You can tell this from the type of lookup. The

lookup::(Monadm)=>...ma

syntax indicates that lookup is polymorphic in its monad: it will work for any monad type, and its behaviour is determined by the particular instance of the monad interface you ask for. When a lookup fails, it calls the ‘fail’ function for the monad you’re using. When a lookup is successful, it calls the ‘return’ function of the same monad. Being ‘polymorphic in a monad’ really just means that it will call which particular concrete monad ‘subclass’ you happen to be using.

Looking at the various useful monads for this, we can choose which failure behaviour we would prefer. Here’s the implementation of the ‘fail’ interface for a variety of monads. It’s up to you to pick which behaviour you’d like.

For Maybes, we get the null value, Nothing, on failure:

instanceMonadMaybewherereturn=Justfail_=Nothing

For Eithers, we get an error string:

instance(Errore)=>Monad(Eithere)wherereturn=Rightfails=Left(strMsgs)

For lists, we get the empty list on failure:

instanceMonad[]wherereturnx=[x]fail_=[]

And for IO we get an exception thrown:

instanceMonadIOwherefail=ioError.userErrorreturn=returnIO

So, depending on the type signature, the compiler will statically pick one of these ‘fail’s to use on a lookup failing at runtime. For example, to fail with a null value, we’d use the Maybe monad:

The ‘data’ keyword defines a new data type, ‘Flag’, which can have one of several values. Such a type is often called a sum (or union) type. So ‘Flag’ is a new user-defined type, just like other types, such as Bool or Int. The identifiers on the right hand side of the | are the types constructors. That is, values which have type ‘Flag’. We ask the compiler to also derive some instances of various common classes for us (so we don’t have to write the code ourselves).

With just this we can already start playing around with the flag data type in GHCi:

User defined data types are really first class citizens in Haskell, and behave just like the ‘inbuilt’ types.

Binding to command line flags

The next step is to associate some particular command line strings with each abstract flag. We do this by writing a list of ‘Option’s, which tie long and short argument flags to the particular abstract Flag value we need, and also associated a help string with each flag:

flags=[Option['b'][](NoArgBlanks)"Implies the -n option but doesn't count blank lines.",Option['e'][](NoArgDollar)"Implies the -v option and also prints a dollar sign (`$') at the end of each line.",Option['n'][](NoArgNumber)"Number the output lines, starting at 1.",Option['s'][](NoArgSqueeze)"Squeeze multiple adjacent empty lines, causing the output to be single spaced.",Option['t'][](NoArgTabs)"Implies the -v option and also prints tab characters as `^I'.",Option['u'][](NoArgUnbuffered)"The output is guaranteed to be unbuffered (see setbuf(3)).",Option['v'][](NoArgInvisible)"Displays non-printing characters so they are visible.",Option[]["help"](NoArgHelp)"Print this help message"]

Parsing the flags

To actually turn the list of command line flags getArgs gives us, into a useful list of abstract Flag values, we use the ‘getOpt’ function, which returns a triple consisting of flags that were set, a list of any non-flag arguments, and a list of error messages. First we need a couple of libraries:

Where ‘cat’ will process the files one at a time. ‘cat’ is where all the hard work is done.

Most of the operations the cat program does requires access to each line of the file. We also need to be able to handle the special file name, “-“. What we’d like to do is separate out any IO, from operations on each file’s content. To do this we’ll write a higher order function, ‘withFile’, which takes a filename, opens it, splits it into lines and applies a function to the contents of the file, before writing the result to stdout:

Now we can implement the pure ‘cat’ function, implementing the cat program’s functionality. Firstly, if there are no command line flags, the ‘cat’ function does nothing to the input:

cat[]f=withFilefid

That is, it applies the ‘id’ function to the stream generated by withFile. That was easy.

Now, if there are some arguments, we’ll need to process them. This can be a little tricky, since the effect of the command line flags are cumulative, and we better process them in the right order. What is that order? Well, from experimentation :-) it seems that (if all flags are enabled) ‘cat’ proceed to:

first squeeze any blank lines;

then any visibility flags are processed;

then line numbering occurs;

then, finally, any visible newlines are printed as ‘$’.

The visibility flags transform non-printing characters into a visible representation. The key to coding this up is recognising that its just a functional pipeline. So we can write it as:

catasf=withFilef(newline.number.visibleas)

Where ‘visible’ renders any non-printing chars. Then we number the resulting lines (if the arguments are set), and then finally make any remaining newlines visible. Note that the core of the algorithm does no IO. It’s a pure function from [String] -> [String]. Now the implementation of ‘number':

Here we actually handle all the data traversal. And use a little helper function, ‘ifset’, to conditionally execute a function if the corresponding command line is set. Note that slight trickiness involving numbering: either we number all lines, or number the non blank lines, but not both.

And we’re done! In the end, our entire implementation is some 89 lines of code, of which 60 are to do with importing modules, or command line argument parsing. The actual heart of the program is fairly tiny in the end.

Summary

Well, in the end I didn’t get on to exception handling, or the use of bytestring to improve performance further. However, we have implemented (95%) of the unix ‘cat’ program, including all argument handling and functionality, in about an hour and a half.

Once it typechecked, the code just worked, except for one bug where I originally rendered newline before counting lines (simply because the spec was underspecified). Lesson: you can start writing your unix scripts in Haskell right now. They’ll be flexible, clean, and easy to maintain. And most of all, fun to write!

Hopefully next time we’ll look into using bytestrings for processing larger volumes of data, and the use of exception handling to deal with unusual errors.

The complete source

And just for reference, there’s the complete source:

importSystem.Console.GetOptimportSystem.IOimportSystem.ExitimportSystem.EnvironmentimportData.ListimportData.CharimportControl.MonadimportText.Printfmain=do(args,files)<-getArgs>>=parsewhen(Unbuffered`elem`args)$hSetBufferingstdoutNoBufferingmapM_(catargs)fileswithFilesf=putStr.unlines.f.lines=<<openswhereopenf=iff=="-"thengetContentselsereadFilefcat[]f=withFilefidcatasf=withFilef(newline.number.visibleas)wherenumbers=ifBlanks`elem`asthennumberSomeselseifsetNumbernumberAllsnewlines=ifsetDollar(map(++"$"))svisibleass=foldl'(fliprender)sasifsetaf=ifa`elem`asthenfelseidrenderSqueeze=maphead.grouprenderTabs=map$concatMap(c->ifc=='t'then"^I"else[c])renderInvisible=map$concatMapvisiblewherevisiblec|c=='t'||isPrintc=[c]|otherwise=init.tail.show$crender_=idnumberLine=printf"%6d %s"numberAlls=zipWithnumberLine[(1::Integer)..]snumberSomes=reverse.snd$foldl'draw(1,[])swheredraw(n,acc)s|allisSpaces=(n,s:acc)|otherwise=(n+1,numberLinens:acc)dataFlag=Blanks-- -b|Dollar-- -e |Squeeze-- -s|Tabs-- -t|Unbuffered-- -u|Invisible-- -v|Number-- -n|Help-- --helpderiving(Eq,Ord,Enum,Show,Bounded)flags=[Option['b'][](NoArgBlanks)"Implies the -n option but doesn't count blank lines.",Option['e'][](NoArgDollar)"Implies the -v option and also prints a dollar sign (`$') at the end of each line.",Option['n'][](NoArgNumber)"Number the output lines, starting at 1.",Option['s'][](NoArgSqueeze)"Squeeze multiple adjacent empty lines, causing the output to be single spaced.",Option['t'][](NoArgTabs)"Implies the -v option and also prints tab characters as `^I'.",Option['u'][](NoArgUnbuffered)"The output is guaranteed to be unbuffered (see setbuf(3)).",Option['v'][](NoArgInvisible)"Displays non-printing characters so they are visible.",Option[]["help"](NoArgHelp)"Print this help message"]parseargv=casegetOptPermuteflagsargvof(args,fs,[])->doletfiles=ifnullfsthen["-"]elsefsifHelp`elem`argsthendohPutStrLnstderr(usageInfoheaderflags)exitWithExitSuccesselsereturn(nub(concatMapsetargs),files)(_,_,errs)->dohPutStrLnstderr(concaterrs++usageInfoheaderflags)exitWith(ExitFailure1)whereheader="Usage: cat [-benstuv] [file ...]"setDollar=[Dollar,Invisible]setTabs=[Tabs,Invisible]setf=[f]