Sunday, December 19, 2010

I've just released a new version of Hoogle to both Hackage and to haskell.org/hoogle. Hoogle is a Haskell search engine that allows you to search for functions by either name or approximate type signature.

What's New

The first release in over a year, building with up to date packages and compilers.

I'm still not sure what should be searched by default, and which collections of modules should be available, but I'm open to suggestions.

Installing Hoogle Locally

Many of the improvements to Hoogle are of specific benefit when installing Hoogle yourself, not using the web version. To install Hoogle:

cabal updatecabal install hooglehoogle data

The last step will download information and generate databases for the Haskell Platform. You can then run searches, such as hoogle filter -n10. Hoogle now uses cmdargs, so hoogle --help will detail some of the options available.

You can also run Hoogle as a web server by typing hoogle server. Now visit localhost in a web browser and you'll have the power of Hoogle on your computer. If you often work offline, you can run hoogle data --local and hoogle server --local to use documentation on your local machine where available.

What Now?

Hoogle is currently the spare time project I'm focusing on - there are lots of improvements I am intending to make. Hoogle 4.1 is about getting the code up to a standard that can be easily maintained, allowing future versions to deliver more features. Please try out Hoogle, report any bugs you find, and let me know your thoughts.

Sunday, December 12, 2010

The network library used to be bundled with GHC. Unfortunately it was unbundled from the standard installer (in my opinion, a mistake), and people were required to either install it using Cabal (rather tricky) or switch to using the Haskell Platform (which is not suitable for power developers who want to test with newer/older GHC versions, or those who want to upgrade/downgrade network). This post describes how to install the library on Windows using Cabal.

This configures network to use mingw32, and sets up the PATH so that mingw binaries from GHC are first, and get configured in. You can then use the network library as normal, without ever using Cygwin again. I have successfully executed this command on GHC 6.12.3 and 7.0.1.

Dependencies on foreign libraries are a problem for any cross platform language. Several years ago, installing Haskell libraries was a painful process - but Hackage and Cabal have made it impressively easy. The only remaining packages that are complex to install are those that bind to foreign libraries, most of which use configure scripts, which are not well suited to Windows. I hope over time even libraries with foreign dependencies will become easy to install.

Thursday, October 14, 2010

Summary: I wrote a little script on top of cabal sdist that checks for common mistakes.

Cabal is the standard way to distribute Haskell packages, and has significantly simplified the process of installing libraries. However, I often make mistakes when creating cabal distributions using sdist. One common problem is that if you don't include a file in your .cabal file it will not be put in the distribution package and the package will fail to build for other people, but if the file is available locally it will still work for you. To work around this problem, I often perform a cabal install immediately after I upload a cabal package. Last week I discovered a package that I'd requested to be uploaded had suffered the same fate (hws – from the fantastic Haskell Workshop paper on writing a scalable web server in Haskell), showing this mistake isn't specific to me.

Eric Kow often says that programmers spend too little time writing tools to help their workflow. In response to this observation I have created the "neil" tool, available in darcs from http://community.haskell.org/~ndm/darcs/neil - darcs get and then cabal install. One feature of this neil tool is neil sdist. The actions neil sdist performs are:

cabal check

cabal configure to a temporary directory

cabal sdist to a temporary directory

change to that temporary directory

cabal configure –Werror –fwarn-unused-imports

cabal build

cabal haddock –executables

The intention is that this sequence of actions will fail if anything bad would happen on a users machine. If the file fails to build then cabal build will fail. If the haddock fails to generate than cabal haddock will fail. If there would be warnings during build then –Werror will cause it to fail. If all these checks pass then the package is likely to install without problems (once the cabal test feature is present then that can be included, which will give even more assurance).

I have deliberately not uploaded the neil tool to Hackage, and have no intention of doing so. The name "neil" is not suitable for Hackage, and I intend to add specific actions to help my workflow. If anyone wants to take any of the actions included in this tool and break them out in to separate tools, roll them back into cabal itself etc, they are very welcome.

Taking Eric's advice, I have found that by automating aspects of my workflow I can take steps which were error prone, or verbose, and make them concise and accurate. With neil sdist I have eliminated an entire class of potential errors, with very little ongoing work.

Over the last few hours I've been going through my inbox, trying to deal with some of my older emails. In that process, I've had to admit defeat on three GHC bugs that I'd left in my inbox to come back to. All these bugs relate to changes to the Haskell standard libraries, that were opened as bugs, and that got resolved as closed/wontfix. I will never get time to tackle these bugs, but perhaps someone will? The bugs are:

Currently the recognised way to test at runtime if your application is being run on Windows is:

import System.Info

.... = os == "mingw"

This is wrong for many reasons:

The return result of os is not an operating system, but a ported toolchain.

The result "mingw" does not imply that MinGW is installed on the computer.

String comparisons are unsafe and unchecked, a simple typo breaks this code.

In GHC this comparison will take place at runtime, even though the result is a constant.

The Haskell abstractions and command line tools for almost all non-Windows operating systems have converged to the point where most programs with operating system specific behaviour have two cases - one for Windows and one for everything else. It makes sense to directly support what is probably the most common usage of the os function, and to encourage people away from the C preprocessor where possible.

I've personally defined this function in binarydefer, catch, derive, hlint, hoogle, my website generator and yhc. There's even a copy in GHC. If a function has been defined identically that many times, it clear deserves to be in the standard library. We have mapM, filterM, zipWithM, but concatMapM is suspiciously absent.

The QSem module defines a quantity semaphore, where the quantity semaphores must be natural numbers. Attempts to construct semaphores with negative numbers raise an error. There is, however, a perfectly sensible and obvious interpretation if negative numbers are allowed. It is a shame that this module could provide total functions, which never raise an error, but does not. In addition, for some problems the use of negative quantity semaphores is more natural.

What Now?

I've removed all these bugs from my inbox, and invite someone else to take up the cause - I just don't have the time. Until these issues are resolved, I will test for Windows in a horrible way, define concatMapM whenever I start a new project, and lament the lack of generality in QSem. None of the issues is particularly serious, but all are slightly annoying.

Email etiquette: Today I've cleared about 50 emails from my inbox. Unfortunately my inbox remains big and unwieldy. If you ever email me, and I don't get back to you, email me again a week later. As long as you reply to the first message, Gmail will collapse the reply in to the original conversation, and there won't be any additional load on my inbox - it will just remind me that I should have dealt with your email. I apologise for any emails that have fallen through the cracks.

Monday, August 23, 2010

Summary: A simple CmdArgs parser is incredibly simple (just a data type). A more advanced CmdArgs parser is still pretty simple (a few annotations). People shouldn't be using getArgs even for quick one-off programs.

A few days ago I went to the Haskell Hoodlums meeting - an event in London aimed towards people learning Haskell. The event was good fun, and very well organised - professional quality Haskell tutition for free by friendly people - the Haskell community really is one of Haskell's strengths! The final exercise was to write a program that picks a random number between 1 and 100, then has the user take guesses, with higher/lower hints. After writing the program, Ganesh suggested adding command line flags to control the minimum/maximum numbers. It's not too hard to do this directly with getArgs:

import System.Environment

main = do [min,max] <- getArgs print (read min, read max)

Next we discussed adding an optional limit on the number of guesses the user is allowed. It's certainly possible to extend the getArgs variant to take in a limit, but things are starting to get a bit ugly. If the user enters the wrong number of arguments they get a pattern match error. There is no help message to inform the user which flags the program takes. While getArgs is simple to start with, it doesn't have much flexibility, and handles errors very poorly. However, for years I used getArgs for all one-off programs - I found the other command line parsing libraries (including GetOpt) added too much overhead, and always required referring back to the documentation. To solve this problem I wrote CmdArgs.

A Simple CmdArgs Parser

To start using CmdArgs we first define a record to capture the information we want from the command line:

For our number guessing program we need a minimum, a maximum, and an optional limit. The deriving clause is required to operate with the CmdArgs library, and provides some basic reflection capabilities for this data type. Once we've written this data type, a CmdArgs parser is only one function call away:

Our simple CmdArgs parser is probably sufficient for this task. I doubt anyone will be persuaded to use my guessing program without a fancy iPhone interface. However, CmdArgs provides all the power necessary to customise the parser, by adding annotations to the input value. First, we can modify the parser to make it easier to add our annotations:

guess = cmdArgsMode $ Guess {min = 1, max = 100, limit = Nothing}

main = do x <- cmdArgsRun guess print x

We have changed Guess to use record syntax for constructing the values, which helps document what we are doing. We've also switched to using cmdArgsMode/cmdArgsRun (cmdArgs which is just a composition of those two functions) - this helps avoid any problems with capturing the annotations when running repeatedly in GHCi. Now we can add annotations to the guess value:

Here we've specified that min/max must be at argument position 0/1, which more closely matches the original getArgs parser - this means the user is always forced to enter a min/max (they could be made optional with the opt annotation). For the limit we've added a name annotation to say that we'd like the flag -n to map to limit, instead of using the default -l. We've also given limit some help text, which will be displayed with --help. Finally, we've given a different summary line to the program.

Monday, August 16, 2010

I've just released CmdArgs 0.2 to Hackage. CmdArgs is a library for defining and parsing command lines. The focus of CmdArgs is allowing the concise definition of fully-featured command line argument processors, in a mainly declarative manner (i.e. little coding needed). CmdArgs also supports multiple mode programs, as seen in darcs and Cabal. For some examples of CmdArgs, please see the manual or the Haddock documentation.

For the last month I've been working on a complete rewrite of CmdArgs. The original version of CmdArgs did what I was hoping it would - it was concise and easy to use. However, CmdArgs 0.1 had lots of rough edges - some features didn't work together, there were lots of restrictions on which types of fields appear where, and it was hard to add new features. CmdArgs 0.2 is a ground up rewrite which is designed to make it easy to maintain and improve in the future.

The CmdArgs 0.2 API is incompatible with CmdArgs 0.1, for which I apologise. Some of the changes you will notice:

Several of the annotations have changed name (text has become help, empty has become opt).

Instead of cmdArgs "Summary" [mode1], you write cmdArgs (mode1 &= summary "Summary"). If you need a multi mode program use the modes function.

All the basic principles have remained the same, all the old features have been retained, and translating parsers should be simple local tweaks. If you need any help please contact me.

Explicit Command Line Processor

The biggest change is the introduction of an explicit command line framework in System.Console.CmdArgs.Explicit. CmdArgs 0.1 is an implicit command line parser - you define a value with annotations from which a command line parser is inferred. Unfortunately there was not complete separation between determining what parser the user was hoping to define, and then executing it. The result was that even at the flag processing stage there were still complex decisions being made based on type. CmdArgs 0.2 has a fully featured explicit parser that can be used separately, which can process command line flags and display help messages. Now the implicit parser first translates to the explicit parser (capturing the users intentions), then executes it. The advantages of having an explicit parser are substantial.

It is possible to translate the implicit parser to an explicit parser (cmdArgsMode), which makes testing substantially easier. As a result CmdArgs 0.2 has far more tests.

I was able to write the CmdArgs command line itself in CmdArgs. This command line is a multiple mode explicit parser, which has many sub modes defined by implicit parsers.

The use of explicit parsers also alleviates one of the biggest worries of users, the impurity. Only the implicit parser relies on impure functions to extract annotations. In particular, you can create the explicit parser once, then rerun it multiple times, without worrying that GHC will optimise away the annotations.

Once you have an explicit parser, you can modify it afterwards - for example programmatically adding some flags.

The explicit parser better separates the internals of the program, making each stage simpler.

The introduction of the explicit parser was a great idea, and has dramatically improved CmdArgs. However, there are still some loose ends. The translation from implicit to explicit is a multi-stage translation, where each stage infers some additional information. While this process works, it is not very direct - the semantics of an annotation are hard to determine, and there are ordering constrains on these stages. It would be much better if I could concisely and directly express the semantics of the annotations, which is something I will be thinking about for CmdArgs 0.3.

GetOpt Compatibility

While the intention is that CmdArgs users write implicit parsers, it is not required. It is perfectly possible to write explicit parsers directly, and benefit from the argument processing and help text generation. Alternatively, it is possible to define your own command line framework, which is then translated in to an explicit parser. CmdArgs 0.2 now includes a GetOpt translator, which presents an API compatible with GetOpt, but operates by translating to an explicit parser. I hope that other people writing command line frameworks will consider reusing the explicit parser.

Capturing Annotations

As part of the enhanced separation of CmdArgs, I have extracted all the code that captures annotations, and made it more robust. The essential operation is now &= which takes a value and attaches an annotation. (x &= a) &= b can then be used to attach the two annotations a and b (the brackets are optional). Previously the capturing of annotations and processing of flags was interspersed, meaning the entire library was impure - now all impure operations are performed inside capture. The capturing framework is not fully generic (that would require module functors, to parameterise over the type of annotation), but otherwise is entirely separate.

Consistentency

One of the biggest changes for users should be the increase in consistency. In CmdArgs 0.1 certain annotations were bound to certain types - the args annotation had to be on a [String] field, the argPos annotation had to be on String. In the new version any field can be a list or a maybe, of any atomic type - including Bool, Int/Integer, Double/Float and tuples of atomic types. With this support it's trivial to write:

data Sample = Sample {size :: Maybe (Int,Int)}

Now you can run:

$ sampleSample {size = Nothing}

$ sample --size=1,2Sample {size = Just (1,2)}

$ sample --size=1,2 --size=3,4Sample {size = Just (3,4)}

Hopefully this increase in consistency will make CmdArgs much more predictable for users.

Help Output

The one thing I'm still wrestling with is the help output. While the help output is reasonably straightforward for single mode programs, I am still undecided how multiple mode programs should be displayed. Currently CmdArgs displays something I think is acceptable, but not optimal. I am happy to take suggestions for how to improve the help output.

Conclusion

CmdArgs 0.1 was an experiment - is it possible to define concise command line argument processors? The result was a working library, but whose potential for enhancement was limited by a lack of internal separation. CmdArgs 0.2 is a complete rewrite, designed to put the ideas from CmdArgs 0.2 into practical use, and allow lots of scope for further improvements. I hope CmdArgs will become the standard choice for most command line processing tasks.

Saturday, July 10, 2010

I have written a paper building on my Supero work that has been accepted to ICFP 2010. Various people have asked for copies of the paper, so I have put the current version online. This version will be removed in about three weeks and replaced with the final version, although the linked copy has nearly all the changes suggested by the reviewers. The abstract is:

Supercompilation is a program optimisation technique that is particularly effective at eliminating unnecessary overheads. We have designed a new supercompiler, making many novel choices, including different termination criteria and handling of let bindings. The result is a supercompiler that focuses on simplicity, compiles programs quickly and optimises programs well. We have benchmarked our supercompiler, with some programs running more than twice as fast than when compiled with GHC.

I've also uploaded a corresponding package to Hackage, but that code should be considered an early version - I intend to revise it before ICFP, to make it easier to run (currently I change the options by editing the source code) and include all the benchmarks. I don't recommend downloading or using the current version, and won't be supporting it in any way, but it's there for completeness.

In the near future I will be posting more general discussions about supercompilation, and about the work covered in this paper. In the meantime, if you find any mistakes in the paper, please mention them in the comments!

PS. Currently community.haskell.org is down, but I have uploaded the paper there anyway. It should be present when the site recovers.

Sunday, April 25, 2010

The Uniplate library contains many traversal operations, some based on functions in SYB (Scrap Your Boilerplate). SYB provides everywhere for bottom-up traversals and everywhere' for top-down traversals. Uniplate provides transform for bottom-up traversals, but has no operation similar to everywhere'. This article explains why I didn't include a transform' operation, and why I believe that most uses of everywhere' are probably incorrect.

The transform operation applies a function to every node in a tree, starting at the bottom and working upwards. To give a simple example:

data Exp = Lit Int | Add Exp Exp | Mul Exp Exp

f (Mul (Lit 1) x) = xf (Mul (Lit 3) x) = Add x (Add x x)f x = x

Calling transform f on the input 3 * (1 * 2) gives 2 + (2 + 2). We can write transform in terms of the Uniplate operation descend, which applies a function to every child node:

transform f x = f (descend (transform f) x)

On every application of f the argument always consists of a root node which has not yet been processed, along with child nodes that have been processed. My thesis explains how we can guarantee a transform reaches a fixed point, by calling f again before every constructor on the RHS of any clause:

Now transform f is guaranteed to reach a fixed point. The transform operation is predictable, and naturally defines bottom-up transformations matching the users intention. Unfortunately, the ordering and predictability of transform' is significantly more subtle. We can easily define transform':

transform' f x = descend (transform' f) (f x)

Here the transformation is applied to every child of the result of f. With transform every node is processed exactly once, but with transform' some nodes are processed multiple times, and some are not processed at all. The first clause of f, which returns x, does not result in the root of x being processed. Similarly, our second cause returns two levels of constructor, causing the inner Add to be both generated and then processed.

When people look at transform' the intuitive feeling tends to be that all the variables on the RHS will be processed (i.e. x), which in many cases mostly matches the behaviour of transform'. Being mostly correct means that many tests work, but certain cases fail - with our function f, the first example works, but 1 * (1 * 1) results in 1 * 1. The original version of Uniplate contained transform', and I spent an entire day tracking down a bug whose cause turned out to be a function whose RHS was just a variable, much like the first clause of f.

Before describing the solution to top-down transformations, it's interesting to first explore where top-down transformations are necessary. I have identified two cases, but there many be more. Firstly, top-down transformations are useful when one LHS is contained within another LHS, and you wish to favour the larger LHS. For example:

f (Mul (Add _ _) _) = ...f (Add _ _) = Mul ...

Here the second LHS is contained within the first. If we perform a bottom-up transformation then the inner Add expression will be transformed to Mul, and the first clause will never match. Changing to a top-down transformation allows the larger rule to match first.

Secondly, top-down transformations are useful when some information about ancestors is accumulated as you proceed downwards. The typical example is a language with let expressions which builds up a set of bindings as it proceeds downwards, these bindings then affect which transformations are made. Sadly, transform' cannot express such a transformation.

The solution to both problems is to use the descend operation, and explicitly control the recursive step. We can rewrite the original example using descend:

Here we explicitly call f to continue the transformation. The intuition that all variables are transformed is now stated explicitly. In this particular example we can also common-up the three subexpressions f x in the second clause, giving a more efficient transformation. If needed, we can add an extra argument to f to pass down information from the ancestors.

After experience with Uniplate I decided that using transform'/everywhere' correctly was difficult. I looked at all my uses of transform' and found that a few of them had subtle bugs, and in most cases using transform would have done exactly the same job. I looked at all the code on Hackage (several years ago) and found only six uses of everywhere', all of which could be replaced with everywhere without changing the meaning. I consider top-down transformations in the style of everywhere' to be dangerous, and strongly recommend using operations like descend instead.

Upcoming Workshops

Are you interested in either generics programming or Haskell? Are you going to ICFP 2010? Why not:

Submit a paper to the Workshop on Generic Programming - Generic programming is about making programs more adaptable by making them more general. This workshop brings together leading researchers and practitioners in generic programming from around the world, and features papers capturing the state of the art in this important area.

Wednesday, April 07, 2010

Haskell has long been my favoured scripting language, and in this post I thought I'd share one of my more IO heavy scripts. I have an external hard drive, that due to regular dropping, is somewhat unreliable. I have a 1Gb file on this drive, which I'd like to copy, but is partly corrupted. I'd like to copy as much as I can.

In the past I've used JFileRecovery, which I thoroughly recommend. The basic algorithm is that it copies the file in chunks, and if a chunk copy exceeds a timeout it is discarded. It has a nice graphical interface, and some basic control over timeout and block sizes. Unfortunately, JFileRecovery didn't work for this file - it has three basic problems:

The timeout sometimes fails to stop the IO, causing the program to hang.

If the copy takes too long, it sometimes gives up before the end of the file.

If the block size is too small it takes forever, if it is too large it drops large parts of the file.

To recover my file I needed something better, so wrote a quick script in Haskell. The basic algorithm is to copy the the file in 10Mb chunks. If any chunk fails to copy, I split the chunk and retry it after all other pending chunks. The result is that the file is complete after the first pass, but the program then goes back and recovers more information where it can. I can terminate the program at any point with a working file, but waiting longer will probably recover more of the file.

I have included the script at the bottom of this post. I ran this script from GHCi, but am not going to turn it in to a proper program. If someone does wish to build on this script please do so (I hereby place this code in the public domain, or if that is not possible then under the ∀ n . BSD n licenses).

The script took about 15 minutes to write, and makes use of exceptions and file handles - not the kind of program traditionally associated with Haskell. A lot of hard work has been spent polishing the GHC runtime, and the Haskell libraries (bytestring, exceptions). Now this work has been done, slotting together reasonably complex scripts is simple.

Saturday, January 23, 2010

HLint is a tool for suggesting improvements to Haskell code. Recently I've put some effort in to optimisation and HLint is now over 20 times faster. The standard benchmark (running HLint over the source code of HLint) has gone from 30 seconds to just over 1 second. This blog post is the story of that optimisation, the dead ends I encountered, and the steps I took. I've deliberately included reasonable chunks of code in this post, so interested readers can see the whole story - less technical readers should feel free to skip them. The results of the optimisation are all available on Hackage, as new versions of hlint, uniplate and derive.

Before I start, I'd like to share my five guiding principles of optimisation:

Tests - make sure you have tests, so you don't break anything while optimising.

Profile - if you don't profile first, you are almost certainly optimising the wrong thing.

Necessity - only start the optimisation process if something is running too slowly.

Benchmark - use a sensible and representative test case to benchmark, to make sure you optimise the right thing.

Optimising - to make a function faster, either call it less, or write it better.

Below are the steps in the optimisation, along with their speed improvement.

Special support for Rational in Uniplate.Data, 30s to 10s

HLint uses Uniplate extensively. HLint works over large abstract syntax trees, from the library haskell-src-exts (HSE), so a generics library is essential. There are two main variants of Uniplate - Data builds on the Scrap Your Boilerplate (SYB) instances, and Direct requires special instances. HLint uses the Data variant, as it requires no instances to be written.

One of the advantages of Uniplate is that it generally outperforms most generics libraries. In particular, the variant written on top of Data instances is often many times faster than using SYB directly. The reason for outperforming SYB is documented in my PhD thesis. The essential idea is that Uniplate builds a "hit table", a mapping noting which types can be contained within which other types - e.g. that there is potentially an Int inside Either String [Int], but there isn't an Int inside String. By consulting this mapping while traversing a value, Uniplate is able to skip large portions, leading to an efficiency improvement.

When computing the hit table it is necessary for Uniplate to create dummy values of each type, which it then traverses using SYB functions. To create dummy values Uniplate uses undefined, unfortunately given the definition data Foo = Foo !Int the value Foo undefined will be forced due to the strictness annotation, and the code will raise an error - as described in bug 243. Uniplate 1.2 had a special case for Rational, which is the only type with strict components contained within HSE. Uniplate 1.3 fixed this problem more generally by catching the exception and turning off the hit table facility on such types. Unfortunately this caused Uniplate 1.3 to turn off the hit table for HSE, causing HLint to run three times slower.

The fix was simple, and pragmatic, but not general. In Uniplate 1.4 I reinstated the special case for Rational, so now HLint makes use of the hit table, and goes three times faster. A more general solution would be to manufacture dummy values for certain types (it's usually an Int or Integer that is a strict component), or to create concrete dummy values using SYB. It's interesting to observe that if HLint used SYB as it's generics library, it would not be using the hit table trick, and would run three times slower.

Use Uniplate.Direct, 10s to 5s, reverted

Uniplate also provides a Direct version, which performs better, but requires instances to be written. In order to further improve the speed of HLint I decided to try the Direct version. The biggest hurdle to using the Direct version is that many instances need to be generated, in the case of HLint it required 677. The first step was to modify the Derive tool to generate these instances (which Derive 2.1 now does), and to write a small script to decide which instances were necessary. With these instances in place, the time dropped to 5 seconds.

Unfortunately, the downside was that compilation time skyrocketed, and the instances are very specific to a particular version of HSE. While these problems are not insurmountable, I did not consider the benefit to be worthwhile, so reverted the changes. It's worth pointing out that most users of Uniplate won't require so many instances to use the Direct versions, and that a program can be switched between Direct and Data versions without any code changes (just a simple import). I also considered the possibility of discovering which Uniplate instances dominated and using the Direct method only for those - but I did not investigate further.

Add and optimise eqExpShell, 10s to 8s

The next step was to profile. I compiled and ran HLint with the following options:

ghc --make -O1 -prof -auto-all -o hlinthlint src +RTS -p

Looking at the profiling output, I saw that the function unify took up over 60% of the execution time:

The function unify is an essential part of the rule matching in HLint, and attempts to compare a rule to a target expression, and if successful returns the correct substitution. Each rule is compared to every expression in all files, which means that unify is called millions of times in a normal HLint run. Looking closely, my first suspicion was the second line from the bottom in the guard - the call to descend and (==). This line compares the outer shell of two expressions, ignoring any inner expressions. It first uses the Uniplate descend function to insert a dummy value as each subexpression, then compares for equality. To test my hypothesis that this method was indeed the culprit I extracted it to a separate function, and modified unify to call it:

I reran the profiling, and now all the time was being spent in eqExpShell. My first thought was to expand out the function to not use Uniplate. I quickly rejected that idea - there are expressions within statements and patterns, and to follow all the intricacies of HSE would be fragile and verbose.

The first optimisation I tried was to replace toNamed "_", the dummy expression, with something simpler. The toNamed call expands to many constructors, so instead I used Do an [] (an is a dummy source location), which is the simplest expression HSE provides. This change had a noticeable speed improvement.

My second thought was to add a quick test, so that if the outer constructors were not equal then the expensive test was not tried. Determining the outer constructor of a value can be done by calling show then only looking at the first word (assuming a sensible Show instance, which HSE has).

This change had a bigger speed improvement. I found that of the 1.5 million times eqExpShell was called, the quick equality test rejected 1 million cases.

My next thought was to try replacing constr with the SYB function toConstr. There was no noticeable performance impact, but the code is neater, and doesn't rely on the Show instance, so I stuck with it. After all these changes HLint was 2 seconds faster, but eqExpShell was still the biggest culprit on the profiling report.

Write eqExpShell entirely in SYB, 8s to 8s, reverted

My next thought was to rewrite eqExpShell entirely in SYB functions, not using the Eq instance at all. The advantages of this approach would be that I can simply disregard all subexpressions, I only walk the expression once, and I can skip source position annotations entirely. Starting from the geq function in SYB, I came up with:

Unfortunately, this code takes exactly the same time as the previous version, despite being significantly more complex. My guess is that toConstr is not as fast as the Eq instance, and that this additional overhead negates all the other savings. I decided to revert back to the simpler version.

Call eqExpShell less, 8s to 4s

Having failed to optimise eqExpShell further, I then thought about how to call it less. I added a trace and found that of the 1.5 million calls, in 1.3 million times at least one of the constructors was an App. Application is very common in Haskell programs, so this is not particularly surprising. By looking back at the code for unify I found several other constructors were already handled, so I added a special case for App.

It is easy to show that if an App (or any specially handled constructor) is passed as either argument to eqExpShell then the result will be False, as if both shells had been equal a previous case would have matched. Taking advantage of this observation, I rewrote the line with the generic match as:

With this change the eqExpShell function was called substantially less, it disappeared from the profile, and the speed improved to 4 seconds.

Fix Uniplate bug, 4s to 1.3s

The next step was to rerun the profiling. However, the results were very confusing - almost 70% of the execution time was recorded to three CAF's, while I could see no obvious culprit. I reran the profiler with the -caf-all flag to more precisely report the location of CAF's, and was again confused - the optimiser had rearranged the functions to make -caf-all useless. I then reran the profiler with optimisation turned off using -O0 and looked again. This time the profiling clearly showed Uniplate being the source of the CAF's. The hit table that Uniplate creates is stored inside a CAF, so was an obvious candidate.

Turning back to Uniplate, I attempted to reproduce the bug outside HLint. I enhanced the benchmarking suite by adding a method to find all the String's inside very small HSE values. The standard Uniplate benchmarks are for testing the performance of running code, and I had neglected to check the creation of the hit table, assuming it to be negligible.

These results surprised me. In particular, the hit table from Exp to String is a subset of the Module one, so should always be faster. The computation of the hit table is reasonably complex, and I was unable to isolate the problem. The tricky part of the hit table is that it is necessary to take the fixed point of the transitive closure of reachability - I had tried to keep track of recursive types and reach a fixed point with the minimal number of recomputations. I clearly failed, and probably had an exponential aspect to the algorithm that under certain circumstances caused ridiculously bad behaviour.

Rather than try to diagnose the bug, I decided instead to rethink the approach, and simplify the design. In particular, the fixed point of the transitive closure is now written as:

fixEq trans (populate box)

Where populate finds the immediate children of a constructor, trans takes the transitive closure based on the current state, and fixEq takes the fixed point. Using this new simpler design I was also able to compute which types contained with types recursively and cache it, meaning that now computing the hit table for Module not only computes the hit table for Exp, but does so in a way that means the result can be reused when asking about Exp. After rewriting the code I reran the benchmark:

I had hoped that the rewrite would fix the performance problems, and it did. I have not diagnosed the original performance bug, but at 0.03 seconds I was satisfied. I have now released Uniplate 1.5 with the revised hit table code. With this change, the time for HLint drops to 1.3 seconds and all the CAF's went away from the profile.

Sharing computation, 1.3s to 1.2s

At this point, I was happy to finish, but decided to profile just one last time. The top function in the list was matchIdea:

This function first strips both the rule (lhs) and the expression (x) of their source position information to ensure equality works correctly. However, both the rules and expressions are reused multiple times, so I moved the fmap calls backwards so each rule/expression is only ever stripped of source position information once. With this change the runtime was reduced to 1.2 seconds.

Now the largest contributor to the HLint runtime is the parsing of Haskell files. There are no obvious candidates for easy optimisations, and the code runs sufficiently fast for my purposes.

Conclusions

There is still some scope for optimisation of HLint, but I leave that for future work. One possible avenue for exploration would be turning on selected packages of hints, to see which one takes the most time - profiling on a different measure.

In optimising HLint I've found two issues in Uniplate, the first of which I was aware of, and the second of which came as a total surprise. These optimisations to Uniplate will benefit everyone who uses it. I have achieved the goal of optimising HLint, simply by following the profile reports, and as a result HLint is now substantially faster than I had ever expected.

Footnote: I didn't actually profile first, as I knew that a performance regression was caused by the upgrade to Uniplate 1.3, so knew where to start looking. Generally, I would start with a profile.

Thursday, January 14, 2010

A few days ago I posted about my plan to use .ghci files for all my projects. I am now doing so in at least five projects, and it's working great. There were two disadvantages: 1) every command had to be squeezed on to a single line; 2) some names were introduced into the global namespace. Thanks to a hint from doliorules, about :{ :} I can eliminate these disadvantages.

Saturday, January 09, 2010

I develop a reasonable number of different Haskell projects, and tend to switch between them regularly. I often come back to a project after a number of months and can't remember the basics - how to load it up, how to test it. Until yesterday, my technique was to add a ghci.bat file to load the project, and invoke the tests with either :main test or :main --test. For some projects I also had commands to run hpc, or perform profiling. Using .ghci files, I can do much better.

All my projects are now gaining .ghci files in the root directory. For example, the CmdArgs project has:

:set -w -fwarn-unused-binds -fwarn-unused-imports:load Main

let cmdTest _ = return ":main test":def test cmdTest

The first line sets some additional warnings. I usually develop my projects in GHCi with lots of useful warnings turned on. I could include these warnings in the .cabal file, but I'd rather have them when developing, and not display them when other people are installing.

The second line loads the Main.hs file, which for CmdArgs is the where I've put the main tests.

The last two lines define a command :test which when invoked just runs :main test, which is how I run the test for CmdArgs.

To load GHCi with this configuration file I simply change to the directory, and type ghci. It automatically loads the right files, and provides a :test command to run the test suite.

I've also converted the HLint project to use a .ghci file. This time the .ghci file is slightly different, but the way I load/test my project is identical:

The first section turns on/off the appropriate warnings, then sets the include path and loads the main module.

The second section defines a command named :test, which runs the tests.

The final section defines a command named :hpc which runs hpc and pops up a web browser with the result. Unfortunately GHC requires definitions entered in a .ghci file to be on one line, so the formatting isn't ideal, but it's just a list of commands to run at the shell.

Using a .ghci file for all my projects has a number of advantages:

I have a consistent interface for all my projects.

Typing :def at the GHCi prompt says which definitions are in scope, and thus which commands exist for this project.

I've eliminated the Windows specific .bat files.

The .ghci file mechanism is quite powerful - I've yet to explore it fully, but could imagine much more complex commands.

Sunday, January 03, 2010

This tutorial explains how to perform IO in Haskell, without attempting to give any understanding of monads. We start with the simplest example of IO, then build up to more complex examples. You can either read the tutorial to the end, or stop at the end of any section - each additional section will let you tackle new problems. We assume basic familiarity with Haskell, such as the material covered in chapters 1 to 6 of Programming in Haskell by Graham Hutton.

This program gets the contents of file.in, runs the operate function on it, then writes the result to file.out. The main function contains all the IO operations, while operate is entirely pure. When writing operate you do not need to understand any details of IO. This pattern of IO was sufficient for my first two years of programming Haskell.

Action List

If the pattern described in Simple IO is insufficient, the next step is a list of actions. A main function can be written as:

The main function starts with do, then has a sequence of xI <- exprI statements, and ends with return (). Each statement has a pattern on the left of the arrow (often just a variable), and an expression on the right. If the expression is not of type IO, then you must write xI <- return (exprI). The return function takes a value, and wraps it in the IO type.

As a simple example we can write a program that gets the command line arguments, reads the file given by the first argument, operates on it, then writes out to the file given by the second argument:

As before, operate is a pure function. The first line after the do uses a pattern match to extract the command line arguments. The second line reads the file specified by the first argument. The third line uses return to wrap a pure value. The fourth line provides no useful result, so we ignore it by writing _ <-.

Simplifying IO

The action list pattern is very rigid, and people usually simplify the code using the following three rules:

_ <- x can be rewritten as x.

If the penultimate line doesn't have a binding arrow (<-) and is of type IO (), then the return () can be removed.

The functions we've written so far have all been of type IO (), which lets us perform IO actions, but not give back interesting results. To give back the value x, we write return x as the final line of the do block. Unlike the imperative language return statement, this return must be on the final line.

For optional IO you make the final statement of the do block an if, then under each branch continue the do. The only subtle point is that the else must be indented by one more space than the if. This caveat is widely considered to be a bug in the definition of Haskell, but for the moment, the extra space before the else is required.

Break Time

If you've gone from understanding no IO to this point in the tutorial, I suggest you take a break (a cookie is recommended). The IO presented above is all that imperative languages provide, and is a useful starting point. Just as functional programming provides much more powerful ways of working with functions by treating them as values, it also allows IO to be treated as values, which we explore in the rest of the tutorial.

Working with IO Values

The next stage is to work with IO as values. Until now, all IO statements have been executed immediately, but we can also create variables of type IO. Using our title function from above we can write:

main :: IO ()main = do let x = title "Welcome" x x x

Instead of running the IO with x <-, we have placed the IO value in the variable x, without running it. The type of x is IO (), so we can now write x on a line to execute the action. By writing the x three times we perform the action three times.

Passing IO Arguments

We can also pass IO values as arguments to functions. In the previous example we ran the IO action three times, but how would we run it fifty times? We can write a function that takes an IO action, and a number, and runs the action that number of times:

This definition makes use of optional IO to decide when to stop, and recursion to continue performing the IO. We can now rewrite the previous example as:

main :: IO ()main = do let x = title "Welcome" replicateM_ 3 x

In an imperative language the replicateM_ function is built in as a for statement, but the flexibility of Haskell allows us to define new control flow statements - a very powerful feature. The replicateM_ function defined in Control.Monad is like ours, but more general, and can be used instead.

IO in Structures

We've seen IO values being passed as arguments, so it's natural that we can also put IO in structures such as lists and tuples. The function sequence_ takes a list of IO actions, and executes each action in turn:

If there are no elements in the list then sequence_ stops, with return (). If there are elements in the list then sequence_ gets the first action (with head xs) and executes it, then calls sequence_ on the remaining actions. As before, sequence_ is available in Control.Monad, but in a more general form. It is now simple to rewrite replicateM_ in terms of sequence_:

A much more natural definition of sequence_, rather than using null/head/tail, is to make use of Haskell's pattern matching. If there is exactly one statement in a do block, you can remove the do. Rewriting sequence_ we can eliminate the do after the equals sign, and the do after the then keyword.

A Haskell program usually consists of an outer IO shell calling pure functions. In the previous example main and operateFile are part of the IO shell, while operate and everything it uses are pure. As a general design principle, keep the IO layer small. The IO layer should concisely perform the necessary IO, then delegate to the pure part. Use of explicit IO in Haskell is necessary, but should be kept to a minimum - pure Haskell is where the beauty lies.

Where to go now

You should now be equipped to do all the IO you need. To become more proficient I recommend any of the following:

Write lots of Haskell code.

Read chapters 8 and 9 of Programming in Haskell by Graham Hutton. You should expect to spend about 6 hours thinking and contemplating on sections 8.1 to 8.4 (I recommend going to a hospital A&E department with a minor injury).