Chris Done's home page feed.http://chrisdone.com
Wed, 11 Jan 2017 00:00:00 UTFast Haskell: Competing with C at parsing XMLhttp://chrisdone.com/posts/fast-haskell-c-parsing-xml
In this post we’re going to look at parsing XML in Haskell, how it compares with an efficient C parser, and steps you can take in Haskell to build a fast library from the ground up. We’re going to get fairly detailed and get our hands dirty.

A new kid on the block

A few weeks ago Neil Mitchell posted a blog post about a new XML library that he’d written. The parser is written in C, and the API is written in Haskell which uses the C library. He writes that it’s very fast:

Hexml has been designed for speed. In the very limited benchmarks I’ve done it is typically just over 2x faster at parsing than Pugixml, where Pugixml is the gold standard for fast XML DOM parsers. In my uses it has turned XML parsing from a bottleneck to an irrelevance, so it works for me.

In order to achieve that speed, he cheats by not performing operations he doesn’t care about:

To gain that speed, Hexml cheats. Primarily it doesn’t do entity expansion, so &amp; remains as &amp; in the output. It also doesn’t handle CData sections (but that’s because I’m lazy) and comment locations are not remembered. It also doesn’t deal with most of the XML standard, ignoring the DOCTYPE stuff. [..] I only work on UTF8, which for the bits of UTF8 I care about, is the same as ASCII - I don’t need to do any character decoding.

Cheating is fine when you describe in detail how you cheat. That’s just changing the rules of the game!

But C has problems

This post caught my attention because it seemed to me a pity to use C. Whether you use Haskell, Python, or whatever, there are a few problems with dropping down to C from your high-level language:

The program is more likely to segfault. I’ll take an exception over a segfault any day!

The program opens itself up to possible exploitation due to lack of memory safety.

If people want to extend your software, they have to use C, and not your high-level language.

At the moment, sorry to say – I wouldn’t use this library to parse any arbitrary XML, since it could be considered hostile, and get me owned. Using American Fuzzy Lop, just after a few minutes, I’ve already found around ~30 unique crashes.

But C is really fast right? Like 100s of times faster than Haskell! It’s worth the risk.

But-but C is fast!

Let’s benchmark it. We’re going to parse a 4KB, a 31KB and a 211KB XML file.

The numbers 60 and 62 are < and >. In XML the only characters that matter are < and > (if you don’t care about entities). < and > can’t appear inside speech marks (attributes). They are the only important things to search for. Results:

File hexml xeno
4KB 6.395 μs 2.630 μs
42KB 37.55 μs 7.814 μs

So the baseline performance of walking across the file in jumps is quite fast! Why is it fast? Let’s look at that for a minute:

The ByteString data type is a safe wrapper around a vector of bytes. It’s underneath equivalent to char* in C.

With that in mind, the S.elemIndex function is implemented using the standard C function memchr(3). As we all know, memchr jumps across your file in large word boundaries or even using SIMD operations, meaning it’s bloody fast. But the elemIndex function itself is safe.

So we’re effectively doing a for(..) { s=memchr(s,..) } loop over the file.

Keep an eye on the allocations

Using the weigh package for memory allocation tracking, we can also look at allocations of our code right now:

We see that it’s constant. Okay, it varies by a few bytes, but it doesn’t increase linearly or anything. That’s good! One thing that stood out to me, is that didn’t we pay for allocation of the Maybe values. For a 1000x < and > characters, we should have 1000 allocations of Just/Nothing. Let’s go down that rabbit hole for a second.

Looking at the Core

Well, if you compile the source like this

stack ghc -- -O2 -ddump-simpl Xeno.hs

You’ll see a dump of the real Core code that is generated after the Haskell code is desugared, and before it’s compiled to machine code. At this stage you can already see optimizations based on inlining, common-sub-expression elimination, deforestation, and other things.

The output is rather large. Core is verbose, and fast code tends to be longer. Here is the output, but you don’t have to understand it. Just note that there’s no mention of Maybe, Just or Nothing in there. It skips that altogether. See here specifically. There is a call to memchr, then there is an eqAddr comparison with NULL, to see whether the memchr is done or not. But we’re still doing safety checks so that the resulting code is safe.

Inlining counts

The curious reader might have noticed that INLINE line in my first code sample.

{-# INLINE elemIndexFrom #-}

Without the INLINE, the whole function is twice as slow and has linear allocation.

Right at the top, we have findTagName, doing all the allocations. So I looked at the code, and found that the only possible thing that could be allocating, is S.drop. This function skips n elements at the start of a ByteString. It turns out that S.head (S.drop index0 str) was allocating an intermediate string, just to get the first character of that string. It wasn’t copying the whole string, but it was making a new pointer to it.

So I realised that I could just replace S.head (S.drop n s) with S.index s n:

This function performs at the same speed as process before it accepted any callback arguments. This means that the only overhead to SAX’ing will be the activities that the callback functions themselves do.

Specialization is for insects (and, as it happens, optimized programs)

One point of interest is that adding a SPECIALIZE pragma for the process function increases speed by roughly 1 μs. Specialization means that for a given function which is generic (type-class polymorphic), which means it will accept a dictionary argument at runtime for the particular instance, instead we will generate a separate piece of code that is specialized on that exact instance. Below is the Identity monad’s (i.e. just pure, does nothing) specialized type for process.

In the 4KB case it’s only 800 ns, but as we say in Britain, take care of the pennies and the pounds will look after themselves. The 240->285 difference isn’t big in practical terms, but when we’re playing the speed game, we pay attention to things like that.

Where we stand: Xeno vs Hexml

Currently the SAX interface in Zeno outperforms Hexml in space and time. Hurrah! We’re as fast as C!

It’s also worth noting that Haskell does this all safely. All the functions I’m using are standard ByteString functions which do bounds checking and throw an exception if so. We don’t accidentally access memory that we shouldn’t, and we don’t segfault. The server keeps running.

If you’re interested, if we switch to unsafe functions (unsafeTake, unsafeIndex from the Data.ByteString.Unsafe module), we get a notable speed increase:

Implementing a DOM parser for Xeno

All isn’t lost. Hexml isn’t a dumb parser that’s fast because it’s in C, it’s also a decent algorithm. Rather than allocating a tree, it allocates a big flat vector of nodes and attributes, which contain offsets into the original string. We can do that in Haskell too!

Here’s my design of a data structure contained in a vector. We want to store just integers in the vector. Integers that point to offsets in the original string. Here’s what I came up with.

We have three kinds of payloads. Elements, text and attributes:

1. 00# Type tag: element2. 00# Parent index (within this array)3. 01# Start of the tag name in the original string4. 01# Length of the tag name5. 05# End index of the tag (within this array)

1. 02# Type tag: attribute2. 01# Start of the key3. 05# Length of the key4. 06# Start of the value5. 03# Length of the value

That’s all the detail I’m going to go into. You can read the code if you want to know more. It’s not a highly optimized format. Once we have such a vector, it’s possible to define a DOM API on top of it which can let you navigate the tree as usual, which we’ll see later.

We’re going to use our SAX parser–the process function, and we’re going to implement a function that writes to a big array. This is a very imperative algorithm. Haskellers don’t like imperative algorithms much, but Haskell’s fine with them.

The function runs in the ST monad which lets us locally read and write to mutable variables and vectors, while staying pure on the outside.

I allocate an array of 1000 64-bit Ints (on 64-bit arch), I keep a variable of the current size, and the current parent (if any). The current parent variable lets us, upon seeing a tag, assign the position in the vector of where the parent is closed.

Whenever we get an event and the array is too small, I grow the array by doubling its size. This strategy is copied from the Hexml package.

Finally, when we’re done, we get the mutable vector, “freeze” it (this means making an immutable version of it), and then return that copy. We use unsafeFreeze to re-use the array without copying, which includes a promise that we don’t use the mutable vector afterwards, which we don’t.

The DOM speed

Not bad! The DOM parser is only <2x slower than Hexml (except in the 31KB where it’s faster. shrug). Here is where I stopped optimizing and decided it was good enough. But we can review some of the decisions made along the way.

In the code we’re using unboxed mutable references for the current size and parent, the mutable references are provided by the mutable-containers package. See these two lines here:

sizeRef <- fmap asURef (newRef 0)
parentRef <- fmap asURef (newRef 0)

Originally, I had tried STRef’s, which are boxed. Boxed just means it’s a pointer to an integer instead of an actual integer. An unboxed Int is a proper machine register. Using an STRef, we get worse speeds:

File xeno-dom
4KB 12.18 μs
31KB 6.412 μs
211KB 631.1 μs

Which is a noticeable speed loss.

Another thing to take into consideration is the array type. I’m using the unboxed mutable vectors from the vector package. When using atomic types like Int, it can be a leg-up to use unboxed vectors. If I use the regular boxed vectors from Data.Vector, the speed regresses to:

Tada! We matched Hexml, in pure Haskell, using safe accessor functions. We provided a SAX API which is very fast, and a simple demonstration DOM parser with a familiar API which is also quite fast. We use reasonably little memory in doing so.

This package is an experiment for educational purposes, to show what Haskell can do and what it can’t, for a very specific domain problem. If you would like to use this package, consider adopting it and giving it a good home. I’m not looking for more packages to maintain.

Things learned

I made some statements in that post that I’m going to re-evaluate in this post:

Let’s have a code style discussion. I propose to solve it with tooling.

It’s not practical to force everyone into one single style.

Code formatting is solved with tooling

I’ve used hindent for two years, it solves the problem. There are a couple exceptions1. On the whole, though, it’s a completely different working experience:

Code always looks the same.

I don’t make any style decisions. I just think about the tree I need for my program.

I don’t do any manual line-breaking.

I’ve come to exploit it by writing lazy code like do x<-getLine;when(x>5)(print 5) and then hitting a keybinding to reformat it.

Switching style is realistic

I’ve been writing Haskell in my own style for years. For me, my style is better for structured editing, more consistent, and visually easier to read, than most code I’ve seen. It’s like Lisp. Using hindent, with my ChrisDone style, I had it automatically formatted for me. I used 2-space indents.

The most popular style in the community2 is JohanTibell: The alignment, line-breaking, and spacing (4 spaces instead of 2) differs significantly to my own style.

At FP Complete I’ve done a lot of projects, private FP Complete projects, client projects, and public FP Complete projects (like Stack). For the first year or so I generally stuck to my guns when working on code only I was going to touch and used my superior style.

But once the JohanTibell style in hindent was quite stable, I found that I didn’t mind using it while collaborating with people who prefer that style. The tooling made it so automatic, that I didn’t have to understand the style or make any style decisions, I just wrote code and got on with it. It doesn’t work great with structured-haskell-mode, but that’s ok. Eventually I got used to it, and eventually switched to using it for my own personal projects.

I completely did a U-turn. So I’m hoping that much of the community can do so too and put aside their stylistic preferences and embrace a standard.

Going forward

There is a demonstration web site in which you can try examples, and also get a link for the example to show other people the output (for debugging).

HIndent now has a “literate” test suite here: TESTS.md. You can read through it as a document, a bit like Johan’s style guide. But running the test suite parses this file and checks that each code fence is printed as written.

There’s also a BENCHMARKS.md, since I rewrote comment handling, switched to a bytestring-builder, improved the quadratic line-breaking algorithm to short-circuit, among other improvements, hindent now formats things in 1.5ms instead of 1s.

For those who still want to stick with their old hindent, Andrew Gibiansky is keeping a fork of hindent 4 for his personal use, and has said he’ll accept PR’s for that.

HIndent is not perfect, there’s always room for improvement (issue tracker welcomes issues), but over time that problem space gets smaller and smaller. There is support for Emacs, Vim and Atom. I would appreciate support for SublimeText too.

Give it a try!

Such as CPP #if directives–they are tricky to handle. Comments are also tricky, but I’ve re-implemented comment handling from scratch and it works pretty well now. See the pretty extensive tests.↩

From a survey of the top downloaded 1000 packages on Hackage, 660 are 4-spaced and 343 are 2-spaced. All else being equal, 4 spaces wins.↩

]]>Mon, 29 Aug 2016 00:00:00 UThttp://chrisdone.com/posts/hindent-5A philosophical difference between Haskell and Lisphttp://chrisdone.com/posts/haskell-lisp-philosophy-difference
One difference in philosophy of Lisp (e.g. Common Lisp, Emacs Lisp) and Haskell is that the latter makes liberal use of many tiny functions that do one single task. This is known as composability, or the UNIX philosophy. In Lisp a procedure tends to accept many options which configure its behaviour. This is known as monolithism, or to make procedures like a kitchen-sink, or a Swiss-army knife.

Which one is better can be discussed in another post. I just want to make the simple case that there is indeed a difference in philosophy and practice. Having written my fair share of non-trivial Emacs Lisp (and a small share of Common Lisp; I’ve maintained Common Lisp systems) and my fair share of non-trivial Haskell I think I’m in a position to judge.

Full disclosure: We’ll just look at some trivial examples anyone can understand, with the (unproven but asserted) implication that these examples are representative of the general way software is written in these languages.

An example which should be readily familiar to any programmer of any background is working on lists. For example, CL has the remove-if-not procedure. Its documentation signature is like this:

(REMOVE-IF-NOT predicate seq :key :count :start :end :from-end)

It packs a number of ideas into one procedure.

By comparison, Haskell has the filter function:

filter :: (a ->Bool) -> [a] -> [a]

Given a problem statement “take all elements from the list–except the first three–that satisfy predicate p, and take only the first five of those”, in Common Lisp you’d express it quite concisely as this:

(remove-if-not #'p xs :count 5 :start 3)

The same in Haskell would be expressed as this:

take 5. filter p . drop 3

The difference which should be obvious whether you know Haskell or Lisp is that in the Lisp code the function does a few behaviours and accepts arguments to configure them. In the Haskell code, we use three different functions which do one task:

The . operator composes functions together, just like pipes in UNIX. We might express this in UNIX something like:

bash-3.2$ cat |tail -n '+4'|grep -v '^p'|head -n 5
12345678910

Press Ctrl-d here we get:

45678

Like pipes in UNIX, the functions are clever enough to be performant when composed together–we don’t traverse the whole list and generate a new list each time, each item is generated on demand. In fact, due to stream fusion, the code will be compiled into one fast loop.

If we want things that don’t satisfy the predicate, we just compose again with not:

take 5. filter (not . p) . drop 3

In Common Lisp composition is a bit wordier because it’s rarely if ever used, so instead there is another function for that:

(remove-if#'p xs :count 5 :start 3)

(Probably a more Lispy approach would’ve been to have a :not keyword argument to the remove-if function.)

The most pathological example of such a kitchen sink in Lisp is the well known LOOP macro.

Problem: get all elements less than 5, then just the even ones of that set.

With the LOOP macro this can be expressed quite readily:

> (loop for i in '(1 2 3 4)
when (evenp i)
collect i
when (> i 5) do (return))
(2 4)

In Haskell this is expressed with two separate functions:

λ> (filter even . takeWhile (<5)) [1..4]
[2,4]

In Haskell the same applies to vector libraries and text libraries and bytes libraries, which can be fused. Fusion is chiefly an advantage of purity – you can fuse n loops together into one loop if you know that they don’t do side-effects. Such an advantage can also be applied to other pure languages like Idris or PureScript or Elm.

]]>Sat, 19 Dec 2015 00:00:00 UThttp://chrisdone.com/posts/haskell-lisp-philosophy-differenceIdle thoughts: More open, more free softwarehttp://chrisdone.com/posts/more-open-software
I’m a bit busy, these are just some idle thoughts.

I just upgraded my Android OS to some other kind of dessert name and a bunch of stuff changed in a way I had no desire for.

It made me think about the virtues of open source software. I can just go and change it! Free software means benefiting from the work of others without being shackled by them at the same time.

And then about the problems of open source software, which is that only developers-skilled developers-with specific knowledge, are able to approach the codebase of an app they use, update it, and then use that new software in a continuous and smooth way. Everyone else’s hands are effectively tied behind their backs.

So that got me thinking about how software could be more “open” than simply “open source”, if it was inherently more configurable. And also about better migration information from one piece of software to the next.

So I imagined a world in which when I get an update for a piece of software I could see a smart diff, as a regular human, of what the new UI and behaviour looks like, how it changed. This button moved there, changed color. Pressing this button used to exhibit X behaviour, now that behaviour is more complicated, or more limited, to trigger this action, and so on.

I believe that a properly declarative UI library with explicit state modeling, such as in Elm or whatnot, could actually handle a thing like that, but that it would have to be designed from the bottom up like that. And every component would need to have some “mock” meta-data about it, so that the migration tool could say “here’s what the old UI looks like with lorem ipsum data in it and here’s what that same data, migrated, looks like in the new UI” and you could interact with this fake UI on fake data, with no consequences. Or interact with the user’s data in a read-only “fake” way.

You could say: actually, no, I want to configure that this button will stay where it is, that the theme will stay my current dark theme, etc.

You could visualize state changes in the UI such as with the time traveling thing in Elm or React and make new decision trees, or perhaps pick between built-in behaviours.

But one key idea could be that when you update software in a new way, unless you’re removing the ability to do a feature completely (e.g. the server won’t even respond to that RPC call), then you should indicate that, in the intelligent “software diff”: then the user can say, no I still want to use that and now they have a “patched” or “forked” version of the software locally but that the maintainers of the software don’t have to worry about.

Normally configuring software is a thing developers manually hard code into the product. It seems obviously better to make software inherently configurable, from a free software perspective at least (not from a proprietary locked-in perspective).

Of course, you could write code at any time; drop down to that. But if most of the code can be self-describing at least in a high-level “do the thing or that thing” way, this would be far more accessible to general users than code itself which at the moment is magic and certainly beyond my interest to go and patch for the most part.

]]>Mon, 26 Oct 2015 00:00:00 UThttp://chrisdone.com/posts/more-open-softwareUse the REPL, Lukehttp://chrisdone.com/posts/haskell-repl
There was an online discussion about iteration times in Haskell and whether and why they are slow. For me, it’s not slow. I do all my Haskell development using a REPL. Here are some tips I wrote up in that discussion.

Prepare for GHCi use

The first thing you want to do before writing anything for your project is make sure you can load your code in the REPL; GHCi. Sometimes you have special configuration options or whatnot (cabal repl and stack ghci make this much easier than in the past). The sooner you start the better. It can be a PITA to load some projects that expect to just be a “start, run and die” process, they often launch threads without any clean-up procedure; in this way the REPL makes you think about cleaner architecture.

Make sure it scales

Learn how to make GHCi fast for your project so that you don’t hit a wall as your project scales. Loading code with byte-code is much faster than object code, but loading with object code has a cache so that in a 100 module project if you only need to reload one, it’ll just load one. Make sure this is happening for you, when you need it. Dabble with the settings.

Write small, parametrized functions

Code that is good for unit tests is code that is good for the REPL. Write small functions that take state as arguments (dependency injection) rather than loading their own state, then they can be ran in the REPL and used in a test suite easily. Regard functions that you can’t just call directly with suspicion.

Test work-in-progress implementations in the REPL

While writing, test your function in the REPL with typical arguments it will expect, rather than implementing a function and then immediately using it in the place you want to ultimately use it. You can skip this for trivial “glue” functions, but it’s helpful for non-trivial functions.

Setup/teardown helpers

Write helpful setup/teardown code for your tests and REPL code. For example, if you have a function that needs a database and application configuration to do anything, write a function that automatically and conveniently gets you a basic development config and database connection for running some action.

Make data inspectable

Make sure to include Show instances for your data types, so that you can inspect them in the REPL. Treat Show as your development instance, it’s for you, don’t use it for “real” serialization or for “user-friendly” messages. Develop a distaste for data structures that are hard to inspect.

Figure out the fastest iteration for you

Use techniques like :reload to help you out. For example, if I’m working on hindent, then I will test a style with HIndent.test chrisDone "x = 1", for example, in the REPL, and I’ll see the output pretty printed as Haskell in my Emacs REPL. But I work on module HIndent.Style.ChrisDone. So I first :load HIndent and then for future work I use :reload to reload my .ChrisDone changes and give me the HIndent environment again.

Configuration

Make sure you know about the .ghci file which you can put in your ~/ and also in the project directory where GHCi is run from. You can use :set to set regular GHC options including packages (-package foo) and extensions (-XFoo), and any special include directories (-ifoo).

More advanced tricks

Consider tricks like live reloading; if you can support it. I wrote an IRC server and I can run it in the REPL, reload the code, and update the handler function without losing any state. If you use foreign-store you can make things available, like the program’s state, in an IORef or MVar.

This trick is a trick, so don’t use it in production. But it’s about as close as we can get to Lisp-style image development.

In summary

Haskell’s lucky to have a small REPL culture, but you have to work with a Lisp or Smalltalk to really know what’s possible when you fully “buy in”. Many Haskellers come from C++ and “stop program, edit file, re-run compiler, re-run whole program” cycles and don’t have much awareness or interest in it. If you are such a person, the above probably won’t come naturally, but try it out.

Motivation

It was after working on a number of projects at FP Complete that use file paths in various ways. We used the system-filepath package, which was supposed to solve many path problems by being an opaque path type. It occurred to me that the same kind of bugs kept cropping up:

Expected a path to be absolute but it was relative, or vice-versa.

Expected two equivalent paths to be equal or order the same, but they did not (/home//foo vs /home/foo/ vs /home/bar/../foo, etc.).

Unpredictable behaviour with regards to concatenating paths.

Confusing files and directories.

Not knowing whether a path was a file or directory or relative or absolute based on the type alone was a drag.

All of these bugs are preventable.

Approach

My approach to problems like this is to make a type that encodes the properties I want and then make it impossible to let those invariants be broken, without compromise or backdoors to let the wrong value “slip in”. Once I have a path, I want to be able to trust it fully. This theme will be seen throughout the things I lay out below.

Solution

After having to fix bugs due to these in our software, I put my foot down and made:

An opaque Path type (a newtype wrapper around String).

Smart constructors which are very stringent in the parsing.

Make the parsers highly normalizing.

Leave equality and concatenation to basic string equality and concatenation.

Include relativity (absolute/relative) and type (directory/file) in the type itself.

Use the already cross-platform filepath package for implementation details.

Implementation

The data types

Here is the type:

newtypePath b t =Path FilePath
deriving (Typeable)

The type variables are:

b - base, the base location of the path; absolute or relative.

t - type, whether file or directory.

The base types can be filled with these:

dataAbsderiving (Typeable)
dataRelderiving (Typeable)

And the type can be filled with these:

dataFilederiving (Typeable)
dataDirderiving (Typeable)

(Why not use data kinds like data Type = File | Dir? Because that imposes an extension overhead of adding {-# LANGUAGE DataKinds #-} to every module you might want to write out a path type in. Given that one cannot construct paths of types other than these, via the operations in the module, it’s not a concern for me.)

The only delimiter syntax accepted is the path separator; / on POSIX and \ on Windows.

Any other delimiter is rejected; .., ~/, /./, etc.

All parsers normalize into single separators: /home//foo -> /home/foo.

Directory parsers always normalize with a final trailing /. So /home/foo parses into the string /home/foo/.

It was discussed briefly whether we should just have a class for parsing rather than four separate parsing functions. In my experience so far, I have had type errors where I wrote something like x <- parseAbsDir someAbsDirString because x was then passed to a place that expected a relative directory. In this way, overloading the return value would’ve just been accepted. So I don’t think having a class is a good idea. Being explicit here doesn’t exactly waste our time, either.

Why are these functions in MonadThrow? Because it means I can have it return an Either, or a Maybe, if I’m in pure code, and if I’m in IO, and I don’t expect parsing to ever fail, I can use it in IO like this:

do x <- parseRelFile (fromCabalFileName x)
foo x
…

That’s really convenient and we take advantage of this at FP Complete a lot.

The instances

Equality, ordering and printing are simply re-using the String instances:

Self-documentation

Now I can read the path like:

{ fooPath ::PathRelDir, ... }

And know that this refers to the directory relative to some other path, meaning I should be careful to consider the current directory when using this in IO, or that I’ll probably need a parent to append to it at some point.

In practice

We’ve been using this at FP Complete in a number of packages for some months now, it’s turned out surprisingly sufficient for most of our path work with only one bug found. We weren’t sure initially whether it would just be too much of a pain to use, but really it’s quite acceptable given the advantages. You can see its use all over the stack codebase.

Doing I/O

Currently any operations involving I/O can be done by using the existing I/O library:

doesFileExist (toFilePath fp)
readFile (toFilePath fp)

etc. This has problems with respect to accidentally running something like:

doesFileExist $(mkRelDir "foo")

But I/O is currently outside the scope of what this package solves. Once you leave the realm of the Path type invariants are back to your responsibility.

As with the original version of this library, we’re currently building up a set of functions in a Path.IO module over time that fits our real-world use-cases. It may or may not appear in the path package eventually. It’ll need cleaning up and considering what should really be included.

Doing textual manipulations

One problem that crops up sometimes is wanting to manipulate paths. Currently the way we do it is via the filepath library and re-parsing the path:

parseAbsFile . addExtension "/directory/path""ext". toFilePath

It doesn’t happen too often, in our experience, to the extent this needs to be more convenient.

Accepting user input

Sometimes you have user input that contains ../. The solution we went with is to have a function like resolveDir:

Which will call canonicalizePath which collapses and normalizes a path and then we parse with regular old parseAbsDir and we’re cooking with gas. This and others like it might get added to the path package.

Comparing with existing path libraries

filepath and system-filepath

The filepath package is intended as the complimentary package to be used before parsing into a Path value, and/or after printing from a Path value. The package itself contains no type-safety, instead contains a range of cross-platform textual operations. Definitely reach for this library when you want to do more involved manipulations.

system-canonicalpath, canonical-filepath, directory-tree

The system-canonicalpath and the canonical-filepath packages both are a kind of subset of path. They canonicalize a string into an opaque path, but neither distinguish directories from files or absolute/relative. Useful if you just want a canonical path but doesn’t do anything else.

The directory-tree package contains a sum type of dir/file/etc but doesn’t distinguish in its operations relativity or path type.

pathtype

Finally, we come to a path library that path is similar to: the pathtype library. There are the same types of Path Abs File / Path Rel Dir, etc.

The points where this library isn’t enough for me are:

There is an IsString instance, which means people will use it, and will make mistakes.

Paths are not normalized into a predictable format, leading to me being unsure when equality will succeed. This is the same problem I encountered in system-filepath. The equality function normalizes, but according to what properties I can reason about? I don’t know.

It has functions like <.>/addExtension which lets you insert an arbitrary string into a path.

Some functions let you produce nonsense (could be prevented by a stricter type), for example:

System.Path.Posix> takeFileName ("/tmp/" ::PathAbsDir)
tmp

I’m being a bit picky here, a bit unfair. But the point is really to show the kind of things I tried to avoid in path. In summary, it’s just hard to know where things can go wrong, similar to what was going on in system-filepath.

data-filepath

The data-filepath is also very similar, I discovered it after writing my own at work and was pleased to see it’s mostly the same. The main differences are:

Uses DataKinds for the relative/absolute and file/dir distinction which as I said above is an overhead.

Uses a GADT for the path type, which is fine. In my case I wanted to retain the original string which functions that work on the FilePath (String) type already deal with well. It does change the parsing step somewhat, because it parses into segments.

It’s more lenient at parsing (allowing .. and trailing .).

The API is a bit awkward to just parse a directory, requires a couple functions to get it (going via WeakFilePath), returning only an Either, and there are no functions like parent. But there’s not much to complain about. It’s a fine library, but I didn’t feel the need to drop my own in favor of it. Check it out and decide for yourself.

Summary

There’s a growing interest in making practical use of well-typed file path handling. I think everyone’s wanted it for a while, but few people have really committed to it in practice. Now that I’ve been using path for a while, I can’t really go back. It’ll be interesting to see what new packages crop up in the coming year, I expect there’ll be more.

]]>Sat, 27 Jun 2015 00:00:00 UThttp://chrisdone.com/posts/path-packageExistentials and the heterogenous list fallacyhttp://chrisdone.com/posts/existentials
An oft-stated argument against static typing is that heterogenous lists are unreasonably difficult to model. Why is static typing being so difficult? Why can’t it just be like dynamic typing? This is a specious argument.

For example, in one article I read, I saw:

In fact you can program heterogeneous lists in dependently typed languages, but it’s unreasonably complicated. Python makes no complaints:

(I’m not sure what “methodological weakness” is supposed to mean, but let’s ignore that.)

There are two problems with this argument and demonstration:

It’s contrived. I’ve written about as much Emacs Lisp and JavaScript as I have written Haskell and C#, and I cannot with all intellectual honesty remember wanting a heterogenous list.1

It’s ill-defined. What is this data structure? What can I assume about the elements so that I can write operations generically? Which, I presume, is the only reason I would be using a list in the first place (otherwise a record would be the correct thing to use); to write algorithms that apply generally to any index.

Even cutting the author some slack and assuming they might want to just temporarily put some things together as a tuple, static languages have tuples, which are heterogenous.

When you look at it beyond the superficial, it’s rather odd.

Regardless, I am sporting. Some people will say, yes, okay, it’s contrived, and never really arises, but if I really wanted this, how could I do it in a statically typed language? So here is the above in Haskell.

So the list contains a bunch of disparate things and the implicit invariant here is that we can print each of them out. So we can model that with an existential data type Py (for “Python”) that holds some type a that is showable.

data Py = forall a. Show a => Py a
instance Show Py where show (Py s) = show s

(Oh, Haskell doesn’t define an instance for printing functions, so let’s use instance Show (a -> b) where show _ = "<function>" to vaguely match Python.)

I may not know, or care, what the type is, but I at least need to know something about it, in a duck-typing kind of way. If it walks like a duck, quacks like a duck, etc. then it’s a good enough duck for my purposes. In this case, Py says, is it at least showable?

Suppose I replace this instance with a new instance that has constraints:

instance (Show a,Show b) =>Show (MyTuple a b) where
show (MyTuple a b) ="MyTuple "++ show a ++" "++ show b

Question: Does that change whether GHC decides to pick this new version of instance over others that may be available, compared to the one above? Have a think.

The answer is: nein! The constraints of an instance don’t have anything to do with deciding whether an instance is picked from the list of instances available. Constraints only apply after GHC has already decided it’s going with this instance.

So, cognizant of this obvious-after-the-fact property, let’s use the equality constraint that was introduced with GADTs and type families (enabling either brings in ~):

instance a ~ () =>IsString (WriterString a) where
fromString = tell

Let’s try it:

λ> execWriter (do"hello" ; "world" ::WriterString ())
"helloworld"

This instance is picked by GHC, as we hoped, because of the a. The instance method also type checks, because the constraint applies when type checking the instance methods, just like if you write a regular declaration like:

Actually, it’s a natural consequence to grokking how instance resolution works (but calling it a “trick” makes for a catchy title).↩

]]>Fri, 19 Jun 2015 00:00:00 UThttp://chrisdone.com/posts/haskell-constraint-trickStream fusion and composability (Java 8 and Haskell) for newbieshttp://chrisdone.com/posts/stream-composability
In an online discussion, when Java 8 released their stream API, written about here, you can write e.g.

Someone asked, “But my question: would the streams be faster than loops? Or is the only benefit better readability?” Someone answered that the benefit is that streams compose and loops don’t. What does composable mean here? Below is my answer, using two languages I know, JavaScript and Haskell.

Composable in this context means: To be able to compose two things into one without redundancy or overhead. For example, consider you want to map a function f over an array arr to produce a new array, you might do this:

Now, if you want to do that all in one process you have a few options:

Put them all one after the other verbatim as I’ve written above. Redundant, a maintenance issue and inefficient.

Merge them all into one clever loop. Also redundant (re-implementing the same concept of mapping, filtering and taking), error prone (it’s easy to get manual loops wrong, especially merging several concepts together), and a maintenance burden.

Put them each into a method on your language’s Array type as map(), filter(), and takeWhile() and then write arr.map(f).filter(p).takeWhile(p2). Good abstraction, very low maintenance because the functions are black boxes. But inefficient.

An ideal stream API will give you the last point, but be able to understand concepts like mapping and filtering and know how to merge them together into an efficient loop. This is called stream fusion, which you can google if you want to know more.

I don’t know Java but I can give a Haskell example:

map f . filter p . takeWhile p2

(Note: In Haskell the operations separated by . are run right to left, like map f (filter p (takeWhile p2 …)).)

and look at the reduced output called Core, a language the compiler generates code for before generating assembly or byte code, the map f . filter p are both compiled into a single loop (Core output is verbose, so I collapsed it into this more readable form). This just walks over the list, checks whether the item is even, if so, keeps it and adds 2 to it, otherwise skips that item:

]]>Thu, 11 Jun 2015 00:00:00 UThttp://chrisdone.com/posts/stream-composabilityHow Haskellers are seen and see themselveshttp://chrisdone.com/posts/haskellers
How Haskellers are seen

The type system and separated IO is an awkward, restricting space suit:

Spending most of their time gazing longingly at the next abstraction to yoink from mathematics:

Looking at anything outside the Haskell language and the type system:

Using unsafePerformIO:

How Haskellers see themselves

No, it’s not a space suit. It’s Iron Man’s suit!

The suit enables him to do impressive feats with confidence and safety:

Look at the immense freedom and power enabled by wearing the suit:

Reality

]]>Tue, 19 May 2015 00:00:00 UThttp://chrisdone.com/posts/haskellersMy Haskell tooling wishlisthttp://chrisdone.com/posts/haskell-wishlist
I spend a lot of my time on Haskell tooling, both for my hobbies and my job. Almost every project I work on sparks a desire for another piece of tooling. Much of the time, I’ll follow that wish and take a detour to implement that thing (Fay, structured-haskell-mode, hindent, are some Haskell-specific examples). But in the end it means less time working on the actual domain problem I’m interested in, so a while ago I intentionally placed a quota on the amount of time I can spend on this.

So this page will contain a list of things I’d work on if I had infinite spare time, and that I wish someone else would make. I’ll update it from time to time as ideas come to the fore.

These projects are non-trivial but are do-able by one person who has enough free time and motivation. There is a common theme among the projects listed, which is that they are things that Haskell among most other well known languages is particularly well suited for and yet we don’t have such tooling as standard tools in the Haskell tool box. They should be!

An equational reasoning assistant

Equational reasoning lets you prove properties about your functions by following a simple substitution model to state that one term is equal to another. The approach I typically take is to expand and reduce until both sides of the equation are the same.

Here is an example. I have a data type, Consumer. Here is an instance of Functor:

I want to prove that it is a law-abiding instance of Functor, which means proving that fmap id ≡ id. You don’t need to know anything about the Consumer type itself, just this implementation. Here are some very mechanical steps one can take to prove this:

Reason that if every branch of a case returns the original value of the case, then that whole case is an identity and can be dropped.

Eta-reduce.

Again, pattern-matching lambdas are just syntactic sugar for cases, so by the same rule this can be considered identity.

End up with what we wanted to prove: fmap id ≡ id

These are pretty mechanical steps. They’re also pretty laborious and error-prone. Of course, if you look at the first step, it’s pretty obvious the whole thing is an identity, but writing the steps out provides transformations that can be statically checked by a program. So it’s a good example, because it’s easily understandable and you can imagine proving something more complex would require a lot more steps and a lot more substitutions. Proof of identity for Applicative has substantially more steps, but is equally mechanical.

Wouldn’t it be nice if there was a tool which given some expression would do the following?

Suggest a list of in-place expansions.

Suggest a list of reductions based on a set of pre-defined rules (or axioms).

Then I could easily provide an interactive interface for this from Emacs.

In order to do expansion, you need the original source of the function name you want to expand. So in the case of id, that’s why I suggested stating an axiom (id a ≡ a) for this. Similarly, I could state the identity law for Monoids by saying mappend mempty a ≡ a, mappend a mempty ≡ a. I don’t necessarily need to expand the source of all functions. Usually just the ones I’m interested in.

Given such a system, for my example above, the program could actually perform all those steps automatically and spit out the steps so that I can read them if I choose, or otherwise accept that the proof was derived sensibly.

In fact, suppose I have my implementation again, and I state what must be satisfied by the equational process (and, perhaps, some axioms that might be helpful for doing it, but in this case our axioms are pretty standard), I might write it like this:

This template-haskell macro proof would run the steps above and if the equivalence is satisfied, the program compiles. If not, it generates a compile error, showing the steps it performed and where it got stuck. TH has limitations, so it might require writing it another way.

Such a helpful tool would also encourage people (even newbies) to do more equational reasoning, which Haskell is often claimed to be good at but you don’t often see it in evidence in codebases. In practice isn’t a standard thing.

Update 2014-01-25: Andrew Gill got back to me that HERMIT is the continuation of HERA. It seems that you can get inlining, general reduction and a whole bunch of case rewrites from this project. Check the KURE paper for the DSL used to do these rewrites, it looks pretty aweeome. So if anyone’s thinking of working on this, I’d probably start with reading HERMIT.Shell or HERMIT.Plugin and see how to get it up and running. It’s a pity it has to work on Core, that’s a little sad, but as far as trade-offs go it’s not too bad. Doing proofs on things more complicated than core might be hard anyway. It does mean you’ll probably want to make a rewrite that does a global variable replacement: x and y is a little easier to read than x0_6 and the like that you get in Core.

Catch for GHC

Ideally, we would never have inexhaustive patterns in Haskell. But a combination of an insufficient type system and people’s insistence on using partial functions leads to a library ecosystem full of potential landmines. Catch is a project by Neil Mitchell which considers how a function is called when determining whether its patterns are exhaustive or not. This lets us use things like head and actually have a formal proof that our use is correct, or a formal proof that our use, or someone else’s use, will possibly crash.

map head . group

This is an example which is always correct, because group returns a list of non-empty lists.

Unfortunately, it currently works for a defunct Haskell compiler, but apparently it can be ported to GHC Core with some work. I would very much like for someone to do that. This is yet another project which is the kind of thing people claim is possible thanks to Haskell’s unique properties, but in practice it isn’t a standard thing, in the way that QuickCheck is.

A substitution stepper

This is semi-related, but different, to the proof assistant. I would like a program which can accept a Haskell module of source code and an expression to evaluate in the context of that module and output the same expression, as valid source code, with a single evaluation step performed. This would be fantastic for writing new algorithms, for understanding existing functions and algorithms, writing proofs, and learning Haskell. There was something like this demonstrated in Inventing on Principle. The opportunities for education and general development practice are worth such a project.

Each step in this is a valid Haskell program, and it’s just simple substitution.

If the source for a function isn’t available, there are a couple options for what to do:

Have special-cases for things like (+), as above.

Just perform no substitution for that function, it will still be a legitimate program.

It’s another project I could easily provide see-as-you-type support for in Emacs, given an engine to query.

Again, this is just one more project which should just be a standard thing Haskell can do. It’s a pure language. It’s used to teach equational reasoning and following a simple lambda calculus substitution model. But there is no such tool. Haskell is practically waving in our faces with this opportunity.

Existing work in this area:

stepeval - a prototype which nicely demonstrates the idea. It’s based on HSE and only supports a tiny subset. There aren’t any plans to move this forward at the moment. I’ll update the page if this changes.

]]>Sat, 24 Jan 2015 00:00:00 UThttp://chrisdone.com/posts/haskell-wishlistMeasuring duration in Haskellhttp://chrisdone.com/posts/measuring-duration-in-haskell
Happy new year, everyone. It’s a new year and time for new resolutions. Let’s talk about time. Specifically, measuring it in Haskell.

A wrong solution

How do you measure how long something takes in Haskell? Here’s a naive attempt:

Inaccurate measuring

Here’s what’s wrong with this implementation:

The clock can be changed by the user at any moment.

Time synchronization services regularly update time.

If you’re on an Ubuntu desktop, time is updated when you first boot up from NTP servers. If you’re on a server, likely there is a daily cron job to update your time, because you don’t tend to reboot servers. My laptop has been on for 34 days:

Additionally, leap seconds can be introduced at any time and cannot be predicated systematically, but there is at least a 6 months in advance notice for time servers. In 2015 there will be an extra second added to time in-between the 30th of June to the 1st of July.

These factors mean that if our main function is run during an update, the reported time could be completely wrong. For something simple like the above, maybe it doesn’t matter. For long term logging and statistics gathering, this would represent an anomaly. For a one-off, maybe it’s forgivable, because it’s convenient. But above all, it’s simply inaccurate reporting.

Accurate measuring

Readers familiar with this problem will think back to measuring time in C; it requires inspecting the system clock and dividing by clocks per second. In fact there are a couple solutions around that use this:

In turn, that package uses System.CPUTime from base, which is also handy.

These are more reliable, because the time cannot be changed. But they are limited, as both only measure CPU time and not IO time. So if your program takes 10 seconds but only does 5 seconds of CPU processing and 5 seconds of waiting for the disk, then you will not have the real time. Also known as wall time.

In the Criterion package, there’s need for fine-grained, fast, accurate measuring of both real and CPU time, so it includes its own cross-platform implementations:

That’s nice, but it’s embedded in a specific package built for benchmarking, which we may not necessarily be doing. For example, I am dabbling with a program to measure the speed of my key presses. It turns out there is a package that does similarly to Criterion, already prepared and similarly cross-platform and only depends on base and ghc-prim.

Then I decide to change the type of mu, so instead I want to just write:

bind (mu bar)
(foo zot)

Which is just like fmap but the function can run in the monad. Similar to traverse:

(Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)

As someone who isn’t a fan of operators, I generally am appreciative of alternative regular plain English word versions of functions, which I find easier to type, read and edit. Currently without defining such a handy name, I have to transform the code to this:

mu bar =<<
foo zot

The name for this function is a no-brainer ((>>=) is now pronnounced “bind”):

bind ::Monad m => (a -> m b) -> m a -> m b
bind = (=<<)

For comparison, the not-very-pleasant <$> and <*> each have word alternatives, fmap and ap.

I submitted this to the haskell libraries mailing list, but include it here for future reference.

]]>Tue, 09 Dec 2014 00:00:00 UThttp://chrisdone.com/posts/bindLucid 2.0: clearer than beforehttp://chrisdone.com/posts/lucid2
Since my last post about Lucid, I’ve updated Lucid to major version 2.0 in a way that removes the need for the with combinator. Now, you can just write:

Convenient construction of custom elements

But you can also construct normal elements with a custom class, so that you don’t have to use with for extending elements like our old container_ example, you can construct an element with some given attributes:

But in practice it seems that elements with no children almost always take a number of attributes. Exceptions to that rule are br_ and hr_, but those are quite rare. So this is a very happy trade-off, I feel. (See the ‘real examples’ at the end of this post.)

Extending elements like this is straight-forward using our usual with combinator. Example, suppose you’re sick of writing the classic input type="text", you can define a combinator like this:

Summary

In total, I’ve made this library almost perfect for my own tastes. It’s concise, easy to read and edit (and auto-format), it lacks namespace issues, it’s easy to make re-usable terms, and it’s fast enough. The need for the with combinator was the only wart that naggled me over the past week, I knew I’d end up making some change. I’ve also covered the trade-offs that come with this design decision.

As far as I’m concerned, Lucid can rest at major version 2.* for a long time now. I added some newfangled HTML5 elements (who knew main was now an element?) and a test suite. You can expect the only minor version bumps henceforth to be bugfixes, regression tests, and more documentation.

]]>Thu, 20 Nov 2014 00:00:00 UThttp://chrisdone.com/posts/lucid2Lucid: templating DSL for HTMLhttp://chrisdone.com/posts/lucid
I’m not big on custom templating languages, for reasons I’ll write about another time. I prefer EDSLs. I preferred the xhtml package back when that was what everybody used. It looked like this:

However, after several years of using that, I’ve come to write my own. I’ll cover the infelicities about Blaze and then discuss my alternative approach.

Reading back through what I’ve written below, it could be read as a bit attacky, and some of the issues are less philosophical and more incidental. I think of it more that the work on writing HTML in a DSL is incomplete and to some degree people somewhat gave up on doing it more conveniently at some point. So I’m re-igniting that.

The combination of having a need to write a few HTML reports and recent discussions about Blaze made me realise it was time for me to come at this problem a-fresh with my own tastes in mind. I also haven’t used my own approach much, other than porting some trivial apps to it.

Blaze

Names that conflict with base

The first problem is that Blaze exports many names which conflict with base. Examples:

div, id, head, map

The obvious problem with this is that you either have to qualify any use of those names, which means you have to qualify Blaze, and end up with something inconsistent like this:

H.div ! A.id "logo"$"…"

Where H and A come from importing the element and attribute modules like this:

importqualified Text.Blaze.Html5 as H
importqualified Text.Blaze.Html5.Attributes as A

Or you don’t import Prelude and only import Blaze, but then you can’t do a simple map without qualification.

You might’ve noticed in the old xhtml package that thediv and identifier are used instead. The problem with using different names from the actual things they refer to is that they’re hard to learn and remember, both for regular Haskellers and newbies coming to edit your templates.

Names that are keywords

This is a common problem in DSLs, too. In Blaze the problem is: class or type (perhaps others I don’t recall). Blaze solves it with: class_ or type_

Again, the problem with this is that it is inconsistent with the other naming conventions. It’s another exception to the rule that you have to remember and makes the code look bad.

Conflicting attribute and element names

There are also names which are used for both attributes and elements. Examples are style and map. That means you can’t write:

Additionally, this instance is too liberal. You end up getting this warning:

A do-notation statement discarded a result of type GHC.Prim.Any

Suppress this warning by saying _ <- "Example" or by using the flag -fno-warn-unused-do-bind

So you end up having to write in practice (again, taken from a real Blaze codebase by one of the authors):

void "Hello!"

Which pretty much negates the point of using IsString in the first-place. Alternatively, you use -fno-warn-unused-do-bind in your module.

Working with attributes is awkward

The ! syntax seems pretty convenient from superficial inspection:

link ! rel "stylesheet"! type_ "text/css"! href "screen.css"

But in practice it means you always have the same combination:

div ! H.class_ "logo"$"…"

Which I find—personally speaking—a bit distasteful to read, it’s not far from what we saw in the old xhtml package:

thediv ! [theclass "logo"] <<"…"

Did we really save that much in the attribute department? Operators are evil.

But mostly presents an editing challenge. Operators like this make it tricky to navigate, format in a regular way and do code transformations on. All Haskell code has operators, so this is a general problem. But if your DSL doesn’t actually need these operators, I consider this a smell.

Attributes don’t compose

You should be able to compose with. For example, let’s say you want to define a re-usable component with bootstrap:

container inner = div ! class_ "container"$ inner

Now you can use it to make a container. But consider now that you also want to add additional attributes to it later. You can do that with another call to with:

Apart from the import backflips you have to do to resolve names properly, you have at least three imports to make just to render some HTML. Call me lazy, or stupid, but I never remember this deep hierarchy of modules and always have to look it up every single time. And I’ve been using Blaze for as long as the authors have.

Transforming

A smaller complaint is that it would sometimes be nice to transform over another monad. Simplest example is storing the read-only model information in a reader monad and then you don’t have to pass around a bunch of things as arguments to all your view functions. I’m a big fan of function arguments for explicit state, but not so much if it’s the same argument every time.

No Show instance

It would be nice if you could just write some markup in the REPL without having to import some other modules and wrap it all in a function just to see it.

Lucid

My new library, Lucid, attempts to solve most of these problems.

Naming issues

Firstly, all names which are representations of HTML terms are suffixed with an underscore _:

p_, class_, table_, style_

No ifs or buts. All markup terms.

That solves the following problems (from the issues described above):

Names that conflict with base: div_, id_, head_, map_, etc.

Names that are keywords: class_, type_, etc.

Conflicting attribute and element names: solved by abstracting those names via a class. You can write style_ to mean either the element name or the attribute name.

Inconsistency is difficult and ugly: there’s no inconsistency, all names are the same format.

No import problems or qualification. Just write code without worrying about it.

How it looks

Plain text is written using the OverloadedStrings and ExtendedDefaultRules extensions, and is automatically escaped:

Duplicate attributes are composed with normal monoidal append. Note that I added a space in my definition of container anticipating further extension later. Other attributes might not compose with spaces.

Unceremonious

Another part I made sure was right was lack of import nightmare. You just import Lucid and away you go:

If I want to do more advanced stuff, it’s all available in Lucid. But by default it’s absolutely trivial to get going and output something.

Speed

Actually, despite having a trivial implementation, being a real monad and a monad transformer, it’s not far from Blaze. You can compare the benchmark reports here. A quick test of writing 38M of HTML to file yielded the same speed (about 1.5s) for both Lucid and Blaze. With such decent performance for very little work I’m already ready to start using it for real work.

Summary

So the point of this post was really to explain why another HTML DSL and I hope I did that well enough.

The code is on Github. I pushed to Hackage but you can consider it beta for now.

]]>Thu, 20 Nov 2014 00:00:00 UThttp://chrisdone.com/posts/lucidFast pagination on PostgreSQLhttp://chrisdone.com/posts/postgresql-pagination
UPDATE: 2014-11-19: Some people asked me how much creating an index on event(channel,id) helps. Answer: not much.

During the implementation of IRCBrowse I discovered that Postgres’s built-in offset is not very fast.

Of course, sending a request in your browser will take longer due to the connection overhead and assets, but generally the goal was for it to be very snappy. The old ircbrowse.com (by another individual, who kindly let me have the name) was very slow indeed. You’d see the page loading the data incrementally from the database.

Here is my package, shell-conduit. It’s still in the experimental phase, but I don’t forsee any changes now for a while.

Bash is evil

I hate writing scripts in Bash. Until now, it was the easiest way to just write unix scripts. Its syntax is insane, incredibly error prone, its defaults are awful, and it’s not a real big person programming language.

Perl/Python/Ruby are also evil

If you’re going to go as far as using a real programming language, why bother with these dynamically typed messes? Go straight for Haskell.

Like a glove

I had an inkling a while back that conduits mirror the behaviour of bash pipes very well. I knew there was something to the idea, but didn’t act on it fully for a while. Last week I experimented somewhat and realised that the following Haskell code

source $= conduit $ sink

does indeed accurately mirror

source|conduit> sink

And that also the following

(do source
source $= conduit)
$$ sink

is analogous to

sourcesource|conduit

We’ll see examples of why this works later.

I must Haskell all the things

Another trick I realised is to write some template Haskell code which will calculate all executables in your PATH at compilation time and generate a top-level name that is a Haskell function to launch that process. So instead of writing

run "ls"

you could instead just write

ls

There are a few thousand executables, so it takes about 10 seconds to compile such a module of names. But that’s all.

Again, we’ll see how awesome this looks in a minute.

Modeling stdin, stderr and stdout

My choice of modeling the typical shell scripting pipe handles is by having a type called Chunk:

typeChunk=EitherByteStringByteString

All Left values are from stderr. All Right values are either being pulled from stdin or being sent to stdout. In a conduit the difference between stdin and stdout is more conceptual than real.

When piping two commands, the idea is that any Left values are just re-yielded along, they are not consumed and passed into the process.

A process conduit on chunks

Putting the previous model into practice, we come up with a type for launching a process like this:

Returning to our mass name generation

Let’s take our earlier work of generating names with template-haskell. With that in place, we have a process conduit for every executable in PATH. Add to that variadic argument handling for each one, we get a list of names like this:

String types

If using OverloadedStrings so that you can use Text for arguments, then also enable ExtendedDefaultRules, otherwise you’ll get ambiguous type errors.

{-# LANGUAGE ExtendedDefaultRules #-}

But this isn’t necessary if you don’t need to use Text yet. Strings literals will be interpreted as String. Though you can pass a value of type Text or any instance of CmdArg without needing conversions.

Using it for real scripts

So far I have ported a few small scripts to shell-conduit from Bash and have been happy every time. I suck at Bash. I’m pretty good at Haskell.

The next test is applying this to my Hell shell and seeing if I can use it as a commandline shell, too.

My friend complained that having to quote all arguments is a pain. I don’t really agree that this is bad. In Bash it’s often unclear how arguments are going to be interpreted. I’m happy just writing something predictable than something super convenient but possibly nonsense.

Summary

I set out a week ago to just stop writing Bash scripts. I’ve written a bunch of scripts in Haskell, but I would still write Bash scripts too. Some things were just too boring to write. I wanted to commit to Haskell for scripting. Today, I’m fairly confident I have a solution that is going to be satisfactory for a long while.

In this post I’m going to use the word “style” to refer to the way code is printed in concrete terms. No changes in the code that would yield a different syntax tree are considered “style” here.

What’s the deal with code style?

Code style is important! If you’re a professional Haskell programmer, you’re working with Haskell code all day. The following things are affected by the style used:

How easily it can be manipulated with regular editors: the more code is laid out in a way that prevents you from readily using your normal editor functions, the less efficient you are.

How well general tooling works with it: do diff and things like that work well?

How easily you can absorb the structure: do you have to spend time hunting around for the start and end of syntactical nodes? Can’t see the tree for the forest?

How quickly you can write it: can you just write code or do you have to spend time thinking about how it’s going to be laid out before writing, or editing the layout afterwards?

How aesthetically offended you are1: does the code you’re looking at assault your sense of beauty?

Code style is important! Let’s have a code style discussion. I propose to solve it with tooling.

Is this really an issue, though?

Okay, so I’m one guy with a bee in his bonnet. Let’s do a quick Google and see what others are saying in this StackOverflow question:

Could someone provide a link to a good coding standard for Haskell? I’ve found this and this, but they are far from comprehensive.

The following points refer to style:

Format your code so it fits in 80 columns. (Advanced users may prefer 87 or 88; beyond that is pushing it.)

Put spaces around infix operators. Put a space following each comma in a tuple literal.

Prefer a space between a function and its argument, even if the argument is parenthesized.

Use line breaks carefully. Line breaks can increase readability, but there is a trade-off: Your editor may display only 40–50 lines at once. If you need to read and understand a large function all at once, you mustn’t overuse line breaks.

When possible, align – lines, = signs, and even parentheses and commas that occur in adjacent lines.

Even the Haskell community is not immune to long, protracted debates about tabs vs spaces. That reddit submission has zero points. That means it’s very controversial. The submission also has 117 comments. That means people are very vocal about this topic. That’s because bike-shedding is inversely proportional to the triviality of the debated thing. We know that.

Everyone has their own style

So then let’s make a tool with a select number of styles, you might say. The problem is that people don’t even use the standards that exist out there. They used slightly varying versions. Ask any Haskeller what style they use, and they will say “mostly like X, but with some differences.”

For the most part the style guides that I have added above (and the tools provided) mirror my own style guide (or perhaps my guide mirrors them). However, there is one item of style that particularly annoys me on a regular basis. […]

Can’t we just use structured editors?

Some more optimistic folk out there might suggest, perhaps, we should just throw away text files and go straight for structured code databases. Put all this formatting nonsense behind us. Layout is just a stylesheet! It’s not data to be stored in a file!

Maybe so. But nobody is using structured editors yet.

A practical way forward

Taking all of the above into consideration, here is my approach at the problem. The hindent library and program. Styled on GNU indent, the intention is that you simply run the program on some source code and it reformats it.

It contains authorship metadata. It holds an initial state which can be used during printing. Most importantly, it has a list of extenders. Means to extend the printer and change its behaviour on a node-type-specific basis.

(It runs in a Printer type to store some state about the current column and indentation level, things like that.)

Now, we implement the prettyInternal method for all our types. But when implementing instances, we never use the prettyInternal method directly. Instead, we use another function, pretty, which can be considered the main “API” for printing, like before. Here’s the type:

pretty :: (Pretty ast) => ast NodeInfo->Printer ()

We’ll look at its implementation in a moment. Here is an example instance of Pretty:

Both constructors are rank-n. Both accept the current state as an argument and the current node. The Extender constructor is Prismesque. It’s existential, and lets you say “I want things of this type”. The CatchAll will just accept anything.

All that adds up to me being able to do something like this. Here’s a demo style:

Write your own style!

I welcome anyone to write their own style. All styles are based upon the fundamental style, which should never change, by extending it. You can base yours off of another style, or start from scratch. While you can keep your style locally, like your XMonad configuration, it’s encouraged to contribute your style as a module.

My style and Johan’s are quite different. But yours may be similar with small tweaks. Another distinctly different style is Simon Peyton Jones’s with explicit braces. This is a style you can implement if you want it.

Preserving meaning

A recommendation is to preserve the meaning of the code. Don’t make AST changes. Like removing parentheses, changing $ into parens, moving lets into wheres, etc. You can do it, but the results might surprise you.

Editing advantages

Having implemented Emacs support, I have been using this for a few weeks on my own code. It’s amazing. I’ve all but stopped manually making style changes. I just write code and then hit C-c i and it almost always does exactly what I want.

It can’t always do what I want. It has simple, predictable heuristics. But even when it doesn’t do what I want, I’ve so far been okay with the results. The advantages vastly outweigh that.

Remaining problems

I need to write a test suite for the fundamental style (and maybe the others). This isn’t hard, it’s just a bit laborious so I haven’t gotten round to it yet.

There’re some parts of the AST I haven’t finished filling out. You can see them which are marked by FIXME’s. This just means if you try to format a node type it doesn’t know how to print, you’ll get a message saying so. Your code won’t be touched.

Comment re-insertion is a little bit of a challenge. I have a decent implementation that generally preserves comments well. There’re some corner cases that I’ve noticed, but I’m confident I have the right change to make to clear that up.

The fundamental printer is fast. My personal ChrisDone style is slower, due to its heuristics of deciding when to layout a certain way. It took 6s to layout a complex and large declaration. I updated my style and brought it down to 2s. That’s okay for now. There are always speed tricks that can be done.

There are of course issues like whether HSE can parse your code, whether you have #ifdef CPP pragmas in your code, and things like that. That’s just part of the general problem space of tools like this.

Remaining ideas

Currently you can only reformat a declaration. I don’t yet trust (any) Haskell printer with my whole file. I invoke it on a per-declaration basis and I see the result. That’s good for me at the moment. But in the future I might extend it to support whole modules.

Implement some operator-specific layouts. There are some operators that can really only be laid out in a certain way. Control.Applicative operators spring to mind:

Foo<$> foo
<*> bar
<*> mu

This can easily be handled as part of a style. Other considerations might be strings of lens operators. I’ve noticed that people tend not to put spaces around them, like:4

foo^.bar.^mu.~blah

There’s also alignment, which is another possibility and easily implemented. The challenge will be deciding when alignment will look good versus making the code too wide and whitespacey. In my own style I personally haven’t implemented any alignment as it’s not that important to me, but I might one day.

Summary

Hopefully I’ve motivated the case that style is important, that formalizing style is important, and that automating it is practical and something we should solve and then move on, redirect our energy that was wasted on manually laying things out and debating.

The Holey type

This is my version of the HoleyMonoid. To make this into a useful package I changed a few things.

The Category instance implied a name conflict burden with (.), so I changed that to (%):

(%) ::Monoid n =>Holey n b c ->Holey n b1 b ->Holey n b1 c

Rather than have the name-conflicting map function, I flipped the type arguments of the type and made it an instance of Functor.

Printers

There is an array of top-level printing functions for various output types:

-- | Run the formatter and return a lazy 'Text' value.format ::HoleyBuilderText a -> a
-- | Run the formatter and return a strict 'S.Text' value.sformat ::HoleyBuilderS.Text a -> a
-- | Run the formatter and return a 'Builder' value.bprint ::HoleyBuilderBuilder a -> a
-- | Run the formatter and print out the text to stdout.fprint ::HoleyBuilder (IO ()) a -> a
-- | Run the formatter and put the output onto the given 'Handle'.hprint ::Handle->HoleyBuilder (IO ()) a -> a

All the combinators work on a lazy text Builder which has good appending complexity and can output to a handle in chunks.

Retrospective

I’ve been using formatting in a bunch of projects since writing it. Happily, its API has been stable since releasing with some additions.

It has the same advantages as Parsec. It’s a combinator-based mini-language with all the same benefits.

]]>Sat, 20 Sep 2014 00:00:00 UThttp://chrisdone.com/posts/formattingZhuangzi on Freedomhttp://chrisdone.com/posts/zhuangzi-peng
The Story of P’eng

In the northern darkness there is a fish and his name is K’un. The K’un is so huge I don’t know how many thousand li he measures. He changes and becomes a bird whose name is P’eng. The back of the P’eng measures I don’t know how many thousand li across and, when he rises up and flies off, his wings are like clouds all over the sky. When the sea begins to move, this bird sets off for the southern darkness, which is the Lake of Heaven.

The Universal Harmony records various wonders, and it says: “When the P’eng journeys to the southern darkness, the waters are roiled for three thousand li. He beats the whirlwind and rises ninety thousand li, setting off on the sixth month gale.” Wavering heat, bits of dust, living things blowing each other about – the sky looks very blue. Is that its real color, or is it because it is so far away and has no end? When the bird looks down, all he sees is blue too.

If water is not piled up deep enough, it won’t have the strength to bear up a big boat. Pour a cup of water into a hollow in the floor and bits of trash will sail on it like boats. But set the cup there and it will stick fast, for the water is too shallow and the boat too large. If wind is not piled up deep enough, it won’t have the strength to bear up great wings. Therefore when the P’eng rises ninety thousand li, he must have the wind under him like that. Only then can he mount on the back of the wind, shoulder the blue sky, and nothing can hinder or block him. Only then can he set his eyes to the south.

The cicada and the little dove laugh at this saying, "When we make an effort and fly up, we can get as far as the elm or the sapanwood tree, but sometimes we don’t make it and just fall down on the ground. Now how is anyone going to go ninety thousand li to the south!

If you go off to the green woods nearby, you can take along food for three meals and come back with your stomach as full as ever. If you are going a hundred li, you must grind your grain the night before; and if you are going a thousand li you must start getting together provisions three months in advance. What do these two creatures understand? Little understanding cannot come up to great understanding; the short-lived cannot come up to the long-lived. …

Among the questions of T’ang to Ch’i we find the same thing. In the bald and barren north, there is a dark sea, the Lake of Heaven. In it is a fish which is several thousand li across, and no one knows how long. His name is K’un. There is also a bird there, named P’eng, with a back like Mount T’ai and wings like clouds filling the sky. He beats the whirlwind, leaps into the air, and rises up ninety thousand li, cutting through the clouds and mist, shouldering the blue sky, and then he turns his eyes south and prepares to journey to the southern darkness.

The little quail laughs at him, saying, “Where does he think he’s going? I give a great leap and fly up, but I never get more than ten or twelve yards before I come down fluttering among the weeds and brambles. And that’s the best kind of flying anyway! Where does he think he’s going?” Such is the difference between big and little.

Therefore a man who has wisdom enough to fill one office effectively, good conduct enough to impress one community, virtue enough to please one ruler, or talent enough to be called into service in one state, has the same kind of self-pride as these little creatures. Sung Jung-tzu would certainly burst out laughing at such a man. The whole world could praise Sung Jung-tzu and it wouldn’t make him exert himself; the whole world could condemn him and it wouldn’t make him mope.

Extract from the story of Hsü Yu

I have no use for the rulership of the world! Though the cook may not run his kitchen properly, the priest and the impersonator of the dead at the sacrifice do not leap over the wine casks and sacrificial stands and go take his place.

Lesson

The lessons of Zhuangzi on the freedom to be yourself are related to his treatment on usefulness. It is part of the broader scope of free and easy wandering. These lessons are more familiar to the modern western sensibility than his other teachings.

]]>Mon, 04 Aug 2014 00:00:00 UThttp://chrisdone.com/posts/zhuangzi-pengTeaching: A good (and bad) examplehttp://chrisdone.com/posts/teaching
Rather than write a bunch of prose about how I think teaching should be done, this time I’ll simply comment an existing example. Needless to say, I am a proponent of the Socratic method. As I begin more work in the area of education, I’ll refer back to this page.

One day, a newbie asked on an IRC channel a Haskell question. Here is a highly editorialized and commentaried version of the conversation, correcting typos and patching out the names. I tried to also group conversations so they’re a bit less interleaved.

It doesn’t really matter who is who, but I wanted to include a real log to show that there are real examples of this all the time. It demonstrates some points about teaching that I think are important. I’ve used names for some philosophers I admire. There isn’t (really) any meaning to the assignments.

The Dialogue

The play starts out with an unknown person joining the channel and asking for help:

Huizi: Hey guys, quick question for you. I am working on expanding my Haskell skills by doing some challenges at https://www.hackerrank.com, and I am having issues with one thing that I think has got to be much easier than I am making it.

Huizi: Reading and parsing STDIN.

The canvas is prepared. It’s not clear what exactly they are having trouble with. What’s the first thing you should do? Ask them questions.

Socrates: Do you want to avoid direct spoilers?

Aristophanes: Huizi: Sure, can you give a specific example of something you’d like to do?

Questions begin. This is a good start.

Huizi: So, say I am expecting a string of "2 3", how would I convert that to a [Int]?

A very specific, easy to investigate question is proposed. What’s the best approach to proceed? First, establish the person’s experience.

Aristophanes: map read . lines

Aristophanes: Err.

Laozi: map read . words <$> readLine

Confucius: You mean words.

Confucius: And getLine, not readLine.

Socrates: Huizi, scratch that question about direct spoilers. The answer’s been given verbatim.

Sadly, the response is not helpful. It is unempathic, presumptuous and muddled. Naturally, the learner is confused by this noise.

Huizi: What is the . in there for?

And their struggle continues:

Confucius: Huizi: Function composition.

Laozi: :t (.)

Minerva: (b -> c) -> (a -> b) -> a -> c

So far, the poor learner has been subjected to line noise. But it doesn’t go unnoticed:

Socrates: Hmm, newbie handling in here isn’t as awesome as back in 2008. Rather clumsy. [winking]

Socrates: Huizi, how much Haskell do you know already?

Amusingly, nor does the criticism, starting a separate meta-discussion:

Laozi: Socrates: What exactly is not satisfactory to you?

Confucius: Socrates: I doubt this is actually the puzzle Huizi is trying to solve. [tongue in cheek]

The important reply of the earlier question comes:

Huizi: I have ran through a lot of sample projects, involving list manipulation.

Huizi: So, I would still consider myself a noob.

In other words, this is a complete Haskell newbie who needs to read more material, or, if they are to be taught here, it should be done with care and patience.

Oddly, the spoilering continues. The learner has already stated that they’re going through exercises. The channel is persisting in trying to spoil the learning process and it does so clumsily. The meta-discussion continues:

Socrates: Laozi, the part where you give a complete newbie some spoiler code that mixes function composition with Applicative operators? [wink]

Confucius: Yeah, if I was going to use fmap, I’d just write fmap here.

Confucius: I only ever use <$> if I’m also using <*>.

The learner expresses bad insights typical of someone unfamiliar with a topic:

Huizi: But I can work with the language. Just anything with reading input or output just feels unnatural in the language.

Huizi: But that’s likely because it is unnatural for functional programming languages.

Confucius: Huizi: Eh, it’s not so bad, you just need to get used to it.

A strange selfish approach to teaching is expressed in reply to criticism in the meta-discussion:

Laozi: Socrates: They are free to ask for more help, I don’t like starting off assuming no knowledge, it’s too much work for me.

Confucius: Huizi: A do-block can be used to glue a bunch of IO actions together into a larger one. Inside a do-block, v <- x means “execute the action x, and call its result v”. If x has type IO t, then v will have type t.

Confucius: Huizi: In the end, main will be one of these IO actions, and will be executed in a compiled Haskell program.

The learner soldiers on trying to solve their actual problem:

Huizi: Yeah, I was trying to use read, I just kept getting errors.

More lecturing:

Confucius: getLine :: IO String is an IO action which when executed, will request a line of text from stdin, and produce that as its result.

Actual interaction with the learner tries to make some headway:

Socrates: Huizi, what was your use of read like? You can type small Haskell expressions in here by prefixing it with “>”.

Socrates: > 1 + 2

Minerva: 3

Lecture still continues:

Confucius: Huizi: It’s important to distinguish execution (of IO actions) from evaluation, which is the process of reducing expressions to values in order to be able to match them against patterns.

Confucius: When people say that Haskell is pure or referentially transparent or stateless or whatever, they’re talking about evaluation.

The learner is now trying to diagnose their problem and voicing their discoveries:

Huizi: Looks like the main issue I was having is that I was using splitOn ' ', instead of words.

This is the first obvious clue to a misstep the newbie has made. Can you see their mistake? If not, you’ll learn something, too. Now the task is to help them realize it. Meanwhile, a little bit more meta discussion:

Socrates: Confucius, does your approach of throwing stuff at people and seeing what sticks without any feedback normally work? It seems strange to me.

Confucius: Socrates: I am hoping for feedback here.

The learner is starting to reason through the issue more:

Huizi: splitOn must have a different output type.

Socrates: Huizi, what’s the type of splitOn? Is that from Data.List.Split?

Huizi: Yeah.

Socrates: :t splitOn– Does lambdabot have it in scope?

Minerva: Eq a => [a] -> [a] -> [[a]]

Socrates: Aha!

A strange lecture-only perspective is professed, which I’ll address later:

Confucius: Socrates: But I figure I might as well put everything on the table even if I get none.

Confucius: Huizi: Does that stuff make sense?

Attempting to continue with the issue at hand, a play is made to try to help the learner see what’s wrong:

Socrates: Huizi, you can still use splitOn for this problem (even though words might be more convenient), can you see the change to make in your call?

Huizi: :t words

Minerva: String -> [String]

Socrates: Huizi, check this out:

Socrates: :t ' '

Minerva: Char

Socrates: :t " "

Minerva: [Char]

But the learner’s still confused, but asking questions, which is good:

Huizi: So basically I end up with an output of [[Char]] with words, and [[a]] with splitOn. Right?

Huizi: But since the input to splitOn is from the getLine, does it not consider itself a [Char] type?

Huizi: Ok, so I am splitting the list wherever it finds the sequence in that list.

Socrates: Yup.

Huizi: > splitOn [2,3] [1,2,3,4,5,6]

Minerva: [[1],[4,5,6]]

Huizi: Perfect, thanks guys.

Socrates: Welcome!

End of conversation.

]]>Sat, 19 Jul 2014 00:00:00 UThttp://chrisdone.com/posts/teachingAn alternative Haskell home pagehttp://chrisdone.com/posts/haskell-lang
I started a couple months back an alternative home page for Haskell. It is a work in progress, but as I work on it every so often I push changes to it.

What’s wrong with haskell.org?

haskell.org isn’t doing a satisfactory job for me as a place to impress people with Haskell and to guide them into using it.

Its design is broken or just strangely put together and it’s not responsive.1

There are too many links on one page which indicates indecision about priorities and a lack of user story or particular target audience. Who’s this site targeting? Where are they supposed to go next? Is it answering the right questions?

Also, the uncoordinated effort of the wiki misleads people and pages begin to bitrot. There are too many to vet.

Why not fix haskell.org?

The current home page is historically resistant to change, technologically and socially. My relationship to haskell.org over the years has been one of stonewalling when requesting access, of slow replies, and of bike-shedding and nitpicking when proposing designs. A camel is a horse designed by committee and haskell.org is such a horse.

So your plan is?

The plan goes like this:

The first part of the plan was to survey existing programming language web sites.

There are good points and bad points for each one, but I came up with a set of things that are common among all, and a couple additional points I came up with:

A theme

Logo

Menu

Download

Community

Documentation

News

Visual things

Opening paragraph

Code sample

Thumbnails of videos

Pictures of community stuff; human beings

Screenshots

Selling points

News

Twitter/feeds

Supporters / sponsoring companies

Other links

Application areas / success stories

Language options (locale; Japanese, German, etc.)

Existing crop of language home pages

If you’re interested in my review of each home page, here’s what I wrote:

F#’s is boring, it has no character, features no code samples. But it does have a bunch of company backing logos.

Ruby’s is among the best. It has character. It has two navigations, which is bad,2 but otherwise it’s perfect. Otherwise, my only criticism is that it overemphasizes news which most new people don’t care about and which Rubyists get via other sources.

Python’s, like Ruby’s, is good. It has character. It has code samples. But it’s worse than Ruby in that it has four areas of navigation. The top bar, the second bar, the third call to action section, and finally the footer. Each of which has a different subset of total places of interest. Again, it uses space presenting news items. However, I particularly like the section which shows Web Programming, GUI Development, etc. and then next to each the library one would use to accomplish that task. That’s very practical and speaks positively about the language and community.

OCaml’s is not bad either. It has a deserty theme giving it its own character. It suffers from link overload, which implies it might’ve been copying Haskell’s or Python’s home pages.

Go’s home page is notable for its embedded try feature, something which I’ve wanted Haskell’s home page to have for a long time. It’s also got a very simple and straight-forward navigation. The logo/mascot is in there, giving the whole page a bit of fun character, too. While not much to look at, unresponsive to device, clearly written by a pragmatist systems person, it has a lot going for it and is in my mind among the best I’ve looked at.

For Perl’s homepage, I’ll echo similar complaints as before. Link overload. It’s a rather lazy way to make a home page. Let’s throw in as many links as we can and hope people read it all and by chance find what they’re looking for. Oh and to fill out the page, let’s add recent uploads (who cares?) and news items (again, who cares?). Finally, it has no character whatsoever. It has the awful O’Reilly pen drawing of a random animal that’s supposed to embody the character of the language, but is meaningless. I probably dislike this one the most, a close tie with F#’s.

Scala’s is very trendy and pretty. It’s got a lot of events and training info which, along with the header mountains, gives it a very communal and active fresh feel. Again, echoing the praise of Go’s page, it has very simple navigation. One navigation at the top, and then two big buttons for the most common tasks. After that, like Python’s home page, there’s a good review of features of the language that make this language more interesting than the next language. I give credit to this page for visual inspiration.

Clojure suffers a little bit from linkitis, too. It has three menus and then a page full of links. It has zero code samples on the first page you land on. But it is clean and has a certain character to it.

Generally, I’m not sure why sites bother with search boxes. Unless they’re implementing code-aware searches, Google will be faster and more accurate every time. As Joel Spolsky says of his StackOverflow, Google is the user interface to SO.

Regarding linkitis, I will quote Don’t Make Me Think that a user will happily click three links to narrow down what they want, than to have to think and search around a page to find what they want, if they have the patience for it.

The audience

The audience is newbies. People who use Haskell don’t go to haskell.org. They go to Hackage, or they google search and find wiki entries, or the GHC manual. A home page shouldn’t cater to Haskellers, it should cater to would-be Haskellers.

Naysayers for an attractive home page say things like “we don’t want superficial people joining the community” (as if they could just learn Haskell on a whim!), but forget that people live insular lives. There are millions of people out there who’ve never heard of Haskell. If a random web user stumbles upon it and is taken by the look of it, what are they going to do with it? Share it. How did you first hear of Haskell? I was told about it by a friend.

To decide on the kinds of things I want to see on a landing page when I first look at a language I’m unfamiliar with I ask a bunch of common questions. I’ve condensed them all in the user stories section.

The theme

I’ve always liked the purple and green of the old Haskell logo. I don’t know why gray/sky blue ended up being used for the new logo. So I decided I’d keep that purple theme and made some mockups. Purple is a cool color.

User stories

The user stories I’ve identified have been encoded in the main navigation:

A user just wants to try Haskell. They scroll to ‘Try it’ and, well, try it. There can be links to further sites like Try Haskell, School of Haskell, Code Pad, Lambdabot, services like that.

A user wants to download Haskell. They click ‘Downloads’. What particular file they want to download doesn’t matter. It could be GHC, it could be the Haskell Platform, it could be some packages. If they want to download something, they go to Downloads.

A user wants to interact with/find community. They click ‘Community’. On that page is a list of various community places of interest, which may itself be expanded with videos and things like that.

A user wants to get information. They click ‘Documentation’. That means books, reports, papers, tutorials.

A user wants to catch up with what’s new in general, with Haskell. They click ‘News’ and there can be an RSS feed available on that page. Haskell News is mostly suitable to this task.

I’ve also made a little page to render wiki pages from haskell.org. There is a simple request sent to haskell.org for /wiki/* pages, it parses the Wiki syntax with pandoc and renders it to HTML, at least for the pages that MediaWiki is kind enough to serve. Example: Handling errors in Haskell

Note that MediaWiki is a bit stunted in the data it exposes for use. Some pages just aren’t available, others produce invalid XML, etc. This is why the wiki is not exposed in the navigation.

I’m not sure about exposing the wiki directly, but rather some selected vetted pages, perhaps.

Going forward

I still have to:

Fill in the Try support

The features copy

Examples for each of said features

A list of video thumbnails to appear under the community banner (as in the comp)

Upcoming/past events

At least 5 examples for the header code

Add books & manuals to the Documentation tab

I’m happy with the look and feel and organization. Now is the matter of filling it with useful things. That’ll take about a month, by weekend/spare-time development. Once that’s done, it will be ready to link to newbies. I’ll have a link to be proud of when people bring up Haskell.

I could solicit the community for contributions via pull requests. It depends on people approving of the project and my approach. So if you’re reading this and you accept my design and organization3 and would like to contribute content (content pages are written in markdown), then pull requests to the github repo would be most handy. I will merge your changes and redeploy with relative speed.

In particular, content in wanting which is not straight-forward for me to produce:

About 5 examples of concise, short Haskell code which can sit in the header. Ideally, each example can be clicked and it will take you to a markdown page under an Examples hierarchy that explains how the code works.

The features section needs to be filled out with content. I’m not entirely sure that the headers are decent, but I’m pretty sure they’re a good start.4 Pages for each of those which contain example code of real problems that are solved are needed.

I won’t be able to actively work on this for a few days, but I can do bits and bobs here and there on the weekend and I always have time to merge straight-forward changes.

Questions/comments, feel free to email me: chrisdone at gmail dot com Put a note in the email if you wish to be CC’d with other people in the discussion.

When I open haskell.org on my phone, I see the tablet-sized layout with tiny text. The layout goes wonky on the tablet version.↩

That means that you won’t nitpick design decisions, bike shed about the theme, organization, choice of picture in the landing page, etc.↩

Maybe type-classes and monads might be of interest because both where pioneered by Haskell and, at least in their native support, are both peculiar to Haskell.↩

]]>Thu, 29 May 2014 00:00:00 UThttp://chrisdone.com/posts/haskell-langPresentations updatehttp://chrisdone.com/posts/presentations-update
Just a small update. I took 15 mins and updated the haskell-mode printer a bit so that everything is indented by default, and lists are expanded as [1,2,…] rather than 1:(2:…).

Andrew Gibiansky contacted me about getting a front-end added for IHaskell, which would be lovely! I designed the present package specifically aimed at working on Emacs or the browser or wherever. So I sent him back an excessively long email about how to integrate it.

It might also be worth adding to tryhaskell, too. It’d be rather easy and helpful to newbies.

I heard about this from John Wiegley a while ago, but every time I recall it, I can’t remember how it goes, so I thought I’d write it down for myself. I think there’s a paper about it, but I can’t find it. Hopefully I’m recalling it correctly.

The Identity monad trick: Let’s say I want to expose an API that lets you work with a data structure. I want you to be able to keep hold of that data structure and pass it back into my library, and I’ll give it back to you later and we can go back and forth.

But I don’t want you to actually give you the data structure freely so you can go and give it to your friends. So instead I force you into the Identity monad, via a newtype wrapper that only I can unpack.

Note that the whole type of this expression is Secret Text. You still don’t have the secret, you’ve got a computation over it.

You’ve used the value, but it never escaped1 the actual Identity monad. It’s like I’m giving you the value, but I’m also not giving you the value.

As always, there’s a difference between “secure against your own stupidity” and “secure against attackers.” For the former, this is satisfied.

For the latter, bottom complicates it, so you should force it in the IO monad and catch any exceptions e.g.

extract :: Secret a -> IO (Maybe a)

This prevents people from using

(v >>= \a -> error ("The value is " ++ show a))

To try to get around it.

unsafePerformIO and other non-standard Haskell can get around it, but if you’re defending against developers, you probably have control over the environment, so you can just white-list the imports and extensions and there’s nothing they can do. This is what tryhaskell (via mueval) does.↩

Problem

I was working with haskell-names the other day. Its data types are nice enough, but are rather unweildly to read in the REPL when debugging and inspecting. This got me thinking about inspection and printers for Haskell data structures again.

I’ve made several approaches for to haskell-mode in the past.

One which requires parsing the output of Show with Haskell and then printing that to s-expressions for Emacs to consume. This is generally unreliable and hacky.

Then I settled with making the REPL just syntax highlight the output. That generally works flawlessly and is an okay solution.

Then I really wanted collapsing support again, so I implemented one based on Emacs’s awareness of expression boundaries (of ( ) and { } and " " etc.). Simple. Kind of reliable.

Today I implement yet another one, but this one I like best. I’ve always wanted to have a Haskell printer that can evaluate on demand, piece-wise, taking care not to evaluate the whole structure too eagerly. I should be able to type [1..] into my REPL and not be spammed by numbers, but rather to expand it at my leisure.

Implementation

My plan was to use the Data.Data API to traverse data structures breadth-first, display to the user something like Just … and then allow the user to continue evaluating on request by clicking the … slot.

I chatted with Michael Sloan about it and we came up with a simple experimental design and thought it would be a nice idea. We hypothesized a nice class-based way to provide custom presenters for your types, so that e.g. a Diagram could be rendered as a bitmap inline with the rest of the data structure, but that needs more thinking about.

I’ve implemented a basic version of it in the present package (a la “presentations” in CLIM) and implemented a usable front-end for it in Emacs. There’s some information about the implementation in the README which you can read on Github.

Result

Yes! It works. Here is a demonstration video. Concept proven. This is definitely my favourite way so far. I will probably write a simple algorithm in Emacs to format things on separate lines, which would make it much easier to read, and I want to make strings expand to fill the screen width, but no further. But this is already an improvement.

I’ll trial it for a while, if I end up using it more often than not, I’ll make the option to make :present implicit for all REPL evaluations.

In other words, every [Tree Integer] is a placeholder that you can click to get more output.

]]>Sat, 26 Apr 2014 00:00:00 UThttp://chrisdone.com/posts/the-printer-haskell-deservesPresciencehttp://chrisdone.com/posts/prescience
]]>Fri, 25 Apr 2014 00:00:00 UThttp://chrisdone.com/posts/prescienceTypeable and Data in Haskellhttp://chrisdone.com/posts/data-typeable
Data.Typeable and Data.Data are rather mysterious. Starting out as a Haskell newbie you see them once in a while and wonder what use they are. Their Haddock pages are pretty opaque and scary in places. Here’s a quick rundown I thought I’d write to get people up to speed nice and quick so that they can start using it.1

It’s really rather beautiful as a way to do generic programming in Haskell. The general approach is that you don’t know what data types are being given to you, but you want to work upon them almost as if you did. The technique is simple when broken down.

Requirements

First, there is a class exported by each module. The class Typeable and the class Data. Your data types have to be instances of these if you want to use the generic programming methods on them.

Happily, we don’t have to write these instances ourselves (and in GHC 7.8 it is actually not possible to do so): GHC provides the extension DeriveDataTypeable, which you can enable by adding {-# LANGUAGE DeriveDataTypeable #-} to the top of your file, or providing -XDeriveDataTypeable to ghc.

Now you can derive instances of both:

dataX=Xderiving (Data,Typeable)

Now we can start doing generic operations upon X.

The Typeable class

As a simple starter, we can trivially print the type of any instance of Typeable. What are some existing instances of Typeable? Let’s ask GHCi:

Use-case 1: Print the type of something

So we can use this function on a Char value, for example, and GHCi can print it:

λ> :t typeOf 'a'
typeOf 'a' ::TypeRep
λ> typeOf 'a'Char

This is mostly useful for debugging, but can also be useful when writing generic encoders or any tool which needs an identifier to be associated with some generic value.

Use-case 2: Compare the types of two things

We can also compare two type representations:

λ> typeOf 'a'== typeOf 'b'True
λ> typeOf 'a'== typeOf ()
False

Any code which needs to allow any old type to be passed into it, but which has some interest in sometimes enforcing or triggering on a specific type can use this to compare them.

Use-case 3: Reifying from generic to concrete

A common thing to need to do is when given a generic value, is to sometimes, if the type is right, actually work with the value as the concrete type, not a polymorphic type. For example, a printing function:

char ::Typeable a => a ->String

The specification for this function is: if given an Char, return its string representation, otherwise, return "unknown". To do this, we need a function that will convert from a polymorphic value to a concrete one:

cast :: (Typeable a, Typeable b) => a ->Maybe b

This function from Data.Typeable will do just that. Now we can implement char:

The Data class

That’s more or less where the interesting practical applications of the Typeable class ends. But it becomes more interesting once you have that, the Data class can take advantage of it. The Data class is much more interesting. The point is to be able to look into a data type’s constructors, its fields and traverse across or fold over them. Let’s take a look at the class.

There aren’t any other interesting instances for this type, but we’ll look at uses for this type later. Representations (so-called FooRep) tend to be references from which you can reify into more concrete values.

Use-case 2: Inspecting a data type

The most common thing to want to do is to get a list of constructors that a type contains. So, the Maybe type contains two.

Use-case 5: Make a real value from its constructor

It’s actually possible to produce a value from its constructor. We have this function

fromConstr ::Data a =>Constr-> a

Example:

λ> fromConstr (toConstr (Nothing ::Maybe ())) ::Maybe ()
Nothing

But what do you do when the constructor has fields? No sweat. We have this function:

fromConstrB :: forall a.Data a
=> (forall d.Data d => d) ->Constr-> a

Haskell beginners: Don’t fear the rank-N type. What it’s saying is merely that the fromConstrB function determines what the type of d will be by itself, by looking at Constr. It’s not provided externally by the caller, as it would be if the forall d. were at the same level as the a. Think of it like scope. let a = d in let d = … doesn’t make sense: the d is in a lower scope. That means we can’t just write:

fromConstrB (5 ::Int) (toConstr (Just1 ::MaybeInt)) ::MaybeInt

The Int cannot unify with the d because the quantification is one level lower. It basically doesn’t exist outside of the (forall d. Data d => d) (nor can it escape). That’s okay, though. There is a type-class constraint which lets us be generic. We already have a function producing a value of that type:

Here I’m doing a little check on any field in the constructor of type Char and if it’s upper case, replacing it with !, otherwise leaving it as-is. The first trick is to use the cast function we used earlier to reify the generic d into something real (Char). The second trick is to cast our concrete Char back into a generic d type.

Just like fromConstrM earlier, if you want to operate on exact indices of the constructor rather than going by type, you can use gmapM and use a state monad to do the same thing as we did before.

Use-case 7: generating from data structures generically

Another slightly different use-case is to walk over the values of a data structure, collecting the result. You can do this with gmapM and a state monad or a writer, but there’s a handy function already to do this:

gmapQ :: forall a.Data a => (forall d.Data d => d -> u) -> a -> [u]

Trivial example:

λ> gmapQ (\d -> toConstr d) (Foo5'a')
[5,'a']

A more useful example can be found in structured-haskell-mode which walks over the Haskell syntax tree and collects source spans into a flat list. Another decent example is in the present package. There’s also an example in Fay to encode types to JSON with a specific Fay-runtime-specific encoding.

Printer example

Here’s a trivial (not very good, but something I wrote once) generic printer:

I wrote it because the GHC API doesn’t have Show instances for most of its data types, so it’s rather hard to actually inspect any data types that you’re working with in the REPL. It has instances for pretty printing, but pretty printing confuses presentation with data.

Summary

We’ve briefly covered how to query types, how to cast them, how to walk over them or generate from them. There’re other things one can do, but those are the main things. The real trick is understanding how to make the types work and that comes with a bit of experience. Fiddle around with the concepts above and you should gain an intution for what is possible with this library. See also: Data.Generics.Aliases.

Hope it helps!

I’ll migrate this to the HaskellWiki when it doesn’t look so, uh, shall we say, unattractive.↩

]]>Tue, 22 Apr 2014 00:00:00 UThttp://chrisdone.com/posts/data-typeableHaskell structured diffshttp://chrisdone.com/posts/haskell-diff
Project-request: someone please implement a program that will diff Haskell in a cleverer way than lines.

In an effort to reign in my incessant work on Haskell tooling1, I’m outlining a tool that I’d personally like and welcome people to implement it. Otherwise it serves as a motivating problem description for the next time I come around to it myself with free time.

Before anyone emails me saying “lines/words are simple, other things are hard, that’s why it’s not been done yet. People undervalue the simple solution …” with a long lecture, spare me!

The concrete diff

The concrete diff is the line-based, and sometimes character-based, diff that we all know and love. There’s no reason to throw this away. You will need to keep this as an optional backend for when you are unable to parse a Haskell file.

Pros: simple to implement. You produce the necessary lines to delete and insert to create the change from A to B.

Cons: doesn’t know about syntactic redundancy where some changes don’t mean anything, and where the actual important change occurs. For example:

But it’s clear to observe that this is not the change we made in spirit, it’s just one line-based way to achieve it. In actual fact, our do putStrLn … was moved into a case, un-changed. At this size, it’s not a big deal. When the code is more interesting, it’s important to know what was really changed, and what remains the same.

The abstract syntax diff

Enter the syntactic diff. We show the difference between two syntactic trees. How this is to be achieved in a readable way is the rub, but here are some ideas.

Now, at least at a superficial glance, you don’t even need this explained to you. You can see exactly what has happened: The code before has changed to the code after, but we can see that node2 has just moved to inside the case.

Where the trickiness arises is taking this to its logical conclusion and applying it generally. What’s displayed if you also change the string in the putStrLn? Good question. Here’s an idea:

Because the node "Write your name" has now been lost, we don’t need to reference it any longer. So one way to show that it has been removed could be to put -{…}. And then to show what replaced it, put in +{…}, a la classic diffs:

In reality this rule would insert more -{…} and +{…} than I’ve written here, but I’m writing these examples manually so take them with a grain of salt. Let’s take it further and say that the string has actually been moved. Then we should indeed give it a number to reference it later:

Again, I don’t think anybody is going to find this confusing. The node3 has moved into a where clause, which has been named greeting and referenced in place of its original place.

Am I making obvious sense, here? It’s not a particularly novel display, it states what happened syntactically, precisely. With a UI, you could expand/collapse nodes in a nested fashion or “explode” all the pieces into a flat list of numbered or +’d or -’d nodes, or just narrow down to one specific interesting expression, like

²{do putStrLn +{greeting}
name <- getLine
print name}

If you’re sufficiently nerd-sniped to find this interesting and do-able, then I invite you to go ahead and give it a go. I’d love to see a prototype. I don’t plan on implementing this in the near or distant future, so we won’t be toe stepping.

The reduced semantic diff

If you’re still reading by this point, let me try to entice you with ambitious ideas. Take the above approach, everything we just laid out, but let’s put an additional step in there: instead of diffing Haskell’s abstract syntax tree, diff the Core.

You can see that the pointless case has been removed. This is the bread and butter of Core simplification. But if I remove the case myself, the Core is exactly the same. This is redundant semantic content, which is why GHC removed it.

If someone made a change like this in a real codebase which removed some redundant semantic content, not just syntactical redundancy, your diff could show it like that. In other words, nothing important semantically actually happened here.

In fact, if I refactored a bunch of code, re-organized a bit, does my next colleague really want to read through all the syntax tree just to see the crux of what changed? Sometimes, but not always. Sometimes, they just want to see the precise thing that will change at runtime.

It might actually be insane, with big blow ups in code difference for minute high-level changes, or it might be great for teams caring about performance. Difficult to know until you try it. You can also do a source-mapping back to the original Haskell source, for a more interesting display.

If you want to implement this, I would love to see any results.

The typed diff

Okay, you’re still reading so you’re pretty easily nerd sniped. Let’s continue with the ideas. Another type of difference between two sources is the types of expressions in there. Consider:

main =let x = [1,2,3]
in print (x <> x)

Now you change the code to:

main =let x = myFancyMonoid
in print (x <> x)

Our structural diff laid out earlier will show this:

main =let x =-{[1,2,3]}
in print (x <> x)

After:

main =let x =+{myFancyMonoid}
in print (x <> x)

But actually, more things have changed here. As a result of the different monoid instance, the print (x <> x) will do something different. Maybe it’s a * rather than +, maybe it’s a number, whatever. Maybe that expression is used in a more interesting way than merely printing it. What’s the real diff?

Or something like that. I’m being hand-wavey in the display, here. The real difference is that we’ve changed the type of x. It’s an important change, which has semantic meaning. My ideas are more vague here. I haven’t thought through many scenarios of how to represent this. But one thing is clear: a diff of types can actually be useful and interesting.

The editing diff

The diffs above are all examples of “cold” diffs. Calculating the difference between two files as-is. If you’re in a structured editor like Lamdu, then you don’t have to do cold diffs and figure out and guess at what happened. You know exactly what happened. This node was raised here, this variable was renamed there, etc. But if you want to work on that, you pretty much have to work on Lamdu.

Summary

In summary I’ve intentionally listed increasingly more wacky diff ideas, from the familiar to the fairly novel. My general approach to tooling is progressive: start with the simplest working implementation then step up. Structured-haskell-mode is an example of this. It’s no Lamdu, and it’s no vanilla text-based mode. It’s a stepping stone inbetween. The impedance to try SHM is lower.

In the same way, maybe we can start with the abstract syntax diff, let people become acclimatized to it, let it stabilize, get it integrated into things like Git, and then work our way up from there.

If nobody bothers trying out these ideas, I’ll probably end up doing them myself eventually, but I thought I’d put the suggestion out there first.

In favour of writing programs that concern themselves with things other than Haskell for once!↩

A new car comes out, and it has some cool feature: Hey, it has road surface detection and changes the steering accordingly! But it should also come with all the old stuff that you come to expect. Comfy seats, seatbelts, airconditioner, heated windows, wipers, proximity detection, power steering, cruise control, etc.

With new programming languages, what you tend to get is a chassis, engine and steering wheel, and the road surface detection.

Here is a list of cool ideas that have been discovered and implemented in programming languages, but which do not in their whole make up any existing language:

]]>Sun, 30 Mar 2014 00:00:00 UThttp://chrisdone.com/posts/one-step-forward-two-steps-backReloading running code in GHCihttp://chrisdone.com/posts/ghci-reload
Something I’ve been using over the past couple weeks in a personal Yesod web site is a way to reload the code while the server is still running in GHCi. I saw in Greg Weber’s blog post about a “reload mode” for web servers and thought I’d share my approach. GHCi already supports reloading of code, it just doesn’t know it.

The problem with doing this in GHCi is always that the :load and :reload commands will clear out any bindings made in the REPL. This means that even if you start your web server in a separate thread—and it will stay running between reloads—you have no way to update or talk to it directly.

That’s why I wrote a package called foreign-store. Its purpose is to make a stable pointer to some Haskell value and store it at an index, and then keep hold of it in C. Later, it can provide that stable pointer by that index. That’s its whole purpose. Because the C code is unaffected by GHCi’s reloads, the pointers are retained, and they are not garbage collected, because that is the point of a stable pointer.

Now, with that created, it’s possible to run a web server, keep hold of the thread id, reload some code in GHCi, kill that thread and restart it. Another option is to keep an IORef of the web handler itself, and then update the IORef instead. In my use of it so far, this has worked flawlessly.

I made a demo project with a README explaining the (simple) approach. The short of it is that I can make some change to a Haskell module in my web project, hit a key (F12), and instantaneously see the browser page refresh with the new update. This is pretty much optimal for me.

It doesn’t end at web servers, of course. Any kind of long-running program that you would like to keep running while developing is fair game. For example, an IRC server. Why not run the server and also inspect the innards of its state while it’s running, and also update the message handler? I’ve done this with my Hulk IRC server before. You can inspect the data structures, query the types of things, etc. all from the REPL.1

If you want to get really funky, you can try using the continuation monad to implement Common Lisp’s restarts. Restarts are especially handy for when you’re running some long IO process and it bails out. You want to be able to correct the code and the continue from where you left off. Restarts let you do that.

I shouldn’t have to tell anyone this but just in case: don’t use this in production.

Of course, there aren’t many of us Haskellers who live in the REPL like Smalltalkers and Lispers do. Many Haskellers never even launch GHCi while developing.↩

]]>Sun, 16 Mar 2014 00:00:00 UThttp://chrisdone.com/posts/ghci-reloadAttempto Controlled Englishhttp://chrisdone.com/posts/attempto-controlled-english
Attempto Controlled English is a formally defined unambiguous language which is a subset of the English language. It’s pretty sweet.

I’ve known about it for some time, but I never fiddled with it because the standard implementation setup is rather elaborate. I wanted a nice, simple package in Haskell which would define a parser and a printer only, much like haskell-src-exts does. That way I can use ACE to parse some simple English for all sorts of purposes1, with a simple familiar API that I can peruse on Hackage. Partly it’s also a good learning experience.

So I went through the paper The Syntax of Attempto Controlled English to see whether it was comprehensive enough to write a parsec parser out of. It was! I first wrote a tokenizer in with Attoparsec and wrote some tests. From those tokens I produced a set of combinators for Parsec, then I wrote a parser. While writing the parser I produced a set of test-cases for each grammar production. Finally, I wrote a pretty printer, and wrote some tests to check that print . parse . print . parse = id.

Newbies to Haskell parsing might find it an interesting use-case because it tokenizes with Attoparsec (from Text) and then parses its own token type (Token) with Parsec. A common difficulty is to avoid parsing from String in Parsec, which most tutorials use as their demonstration.

The Hackage package is here. I find the documentation interesting to browse. I tried to include helpful examples for the production rules. You shouldn’t have to know syntax theory to use this library.

Here is an ACE sample. We can parse the sentence “a <noun> <intrans-verb>” like this:

Anything to do with vocabulary is written as <foo>. The parser actually takes a record of parsers so that you can provide your own parsers for each type of word. These words are not of interest to the grammar, and your particular domain might support different types of words.

I.e. we get back what we put in. I also wrote a HTML printer. A more complicated sentence demonstrates the output:

for each <noun> <var> if a <noun> that <trans-verb> some <noun> and <proper-name>’s <noun> <trans-verb> 2 <noun> then some <noun> <intrans-verb> and some <noun> <distrans-verb> a <intrans-adj> <noun> <proper-name>’s <noun> <adverb>.

Can be printed with

fmap (renderHtml . toMarkup) . parsed specification

and the output is:

for each<noun><var>if a<noun>that <trans-verb>some<noun> and <proper-name>'s<noun><trans-verb>2<noun> then some<noun><intrans-verb> and some<noun><distrans-verb>a<intrans-adj><noun><proper-name>'s<noun><adverb>.

The colors and parenthesizing embellishments are just to demonstrate what can be done. I’m not sure this output would actually be readable in reality.

This is a good start. I’m going to leave it for now and come back to it later. The next steps are: (1) write more tests, (2) add feature restrictions and related type information in the AST, (3) add a couple sample vocabularies, (4) implement the interrogative (useful for query programs) and imperative moods (useful for writing instructions, e.g. text-based games).

Specifically, I want to use this to experiment with translating it to logic-language databases and queries, and from that produce interactive tutorials, and perhaps experiment with a MUD-like game that utilizes it.↩

]]>Mon, 24 Feb 2014 00:00:00 UThttp://chrisdone.com/posts/attempto-controlled-englishEmacs, Notmuch and Offlineimaphttp://chrisdone.com/posts/emacs-mail
I kind of hate writing in anything other than Emacs. Especially email. Writing email in Emacs with message-mode is lovely. I get all my editing facilities and any key bindings that I want. More than that, generally managing my mail in Emacs rather than relying on what’s available in the GMail UI is desirable.

So I moved my email reading to Emacs. Here’s how I did it.

Offlineimap

First, I installed offlineimap. Second, I made a ~/.offlineimaprc configuration file:

to import all new mail from my database, which was the 19K messages. I think that took about 5 minutes.

Post-sync hook

Rather than manually running offlineimap and notmuch new all the time, instead you can put

autorefresh = 1

under your [Account] setting in .offlineimaprc. That will make Offlineimap run in one continuous process. I run it in screen for now, but I will probably add it as a system script when I’m feeling masochistic.

Another thing you can add to the [Account] section is a postsynchook:

postsynchook = /usr/bin/offlineimap-postsync

That path points to my post-sync script. It contains:

notmuch new

And then a bunch of tagging commands.

Tagging

In GMail I would organize everything with filters and tags. I like this approach, so I took the same with Notmuch. First, mailing lists skip the inbox and are tagged. For example:

In other words, “remove the inbox tag, and add the ghc-devs tag for all messages addressed to ghc-devs@haskell.org in my inbox.”

When I receive an electric bill I tag it and flag it:

notmuch tag +flagged +bill from:serviziweb@trenta.it tag:inbox

I also have some inbox skipping filters from lists or people I don’t have interest in seeing in my inbox.

Then I have 69 deletion filters on various mailing lists I never signed up for and am unable to unsubscribe from.

In all I have about 130 filters. I copied them from my GMail account and ran some keyboard macros to conver them to Notmuch’s tagging style.

Emacs

Once you have Notmuch setup, you can use notmuch.el and it works out of the box for reading and searching mail. The mode has some strange choices for its defaults, so I copied the repo with full intention for patching it heavily in the future, and I made some further configurations in a separate file.

The mode is pretty self-explanatory, it just has very silly keybindings. Otherwise it works very well.

Sending email

One thing that doesn’t work out of the box is sending mail. For this I configured my mail user agent:

I wrote Try Haskell in 2010. I didn’t really update the codebase since. It was rather poor, but had generally been stable enough. Last week I finally set some time aside to rewrite it from scratch (barring the tutorial part). In doing so I gave it a fresh design, rewrote the backend and stopped using my patched version of mueval (faster, but less stable).

Aside from reproducing the functionality, I afterwards added a new piece of functionality: typing in a function name (or composition of functions) will simply show the generalized type of that expression. This removes the need for an explicit :t and is more friendly to newbies who would rather see something useful than an error.1

After that, Bob Ippolito requested that some simple IO operations be made available, like getLine, putStrLn, etc. People have requested this in the past, but it seemed more complicated back then. This time, it seemed rather easy to support simple input/output. The console library already supports continued prompts, so the question became simply how to do safe IO.

As Haskellers worth their salt know, the IO monad is not special. Examples of a monad ADT for newbies are available23, and we know that pre-monad Haskell IO was modelled as a request/response system4 before it got updated in Haskell 1.35 Fuelled by the nerdiness factor of this, and that it was the weekend, I rose to the challenge. I knew it wouldn’t be hard, but it would be fun with a real use-case (Try Haskell’s REPL).

My constraints were that it shouldn’t be a continuation-based library, because I cannot have any state in Try Haskell. The server evaluates an expression and returns the result. No other context is kept, no process kept open, and it should return immediately. Given that it’s rather hard to serialize closures, but rather easy to serialize a list of inputs and outputs (aka responses/requests), I thought I’d go that route.

In the end I settled on an ErrorT monad over a State monad containing Input and an Output. The inputs would be stdin lines as [String]. The outputs would be stdout lines and either a final value, or an interrupt.

runIO ::Input->IO a -> (EitherInterrupt a, Output)

Whenever the expression being evaluated runs getLine, it reads from the Input state and pops that line of text off the stack. When getLine tries to read something and the stack is empty, it throws an error (of the ErrorT monad), returning the interrupt InterruptStdin. For example, here is a return value:

That resulted in the library pure-io which some thought was a joke. It supports enough of the subset of IO operations for, I think, a newbie to at least feel like they’re doing some realistic I/O. So I added it to Try Haskell! You can now run interactive commands and write/save/list files. Any file system operations you do will be saved in your browser’s local storage.

It’s really a rather nice referentially transparent IO service. Even if you run forever (getLine >>= putStrLn) :: IO (), it will run forever, but the server can be restarted inbetween. No state is stored on the server at all, it’s all in the client. All the client has to do is pass it back and forth when it communicates with the server.

I’d recommend Haskell intermediates (perhaps not newbies) to implement their own IO monad as a free monad, or as an mtl transformer, partly for the geeky fun of it, and partly for the insights.

Like “No instance for (Show (a0 -> a0)) arising from a use of …” which is frankly a useless message to print in a REPL and it’s strange that this is GHCi’s default behaviour.↩

“Monadic I/O has already become the de-facto standard in the various Haskell systems. We have chosen a fairly conservative, but extensible basic design (an IO monad with error handling),” in the changes list.↩

Yes, that means running the same computation every time from scratch, like a transaction.↩

]]>Sun, 12 Jan 2014 00:00:00 UThttp://chrisdone.com/posts/pure-io-tryhaskellDijkstra on Haskell and Javahttp://chrisdone.com/posts/dijkstra-haskell-java
In 2001, Edsger W. Dijkstra wrote a letter to the Budget Council of The University of Texas. A PDF is available here, I’ve typed it up so that everyone can read it. Sadly, the curriculum was changed to Java. Relatedly, the algorithmic language Scheme was replaced by Python in MIT’s The Structure and Interpretation of Computer Programs version 6.01.

To the members of the Budget Council

I write to you because of a rumor of efforts to replace the introductory programming course of our undergraduate curriculum the functional programming language Haskell by the imperative language Java, and because I think that in this case the Budget Council has to take responsibility lest the decision be taken at the wrong level.

You see, it is no minor matter. Colleagues from outside the state (still!) often wonder how I can survive in a place like Austin, Texas, automatically assuming that Texas’s solid conservatism guarantees equally solid mediocrity. My usual answer is something like “Don’t worry. The CS Department is quite an enlightened place, for instance for introductory programming we introduce our freshmen to Haskell”; they react first almost with disbelief, and then with envy —usually it turns out that their undergraduate curriculum has not recovered from the transition from Pascal to something like C++ or Java.

A very practical reason for preferring functional programming in a freshman course is that most students already have a certain familiarity with imperative programming. Facing them with the novelty of functional programming immediately drives home the message that there is more to programming than they thought. And quickly they will observe that functional programming elegantly admits solutions that are very hard (or impossible) to formulate with the programming vehicle of their high school days.

A fundamental reason for the preference is that functional programs are much more readily appreciated as mathematical objects than imperative ones, so that you can teach what rigorous reasoning about programs amounts to. The additional advantage of functional programming with “lazy evaluation” is that it provides an environment that discourages operational reasoning12.

Finally, in the specific comparison of Haskell versus Java, Haskell, though not perfect, is of a quality that is several orders of magnitude higher than Java, which is a mess (and needed an extensive advertizing campaign and aggressive salesmanship for its commercial acceptance). It is bad enough that, on the whole, industry accepts designs of well-identified lousiness as “de facto” standards. Personally I think that the University should keep the healthier alternatives alive.

It is not only the violin that shapes the violinist, we are all shaped by the tools we train ourselves to use, and in this respect programming languages have a devious influence: they shape our thinking habits. This circumstance makes the choice of first programming language so important. One would like to use the introductory course as a means of creating a culture that can serve as a basis for computing science curriculum, rather than be forced to start with a lot of unlearning (if that is possible at all: what has become our past, forever remains so). The choice implies a grave responsibility towards our undergraduate students, and that is why it can not be left to a random chairman of something but has to be done by the Budget Council. This is not something that can be left to the civil servants or the politicians; here statesmen are needed.

]]>Wed, 08 Jan 2014 00:00:00 UThttp://chrisdone.com/posts/dijkstra-haskell-javaEmacs users are like Terry Pratchett’s Igorshttp://chrisdone.com/posts/emacs-users-are-like-igor
Within the constraints of the Emacs environment, there are no limits. Emacs is built upon this principle; the Lisp principle. Make some small kernel of features. A substrate. Take care to make this substrate programmable (in any language, Emacs chooses Lisp), and then build the system upon it. Let users worry about future features. Some 359,000 lines of C code comprise its kernel, and 1,637,000 lines of Emacs Lisp take the rest of it.1

Similarly, the nature of The Lisp Curse is that what can be written to express any given problem is so arbitrary and free, that you are spoiled for choice, and every programmer re-invents solutions to the same problems, uncaring about sharing a common language with the world outside. The core problem is that Lisp makes programmers selfish. Giving programmers so much flexibility is inviting the alienation of other programmers; Lispers think: how can I express this problem best, for me?

That phenomenon is not a problem for Emacs. Work environments are a very personal thing. They exist to serve only one person: you. Me-me-me is a winning attitude when dealing with your environment. Maybe in the real world, out there, where you have to share programs with other people, where other people will see or even use your efforts, you have to care or take consideration about other people. Not in your environment. Not in your Emacs. It is your virtual home for at least nine hours of the day, every day.

My road to Emacs love developed slowly. I first came to it due to Common Lisp. I knew enough Lisp to get by, copy-pasting example snippets, configuring just enough to edit my environment. It felt a little bit barebones compared to the IDEs I’d used before it. Little did I know the world of functionality and configuration waiting beneath my feet.

Eventually I started patching some things here and there, writing my own hooks, little things like that. I used Emacs for a long time, just becoming proficient as a user with the keybindings and window management, before I ever wrote any Elisp. It hadn’t occured to me that writing any Elisp would ever be of interest to me. I would often shaft my .emacs configuration, and everything would break, and I wouldn’t quite know why.

Finally, I wrote my first mode. I think it was a mode for ASP. It wasn’t very good, and I didn’t fully understand everything that was going on. But it gave me some key insights. This thing isn’t just an editor, it’s really an environment all the way down. I can configure everything about this mode. And the mode consists of a bunch of functions and variables. It’s all code.

After that, it was really a sky-rocket of productivity. Eventually I would write Elisp casually in between programming on work projects. I would notice that a way of working was repetitive, or that Emacs behaved in a way I just didn’t quite like, or I simply thought of a nice thing that I could add. I’d happily spend anywhere from 30 seconds to half an hour writing some functionality to extend my editing facilities.

And it was extended for good. That amazed me, and still does. My problems are only problems for as long as I don’t notice them. Once I do, I write some Elisp to solve it, and then it’s never a problem again. In typical editors and IDEs, I simply wouldn’t even think of fixing such things, never mind actually putting my work to one side for a minute, solving them, and then going back to work again.

I’ve now written a lot of Elisp to support my development, especially with respect to Haskell. Many times, for many months at a time, over the years, I’ve been working on an experimental feature, or feature set, mode, what-have-you, and it’s been very spotty. Mostly working, but breaking a lot, interrupting my work, but with me persevering, pushing through, until that work becomes stable and quite robust through sheer usage and battle testing.

When working recently it occured to me that a lot of the functionality I depend on presently in Emacs for work is built upon my own work. I use the session/interactive-mode work for interacting with Cabal and GHCi, I use structured-haskell-mode in conjunction with that, and then atop that I use god-mode, my own Emacs input method. At one time or another in the past they have all been unusable, or flaky as hell. SHM still has a few growing pains, but is basically there.

This really reminds me of Terry Pratchett’s Igor clan. I discovered this amiable race in The Fifth Elephant. Igors are a people inspired from the typical hunchbacked Igor archetype, but in Discworld, they are also self-modifiers. Their bodies consist of mixed and matched and patched and swapped body parts among other members of their clan, of scars and self-adjustements. They are infinitely self-improving, self-experimenting. They might end up with a botched job and have to hobble around for a few days, but in the end it’s always fixable.

And they lisp.

Though, while Elisp is Emacs’s programmability language of choice, the particular language doesn’t matter much. It could be Python, JavaScript, Haskell, whatever. The key is: if most of your feature set is written in your editor’s programming language, then that editor is very programmable.↩

]]>Wed, 25 Dec 2013 00:00:00 UThttp://chrisdone.com/posts/emacs-users-are-like-igorstructured-haskell-modehttp://chrisdone.com/posts/structured-haskell-mode
For about 2 months I’ve been working on and off on an Emacs package called structured-haskell-mode.1 A full explanation and demo of the features is available on the Github page. In summary, it is a mode that offers paredit-mode2 abilities for Haskell code.

I’ve been keeping it to myself in a private Github repo, hoping to finish fleshing out the feature set3 and smooth over stability issues. In the end I decided I’d put it out there, because the base functionality is quite reliable and enough to get work done better. It does actually change how you work.

The key features that enable new ways of working are:

Cutting and pasting actually preserves the indentation of the particular syntactic node. One doesn’t have to think or care about “re-indenting” or worry about how much nesting is happening for fear of having to clean it up later. It’s so trivial now.

Typing characters or removing them will “bring along” dependent node source, meaning that re-indentation is handled automatically. This means that you can use a nice consistent Lisp style4 without caring about how you’re going to have to manually re-indent it whenever you make changes.

You now don’t have to think about indentation. You think about nesting level. To go to the right place, you use the ) keybinding to go further outwards in the node layers, and hit C-j to start a new sibling at that node level. There is no “tab cycle”. This style is 100% reliable.

Context-awareness is useful. In strings, the quote character is escaped. When hitting C-j in a list (of values, or types in a record, or a list of constructors in a data declaration), it can automatically add delimiter characters properly indented and spaced out. Something you don’t want to have to care about doing yourself.

Parentheses are actually good. The Haskell tendency to abuse $ to avoid having to manage parentheses is symptomatic of having crappy editing facilities. Managing parentheses in Haskell code is a pain, because editors don’t know about things like Haskell’s case expressions, or lambdas, or patterns, or whatever, and re-indentation is a nightmare inside parentheses. Not in this mode. Parentheses make editing a triviality rather than a chore.

The overarching theme to this whole library is to remove redundancy in your work. Stop thinking so much about layout and syntactic debt5 and appealing to the status quo6, and start just thinking about the real work you’re doing, which is plugging together programming constructs.

It is actually a rewrite of a package I wrote six months ago of the same name. That package was stable, but the code was not favourable and there were some kinks to be ironed out. The new version uses Emacs markers so structured operations fail less often.↩

Emacs users who’ve written their share of Elisp will know that paredit-mode is among the most enjoyable editing experiences out there. Strangers to this editing experience are simply missing out on the cream of the crop.↩

Stealing ideas from paredit-mode (e.g. slurping, barfing, convoluting) and coming up with my own ideas, such as operand manipulation, automagic re-indentation.↩

Syntactic debt is the energy and time you spend later on for making decisions or choices now. Feel like you’re nesting your function too deep? Better stop now or you’ll pay for it later because you’ll have to come back and collapse it down to fit within 80/120 columns! That’s a real problem when your editor sucks. When you have much better control over your code, things like that are a non-issue. Just write the code, worry about layout when you’re done. Lispers know this.↩

The status quo has to debunked incrementally, I think. The next thing to sort out is diffs. People waste their time making their code more friendly to diff engines that only know about lines. Diffs should be smart enough to know better. Expect further development in this area.↩

]]>Mon, 09 Dec 2013 00:00:00 UThttp://chrisdone.com/posts/structured-haskell-modeRecording simple GIFs for demoshttp://chrisdone.com/posts/recording-small-gifs
Sometimes you might like to record little GIF animations of your screen to demonstrate an Emacs feature you did (hey, some of you might…). For example, these. I made a wee F9 keybinding for Emacs to run this:

Replace the window id with the window id of your target window, which you can get with xwininfo -display :0.

I would execute screenshot-frame after running a command or pressing a key (it sounds painful but it’s not, and it allows you to make mistakes). The sleep call is to ensure that the buffer has finished updating. I also disabled blink-cursor-mode. Then to preview the animation so far I would use

animate -delay 35 /tmp/frames/*.png

If some frames were redundant I’d remove them. And then finally to write out a .gif I’d use

convert -delay 35 /tmp/frames/*.png out.gif

I found the whole thing quite convenient!

]]>Mon, 09 Dec 2013 00:00:00 UThttp://chrisdone.com/posts/recording-small-gifsMaking GHCi scale better and fasterhttp://chrisdone.com/posts/making-ghci-fast
A common complaint with GHCi is that it doesn’t scale well when the size of the project gets bigger. Once you hit 20, 30, 50, or 150 modules, it stops being fun anymore and you start wishing you didn’t have to wait for it to load.

I recommend enabling -fobject-code. You can enable this by running

$ ghci -fobject-code

Or by setting it in the REPL:

:set -fobject-code

If you want it on all the time, you can put the above line in a .ghci file either in your home directory or in the directory of your project.

This makes GHCi compile everything once and then use incremental recompilation thereafter. You’ll find that you can load 100-module projects and work with them just fine in this way.

After that, you may notice that loading some modules gives less type information and general metadata than before. For that, re-enable byte-compilation temporarily with -fbyte-code (:set -fbyte-code) and :load that module again, you now have fast recompilation with complete information, too.

Another tip is to use -fno-code to have really fast compilation. This also works in combination with -fobject-code. But I’d recommend using this only for type checking, not for getting useful warnings (like pattern match inexhaustiveness). So I would combine it with -fobject-code in the same way as above with -fbyte-code, and then once you’re done hacking, re-enable -fobject-code and rebuild everything.

The Waterflow Problem

In Fig. 1, we have walls of different heights. Such pictures are represented by an array of integers, where the value at each index is the height of the wall. Fig. 1 is represented with an array as [2,5,1,2,3,4,7,7,6].

Now imagine it rains. How much water is going to be accumulated in puddles between walls? For example, if it rains in Fig 1, the following puddle will be formed:

776543221

Fig. 2

No puddles are formed at edges of the wall, water is considered to simply run off the edge.

We count volume in square blocks of 1×1. Thus, we are left with a puddle between column 1 and column 6 and the volume is 10.

Write a program to return the volume for any array.

My Reaction

I thought, this looks like a spreadsheet problem, and closed the page, to get on with my work. Last thing I need right now is nerd sniping.

This is expectedly the fastest algorithm in this page, clocking in at a mean of 128.2953 us for a random vector of 10000 elements.

But I still thought my spreadsheet idea was feasible.

My approach

In a similar way to Philip Nilsson, I can define the problem as it comes intuitively to me. As I saw it in my head, the problem can be broken down into “what is the volume that a given column will hold?” That can be written like this:

volume0 = 0

volume|S|-1 = 0

volumei = min(lefti-1,righti+1)−heighti

Where left and right are the peak heights to the left or right:

left0 = height0

lefti = max(heighti,lefti-1)

right|S|-1 = height|S|-1

righti = max(heighti,righti+1)

That’s all.

A visual example

An example of i is:

776543221

We spread out in both directions to find the “peak” of the columns:

776543221

How do we do that? We simply define the volume of a column to be in terms of our immediate neighbors to the left and to the right:

776543221AXB

X is defined in terms of A and B. A and B are, in turn, are defined in terms of their immediate neighbors. Until we reach the ends:

776543221AXYB

The ends of the wall are the only ones who only have one side defined in terms of their single neighbor, which makes complete sense. Their volume is always 0. It’s impossible to have a puddle on the edge. A’s “right” will be defined in terms of X, and B’s “left” will be defined in terms of Y.

But how does this approach avoid infinite cycles? Easy. Each column in the spreadsheet contains three values:

The peak to the left.

The peak to the right.

My volume.

A and B below depend upon eachother, but for different slots. A depends on the value of B’s “right” peak value, and B depends on the value of A’s “left” value:

776543221AB

The height of the column’s peak will be the smallest of the two peaks on either side:

776543221

And then the volume of the column is simply the height of the peak minus the column’s height:

776543221

Enter loeb

I first heard about loeb from Dan Piponi’s From Löb’s Theorem to Spreadsheet Evaluation some years back, and ever since I’ve been wanting to use it for a real problem. It lets you easily define a spreadsheet generator by mapping over a functor containing functions. To each function in the container, the container itself is passed to that function.

So as described in the elaboration of how I saw the problem in my head, the solution takes the vector of numbers, generates a spreadsheet of triples, defined in terms of their neighbors—exept edges—and then simply makes a sum total of the third value, the volumes.

It’s not the most efficient algorithm—it relies on laziness in an almost perverse way, but I like that I was able to express exactly what occured to me. And loeb is suave. It clocks in at a mean of 3.512758 ms for a vector of 10000 random elements. That’s not too bad, compared to the scanr/scanl.

This is was also my first use of lens, so that was fun. The cloneLens are required because you can’t pass in an arbitrary lens and then use it both as a setter and a getter, the type becomes fixed on one or the other, making it not really a lens anymore. I find that pretty disappointing. But otherwise the lenses made the code simpler.

Update with comonads & pointed lists

Michael Zuser pointed out another cool insight from Comonads and reading from the future (Dan Piponi’s blog is a treasure trove!) that while loeb lets you look at the whole container, giving you absolute references, the equivalent corecursive fix (below wfix) on a comonad gives you relative references. Michael demonstrates below using Jeff Wheeler’s pointed list library and Edward Kmett’s comonad library:

I think if I’d’ve heard of this before, this solution would’ve come to mind instead, it seems entirely natural!

Sadly, this is the slowest algorithm on the page. I’m not sure how to optimize it to be better.

Update on lens

Russell O’Connor gave me some hints for reducing the lens verbiage. First, eta-reducing the locally defined lens l in my code removes the need for the NoMonomorphismRestriction extension, so I’ve removed that. Second, a rank-N type can also be used, but then the type signature is rather large and I’m unable to reduce it presently without reading more of the lens library.

]]>Thu, 14 Nov 2013 00:00:00 UThttp://chrisdone.com/posts/twitter-problem-loebGod-mode for Emacshttp://chrisdone.com/posts/god-mode
A month ago I blogged about ways to reduce strenuous key presses in my Emacs use. I analyzed my runs of chords in Emacs, then speculated on the merits of exclusive vs mixed editing. Since then I wrote an Emacs mode called god-mode. It’s a mode that you toggle in and out of, and when you’re in it, all keys are implicitly prefixed with C- (among other helpful shortcuts). Over all, it’s been a resounding success. A couple other people, including the author of multiple mark mode, contributed some patches. I’ve been using it for a month and have been very satisfied.

That’s not bad. I grant that my Vim fu is weak, so probably there are shorter ways to write the Vim examples. But at any rate Emacs is doing well here.

Evaluation After One Month

I’ve been using this in my Emacs turned on by default for one month. I knew I was going to stick with it after a week or so of use, it was already ingrained into how I use Emacs. Now, when I access a remote Emacs on a server or whatnot, I find that I reach for the Caps Lock key (my toggler key) in order to do an involved editing operation, only to find that it’s not there! Oh, no! I’ll have to use Ctrl for all these dull commands…

I’ve also noticed that the more tired I get with my hands towards the end of the day, the more I tend to stick in god-mode. That gives me extra mileage to finish those last things.

Retaining God Mode Exclusively

In fact in some modes it’s possible to remain entirely in God mode. In CSS mode, for example, I’m able to produce out the following:

.foo {
display: none;
}

by typing

{ .foo ↲ : d ↲ ↲

What happens there is that { prompts me for a rule and inserts { } and puts my cursor inside it. Then : prompts for a property name, which is completed with ido-mode. Then it prompts for a value. In the case of the display property, it knows there’s only a list of values available for it, and it prompts for a choice of none, block, etc. I hit ↲ to choose the default.

If I want to edit a property/value pair, I hit ; and it prompts me for the value with the input containing the existing value.

The more one is able to stay in God mode, the more the speed and convenience benefits.

The Keymapping

(This is described in the README, but including here for posterity.)

God-mode defines the following mapping:

All commands are assumed to be C-<something> unless otherwise indicated. Examples:

a → C-a

s → C-s

akny → C-a C-k C-n C-y

xs → C-x C-s

x s → C-x s

Note the use of space to produce C-x s.

g is a special key to indicate M-<something>. This means that there is no way to write C-g in this mode, you must therefore type C-g directly. Examples:

gf → M-f

gx → M-x

G is a special key to indicate C-M-<something>. Example:

Gx → C-M-x

Digit arguments:

12f → M-12 C-f

Repetition:

gfzz → M-f M-f M-f

Universal boolean argument:

uco → C-u C-c C-o

There is a key (default i - think insert) to disable God mode, similar to Vim’s i.

]]>Sat, 21 Sep 2013 00:00:00 UThttp://chrisdone.com/posts/god-modeFunctional Programming is Hard?http://chrisdone.com/posts/functional-programming-is-hard
Just a reminder to those who think imperative object oriented style is inherently easier to understand for humans, and that functional programming languages are really hard, if you look at any class of complete programming newbies trying to learn a modern language like Java or C++, you quickly realise this is false. Here’s an email I received a few years ago from a friend who was taking programming at college, verbatim:

Encapsulation, Inheritance, Class, Object

I have to define what these mean in terms of programming yet every time I research them I come across information I don’t understand. If you could lend me a hand or point me in the right direction of something a bit easier to understand that’d be great.

Just need some one to explain the terms in a simple manner, I don’t get why every time you research something they try to explain it in the most complex terms possible.

]]>Fri, 30 Aug 2013 00:00:00 UThttp://chrisdone.com/posts/camelcase-vs-underscores-vs-hyphensAnalysis of Emacs keys pressedhttp://chrisdone.com/posts/emacs-key-analysis
Here’s the deal: Emacs keybindings make my fingers hurt. I don’t think I ever experienced RSI before I started using Emacs. I guess I’ve been using Emacs for about 6 years. I’m very efficient with it. I can edit almost as fast as I can think, my fingers never need to take a break. But that efficiency comes at a steep price, I feel.

I hypothesize that chords are to blame, and that I would be happier and less achey if I used a modal set of keybindings, like in Vim, in which every key binding is a single character. Not all the keybindings (e.g. $) are a single key press, but most are.

I’ve tried evil-mode, and it’s pretty poor. It doesn’t provide a proper mapping to Emacs; hitting $ doesn’t actually execute move-end-of-line, it executes evil-end-of-line, which does not integrate with existing modes well at all. It’s catering to Vimers, but it’s not good for Emacs power users.

I suspect that I would like to have a global modal switcher that will make C- and M- implicit somehow, so that a SPC e w is equivalent to typing C-a C-SPC C-e C-w. Before sitting down to develop such a system, tackle the problem of how to start and exit the mode, and how to deal with the meta key, I thought I would collect some statistics. (And actually there are systems like sticky keys or chords for Emacs for tackling stuff like this, so it’s not a scary, new area.)

What I wanted to prove (or collect evidence for) was:

I waste a lot of energy on C- and M- commands.

Said commands happen in clusters, which would justify a modal switcher.

I already had a trivial script to print key presses for screencasts, so I modified that to also store the time and mode in the buffer, and I opened a keys.log file to which I would save the key presses for a day.

I then whipped up a script to read in those statistics and print out a summary, to (hopefully) provide evidence for the above claims.

For unique clusters, I’m doing 2.26 commands per cluster. So if I used sticky keys, or a modal switcher, it would not be a gain. E.g. C f f C vs C-f C-f is no gain, it’s actually more presses due to having to hit C again.

But in terms of non-unique clusters, there’s a gain at 3.44 commands per cluster. That means C f f f C vs C-f C-f C-f, which is one key less pressed. If I’m pressing 9218 keys for C-/M- commands, there might be a 20% decrease in key presses.

I’d love to see a similar analysis done of Vim. How often do Vim users switch from insert mode to normal or presentation mode? I will continue recording my keys for the next couple of days.

Very interesting is how much I use navigation functions. In reaction to this, I’m disabling those keybindings and switching to arrow keys. And I’ve found M-e, a more convenient binding for C-M-u. I will also stop using C-d and use DEL.

]]>Wed, 07 Aug 2013 00:00:00 UThttp://chrisdone.com/posts/emacs-key-analysisHaskell Newshttp://chrisdone.com/posts/haskell-news
As a consumer of Haskell content I neither have the time nor inclination to follow haskell-cafe and other various mailing lists, the reddits, the google+ community, planet haskell, hackage releases, twitter, youtube and whatever other submission places I haven’t heard of.

It has two views: grouped and mixed. Grouped lists all the items according to their source, and mixed lists everything in a flat stream. The mixed view polls for updates every ten minutes, so that users can leave it open in a tab a la Twitter. There is also an RSS feed, because I heard you like feeds, so I put a feed in your feed so you can subscribe while you subscribe.

I think this paints a fairly comprehensive picture of the Haskell community’s public activities. Certainly, if you want Haskell news, here is the best place online to go.

All the feeds are updated every ten minutes. All of the feeds are taken from RSS or Atom feeds, with the exception of three, which I scraped with tagsoup:

Google+, which provides no RSS feed (but they do provide an API, which I could look into if I had nothing better to do)

Twitter, which no longer provides an RSS feed (but they do provide an API, which I could look into if I had nothing better to do)

Github, which does not provide an RSS feed for language-specific project updates (I don’t know if they have an API, nor care too much)

All feed items are stored in a database forever. There are currently 17k entries from 4 months of running. Feeds are unparseable for the feed library from time to time.

]]>Fri, 26 Jul 2013 00:00:00 UThttp://chrisdone.com/posts/haskell-newsIRC Browsehttp://chrisdone.com/posts/ircbrowse
Haven’t blogged in a while, had some time to write now.

Since I last blogged, I made IRC Browse. It’s a service which allows you to browse the IRC logs of the #haskell and #lisp channels of Freenode. The logs come from tunes, and, for Haskell, they go back to 2001. I like IRC. I don’t go on it that frequently anymore, but I like to read the logs and I see it for the useful communication and coordination tool it is. I’ve always wanted a trivial way to view and share IRC logs as a service, so I made one. The source code is here.

It’s written in Haskell, using Snap, PostgreSQL for the database, and Sphinx for search. It’s fast.

I made it ages ago, really, but thought it worth blogging about once.

The IRC summary

The IRC summary is generated upon request, and reveals some possibly interesting insights into channel activity and the top contributors.

Of interest the most is the activity by year, which indicates that 2009 was the apex of the IRC channel’s activity, which has since dwindled, and appears to be continuing to dwindle: despite sustained activity, conversation generally is decreasing.

There are various hypotheses put forth for this. I speculate that

People have been moving to other channels, such as #haskell-lens, #haskell-blah, etc.

People are able to read reliable books that are now well publicized in contrast to in the past

Some very active people have moved on

Browsing

This is where the name “IRC Browse” comes from. There used to be a service at ircbrowse.com, a few years back, providing a similar browsing service. I asked the author of that old site whether I could use the name ircbrowse.net, and they approved and wished me luck.

One thing that bugged me about the old IRC Browse was the speed. It was god-awfully slow. It would take ages just to display one page of logs. What I wanted was to have a log browsing service that would be instantaneous and snappy.

After some learning with PostgreSQL, I discovered some ways to make paginating 26 million rows of a table quite fast. Simply using OFFSET/LIMIT is far too slow—takes about one second to retrieve a result. I couldn’t simply query on the IDs, because there isn’t just one channel, or one pagination type. So I created a separate table to store paging indexes. For every row of the “event” table, I created a corresponding, ordered, row in the index table. After that, it was snappy.

Another thing I discovered is that my pgsql-simple library was a little sluggish. The pages would retrieve in, say, 50ms, rather than, say, 2ms. So I switched the library to postgresql-simple and got the extremely snappy responsiveness that I wanted.

Searching

For searching I learned how to use the tool called Sphinx. It takes in a configuration and a database, and then populates a search index. From that search index, it provides very fast full text search.

I couldn’t get the Sphinx library to work with the version of Sphinx I was using at the time. I made a trivial wrapper to the command line program instead. That worked. At some point I will replace this with use of the Haskell sphinx library.

Another optimization I can do is split the indexes into #haskell and #lisp.

Profiles

Profiles give a nice way to tell when someone probably goes to sleep and is probably available. It also tells whether someone has been active lately. If they haven’t been active lately, you can check their complete history by year, and if you see it dwindling, perhaps they’re not on the IRC anymore.

There are also quotes @remember’d by lambdabot, which can be fun to read.

Importation

Importing the logs happens daily, at 10:30 UTC time. One day I might update this so that it connects to the IRC directly and updates the logs in real time. But I’m not sure it’s worth it.

Other stuff

I also did a social graph thing, but it’s not that good and I will probably remove it. There’s a word cloud, which looks pretty enough, I’ll keep that.

]]>Tue, 23 Jul 2013 00:00:00 UThttp://chrisdone.com/posts/ircbrowseFay, JavaScript, etc.http://chrisdone.com/posts/fay
A couple months back I released Fay in a preliminary stage, with a little web site of its own. I haven’t blogged about it yet, so I thought I’d do that.

Setting the scene

And lo, when God created the world, he looked at it, and saw that it was good.

When Brendan Eich created JavaScript, he looked at it, and saw that it was good enough given the questionable requirements and strict time constraints.

When I look at JavaScript, I see that it is bad. And not good enough given the various other superior languages out there.

Recognizing it as a real problem

I think any developer with their head screwed on knows about the above problems and that JavaScript needs to be replaced as soon as possible. But the problem is immediate.

My approach to the problem, as with everyone else, has long been: well, we can’t do anything about it, let’s just wait for Google to complete their native client project and hope that it breaks the market. Or, let’s wait until the existing compiling-to-JavaScript solutions become usable.

Any way you look at it, as you sit down to write a new project, and every time you get a stupid error due to JavaScript’s wackiness, you say to yourself “just one more project in JavaScript… just this quick script…”

After seeing Inventing on Principle1, I was profoundly influenced by Bret Victor’s message. His talk was impressive, but his message was moreso. The idea that I took away from watching it was:

“If you recognise something as a problem, and you have the capability to fix it, you have a moral duty to fix that problem.”

I’m not sure I have such strong convictions as Bret to apply that generally, but his principled approach influenced me. One day I wanted to write a web app, and got that sinking feeling of wasting it on JavaScript, and decided never to write any JavaScript again for a new project.

Fixing the problem

I decided that to make such a claim, I should have to back it up with a solution, and do it fast. So I spent that weekend hacking on a Haskell compiler for JavaScript. I spent another weekend polishing it, and on the third week I was using it at work in production. Back then the project was called “hj.” And for months it sat hidden, private to me. A mini-success and a solution to the problem I saw.

I’ll note that Elm and Roy also inspired me to give it a go.

Reaction

Fast-forward a couple months, I decide it’s time to re-brand it to something friendly and put it online. I called it “Fay.”

Someone posted it to Reddit’s programming forum and Hacker News, and the site got about ten thousand hits in two days. Lots of interest generated, and people emailed me asking what the implications of such a project are. That’s really encouraging!

I got invited to talk at LXJS, a JavaScript conference. I will be going in two weeks. Ironically, I will go to basically say how much I dislike JavaScript to a crowd of people who mostly like JavaScript, but that’s how I roll.

Today and tomorrow I’ll be producing a bunch of demo examples of Fay code, and finalizing my short 20-minute talk.

Future work

Fay is missing some things that would be nice to have:

Type-classes

Tail-call optimization

Strictness analysis

Source mappings

Cabal support

Lots more other stuff

But these can wait a bit of time. I could/would write more, in more detail, but I have a lot of stuff to do at the moment. So, apologies for the brief post, but I thought it was worth having this post in the blog for the sake of chronology, and to get this series of events out of my system. That’s the point of a blog, right?

]]>Sat, 15 Sep 2012 00:00:00 UThttp://chrisdone.com/posts/fayMaking HaskellDB slightly more type-safehttp://chrisdone.com/posts/haskelldb-more-type-safe
I was just discussing HaskellDB’s major flaws with Oliver Charles and I noted that one huge problem is that the type of update does not restrict the record given to make the update. Its type is

This problem actually bit me in the ass in production once before. That is not an exciting bug to have.

So I thought, we need to prove that for the type above, s <: r (read as “s is a subtype of r”). How do we express that? How about a type class.

The type-class can be

class Subset sub super

But how to implement it? Well, we need to say that for every field of sub, that field is also a field of super. That’s made easy for us, because HaskellDB already has a HasField field record class for exactly that!

Testing this on my codebase actually found a bug in which I was using the wrong field!

I will send this to the maintainer of HaskellDB as it’s a glaring bug waiting to happen to someone.

]]>Sat, 25 Aug 2012 00:00:00 UThttp://chrisdone.com/posts/haskelldb-more-type-safeComments on my bloghttp://chrisdone.com/posts/blog-comments
Comments are unsupported on my blog for the same reason as Dave Winer gives:

“…to the extent that comments interfere with the natural expression of the unedited voice of an individual, comments may act to make something not a blog…. The cool thing about blogs is that while they may be quiet, and it may be hard to find what you’re looking for, at least you can say what you think without being shouted down. This makes it possible for unpopular ideas to be expressed. And if you know history, the most important ideas often are the unpopular ones…. That’s what’s important about blogs, not that people can comment on your ideas. As long as they can start their own blog, there will be no shortage of places to comment.”

Joel Spolsky first exposed me to the above. He elaborates in his blog:

The important thing to notice here is that Dave does not see blog comments as productive to the free exchange of ideas. They are a part of the problem, not the solution. You don’t have a right to post your thoughts at the bottom of someone else’s thoughts. That’s not freedom of expression, that’s an infringement on their freedom of expression. Get your own space, write compelling things, and if your ideas are smart, they’ll be linked to, and Google will notice, and you’ll move up in PageRank, and you’ll have influence and your ideas will have power.

If I wanted to have my expression shared and democratised I’d comment on a forum.

People email me regarding my blog with ideas and errata/amendments all the time, I’m glad for them, and I reply.

]]>Fri, 06 Jan 2012 00:00:00 UThttp://chrisdone.com/posts/monads-are-burritosJi, a little library for controlling a web browser from Haskellhttp://chrisdone.com/posts/ji-haskell-web
As of recent I have only been creating a lot of new projects, not working one existing ones or finishing half done ones off.1

So here’s yet another little project that is to test the concept of controlling the web browser’s DOM from Haskell as a means to write user applications.

It doesn’t use websockets as websockets aren’t well supported2, so I just used a simple poll/push protocol.

It seems fairly viable so far. I would have liked to produce many more examples, but I couldn’t really think of any. I stole the idea for the dollars from Albert Lai. There is more room for optimizations, but until I do a more large scale test, hard to say exactly where needs it.

It might be a good test to rewrite TryHaskell with it. Probably still too easy. I’ll give it a while to think about it.

It could be a base on which to build a more high-level library or framework.

I could also write a back-end for digestive-formlets.

Partly this is a way to feel like I’ve spent my time well as it’s easier to complete something small, and partly that ideas I get which sound feasible typically linger in my head asking to be prototyped, so this is a way of flushing them out.↩

I wanted to use socket.io but there is no Haskell socket.Io back-end, and I didn’t feel like writing Node. There are several websockets Haskell libraries, but as mentioned websockets itself isn’t well supported, I’d have to upgrade to try it (and so would everyone else). Websockets would be the eventual protocol, though.↩

]]>Mon, 26 Dec 2011 00:00:00 UThttp://chrisdone.com/posts/ji-haskell-webA concept for editing code as an AST rather than texthttp://chrisdone.com/posts/concept-for-editing-asts-rather-than-text
Here is a demo video.

It’s not my intention for this to be all point and click, at all, but coming up with keybindings and a means to navigate the AST via keyboard is an interesting problem somewhat separate to the problem of creating/displaying/editing as-is. As you can see I’m even confused myself when using it, and it’s a lot slower with the mouse. With keyboard control it could be blindingly faster than any text editing-based language editors available now. Think paredit-mode, but for non-Lisp languages.

I’ve had this idea in the back of my mind for years and today thought that I might do a concept implementation to solidify the idea somewhat.

The idea is that you can’t create a syntactically invalid tree, and at each point it can offer you a choice between the valid choices. That’s one part, the correctness. But that’s merely a nice side-effect to the idea of purely syntactical editing, rather than textual editing, so that jumping around, transposing, moving, deleting expressions will be a lot easier. Even so that there is no need to care about indentation, but rather moving things about the AST. Still merely a concept at this point. It’s really hard to think about what you might like from an editing mode like this without going ahead and implementing something.

It can technically be generalized to any programming language, but Haskell is my main working language so I am targeting it specifically.

It could also be helpful for newbies, being guided on the syntax. Purists will argue people should be able to write syntax. I suppose they’d be right.

In summary I made a little DSL for describing an AST for manipulating like this. There is a “list of things” combinator, an “optional” combinator (e.g. the “module” decl is optional), there is a “choice” combinator, e.g. when adding a new top-level decl it prompts for a choice between the different types of decls, validating text inputs (e.g. module name, constructor, variable, etc.), and that is more or less enough, as far as I can tell, so far. Hopefully it won’t get much more complicated than that.

For a real implementation I would probably do it in Emacs, if overlays would permit me enough power to do it (I think so). But it could also be implemented in Yi, or Leksah, or Vim, or whatever, were those users so inclined.

If you’re implementing something like this already, I’d be interested to see it. If you have some interesting ideas, feel free to comment.

]]>Sat, 17 Dec 2011 00:00:00 UThttp://chrisdone.com/posts/concept-for-editing-asts-rather-than-textA map generic upon the value thanks to typeablehttp://chrisdone.com/posts/generic-map
Not sure why I never tried this before.

]]>Mon, 05 Dec 2011 00:00:00 UThttp://chrisdone.com/posts/generic-mapHaskellDB: A long tutorialhttp://chrisdone.com/posts/haskelldb-tutorial
I’ve been using HaskellDB in production for about two years. I decided that I’d write a proper, up-to-date description, or tutorial, about what it is, how it works, what it can do, and my experience using it in projects.1

ORM approach

Fields

The approach for the object relational mapping is that one defines the column types and entity schemas up front. So, supposing our project is named Caturday, in a module named Caturday.Model.Fields,5 using the field macro,6 one declares fields. For example:

Speed and optimisation

But the subquery is useless in this example, so clearly the optimizer isn’t magic.

λ> ppSqlUnOpt simpleDoubleSelection
SELECT id2 as id,
title2 as title
FROM (SELECT id as id2,
title as title2
FROM content as T1) as T1,
(SELECT id as id1,
title as title1
FROM content as T1) as T2

In fact, subqueries are created in all cases.

For normal query optimizers, e.g. PostgreSQL, the subquery is lifted as to be equivalent to there being one query. I am not sure about MySQL; it may have trouble when joins are involved. Don’t expect good performance from HaskellDB if you’re using MySQL.10

For example, PostgreSQL sees such use of sub-query as equivalent to direct join:

I’m not joining on any indexes so it’s a sequence scan. For people not used to PostgreSQL output, this basically means it will do a cartesian product in both versions.

Maintenance

The great part about HaskellDB is that it is in first-class Haskell land. Fields and tables have a statically enforced membership and field-type schema.

The obvious use case is that it avoids making mistakes in naming and ending up with the wrong field type, or using a field that doesn’t exist in a given table.

The fact that all fields are defined up front with the right type means that one really has to think about how meaningful a type is and how one will use it. For example:

field "Abstract" "abstract" "abstract" [t|Maybe String|]

This is how to encode a database text field that is nullable. When one is encoding their database schema into the Haskell type system, one finds that it really needs to be thought of properly of what types are there in the database, particularly nullability.

In my day to day work, I have to work with database schemas that aren’t mine, I have to interface with them. Due to my use of HaskellDB, I have a lot of correctness questions about these schemas I’m working with to the authors, if they are available for consultation.

Often it comes up, that I ask “why is this field nullable?” and the question often comes back, “I don’t know.” As the PostgreSQL documentation says, in most database designs the majority of columns should be marked not null.11

Note that in Haskell nullability is not implicit. No values can be null. But you can have choice between a value or not a value, as in Maybe:

data Maybe a = Just a | Nothing

And so if we use the abstract field, as mentioned, and use it as a string, it’s not a string, it’s a Maybe String, so we get a compile error such as:

Mismatch: Demo.hs:23:32: “Maybe String” ≠ “String”

Another nice property is that fields named in your codebase, and their names in the database, are entirely separate and configurable. Just because Joe Master Designer chose certain names in his schema, that doesn’t mean that you have to conform to those names. Maybe they call it thetitle, and you just want title:

field "Title" "title" "thetitle" [t|String|]

Another fact is changes to the schema underneath: if someone (you or someone else) changes the type or availability of a field or table in the schema, all you need do is make the necessary change in the field module or table module, and the compiler will tell you immediately which modules need updating with the new invariants.

Suppose we change the type of the field title to Int (for example), when we recompile our examples above, we get:

Extension

Pagination and composing queries

Because the query DSL is a monad (as plenty of Haskell DSLs are), it is really nicely composable. This means it’s trivial to split up queries into discrete parts that have meaningful and generic purposes.

For example, to implement pagination, which is essentially the simple problem of an offset and a count. I implemented this in HaskellDB.Database.Pagination.12

Thus the following implementation is possible. Suppose we write some functions to search the articles by title in the database, but paginated. Two things we need for this are:

Stability

The problem with HaskellDB is that the implementation can be unstable. I found that I had to patch the PostgreSQL library to handle simple stupid things like fields named “user” or “order”, by making sure to quote all fields.

I also had to open up some of the internal parts of the API so that I could extend it further, such as for the operator (.@@.) defined above. I’ll push these fixes and extensions to fork repos at some point.

Reading error messages

HaskellDB gets a lot of stick for hard to read error messages. This is true when you get things badly wrong.

In the general case the errors are quite straight forward.

For example, if I try to use a field which doesn’t exist in the table, like this:

Error: Demo.hs:39:13: No instance for (HasField F.Count RecNil)
arising from a use of `!' at Demo.hs:39:13-27
Possible fix:
add an instance declaration for (HasField F.Count RecNil)
In the first argument of `(.==.)', namely `content ! F.count'
In the second argument of `($)', namely
`content ! F.count .==. val 1'
In a stmt of a 'do' expression:
restrict $ content ! F.count .==. val 1

Which is a very useful error message. content does not has field count.

For getting the wrong type, it merely shows “couldn’t match type A against type B,” straight-forward.

The cases where compile errors blow up are, for example, if I wrote this:

The error actually makes sense if you understand the API well enough, but otherwise it can be very confusing and worrying. Don’t worry about it, you didn’t break something complicated, you just made a typo somewhere. It shows the offending expression; you realised you tried to use a table as a field, and you correct.

Files

Afterwards it would seem like a good idea to get a proper comprehensive tutorial on the HaskellWiki, or much better yet, embed a tutorial in the Haddock documentation for HaskellDB. At the moment the haddock docs are literally just an API listing, with no elaborative explanation or examples. Writing in Haddock mark-up is quite a painful, boring experience. Regardless, I believe the haddock docs of a project should (most of the time) be sufficient to explain its use, linking to external papers and blog posts and whatnot is annoyingly terse and quickly becomes out of date.↩

Embedded domain-specific language. A common notion in Haskell and Lisp languages, though implemented differently in each.↩

This is the convention I have chosen to use. It makes good sense and can be very helpful for all fields used in the project to be defined on a per-project basis, rather than per-entity, and of the same type.↩

A macro that you can get from Database.HaskellDB.TH, which I have yet to put into a library or get added to HaskellDB mainline. I don’t care to debate API decisions with the HaskellDB maintainers right now.↩

A macro that you can get from Database.HaskellDB.TH, which I have yet to put into a library or get added to HaskellDB mainline. I don’t care to debate API decisions with the HaskellDB maintainers right now.↩

When table names conflict with field names—and eventually it happens—this is useful to have. Alternatively as F also makes sense, to be consistent.↩

]]>Sun, 06 Nov 2011 00:00:00 UThttp://chrisdone.com/posts/haskelldb-tutorialCommon Lisp/Haskell syntactic comparisonhttp://chrisdone.com/posts/common-lisp-haskell
This is a little reminder/documentation for myself to explain that, despite having nice regular s-expression syntax, Common Lisp actually has a lot of syntactic concepts. I add comparison to Haskell because Haskell is known (perhaps superfically) for having a lot of syntax compared to other languages.

Expanding with examples

Someone commented:

What if my data type is a list and I know that head will not throw an exception?

foo [] = 0
foo xs = bar $ head xs

The problem is that this is an invariant that only exists in the programmer’s head (sorry) and is not encoded in the type system (such is the problem with all partial functions), when it so easily can be. Some examples:

Sometime last year I found a Haddock bug:

haddock: internal Haddock or GHC error: Prelude.head: empty list

The cause is line 191:

packageMod = ifaceMod (head ifaces)

in the render function, because the author assumed that the “not-null” invariant would never be broken. But then he used the renderStep function again, and line 158, in the main function:

So, despite the invariant being satisfied at the time of writing, later that tacit invariant was broken and the developer didn’t realise it. This is more or less the most common case of partial function exceptions. You just know X will never happen, and then it does.

It’s trivial to abstract away partiality. In some cases handling cases might be a speed concern, but that should be a case-by-case localized optimization based on profiling.

]]>Mon, 17 Oct 2011 00:00:00 UThttp://chrisdone.com/posts/boycott-head“Value polymorphism”, simple explanation with exampleshttp://chrisdone.com/posts/value-polymorphism
A concept in Haskell which is particularly novel is that polymorphism works at the value level rather than function-parameter or object-dereference level.

Function-parameter polymorphism comes in some different forms, for example, C++:

The type of an expression def therefore is Default a => a, or, “any instance of Default”. I can instantiate an instance myself by specifying a type signature:

λ> def ::Int→0
λ> def ::Char→'a'

Or by type inference, meaning that the combination of this expression with other expressions allows the compiler to infer the single correct type instance:

λ> def :"bc"→"abc"
λ> def -2→-2
λ> def ==0→True

But with no information it will be a static compile error:

λ> def
Ambiguous type variable `a' in the constraint:
`Default a' arising from a use of `def' at
<interactive>:1:0-2
Probable fix: add a type signature that fixes these type
variable(s)

Why is value polymorphism beneficial? Some trivial examples follow (and you are trusted to extrapolate to the more sophisticated things that might otherwise obscure the essence of this feature).

The Read class contains a method read which is polymorphic on the return value:

classRead a where read ::String-> a

It parses a data type from a string. Combined with the Show class, together Read and Show make a naive serialization library. In the same way, it would be ambiguous to read without specifying the instance:

λ> read "2"
Ambiguous type variable `a' in the constraint:
`Read a' arising from a use of `read' at
<interactive>:1:0-7
Probable fix: add a type signature that fixes these type
variable(s)

But specifying with a type signature or using type inference are fine:

λ> read "2" ::Int→2
λ> read "2"*3→6

Another example is JSON parsing (the real class is different to this, but introduces questions that are irrelevant to the point of this post).

classJSON a where decode ::String->Result a

The decode function is return-value polymorphic, it can be read like this:

decode :: (JSON a) =>String->Result a

That is, it returns a result (success or fail) with a value which is an instance of the JSON class.

In fact, the literal 1 is also polymorphic with type Num a => a, meaning that the number could be an Integer, a Double, a Rational, or a user-defined type like Scientific. It will be determined by inference or annotation.

Such static value polymorphism is difficult to do in popular languages such as C#, Java, C++, without some kind of proxy objects to explicitly instantiate an object to dereference using generics or templates, and hard to do in Lisp, Python, Ruby and JavaScript without static type systems (although can also be approximated with proxy aka “witness” objects). This is, for example, why implementing the Monad class is rather awkward in other languages.

]]>Sun, 16 Oct 2011 00:00:00 UThttp://chrisdone.com/posts/value-polymorphismRank-N types, a simple DB examplehttp://chrisdone.com/posts/rankntypes
This is a very simple example of rank-N types to demonstrate to non-Haskellers/newbies.

Following the resources theme, rank-N types as seen in the ST monad are also a gem:

This is pretty nice if your DB library implementation, e.g., is supposed to ensure operations on a connection run inside a transaction, or if your operations assume a connection exists. Otherwise you’re liable to having DB code run outside of a transaction, or code throwing exceptions because the connection was closed but we tried to use it anyway, or in severe cases, some C DB libraries will just segfault.

We didn’t have to do anything complex or write any boilerplate or macros or whatnot, just use the type system. That’s what it’s for.

JavaScript per se is insufficient. The depths to which JavaScript fails is well-documented and well-understood. Its main faults are its verbose function syntax1, late binding2, which has led to the creation of various static analysis tools to alleviate this language flaw3, but with limited success4 (there is even a static type checker5), finicky equality, this behaviour, and lack of static types and modules6.

Using JavaScript for what it is good for7, but not using the language per se, is therefore desirable, and many are working to achieve this8, in some form or another. There various ways to do it9, but I will opt for compiling an existing language, Haskell, to JavaScript, because I do not have time to learn or teach other people a new language, garner a new library set and a new type checker and all that Haskell implementations provide.

Given the option, I’d choose GHC because it is the flagship Haskell compiler, with the most features, which most Haskellers write all their projects with. Haskell has a standard, presently Haskell 2010, but I think that most projects out there use quite a few language extensions10 outside of the standard; Haskellers target GHC. This is not to say that for compiling to JS, Haskell 98 wouldn’t be a vast improvement.

Fortunately there is a project maintained by Victor Nazarov called GHCJS. You can use GHC 6.12.3 or GHC 7+. For my experimentation I am using 6.12.3. I followed the instructions given, with a fix for the build process11, and some tweaks to the libraries12. In order to build the libraries and copy them to the examples/ directory, I wrote a little script13, which helps automate this. There is also BuildTest.hs in the examples/ dir which gentle reader should try first.

After much twiddling and fudging with the example file and the provided FFI, some help from Victor Nazarov, with some trial and error, I managed to get some fundamental things working that are necessary to be able to write effectively in the JavaScript environment14. Timers work (and AJAX requests will), but this example is merely a clickable blank page which alerts “‘Ello, World!”. Uninteresting functionally, but a good test of the fundamentals (see the pasted Haskell source).

Next up, I will write a simple pong game15 to test integration with the canvas element and speed of the runtime and establish some sort of base library and project template from which other Haskellers can more easily experiment. Perhaps we could even have in the future a browser-based IDE and compiler which can of course run the compiled code in the user’s browser. That would be nice.

Its support for closures is commonly noted as being one of JavaScript’s redeeming features.↩

Early binding allows for static verification of the existence of method-signature pairs (e.g. v-tables). Late binding does not give the compiler (or an IDE) enough information for existence verification, it has to be looked up at run-time.↩

There are several hinting libraries, which developers insist are indispensable tools when developing JavaScript seriously, such as JavaScript lint, JSLint, and JSure.↩

This will ensure that invocations to Bad.query() will be well-typed. See the Google closure docs for more examples. Developers I’ve spoken to at Google say this makes JS bearable with sufficient self-discipline, but without it, maintaining a large codebase in JS is unrealistic.↩

It is established that JavaScript is now a target platform due to its browser ubiquity. If we want to write tools, applications, games, etc. that will reach a wide audience with little effort on the user’s part, targetting the browser and therefore JavaScript is an excellent option.↩

On Ubuntu, I had to explicitly add -pthread to the build configuration of libraries/unix, otherwise it didn’t figure it out automatically.↩

There were maybe 5 foo# shaped functions that were out of scope throughout the base libraries, particularly in GHC. I simply replaced these with undefined, or because that’s not available, let a = a in a, or whatever bottom value to stop it complaining. I don’t know whether GHC will detect let a = a in a, I think it does. So the runtime will just throw an exception on these values.

I.e. a way to use closure callbacks for e.g. setInterval/setTimeout and AJAX, a way to serialize data structures like strings and arrays from/to Haskell and JavaScript, and a way to access the DOM and bind events to it.↩

Pong is a good demo. I’ve already started work on this, but hit some walls when trying to separate the build into a more generic and less example-y structure. It’s quite easy to break this system at present.↩

“The Bible of Software Engineering”, because, “everybody quotes it, some people read it, and a few people go by it.”

In my reading of the book, around chapter 11, “Plan to Throw One Away”, I got the idea to annotate and underline sentences and paragraphs that rang true with my experience or that I thought were insights that I and everyone should take into account.

Now that I’ve finished it, I thought I’d jot those points, that I felt the need to underscore, here. Flicking back through the earlier chapters there are lots of other points I ought to underscore, but that’s for another time.

I often see or participate in debates about software development that are better summed up by many clear insights from MMM, so it’s good for me to jot them down; having a common vocabulary and literature avoids a bunch of redundant discussion. For example, I saw some rather odd posts to Reddit’s programming section with laboured gardening and writing analogies.

I’m not sure what the legality of typing up so much of a book is. There is a lot more context to each of the points below, so you really need the book anyway to fully grok everything covered. Many points in the book may or may not have been underscored depending on the availability of a pen at the time, and I miss out the first ten chapters. Others downright do not make sense without the context which I don’t feel comfortable in further quoting verbatim.

At any rate, most of the quotes below have been quoted verbatim elsewhere.

Plan to Throw One Away

Pilot Plants and Scaling Up

“Programming system builders have also been exposed to this lesson, but it seems to have not yet been learned. Project after project designs a set of algorithms and then plunges into construction of customer-deliverable software on a schedule that demands delivery of the first thing built.”

“In most projects, the first system built is barely usable. It may be too slow, too big, awkward to use, or all three.”

“The discard and redesign may be done in one lump, or it may be done piece-by-piece. But all large-system experience shows that it will be done.”

“The management question, therefore, is not whether to build a pilot system and throw it away. You will do that. The only question is whether to plan in advance to build a throwaway, or to promise to deliver the throwaway to customers.”

“Hence plan to throw one away; you will, anyhow.”

The Only Constancy is Change Itself

“But the very existence of a tangible object serves to contain and quantize user demand for changes.”

“Clearly a threshold has to be established, and it must get higher and higher as development proceeds, or no product ever appears.”

“The throw-one-away concept is itself just an acceptance of the fact that as one learns, he changes the design.”

Plan the System for Change

“Most important is the use of a high-level language and self-documenting techniques so as to reduce errors induced by changes. Using compile-time operations to incorporate standard declarations helps powerfully in making changes.”

“Every product should have numbered versions, and each version must have its own schedule and a freeze date, after which changes go into the next version.”

Plan the Organization for Change

“[…] the reluctance to document designs is not due merely to laziness or time pressure. Instead it comes from the designer’s reluctance to commit himself to the defense of decisions which he knows to be tentative.”

“[M]anagers themselves often think of senior people as ‘too valuable’ to use for actual programming.”

Two Steps Forward and One Step Back

“The fundamental problem with program maintenance is that fixing a defect has a substantial (20-50 percent) chance of introducing another. So the whole process is two steps forward and one step back.”

“If fact it often has system-wide ramifications, usually nonobvious. […] the far-reaching effects of the repair will be overlooked.”

“Clearly, methods of designing programs so as to eliminate or at least illuminate side effects can have an immense payoff in maintenance costs. So can methods of implementing designs with fewer people, fewer interfaces, and hence fewer bugs.”

One Step Forward and One Step Back

“Soon or later the fixing ceases to gain any ground. Each forward step is matched by a backward one. Although in principle usable forever, the system has worn out as a base for progress.”

“A brand-new, from-the-ground-up redesign is necessary.”

“Systems program building is an entropy-decreasing process, hence inherently metastable. Program maintenance is an entropy-increasing process, and even its most skillful execution only delays the subsidence of the system into unfixable obsolescence.”

Sharp Tools

“A good workman is known by his tools.” (proverb)

High-level Language and Interactive Programming

“I cannot easily conceive of a programming system I would build in assembly language.”

The other face

Self-Documenting Programs

(An almost meta-quote here considering the context of this post) “Refer to standard literature to document basic algorithms wherever possible. This saves space, usually points to a much fuller treatment than one would provide, and allows the knowledgeable reader to skip it with confidence that he understands you.”

No Silver Bullet

Does It Have to Be Hard?—Essential difficulties

“First, we must observe that the anomaly is not that software progress is so slow, but that computer hardware progress is so fast.”

“No other technology since civilization began has seen six orders of magnitude price-performance gain in 30 years.”

“Second, to see what rate of progress we can expect in software technology, let us examine its difficulties. Following Aristotle, I divide them into essense—the difficulties inherent in the nature of the software—and accidents—those difficulties that today attend its production but that are not inherent.”

“I believe the hard part of building software to be the specification, design and testing of this conceptual construct, not the labour of representing it and testing the fidelity of the representation. […] If this is true, building software will always be hard. There is inherently no silver bullet.”

Complexity

“Software entities are more complex for their size than perhaps any other human construct, because no two parts are alike (at least above the statement level). If they are, we make the two similar parts into one, a subroutine, open or closed. In this respect software systems differ profoundly from computers, buildings, automobiles, where repeated elements abound.”

“Digital computers are themselves more complex than most things people build; they have very large numbers of states. This makes conceiving, describing, and testing them hard. Software systems have orders of magnitude more states than computers do.”

Below; functional programming springs to mind:

“From the complexity comes the difficulty of enumerating, much less understanding, all the possible states of the program, and from that comes the unreliability.”

“From complexity of structure comes the difficulty of extending programs to new functions without creating side effects. From complexity of structure comes the unvisualized states that constitute security trapdoors.”

“The physicist labors on, however, in a firm faith that there are unifying principles to be found […] because God is not capricious or arbitrary. No such faith comforts the software engineer.”

“[…] not because of necessity but only because they were designed by different people, rather than God.”

“Partly this is because the software in a system embodies its function, and the function is the part that most feels the pressures of change. Partly it is because software can be changed more easily—it is pure thought-stuff, infinitely malleable.”

“All successful software gets changed. Two processes are at work. As a software product is found to be useful, people try it in new cases at the edge of, or beyond, the original domain. The pressures for extended function come chiefly from users who like the basic function and invent new uses for it.”

“As soon as we attempt to diagram software, we find it to constitute not one, but several, general directed graphs, superimposed on one another.”

I think the below is a very interesting point; having a visual mind does not seem to help you in programming.

“In spite of progress in restricting and simplifying the structures of software, they remain inherently unvisualizable, thus depriving the mind of some of its most powerful conceptual tools.”

Which follows nicely into the next point I underscored a page later:

Graphical programming

“Nothing even convincing, much less exciting, has yet emerged from such efforts. I am persuaded that nothing will.”

Program verification

“Program verification does not mean error-proof programs. There is no magic here, either. Mathematical proofs also can be faulty. So whereas verification might reduce the program-testing load, it cannot eliminate it.”

“More seriously, even perfect program verification can only establish that a program meets its specification. The hardest part of the software task is arriving at a complete and consistent specification, and much of the essence of building a program is in fact the debugging of the specification.”

Environments and tools

“Perhaps the biggest gain yet to be realized in the programming environment is the use of integrated database systems to keep track of the myriads of details that must be recalled accurately by the individual programmer and kept current in a group of collaborators on a single system.”

(And I don’t think ‘intellisense’ really covers it.)

Promising Attacks on the Conceptual Essense

I found this to be a very interesting perspective considering the era in which it was written:

“There are dramatic exceptions to my argument that the generalization of the software packages has changed little over the years: electronic spreadsheets and simple database systems. These powerful tools, so obvious in retrospect and yet so late appearing [bold added for emphasis], lend themselves to myriad uses, some quite unorthodox.”

Incremental development — grow, not build, software

“Some years ago, Harlan Mills proposed that any software system should be grown by incremental development. That is, the system should first be made to run, even though it does nothing useful except call the proper set of dummy subprograms. Then, bit by bit it is fleshed out, with the subprograms in turn being developed into actions or calls to empty stubs in the level below.”

“The morale effects are startling. Enthusiasm jumps when there is a running system, even a simple one.”

“I find that teams can grow much more complex entities in four months than they can build.” (Yes, I see the gardener analogy here, but please.)

Great designers

“I think the most important single effort we can mount is to develop ways to grow great designers.”

There is a lot more crammed in this book, some several more chapters. But I’ll stop here.

]]>Sun, 26 Jun 2011 00:00:00 UThttp://chrisdone.com/posts/the-mythical-man-month-insights‘amb’ operator and the list monadhttp://chrisdone.com/posts/amb-list-monad
A friend was messing about with the amb operator in JavaScript after seeing it in Common Lisp. The amb (or ambiguous) operator, first described by our pal John McCarthy (1967), and something I first encountered in SICP.

Motivation

Last Wednesday night I whipped up a simple IRC server in Haskell in about four hours. We have been long time sick of the poor quality of the Skype Linux implementation, which was, on the dev team, our main point of communication. We agreed something like IRC would be good, so I thought it would be easy in Haskell to make such a thing, and it was; the next day we were chatting on it!

General Haskell Projects

It’s good Haskell practice to start any project with cabal init which asks you a series of questions and generates a .cabal file for you. Common practice is to put source in the src dir, and have your Project in a sub-directory matching the project name:

$ ls src
Control Data GeneratePass.hs Hulk Main.hs

Code that isn’t specific to the particular project but could be used anywhere should go in appropriate modules such as Control.*, Data.*, etc. It occurs commonly that you will need this code in other projects and because the dependency between these modules and your main project’s modules is only in one direction you can simply copy the files over to your new project.

Hulk’s module hierarchy

The first two just contain utilities that I tend to use often. The Main module is the main entry point, then control goes to Hulk.Server which starts listening on the right port, accepting connections and handling/sending messages to/from clients.

Purity vs Impurity

In order to handle messages and reply to them from clients, the Hulk.Client module is used. The code in Hulk.Client is entirely pure, and it is the bulk of the project. This is an intentional effort. The original program I whipped up used a bunch of MVars and was basically an imperative program, and about as confusing.

Another “good practice” is for Haskell programs to be like a well-oiled super villain base. On the edge is where all the explosions happen, and inside is where the bad guys sit and drink Orzo and control everything.

Impure code is like the wreckless henchmen who always wreck everything, and double-cross you at every opportunity. Pure code is the evil genius who devises the master plan, tells the henchmen what to do, and keeps them in separate living quarters.

It’s also common to put all your types into one module named Types, as you tend to use types from every module and this avoids circular dependency problems in the long run.

I initialise the sockets subsystem for Windows and then install a handler for SIGPIPE, because that signal is sent in Unix when a program attempts to write to a socket that has been closed. Both Windows and Unix have their novel design choices. Go figure.

The reading process is merely a simple monad that either returns the Config object or an error. I choose to just throw an error when there’s an issue. I use this library for pretty much every project I use, it really is an essential library.

An alternative way to express the code above is to runReaderT, define a function like get' = ask >>= flip get and then you can express the above with Applicative operators (<$>) and (<*>).

Server starter

I start the server by using the listenOn function, covered in Peteris’s post, and accept connections, setting the buffering to NoBuffering. This turns out to be rather important; as Peteris mentions, this avoids surprises with buffering, which is something I experienced when testing out LineBuffering in this project. In certain situations unknown to me, access to handles locks up.

Connection handling

I fork a new thread per handle. No big deal. I have one value, envar, of type MVar Env, which stores the state of the whole server. It can only be accessed by one thread at a time, that’s why I put it on an MVar. The definition of Env is:

I get a line which is Right, or fail and return what’s Left. The case of getLine failing is when the socket is closed. I ignore messages only containing newline characters, and the middle case is actually getting a valid line which I pass to runHandle that runs the pure client handler, then loops again.

It passes the program state (env) and the current connection info (conn) to the function handleLine, which is the single export from Hulk.Client, which is a transformer over an arbitrary monad. Technically, in this case I’m running it inside a readerT on IO, so it’s not actually pure. The handleLine action returns a bunch of replies/instructions for the Server module to perform and a new state (env).

When I said that the Hulk.Client module was pure, I meant that it is abstracted over whether it is pure or impure, and therefore can be treated as pure for testing and developing, and when running the server, runs in IO, but only 0.1% of the code uses IO. Also, when I said “arbitrary monad”, I meant any monad implementing the MonadProvider class.

Meaning that these are the only “impure” things I need when running the program. I need to read the preface, motd, key, and password files on demand. In the IO case, I simply read the file. In the pure case, I can stick it in a Reader or Identity monad and the whole computation is thus pure.

What’s the benefit? This means I can run arbitrary parts of the computation trivially, and make pure test suites out of it. QuickCheck my IRCd, anyone? The main benefits are not to have to worry about conflicting simultaneous threads, and being able to run any function from the module with whatever state one desires.

Client replies

The Client module replies with one of the following:

data Reply = MessageReply Ref Message | LogReply String | Close

MessageReply: Send this Message to the given handle (Ref).

LogReply: Log this String.

Close: Close the current connection.

I find this separation of IO and logic to be useful.

The IRC monad stack

The rest of the project lies in Hulk.Client and is academic/straight-forward. I will explain the IRC monad, though:

Summary

That’s all, folks! I hope this is useful to some people thinking of writing their first Haskell daemon project.

Haskell is the only language I know in which I can write 400~ lines of code without running it and then run it and have it work as expected.

]]>Sun, 30 Jan 2011 00:00:00 UThttp://chrisdone.com/posts/hulk-haskell-irc-serverLisk - Lisp and Haskellhttp://chrisdone.com/posts/lisk-lisp-haskell
In my spare time I’m working on a project called Lisk. Using the -pgmF option for GHC, you can provide GHC a program name that is called to preprocess the file before GHC compiles it. It also works in GHCi and imports. You use it like this:

I literally only support what is exhibited in the example above, and it is not ready for use at all. But I am using haskell-src-exts’s AST and pretty printer in order to convert from Lisk to Haskell, so I’m in good hands regarding completeness of the syntax. I don’t have a lot of time to work on it right now, but I will be doing.

]]>Thu, 25 Nov 2010 00:00:00 UThttp://chrisdone.com/posts/lisk-lisp-haskellDuck typing in Haskellhttp://chrisdone.com/posts/duck-typing-in-haskell
This is a simple Literate Haskell file demonstrating duck typing in Haskell. You copy the whole web page and paste it into a .hs file and compile/load it. Grab the normal Haskell version here.

I’m using a library called Has which you can grab from Hackage (or just cabal install has). It’s pretty neat, it allows you to define field names and their types, that you are going to use at some point, and then lets you construct/access/update arbitrary records based on those fields.

We need to enable type families and flexible class contexts to work with this library. And also it’s nice if we disable the monomorphism restriction. I don’t want mono, I want manymorphism! As many as you can fit in your pocket.

λ> age ^. chris
67
λ> age ^. donald
<interactive>:1:0: No instance for (Contains (Labelled Age Integer)
TyNil) arising from a use of `^.' at <interactive>:1:0-12

So there you have it, duck typing in a statically typed way. We get to have our cake and eat it too.

By the way, and this isn’t particularly important, I used a function to make creating record fields a little nicer, because I don’t like the namelessness of writing fieldOf 23:

> -- | Creation: I like to be able to name the fields that I'm assigning.
> (^-) :: a -> TypeOf a -> FieldOf a
> (^-) = const $ fieldOf
> infixr 6 ^-

]]>Mon, 22 Nov 2010 00:00:00 UThttp://chrisdone.com/posts/duck-typing-in-haskellAmelie: hpaste.org gets an updatehttp://chrisdone.com/posts/hpaste-update
The hpaste.org site had some database locking issues and the person hosting it has not been around to fix it for a long time. I thought the source code for the site was a bit big, so I thought it would be a neat project to write from scratch as a long screencast about writing a Haskell project from scratch, using Hackage libraries, etc.. I will post more about the screencast later once I am done. The project is called amelie.

I plan on adding some niceties like hlint and the Context in IRC feature that paste.lisp.org used to support. I will add some trivial spam filters depending on the type and frequency of the spam recieved. I will also add an API pretty soon and an RSS feed. Once the spam’s sorted out we can properly bring back the hpaste bot into the IRC channel.

I also have an archive of all the old pastes from hpaste.org (there are 37,000 of them). I have already written an import script for this so I will get those imported sometime this week.

I recently had to switch my VPS host to a new one, it may be that tryhaskell will be a little twitchy. It seems okay though! Contact me at chrisdone at gmail dot com if you experience problems.

(Yes, I did just re-use the design/colours from TryHaskell.)

]]>Wed, 15 Sep 2010 00:00:00 UThttp://chrisdone.com/posts/hpaste-updateHaskell Formlets: Composable web form construction and validationhttp://chrisdone.com/posts/haskell-formlets
Note: This is an archive of an old 2008 post from an old blog.

I think we all saw formlets some months ago when Chris Eidhof posted a blog entry about Formlets in Haskell. For a reminder, a brief description follows. Then I will jump straight into examples.

Description of a formlet

A formlet contains information about how to render itself in mark-up language, whether values provided to it are valid, what error to show when those values are invalid, or, if valid, the formlet returns a new value. Perhaps as a loose model, we can enumerate these as:

Presentation

Validation

Parsing

Failure

Success

It is, therefore, a composition of five properties present in web forms. I contend that keeping these properties specified in the same place and therefore automatically consistent with eachother, is something we want, as developers.

A formlet is self-contained and composable. By ‘self-contained’, this means that all the data needed for a formlet is contained inside its definition. By ‘composable’, this means that formlets can be used together without influencing eachother, and that I can make new, valid, formlets out of existing ones. Composability is something which Haskell is exceedingly good at1, as we will see. The Haskell Formlets library provides us, very concisely, with a way to use formlets. This entry discusses some examples of this library. I hope to convince the reader that this is an excellent way to develop web forms.

Example 1: A user registration form

Suppose I have a formlet that is a user registration form, called register. The registration form takes a username and a password, and the password must be entered twice, in two fields, for confirmation. I might compose this formlet from two other formlets; user and pass. The user formlet may simply display a text field, and checks that the field is not empty. The pass formlet, on the other hand, ought to be composed of two password entry formlets. Each of those sub-formlets will perform the task of checking that the password is valid (such as ensuring that it is greater than six characters in length), and the password formlet merely needs to check that each of the values returned from these two are equal.

Let us convince ourselves that we can indeed express this, using Haskell.

Description of a simple validating formlet

Firstly, for the register formlet I have added type annotations. Types in Haskell help us understand the behaviour of our code, and here is a good example. It is a form which displays mark-up of type Html (provided by Text.XHtml.Strict), it is intended to be ran in some monad m (or instance of Applicative), and returns a Registration value.

register :: (Applicative m,Monad m) =>FormHtml m Registration

Next, we can instantly see that register is composed of user and pass.

register =Registration<$> user <*> pass

But what are user and pass? I will explain user; the meaning of pass can then be inferred. We’ve defined user as displaying a text input box, with no value i.e. Nothing. The value provided from this form element is then checked against what is called a Failing2, which ensures that a value is valid, or displays an error.

It takes a validating function (e.g. valid), an error message, and a value to validate. It either returns a Failure with the error message, or a Success with the value.

Now, the type of check should make sense to us, and this is very lovely:

check ::Monad m =>Form xml m a -> (a ->Failing b) ->Form xml m b

We can see that it takes a formlet returning a, it takes a function which validates a and returns b, finally producing a formlet which returns b. What we have, now, is a function which wraps a validation around a formlet, producing a new formlet.

Correction of the registration formlet

Of course, the definition of pass is insufficient. It needs to display two fields, and check them both, and ensure that their values are equal. Let us correct this by defining a new formlet, passConfirmed. The name means that this password has been confirmed by the user, by their entering twice. Here is the definition3:

The validPasswords is what could be described as a wrapper around passwords, which validates the values, and simply returns them if valid (i.e. equal). passwords simply takes two valid passwords, and puts them in a tuple.

Here we must recognise something special! Composability, ladies and gentlemen! Our passConfirmed formlet does not have to care about whether the passwords are six characters or longer, because the pass formlets have done this for us!

Adding labels: wrapping mark-up

Right now, despite being very lovely, the display of our widgets would be less than presentable. There are no labels on the form inputs! Neither the user nor the pass are labelled. They are also not paragraphed, in the mark-up. Therefore let us write a function which will take a formlet and stick a label and a paragraph around it.

We now have a proper label for the user input field. We ought to now confirm that this is indeed the case by running our code. …Finally!

Running a Formlet

In order to run our formlet, we need to use the runFormState function. Studying its type, we see that it takes an environment, a prefix for the form’s element names, and the formlet itself. Finally, it returns a tuple of the return success or failure of the form, mark-up which ought to be displayed, and the form content type, which we are not interested in.

I have set the monad type to IO. This is merely because we are testing it in GHCi, which defaults to the IO monad, which is just dandy for our purposes.4

Example 2: User registration with monadic validation

We have so far covered how to develop a formlet from an informal description, improve it and then test it in our Haskell prompt. We have addressed the four points previously mentioned, i.e. we have worked with how formlets are presented in mark-up, we have worked with how they validate input, how they fail on invalid input, and how they succeed on valid input. Furthermore, we have demonstrated to ourselves, albiet in a limited way, that all of these things can be composed.

So far, our validation has been pure. A formlet takes a value and validates it purely functionally. This is completely acceptable. Indeed, initially the Haskell Formlets library was only pure. However, we now have access to monadic validation! Yes! We shall briefly discuss why this is a good, and then demonstrate with an example.

Consider a customer database; we want to use this database in our registration form. Suppose we want to check to see if the username already exists in the database. If it does, the form fails, otherwise it returns a registration which can then be sent off to some other function we don’t care about right now. This means that when validation occurs, it ought to have the ability to behave differently for the same input. We need to impurely talk to the database in order to complete the validation. But why do the formlets need to be impure? Why not run the whole form, and then compare the returned values against a database? Because then the whole model of composability is broken. It is no longer a formlet that validates inputs. It is a formlet that validates some inputs, and then breaks the abstraction when it gets a bit hairy. We want to be able to have a username formlet that has everything necessary in its description contained within its definition. We are now going to experiment with this notion.

Let us continue from where we left off from Example 1.

Create a simple database to use

For this example, we’ll play it safe and use Sqlite3.

import Database.HDBC
import Database.HDBC.Sqlite3

We shall create a customer table with some entries to work with, with md5 hashed passwords:

I shall summarise what has been covered in this section. We have familiarised ourselves with monadic version of check and ensure. We have happily combined pure and impure formlets, and kept each formlet contained, doing one thing well. ’Tis the beauty of abstraction in programming at its most clear. We have established how to run a formlet with a monad such as ReaderT, and we can see how this might be done with StateT, for instance.

Example 3: Custom form inputs

Earlier, we touched upon producing custom mark-up for a formlet, by wrapping around an existing one. We will now create our own (simple) form input from scratch; that which is not provided in the Text.XHtml.Formlets library; a checkbox.

A checkbox can only return two values, checked or unchecked, therefore Bool is completely satisfactory as a return value for our formlet. To help us think about the type, let’s look at an existing form input’s type:

Of course, we have only demonstrated a very simple example. However, most form elements are simple. The point here is that one can use arbitary markup in a formlet and then compose that with any other. Consider a clever Javascript colour selector, and such things like that.

Formlets in a real CGI program

We have now covered the lovely features of the Haskell Formlets. Finally, I have put our demonstration code into practise in a live server application which simply accepts registrations and lists the usernames. You can view the page and view the raw source code or a syntax highlighted version. Forgive the messy code; it was written in about ten minutes; the code around it isn’t really important. It is the formlets themselves.

UPDATE: 11 April 09: Also, I have an example of using formlets in a real project with time constraints etc.

Summary

I think I have covered, quite fully, the idea of formlets. Formlets are composable pieces which contain (1) presentation, (2) validation, (3) parsing, (4) success and (5) failure. Formlets are useful because these five points are kept in one place and thus consistent, without any manual ‘synching’. We can compose formlets, and a formlet’s definition contains all the required information. We can customise the presentation of existing formlets, or create our own new input methods. A self-contained formlet can validate with and perform side-effects, in a safe, composable manner. Formlets parse validated values into proper program values (such as the Registration type). Formlets are an excellent example of the kind of abstractions that Haskellers use all the time.

Notes

Please email me about any typing mistakes or inconsistencies that you notice.

Formlets, Parsec, Text.XHtml, the various monads, are some of my favourite examples.↩

This is a type, defined in Control.Applicative.Error, with a definition that will make clear to you why it is used: