Switching this site from Jekyll to Hakyll

I used to manage this site with Jekyll. I’ve now switched to Hakyll. Here’s my reasoning and some notes on how it went.

Static Site Generators

I love static site generators. They let me run a faster, cheaper, more secure web site.

As a professional paranoid, I like the smaller trusted computing base. The only code that serves requests on my server (outside the kernel) is publicfile, which contains 0.2% as much code as a Nginx/MySql/Wordpress stack, for example.

So I didn’t even consider alternatives like Wordpress. Not that they aren’t awesome — they just aren’t my style.

Jekyll

I was an early adopter of Jekyll. It sort of pioneered the current generation of generators, the ones with YAML metadata and blog-oriented features.

But in my use case, Jekyll didn’t age well. I adopted Jekyll at version 0.10.0, when the plugin API was rather limited — and the accepted way of extending it was essentially monkey patching. (I’m not sure we thought it was monkey patching at the time, but it proved to be fragile.) As a result, once I had done the work to teach Jekyll all the tricks I wanted, I was more or less stuck using 0.10.0 forever.

Which might be fine, except that:

It meant that when the community introduced better ways of doing what I was doing, I couldn’t upgrade.

It meant I needed to keep an ancient copy of Jekyll, and all its dependencies, available and fresh.

You could say that this fragility was my fault; I wouldn’t argue much. But I also think that this sort of fragility happens more easily in languages and environments with certain properties. In particular, while dynamic languages like Ruby are famous for making this sort of expedient patching possible, they also render such patches far easier to break. It’s easy to get silent failures when a method you’ve overridden is no longer called, for example. The proper fix for this in Ruby is pervasive testing.

And while high test coverage and continuous integration are the mainstay of my professional life, this is a blog. It’s already a relatively low priority in my life, and I want to spend what little blogging-time I get writing, not writing tests.

The time came when I had to choose between applying OS security updates to my server and having Jekyll continue to run. I chose the former. This has left me unable to update my blog without rebuilding a sandboxed Ruby environment to reproduce the state of the world circa 2011.

So when I decided to rectify the situation, I looked around for static site generators similar to Jekyll, but written in a language that has some meaningful level of automated checking, and ideally with a module structure that makes for robust plugins.

And I found one!

Hakyll

Hakyll is a static site generator written in Haskell, which happens to be one of my favorite programming languages.

I’ll be honest: if you’re not already comfortable with Haskell, or looking for an excuse to become so, Hakyll is probably not for you. A lot of folks can probably use Jekyll effectively without ever learning Ruby; the same is not true of Hakyll and Haskell.

However, if you already speak Haskell, Hakyll has some real advantages:

Whereas a Ruby program can only be considered correct with 100% statement-level and control-flow coverage1, the joke among Haskell programmers is that any program that actually compiles is likely to be correct. That’s an exaggeration — but less of an exaggeration than you might think.

Haskell’s module system lets Hakyll’s designers explicitly designate which bits of Hakyll are stable and available for me to (ab)use, vs. those that are purely internal.

If I use a function or an extension point, and it goes away in a future revision to Hakyll, I will get a compilation error — but:

Hakyll sites can be managed using Cabal, which means setting up a fixed library environment of all your site’s dependencies is as easy as typing cabal sandbox init once. Cabal also lets you fix the versions of dependencies if desired, for an easily reproduceable build. I will not be bitten by a surprise upgrade to Hakyll or one of its dependencies.

Hakyll uses Pandoc under the hood, which is one of my all-time favorite text munging tools. Every time I use a less capable text munger, like the Markdown formatter in Jekyll, I have sadface. Things that are easy in Pandoc and very tedious in lesser systems:

Apply a Markdown template to an HTML document (not the other way around).

Find any HTML header tags in a post teaser and demote them by two levels.

Iterate over all URLs in a document and rewrite them.

Generate a PDF version of certain posts.

Format some sections using TeX.

Pull out a div with a certain id as a teaser instead of splitting at some arbitrary text marker.

Hakyll is crazy fast. My site is not that big, but Jekyll was getting noticeably slower as I added content.

So, that much was an easy sell. But it’s taken me a little over a year to finally get around to porting to Hakyll, for one reason:

While Hakyll’s name would suggest that it’s similar to Jekyll, it’s really not. In fact, its approach is almost diametrically opposed to Jekyll’s. Where Jekyll follows the Rails-style “convention over configuration” scheme, Hakyll will do literally nothing out of the box. You have to tell it where to find inputs, what to do with them, and where to put the results.

It’s not as bad as it sounds — there are good tutorials, and the configuration language is straightforward. But it’s something you’ve gotta do.

Moving to Hakyll

Do not imagine that you can write some routing rules and have Hakyll process your existing Jekyll site. If you are very lucky, and your site is very simple, you might pull this off. But do not expect it. This put me off Hakyll for nearly a year before I quit whining and dug in.

Hakyll is incompatible with Jekyll in two ways that bit me immediately: blog post handling and metadata.

Blog Post Handling in Hakyll

Hakyll does not grok Jekyll’s _posts convention and does not do formatted permalink (“slug”) generation. That is, where Jekyll will take a file called

blog/_posts/2011-01-20-hello-blag.md

and render a post that can be reached at the URL

blog/2011/01/20/hello-blag/

Hakyll just stares blankly.

It is, of course, possible to teach Hakyll how to do this. One can teach Hakyll to do anything. But there doesn’t seem to be a commonly agreed upon module to do it — each blog I’ve looked at uses its own code. Synthesizing these approaches, or rolling my own from scratch, sounded like negative work to me, relative to my actual goal: of posting something on my blog.

My priority in the migration was to preserve the user-visible structure of the site — to break as few links as possible. Preserving the source structure that I edit was a non-goal. So, in the end, I decided the path of least resistance was to give up this fancy URL-transformation stuff from Jekyll. The blog post mentioned above would now simply live at

blog/2011/01/20/hello-blag/index.md

This generates a low, constant amount of work for me: when posting, I have to create a directory or three. I can automate that if it becomes annoying.

Metadata: a total Hakyll

Jekyll allows arbitrary YAML in its metadata, and I leaned pretty heavily on this to do everything from size image thumbnails to generate 3D previews of objects and associate them with a page. It was awesome.

Pandoc also supports YAML metadata in exactly the same style. I’ve used this too.

Yup: string keys and string values, a single level. Not even lists! You can kind of hack around this by writing dynamic Context implementations; I wrote one that would extract a second metadata block from pages and make it available using Pandoc’s support. But that’s only available in templates, and even then, it’s restricted to strings and lists of things that can become strings. In the end, I found it inflexible and abandoned it.

Sigh.

In the end, Hakyll offered enough advantages that I simply ditched all the YAML-driven functionality on my site. But I’m still sad about it. I’ll gradually find ways to implement it differently.

Seriously. Ignore for the moment the fact that any function can be replaced at any time. Because of the risk of undefined variables, wrong argument counts, etc, you really want to have run every line of your Ruby program at least once. Because the definition, types, and nil-ness of variables may change along each code path, you also need full branch coverage in the strictest case. Unfortunately, establishing branch coverage in Ruby — like SmallTalk — is actually really difficult. So write lots of tests and hope for the bests.↩