The Old New Thing: Turning josh.sg Static

May 20, 2018

Fifteen years ago, when I spun up what became josh.sg, Movable Type was the de rigeur blogging platform, the CMS before any of us even really knew what a “CMS” was. It was Perl (with all the quirks that that implies); it was static (PHP-powered dynamic page generation was still TK); but it still meant that any idiot could post content on the web without having to learn to wrangle HTML. This Macworld article is pretty representative: “Weblogs (often referred to as blogs)” appears in the third graf (and, separately, kudos to Macworld for keeping the post live and avoiding link rot even after fifteen years).

Dynamic page generation became the buzzword later in the 2000s; why not store the site in a database and construct it on each request? CPU cycles are cheap, right? And Wordpress still owns the space, but it’s got its own headaches: oddly flaky (especially on low-end machines like the AWS t2.micro that hosts josh.sg); notoriously prone to security holes; and regularly in need of updates and maintenance even when you’re not posting (which I’m not really, having migrated most of my snarky shortform commentary to Twitter).

And I don’t update the site much any more; you might notice it’s gone from multiple posts in a day back in 2009 to a grand total of zero posts for the whole of 2017. So: why not set myself a task of turning josh.sg into a nice clean static site that:

Loads a hell of a lot faster than Wordpress;

Can be hosted in Amazon S3; and,

Doesn’t fall over in a heap whenever MySQL or the backup script falls over on the hosting server.

I’ve spent the last few days extracting josh.sg’s content from a locally-hosted Wordpress installation and rebuilding it using the Hugo static site generator. Here’s the process I used, and a few notes that might come in handy if you’re trying the same thing.

Step 0: Why Hugo?

There’s a whole world of static site generators out there. The only reason I picked Hugo (the #2) over Jekyll (the #1) was that Hugo comes as a single executable and I didn’t want to have to set up the entire Ruby toolchain.

Step 1: Extract posts from Wordpress to Markdown

I used ExitWP to convert my Wordpress posts (extracted from a Wordpress XML dump) to Markdown. It generally works fine, though there were a few conversion issues: notably, when it ran into blockquote tags, it’d add the > characters to signify a blockquote but also left the HTML tags intact.

The solution was a quick Python script to loop through the generated Markdown files and rip out any <blockquote> lines; the output works fine.

The other problem: footnotes; exitwp doesn’t play at all nicely with the non-native Wordpress footnoting plugin I was using. The only solution here was to go through the small number of posts with footnotes and manually edit them.

Step 2: Generate new Hugo site

Step 3: Choose a theme

Coming from Wordpress to Hugo is a splash of cold water in the face. There’s a decent library of themes available for Hugo, but the ones tailored to blogging, at least, tend to be pretty minimal, and reward some hand-tailoring of the CSS and the template pages.

Step 4: Generate and upload

Generate your new site using Hugo’s live-update server (which is quite a nice feature); at this point you’ll want to browse your site and manually clean up any issues that you find

I’m uploading the generated files to an S3 bucket with web-hosting enabled. If you try this and you get mysterious 404s on the individual article pages, or any page in a subdirectory, even though you think you’ve made all objects public, it’s because S3 doesn’t seem to like to set permissions recursively. this Stack Overflow page explains how to use a security policy to make all the items in your hosting bucket public.

What’s next?

Workflow: oh yeah the workflow, there basically isn’t one. Until I get that figured out, posting a new post is a laborious process of “find site generator files; add new .md file into post directory; generate on laptop; manually upload to S3; fix permissions; tweet link to new post; remember to back up files…“.

Netlify seems to be the default here, but I’d rather not rely on Netlify to stick around for the remainder of the life of the universe, so it’s probably going to be some combination of an S3 upload notification triggering the Hugo executable. Watch this space.

Rogue formatting bugs: the exitwp conversion to Markdown is still a little buggy; there are pieces of text throughout the site that are still rendering as underscores rather than italics, for example. I’m looking for a solution that doesn’t involve manually walking through a thousand-plus posts.

Comments: haha no, comments are a cesspit; my spam-to-comments ratio on the old site was somewhere north of 1000 to 1. I’m going to point commenters to Twitter.

SSL & Cloudfront: the old site took advantage of Let’s Encrypt for SSL, and it worked beautifully. Encrypting an S3 site requires a bit of Cloudfront wrangling.

On the whole

10⁄10 would recommend, though it’d be nice if there was a better ecosystem of prebuilt themes and a better workflow for publishing. If you’ve ever wanted to move off Wordpress, there’s no time like the present.