Scott's WeblogThe weblog of an IT pro focusing on cloud computing, Kubernetes, Linux, containers, and networking

The Story Behind the Migration

A number of people have asked me why I migrated from WordPress—which powered my blog for 9 years—to Jekyll and GitHub Pages. Now that the migration is finally complete, I can share with you the story behind the migration: why I migrated, the process I followed, and some of the tools I used.

Why I Migrated

“Why?” is a question I heard quite a bit as I was sharing updates on the progress of the blog migration over the Christmas/New Year holiday. It’s quite simple, really: I needed to walk the walk.

Allow me to explain. For the last couple of years, I’ve occasionally been giving presentations at VMUG meetings and other events on how to stay relevant in the fast-changing world of IT. The most recent instance was a whirlwind tour of Dallas, Chicago, and Phoenix in September of this last year, where I presented this deck, titled “Closing the Cloud Skills Gap.”

In that presentation, one of the recommendations I made to the audience was to become more familiar with the software development process. That includes tools like Git (and, by extension, GitHub), Vagrant (a quick introduction is available here), and others. I don’t fully buy into the “Everyone needs to become a programmer” mantra, but I do believe that IT pros who educate themselves on the software development process will have an advantage over those who don’t. I believe it’s sound advice, but the issue was that I wasn’t following my own advice. I was only relatively familar with Git, and even less familiar with GitHub. I needed to walk the walk as well as talk the talk.

So I upped my game: I started becoming more active on GitHub, primarily through contributions to the Open vSwitch web site (which, like this site, is also hosted on GitHub Pages). I learned about Git branches, remotes, squashing commits, and submitting pull requests. I realized that the only way to make using these tools and processes as natural as possible was to integrate them into something that was already deeply a part of me—my blog. By migrating my blog to GitHub Pages and Jekyll, publishing a blog post becomes committing changes and pushing them to GitHub. If I decide I want to work on a new theme or work on adding new functionality to the site, I can create a branch to hold my work.

There are other reasons why this move made sense for me, not the least of which was the fact that I was already writing all my blog posts in Markdown anyway (see this post). Switching to Jekyll means I get to keep writing in Markdown and skip the HTML conversion process (Jekyll handles that for me on the server side). I gain the freedom to switch providers (I can run Jekyll locally and upload the resulting site to any web server). Heck, I could run the website out of Amazon S3 if I wanted to (which is what Werner Vogels does). It also frees me to use whatever content creation tool I want to use, since Markdown is just plain text. vim for writing blog posts, anyone?

The Process I Followed

Once the decision to migrate was made, then the focus had to shift to how I was actually going to do this. I had a lot of content in WordPress (over 1,600 posts). After some trial-and-error, I finally settled on a process that seemed reasonable:

First, I exported the content out of WordPress using the Export tool in the WordPress dashboard. This created an XML file containing the blog posts and their associated metadata.

Next, I used a tool called exitwp (itself a project hosted on GitHub) to take the data in the XML export and turn it into a boatload of Markdown files and a directory structure. (Jekyll has a certain directory structure that it expects, and exitwp helps build that structure for you.)

These two steps will get you most of the way (say, 90%) to getting your old WordPress content over to Jekyll. However, being the stickler that I am, I knew that a bunch of my old content wasn’t formatted correctly–there were strange characters left over from a previous blog migration, for example. So I kept going.

Using the Markdown files generated by exitwp, I used an application named TextSoap (an application designed to “clean” text files like the Markdown files I had) to apply a series of regular expressions (regexes) to all the files. These regexes did things like remove ASCII gremlins (odd characters), straighten all quotes, remove extra returns and spaces, etc. TextSoap was a huge lifesaver in cleaning up the files—I was able to transform all 1,600+ posts in a matter of minutes (once I had created and tested the regexes). For most people, this would have been more than enough. I kept going.

I manually reviewed and cleaned up every single blog post, committing them to the GitHub repository only after I was satisfied with the source content files. Part of this final review was switching hard-coded blog URLs to post_url Liquid tags (so that post URLs were created correctly when the site is generated by Jekyll)—another area where TextSoap came in handy. This was obviously quite time-consuming, but it (hopefully) yielded a higher quality site.

The Tools I Used

I’ve already talked about some of the tools I used, but let me recap them here:

A couple of different tools can take the exported XML and transform it for you. I used exitwp, but there are other tools as well. I believe Jekyll has a WordPress-to-Jekyll importer, for example.

To further “clean” and/or fine-tune your Markdown content, you can use a tool like TextSoap to modify all the text files rapidly. (The more hardcore among us could use tools like egrep and sed to do the same thing.)

You’ll obviously need to use git or the GitHub client, as you have to push changes to the Git repository to GitHub.

Summary

I hope this post helps provide some context and information on why I switched to Jekyll, how I handled the migration, and some of the tools that were useful during the migration. There’s so much more to be shared about the migration, so stay tuned for more posts in the near future.