Setting Up a Research Blog with Gitlab Pages - A Guide

February 1, 2017

NOTE: this post may be subject to edits as I learn more and add more to the blog.

Towards the end of last year, I decided to set up this blog to document and
motivate my research progress, and since it turned into a more involved task
than I originally anticipated, I thought I’d better write something about the
process here in order to potentially aid other budding blogeurs.
Let me preface this post by stating at the outset that I am by no means a
web expert- be that front-end, back-end or no-end -but when I have an itch
to scratch, I do like to hack my way through it, sometimes tearing what remains
of my hair out and sometimes learning some useful things along the way.
This particular venture might best be considered as having nurtured some insidious
combination of the two. So without further ado, allow me to present what I
have uncovered amidst the fresh-plucked remnants of my follicular outgrowth.

Jekyll

Since I have been using Jekyll and will be referencing it throughout this post,
I’ll just briefly describe it here.
Jekyll is based on Ruby so
its workflow involves installing gems
when you want to add fun new features to your site.
As is proudly touted on the Jekyll homepage, it is possible to
get a new Jekyll site up and running in seconds:

That last bundle exec jekyll serve line is particularly useful,
and you will find yourself using it all the time in order to test
the current version of your site, or just leaving it running
in a terminal for same since Jekyll will update most of your changes
on-the-fly.

Themes

There are an abundance of themes available for these site generators
(e.g. see here for Jekyll themes), and it is
helpful to just pick one early on, fork its repo, and use it to
kickstart your blog development process.
I used the al-folio theme
(see here for a demo), which
in turn was based on the *folio theme,
and hacked it up for my own purposes.

Free Hosting with Gitlab Pages

When I first decided to put together a blog, I had initially intended to use
Github Pages for the task, but I wanted to use
a custom domain secured end-to-end with SSL/TLS, something that is not currently possible
with Github Pages (you can use Cloudflare for SSL,
but it only secures the connection between users and the CloudFlare network,
not between Cloudflare and the hosting service, i.e. Github -
see here for a discussion)
as well as custom Jekyll plugins,
so I decided to go with Gitlab Pages instead.
There are several other nice advantages to using Gitlab Pages over Github Pages
e.g. your choice of static site generators and customizable build processes,
and, importantly, there are ways and means of getting around the disadvantages,
e.g. slow build times (see the below section on continuous integration).
Some of the main differences between them are summarised
here
and here.

Continuous Integration and Gitlab Runners

Ok, so let’s get our cards on the table here- upon first reading these terms,
I was as confused as you probably are.
They are buzzwords that feel so abstracted away from the reality of what they
might be doing that it is not at all obvious what their purpose is.
Let me try to break it down for you as I understand it.

Continuous Integration

Continuous integration (CI), according to Wikipedia,
refers to
“the practice of merging all developer working copies to a shared mainline several times a day.”
However, for our purposes working on Gitlab Pages, what this effectively means is
that everytime you push a commit to the Gitlab server, it will rebuild your project.
Why? Well, I’ll leave more detailed explanations to the Wikipedia page,
Gitlab’s own explanatory effort, or
the more procedurally cognisant, but suffice to say that this is a good way of making sure
that everything is always working.
The most important thing to note is that this process will build and deploy your site.
How? In brief, this involves creating a .gitlab-ci.yml file that
tells Gitlab Pages how to build your project.

source'https://rubygems.org'# Jekyllgem'jekyll'# Added these to get al-folio workinggem'jekyll-paginate'gem'jemoji'gem'jekyll-scholar'gem'pygments.rb'# Needed for converting Gravatar to faviconsgem'rmagick'

We’ll discuss the usefulness of Bundler and the contents
of the Gemfile in more detail as we go along,
but let’s get back to the .gitlab-ci.yml file for now and
try to get an overview of what’s going on there in each of
its sections:

The image Specification

This part is really simple - it just tells Gitlab which Ruby Docker image to load.
For those who don’t know, Docker is the software containerization tool taking the world by storm
that allows you to package your software in a “container”, i.e. a complete filesystem that contains
all of the necessary bits and pieces needed for it to run, so that the software can always be run
on any machine in the same environment. A nice overview can be found here.
In this instance, we’re asking Gitlab Pages to use a Ruby 2.3 Docker image so that we
can run Jekyll, which is Ruby-based.

The cache Specification

According to the Gitlab documentation,
“cache is used to specify a list of files and directories which should be cached between builds.
You can only use paths that are within the project workspace.”
So here, we tell Gitlab to keep the contents of the vendor directory
between builds. Why?
Well, as we shall see, we will be instructing Bundler to install gems into the
vendor directory, so by caching that directory between builds,
we can speed up the build process. Neat.

The before_script Specification

This section dictates what should be done before any build jobs are executed.
I currently do two things here.
Firstly, I fix some locale settings to avoid a problem caused by UTF-8 characters
in the author names of some of my publications.
Secondly, and perhaps more importantly for most use-cases,
I tell Bundler to install Ruby gems to the vendor directory using the command
bundle install --path vendor.
As was explained in the previous section, this is an attempt to speed up the build process.

The test Job

This is the first build job definition.
It instructs Gitlab to build the site in all branches except formaster,
place the results in test directories in each branch, and zip up the results
for download.
This really comes into its own when incorporated into a workflow where
you use different branches for writing drafts of blog posts and so on,
before merging them into the master branch for deployment.
More on this later.

The pages Job

This is where the real magic happens- it is where we deploy the master
branch of the project as a Gitlab Pages public site!
According to the Gitlab Pages documentation,
in order to make use of Gitlab Pages, the following three conditions must be satisfied:

A special job named pages must be defined

Any static content which will be served by GitLab Pages must be placed under a public/ directory

artifacts with a path to the public/ directory must be defined

Since this job carries the name pages, the first condition is already satisfied.
The instruction bundle exec jekyll build -d public tells Bundler/Jekyll to
build the site in the public/ directory, so that satisfies the second requirement
(the command is accompanied by some Google Analytics specifications, but more on that later).
The artifacts setup is pretty much the same as in the test job case, and
satisfies the third requirement.

That’s it! Once this file has been specified in the Jekyll project root directory and
everything is committed and pushed to the Gitlab server, Gitlab will
launch “runners” to build the project and deploy the site.
And it is these Gitlab Runners that are the subject of the next section.

Gitlab Runners

When Gitlab builds your project during continuous integration, it needs machines
to run the builds on. That’s where Gitlab Runners come in.
Gitlab Runners are virtual machines that can run on either Gitlab’s own servers,
some other server(s) linked to a Gitlab instance, or even your own laptop or other machine.
These are categorised as either shared runners or specific runners.

Shared Runners

For most use-cases, these are going to be Gitlab’s own servers, which can be slow
at times depending on their workload because they’re used to build
the jobs of Gitlab’s other users as well.
If you do not have a specific runner set up, then Gitlab will default
to using its own shared runners.
Good to be able to fall back on, but maybe not an ideal solution.

Specific Runners

Setting up a specific runner, e.g. on your own PC, allows you to dedicate
your own resources to your own builds. No more waiting for shared runners
on remote servers to queue your project! Caveat: you’ll still need a decent
internet connection for speed, because the runner seems to like to ping
the Gitlab server constantly during the build. This could of course
be avoided if you had your own Gitlab instance running on a separate
server, but we’re not dealing with that in this guide, so let’s not
worry about it.

Here, I will discuss how to install a Docker specific runner on Ubuntu,
but the instructions for other systems/methods are readily available.
There are a few different options for executers that provide different
ways and means of building a project in the runner.
There are good security reasons,
amongst other reasons, for using the Docker executer, so we will go along with that.
Here is how to go about the runner installation
(more detailed instructions can be found here):

After that, the specific runner needs to be registered in order to run builds
for your project. After entering the sudo gitlab-ci-multi-runner register
command as shown below, when prompted for a token, you need
to go to the Settings -> Runners section of your Gitlab project page
to retrieve the registration token provided in Step 3 under
“How to setup a specific Runner for a new project”.

After this, the next time you push a commit to Gitlab’s remote servers,
your specific runner should pick up on the build request and build
your project locally. Note again: this might still involve heavy network traffic
between Gitlab’s servers and your machine, but it might just be a faster build overall.

Using a Custom Domain

You don’t necessarily need to have a custom domain for your site-
Gitlab Pages will provide you with a nice URL along the lines of
https://barryridge.gitlab.io by default -but a custom domain
(e.g. barog.net) is a nice thing to have
for various reasons, so I will try to explain how to set one up here.
First of all, you will need to choose and register your domain name with
a domain name registrar, and for that I would recommend
namecheap.com, but there are many other
options available.

The Gitlab Pages documentation here
and here
explains how to set things up from the Gitlab side.
This involves going to the Settings -> Pages -> New Domain under your
project dashboard and setting an A record pointing to 104.208.235.32
and a CNAME record pointing to username.gitlab.io.
But you will still need to adjust your DNS settings on Namecheap so that your domain name
points to Gitlab’s servers.
David Ensinger provides a nice guide
for setting the DNS for Github Pages on Namecheap, but the procedure
for Gitlab Pages is not much different.
Here is what my setup looks like:

Namecheap Setup

Again, similarly to on Gitlab Pages, you will need to set an A record to point to 104.208.235.32
(see Gitlab documentation here and
here)
and a CNAME record to point to username.gitlab.io.

Securing Your Site with SSL/TLS and Let’s Encrypt

You may notice when setting up your custom domain that
there is a section in your Gitlab Pages project dashboard
under Settings -> Pages -> New Domain where you
can add an SSL/TLS certificate and its key.
But where do you get a certificate?
That’s where Let’s Encrypt,
the free, automated, and open certificate authority, comes in.
Long story short, it allows you to generate your own security
certificates for free so that you can have that warm and
reassuring HTTPS next to your domain name.

I largely followed this excellent guide
to get this going, but I did run into some tricky issues that
I will try to help you with here. The first thing that you
will need to do is install Let’s Encrypt on your local machine:

Here, I have replaced a generated filename with XXXX and a
generated token with YYYY.
You should keep this interface open WITHOUT pressing enter,
and proceed to set up Jekyll page called letsencrypt-setup.html
in your project root directory containing the following:

---
layout: null
permalink: /.well-known/acme-challenge/XXXX
---
YYYY

This will cause your Jekyll site to generate a file called XXXX.html
in the public/.well-known/acme-challenge directory when deployed, served at
http://YOURDOMAIN.org/.well-known/acme-challenge/XXXX.html.
The problem is, the letsencrypt-auto tool will look for the YYYY token
at the URL http://YOURDOMAIN.org/.well-known/acme-challenge/XXXX
without the .html extension.
To fix this, we add a shell copy instruction to .gitlab-ci.yml as follows:

pages:stage:deployscript:# Generate public site and deploy-JEKYLL_ENV=production bundle exec jekyll build -d public# JEKYLL_ENV used for Google Analytics# Use this when creating a new letsencrypt cert,# since jekyll adds .html to the file and letsencrypt# does not expect a .html extension-cp ./public/.well-known/acme-challenge/XXXX.html ./public/.well-known/acme-challenge/XXXXartifacts:# Save a zipped version for downloadpaths:-publiconly:# Only deploy the master branch-master

Remember: be sure to substitute XXXX and YYYY in the above with the
actual strings generated by letsencrypt-auto!
Once you’ve pushed the code to the Gitlab servers, you should then
be able to test it as follows:

~ $ curl http://YOURDOMAIN.org/.well-known/acme-challenge/XXXX
YYYY

If the string YYYY is returned successfully, then you can return to
the letsencrypt-auto tool terminal interface (that you should still have open!)
and hit ENTER as instructed.
The tool will then check the link just like you did to see if it returns the string,
and if it does, it should congratulate you on successfully generating your
certificate and you’re free to copy it over to your Gitlab Pages custom domain settings page.
First you need to copy the certificate(s) with the following command:

Adding Google Analytics

To set up Google Analytics,
I followed this tutorial
for Jekyll. I will not repeat the details here, other than
mentioning that an important aspect is that you need to
set JEKYLL_ENV=production environment variable ahead of the
bundle exec command in your .gitlab-ci.yml file as follows:

Coming Soon!

As you have probably already figured out by now,
setting up one of these blog thingies can be
a deceptively complex process.
I could go on and on writing about what I’ve had
to do to get to this point, but I wanted to get
something out there (like, you know, an actual blog post!),
so I’ve decided to stop here for now.
I would still like to write about some other things
in relation to this journey at some point in the future
though, so I will leave some placeholders here to give
you a taste of what is, hopefully, to come.

Leveraging Bower, npm and Grunt for Package Management

The package managers Bower and
npm, as well as the
automation tool Grunt,
are extremely useful doohickeys to have in your
toolkit.
Torsten Scholak has written an excellent post
on how to best make use of them with Jekyll and Github Pages
over on his Meticulous Disorder blog.
There really isn’t too much difference when employing them on Gitlab Pages.

Creating Blog Posts from Jupyter Notebooks

I haven’t actually tried this out yet, but there is a nice post
available here
detailing how it might be acheived.
This could be a very nice addition to the workflow of
writing a research blog, so I’m hoping for good things here.