Posts

Wow,
it’s been less than 2 years since Sanger started it’s live private cloud service. Over this time we’ve had an incredible journey with both the OpenStack community at large and our local developers, research staff and Vendors.

We have just heard from SuperComputing 2018, that we have been awarded the readers choice for the Best use of HPC in the Cloud. This is a considerable achievement for the IT, informatics and project teams that have been a part of this journey.

Introduction

The RADOS Gateway (rgw for short) is a component of Ceph that provides S3-compatible storage. Our users use it to make data publically available, as well as sharing data privately with collaborators.

Naturally, we want to use HTTPS for this, which means we need a TLS certificate. This needs to be a wildcard certificate, as S3 typically puts the bucket name into the request hostname (e.g. for our service cog.sanger.ac.uk, bucket foo would be referred to as foo.cog.sanger.ac.uk).

We use a commercial wildcard certificate for our production S3 gateway, but these have a number of downsides (renewal/rollover is typically a manual process, and they cost money – around £75/year from the Jisc provider, up to £180/year from a commercial provider); so we decided to look at LetsEncrypt (LE) for our test S3 gateways with a view to using LE in production when our current certificate expires.

This article briefly outlines how we set this up, in the hope it might be of interest to others.

LetsEncrypt

LetsEncrypt (LE) provides a free, automatic certification authority, based on Free Software. It is a non-profit, and funded by donations. Its certificates are short-lived (90 days), so you essentially have to automate their acquisition and deployment.

The protocols used by LE are well-documented elsewhere, but essentially it’s a challenge-response system – you have to prove to LE that you own the domain(s) you’re requesting a certificate for. There are two ways to do this – over HTTP (i.e. putting the challenge token on a web-server) or DNS (putting the challenge token into a special
DNS record).

Dehydrated & DNS updates

There are a number of LE clients available; we chose dehydrated, because it’s a fairly simple shell script, doesn’t need (or expect) root access, and doesn’t try and do any of its own crypto. Also, the set up is pretty straightforward.

For S3, we need a wildcard certificate, and those are only available using the DNS-based challenge. Dehydrated supports this; you need to supply a hook to let it update the relevant DNS records. The Dehydrated wiki has hooks for a number of providers and resolvers, but not one for Infoblox, the BIND-based DNS/DHCP/IPAM platform we use. So we had to write one. Essentially, this involves POSTing small JSON fragments to the Infoblox API:

Beyond that, dehydrated needs relatively little configuration – a domains.txt containing the domains we want certificates for, and a configuration file most of which can be left as defaults. We change a couple of things to specify the LE API endpoint (there’s a staging endpoint that can be used for testing, to avoid various rate limits), that we want to use the DNS challenge type, and where the hook script is:

Key distribution

At this point we can get a wildcard certificate for *.cog.sanger.ac.uk, but we still need a way to deploy it across our rgws. Since each rgw is a Ceph client (and so has a Ceph credential), the easiest way to do this is using a Ceph pool. So we make a small Ceph pool, and then adjust the hook script to store the key, certificate, and certificate chain in the pool:

Each rgw then runs a cron job that gets the key, chain, and certificate out of the rgwtls pool, performs some sanity checks, and then copies them into place if they’re different to the currently-deployed set. The web server within the rgw is civetweb, which expects a single .pem file containing key, certificate, and intermediate chain.

In outline, that’s all there is to it! One script to negotiate the DNS challenge with LE and copy the key/certificate/chain into a Ceph pool, and another script to deploy them.

Automated deployment

We use Ansible to manage our Ceph deployment; specifically, we manage a local branch off the 3.0 Stable version of ceph-ansible. So we developed a small role to automate deploying this on our Ceph clusters. It’s mostly just the obvious copying scripts into place, installing cron jobs, and so on. We do, however, only want one of our rgws to actually be running dehydrated, though (otherwise you’d have multiple dehydrated clients getting different keys and certificates for the same domain, which would lead to confusion!); we achieve this by arranging for it only to be installed and run on the first member of the rgws group, e.g.:

Issues

We found a couple of wrinkles that are worth noting. Firstly, the LE servers do not share state between them, which means you need a stable external IP address for the DNS challenge to work. If you get strange errors from dehydrated, it’s worth checking this; our scripts achieve this by using a single IP address of our local web proxy as HTTP proxy.

Secondly, the civetweb used in Ceph versions prior to Luminous has no mechanism for noticing when its TLS key/cert change, so you have to restart the rgw, which is disruptive. Given the short validity period of LE certificates, this equates to a restart every 90 days. S3 clients should retry, so this is unlikely to cause a problem, but it’s something to be aware of. With Luminous or later, you can set ssl_short_trust in your civetweb configuration, and then the restart isn’t needed.

Show Me The Code!

We’ve glossed over a lot of details of error-handling and suchlike, in the interests of brevity. If you actually want to try this yourself, then you’ll want to care about the tedious details 🙂

Our Ansible role is available on github; it’s not exactly a drop-in role – you’ll need the hook script to talk to your DNS machinery, for example, and store the related credential using regpg or similar. But you can see all the error checking details that I glossed over above, and it’s hopefully a useful starting point.

We’ve offered the role to the upstream ceph-ansible project, so it may yet appear there in future…

Our iRODS infrastructure continues to grow. We have just signed off another >1PB of iRODS capacity to our CASM group. This brings our current iRODS archives up to approximately 14PB total capacity, between Sanger and JSDC data centers.

At the end of 2017, we delivered another 1PB of Ceph storage to our rapidly expanding internal cloud, the flexible compute environment. Over the course of 2017-2018, we have gone from 0PB to 5.5PB of usable Ceph based storage.

This has become our most rapidly deployed storage platform to date and stability in particular has remained outstanding, we have more details available here: