Recent Posts

We just got a set of new SuperMicro servers for one of our Ceph clusters at HUNT Cloud.
This made for a great opportunity to write up the simple steps of expanding a Ceph cluster with Juju.

New to Juju? Juju is a cool controller and agent based tool from Canonical to easily deploy and manage applications (called Charms) on different clouds and environments (see how it works for more details).

Scaling applications with Juju is easy and Ceph is no exception.
You can deploy more Ceph OSD hosts with just a simple juju add-unit ceph-osd command.
The challenging part is to add new OSDs without impacting client performance due to large amounts of backfilling.

Below is a brief walk through of the steps on how you can scale your Ceph cluster, with a small example cluster that you can deploy locally on LXD containers to follow along:

Set crush initial weight to 0

Add new OSDs to the cluster

Clear crush initial weight

Reweight new OSDs

Deploy a Ceph cluster on LXD for testing

The first step is to get a Juju controller up and running so you can deploy Ceph.
If you’re new to Juju and LXD, you can get started with the official docs here.
In case you already have installed all the requirements, you can simply bootstrap a new controller like so:

The output above says that the cluster does not contain any pools, pgs or objects.
So, let’s fix that by creating a pool called testpool and writing some data to it with one of Ceph’s internal benchmarking tools, rados bench, so that we have something to actually shuffle around:

Finally, let’s take a look at ceph osd tree which prints out a tree of all the OSDs according to their position in the CRUSH map.
Pay particular attention to the WEIGHT column as we will be manipulating these values for the new OSDs when you expand the cluster later.

Don’t worry if you don’t end up with the exact weight or usage numbers as above.
Those numbers depend on the size of storage available on the LXD host.

So, with a working Ceph cluster, we can finally get started.

Set crush initial weight to 0

As mentioned at the top, the challenge is to manage the amount of backfilling when adding new OSDs.
One way of doing this is to make sure that all new OSDs get an initial weight of 0.
This ensures that Ceph doesn’t start shuffling around data right away when we introduce the new OSDs.

The ceph-osd charm has a handy configuration option called crush-initial-weight which allows us to set this easily across all OSDs hosts:

juju config ceph-osd crush-initial-weight=0

Note: There was a bug in the ceph-osd charm before revision 261 which did not render the correct configuration when setting crush-initial-weight=0.
Here’s a workaround for those on older revisions:

If you only want to add OSDs (drives) to existing Ceph OSD hosts you can use the osd-devices configuration option.
Here’s an example for this test cluster which adds a new directory as a new OSD for all ceph-osd hosts:

Good for manually controlling the CRUSH map with automation, version control etc.

The main point is that you want to increment the weight of the OSDs in small steps in order to control the amount of backfilling Ceph will have to do.
As with all things Ceph, this will of course depend on your cluster so I highly recommended you try these things in a staging cluster and start small.

For our test, let’s use the reweight-subtree command with a weight of 0.01 so we can reweight all the OSDs of the new Ceph OSD host:

And there you go.
All the new OSDs have been introduced to the cluster and weighted correctly.

Afterword

Thanks to all the OpenStack Charmers for creating and keeping all of these charms in great shape.
Also thanks to Dan van der Ster and the storage folks at CERN for the tools and many great tips on how to run Ceph at scale.
Finally, many thanks to Oddgeir Lingaas Holmen for helping write and clean up these posts.

I recently spent some time upgrading our Juju environments from 2.1 to 2.3. Below are a few lessons learned aimed at other Juju enthusiasts doing the same experiment.

First, Juju is a cool controller and agent based tool from Canonical to easily deploy and manage applications (called Charms) on different clouds and environments (see how it works for more details).

We run an academic cloud, HUNT Cloud, where we utilize a highly available Juju deployment, in concert with MAAS, to run things like OpenStack and Ceph. For this upgrade, we were looking forward to some of the new features such as cross model relations and overlay bundles.

How to upgrade Juju (for dummies)

Upgrading a Juju environment is usually a straightforward task completed with a cup of coffee and a couple of commands. The main steps are:

Upgrade your Juju client (the client talking to the Juju controllers, usually on your local machine, apt upgrade juju or snap refresh juju)

Issue No. 1

Now, if you look closely, the output above says 2.2.9, not 2.3.2 which was the latest version at the time and the one I actually wanted.
Well, the upgrade to 2.2.9 succeeded, so I continued upgrading once more by running juju upgrade-juju --model controller to reach 2.3.2.

This time things didn’t go as smooth for the controllers and they got stuck upgrading which rendered the environment unusable.
It did however produce some rather bleak yet humorous error messages.

I was able to reproduce this in one of our larger staging areas and the bug got fixed in 2.3.3 in lp#1746265.

Issue No. 2

So, after getting stuck with the issue above, I was encouraged to try upgrading straight to 2.3.2, skipping 2.2.9 altogether.
Juju allows you to specify the target version using the --agent-version flag.
The command you end up with is juju upgrade-juju --model controller --agent-version 2.3.2.

Sticking to good form and the rule of three, the controllers got stuck upgrading rendering the environment unusable once again.
Fortunately, it was easy to reproduce both in our staging area and on local LXD deployments so this one also got fixed in 2.3.3 in lp#1748294.

Issue no. 3

We gave the upgrade a new try when version 2.3.4 rolled around late in February.
Things looked good after multiple runs in staging, so I finally upgraded one of our production controllers using juju upgrade-juju --model controller --agent-version 2.3.4.

The upgrade process took around 15 minutes. After a lot of logspam in the controller logs and some unnerving error messages in the juju status --model controller output, things seemed to settle.
Almost.
We noticed charm agent failures and connection errors between the controllers and a small number of the applications in the main production Juju model containing our OpenStack and Ceph deployments.

After filing lp#1755155, I was recommended to push on and upgrade the Juju model even though some of the charms errored out.
This approach resolved the connection errors.

The root cause was most likely lp#1697936 which was reported last year.
It turned out 2.1 agents could fail to read from 2.2 and newer controllers.
I did eventually find a mention of the bug in the changelog for 2.2.0, however the description did not contain the error messages leaving my searches in Launchpad coming up empty.

Upgrading the model with juju upgrade-juju --model openstack --agent-version 2.3.4 and restarting the affected agents finally did the trick and all components were running smoothly on 2.3.4.

Afterword

Now you might rightfully ask, Sandor, why on earth didn’t you just upgrade the model right away as described in step 3?
Well, I simply became a bit wary of proceeding without any easy way to rollback after running into all the previous bugs where things got stuck.