Tackling the glibc vulnerability

I had only been at my current role for a few months when the glibc vulnerabilty reared it's ugly head. Without going into the specifics, here's what the exploit really meant.

It was discovered that the GNU C Library incorrectly handled receiving
responses while performing DNS resolution. A remote attacker could use this
issue to cause the GNU C Library to crash, resulting in a denial of
service, or possibly execute arbitrary code.

Puppet

In order to patch it we had to role out the update using Puppet which was easy enough. Just a case of using the Puppet package resource and applying it to all nodes typically via a shared module:

package { 'libc6':
ensure => 'specific_package_release'
}

Restarting all nodes

Due to the severity of the bug as it affects some very fundamental packages in a *nix system, the only way to resolve this is to restart the system after patching. The problem this poses is that you then don't know which nodes need rebooting and which are now safe while patching it all; this is where checkrestart comes to the rescue.

Checkrestart

checkrestart is part of the debian-goodies package, checkrestart allows you to see which processes need restarting after an upgrade. By using the tool, I could then figure out which machines needed rebooting and which had already been upgraded and bounced.

Sensu / Alerta

For monitoring at my current role we use Sensu and for alerting we use Alerta. I found a community sensu check that uses checkrestart called check-process-restart.rb that will alert if the node has X amount of processes that need restarting.

Once all the nodes were subscribed to the check (and by doing so making our alerting super angry) I could identify which systems needed rebooting and which had been done already. You know. Because screw doing anything manually.

Docker images and base AMIs

Due to the nature of docker it was pretty easy to patch the base images and roll out updated containers.

As for our base AMIs we just had to update them using our CI setup (Jenkins) and then the autoscaling will do the rest for us.

Learning the infrastructure

As painful and laborious as the entire patching and reboot saga was, it was a great way to get myself accustomed with the new infrastructure I was then managing. I had only been at my new role for about 3 months so it gave me the opportunity to go through the many Puppet manifests and corresponding modules. From this I could gain an insight into what was actually business critical or just a massive ballache to mess with.

Following on from this it's also given me a few ideas as to what are the SPOFs are, and then allowed me to look into remedying it.

All in all it was actually a very useful exercise and am quite glad it happened in retrospect.