Minimizing maintenance time while updating thousands of Acquia Cloud Site Factory sites

Note (January 2016): While this blog post was originally written about drupalgardens.com (originally published April 06, 2010), the same basic strategy for updating sites has continued to be used in the Acquia Cloud Site Factory (ACSF) platform that emerged from it. Some of the very specific details in the original post are outdated (for example, we now use git instead of svn), but the strategy of using two docroots with different code versions and two different vhost ports to choose between them is how ACSF still works to update sites.

In his Drupal Gardens 2 month update Chris wrote about a number of the improvements to Drupal Gardens in our last sprint (see archived blog post). I want to focus on our solution for a system that now lets us perform Gardens site upgrades with only a couple minutes of maintenance (site offline) time per site when running database updates. The problem was made a little more difficult by the custom domain feature that was also going live, so each Drupal site (database) might be referenced by multiple sites directories. I describe here the solution we worked out, which involves using Drush, the (internal) Acquia Hosting APIs, and communication with the http://www.drupalgardens.com/ site so that each site in our multi-site installs could be seamlessly moved between two different versions of Drupal 7.

Prior to this new system, the early beta testers sometimes experienced a fairly long period of their site being in maintenance mode during any Drupal Gardens upgrades. The reason for potential delays before is that we are using Drupal's multi-site feature to run thousands of Drupal Gardens sites off the same code base. The multi-site configuration gives us better PHP opcode performance and we also decided it was the easiest configuration for us to maintain. However, when we deployed new code, the safest approach was to take all sites offline since they all started immediately using the new code but their databases were not yet updated.

Drupal Gardens is deployed on top of Acquia Hosting in a high-availability configuration involving (at the moment) each site being served by redundant load balancers, three or more web servers, and a cluster of database servers. At the time of this project, the Aegir Hosting and Provision projects were not suited to work in such an environment, so we have our own system that uses Acquia Hosting's internal API calls to provision new databases, and Drupal Gardens-specific code to deploy them. However, Drush (an integral component of Aegir) was the obvious tool to use to run database updates and other commands across a large number of sites.

For each multi-site installation the load balancer has two name-based virtual host (vhost) configurations. Each of those vhosts directs requests to the same set of Apache web servers, but on different ports. Apache is configured with a vhost for each port that has a different document root (docroot). So, a host name that matches vhost #1 on the load balancer will use the Drupal code present in docroot #1 on the web server.

Having in place an easy way to route requests based on the host name to a different docroot is the key to speeding the process. Obviously there are other ways we could have achieved this including name-based vhosts on the web servers, or virtual document root configurations for the web servers, but by limiting the configuration change to the load balancer we get faster changes that are easier to verify, and it means that we can easily have the two docroots on different web servers if desired.

From there the rest of the process is mostly coordination, including a few Drush commands run from within docroot #1 on one of the web servers.

Update the svn repository for the Drupal Gardens code and tag the new code.

Prepare a drushrc file so that there is just one alias for each Drupal Gardens site, even if it has multiple hostnames configured. The information for this comes via an API call to http://www.drupalgardens.com/ that returns a list of all sites and the associated host names for each.

A custom Drush command calls the hosting API to move all domain names from vhost #1 to vhost #2 on the load balancer. This means they all use docroot #2 which has the current code.

Switch docroot #1 to use the updated code corresponding to the tag in svn.

Run a wrapper script that iterates through the site aliases and for each site invokes:

Custom Drush command puts the site into maintenance mode plus causes a message to appear for the site owner with estimated offline time.

Custom Drush command calls the hosting API to move the domain names for one site from vhost #2 to vhost #1, and hence request go to docroot #1 with the new code.

Standard Drush commands run the database updates and then clear all caches

Custom Drush command insures the site is out of maintenance mode.

The output of all these steps and whether they succeeded or registered any failures is logged in case follow-up action is needed.

Finally, switch docroot #2 to the new code to be ready to start the cycle again.

Keeping current with Drupal code and database updates is key for the long-term success of any Drupal project. As we scale Drupal Gardens from thousands of sites today, to hundreds of thousands and maybe to millions, we will continue to refine approaches like this that minimize the impact of maintenance on Drupal Gardens sites and users.

Note (January 2016): While this blog post was originally written about drupalgardens.com (originally published April 06, 2010), the same basic strategy for updating sites has continued to be used in the Acquia Cloud Site Factory (ACSF) platform that emerged from it. Some of the very specific details in the original post are outdated (for example, we now use git instead of svn), but the strategy of using two docroots with different code versions and two different vhost ports to choose between them is how ACSF still works to update sites.