Scaling A Drupal Community With Authenticated Users

This is a guest post written by Stephen Pope, Partner, at Project Ricochet, a Rackspace partner anda full service web development firm that specializes in Drupal development and responsive web design..

When we set out to build a new social community using Drupal during a recent project, how to scale dominated the technical discussions. Scaling seems to have become an amorphous topic, and people get lost trying to think through every possible pain point. Because in a startup, funds might be limited, the trick is to find a balance, and using Drupal and Rackspace services is just that balance.

We attacked the project by breaking it down into the fundamental pieces:

Web servers, dynamic content (HTML)

Databases

Static assets (images, css, javascript)

Handling growth

We think of things on a very practical level; we could nerd out on any of the subjects above, but we know our time and our developers’ is valuable., With a focus on using off-the-shelf software, a bit of research and a hosting partner like Rackspace, we can realize a huge cost savings and eliminate some of the unnecessary complications. In our project, we started with hosting as a point to make smart choices – to maximize our scalability and cost savings in the long term. In general, we find that this saves us from having to implement custom solutions before we need them (if in fact we ever do!).

Web servers, dynamic content

Using load balancers

Our scaling challenge in this project involved authenticated users. You can’t cache the majority of content like you can for a website with anonymous users. It’s very common for anonymous sites to use a front end cache proxy, such as Varnish. In our case, each page’s content is different because its content revolves around the user in question.

Because the memory requirements for PHP (and Drupal more specifically) can be quite high, even just a few continued modules and a single Drupal page request could hog 128 MB or more per process. If you have continuously heavy traffic, you could bottleneck and dogpile fairly quickly.

Load balancing isn’t always straightforward, but with Drupal it’s fairly transparent. You simply have to decide how many Apache processes you’ll need by using your worst case PHP memory usage (a limit set in your php.ini file), then divide that into the memory size you have available on your cloud server (after taking into account overhead of services like Apache, MySQL, Varnish or Tomcat). As you grow, you’ll need X number of www cloud servers to handle the given load. At that point, it’s simple arithmetic.

Handling Sessions, caching (and more)

MongoDB

If we’re going to stretch MySQL as far as we can, we need to reduce and focus the work that we ask MySQL to accomplish. Drupal stores its sessions and cache data by default in MySQL. It works, but this data is used often, the tables can be large and they have a simple index and only a column or two. Other services like MongoDB and Memcache are perfect replacements for this commonly used data because they have been uniquely specialized to handle the type of operations we need.

Databases

MySQL

Scaling your MySQL database might be one of the harder parts of this project. There isn’t going to be a one-size-fits-all solution. Typically a multiple read, single write setup will take you pretty far. The basic idea is that your www servers can request information from any number of read nodes, and writes are directed to a single server.

Add a storage array to your cloud server

One of the typical drawbacks of cloud hosting is the shared I/O devices such as the hard drive. If you have a busy neighbor processing a lot of files, your application may suffer.

Using Rackspace’s Cloud Block Storage, you can get a dedicated hard drive for your MySQL server. You can even add an SSD drive to ensure blazing fast access to your data. It will also help extend the life of your MySQL server as you grow. SSD is recommended for high volume sites, however you can start off with a traditional drive and move to SSD later if you’d like to keep costs down. The main point is to get a dedicated I/O device so you don’t need to share resources.

Static Assets and Cloud Files

Handling a large volume of uploaded images

In our project, our specific web community had a unique challenge: users could upload extremely large images files that could be shared and sold. There were a few main challenges:

Your typical server simply didn’t have enough hard drive space to hold all the images, especially with five to six resized versions of each image.

Users across the globe needed to be able to download the images from the same geographic location in which they were located.

We needed to share uploaded images across multiple web servers.

The images were so large that it required a lot of bandwidth on the server.

Apache processes end up handling additional requests that they don’t need to due to the large number of images.

As images are uploaded into the Drupal system (normally stored in the /sites/My_Site/files) they are instead transferred to the blazing fast Rackspace CDN. The module will seamlessly serve URLs from Akamai, a world leader (and partner of Rackspace) in distributed file hosting. Not only will you not require large amounts of hard drive space on your servers, but your assets will be spread across the globe and served up from servers closest to where they’re being requested.

You’ll reduce load, stress and bandwidth on the Apache server as well, allowing each to dedicate itself to processing the dynamic parts of your site.

Handling Growth

Adding more nodes to your server configuration

Now that you’ve set up your basic architecture, most of the leg work as you start to grow will involve replicating additional servers into the mix. Rackspace lets you clone any of your servers by making a virtual image of that server, then create new server clones from those images.

Before you make your server images

How should you start? I recommend a larger number of smaller nodes (as opposed to fewer higher power instances) – even as you continue to grow. The minimum setup would probably look like this:

Two Apache/www servers

One Cloud load balancer

Can handle 32 nodes

One Cloud MySQL write server

Dedicated storage array

One Cloud MySQL read server

Storage array optional at first

One Backend cache server

MongoDB

MemCache

You may wonder, “Why not start with all the services on a single cloud server if I don’t have the traffic yet to justify breaking services up into additional servers?”

Well, sure you could stack all these services on a single “larger” cloud server until you get more traffic, but that’s not always as simple as you might think (as we all know, it’s *never* as simple in practice as you might think beforehand). The point is to setup something that can scale without a lot of rework, complicated and tedious migration and maybe even a developer’s or sysadmin’s help. By going with more servers from the get-go, you’ve separated the various channels of concern into distinct areas. Trouble spots or bottlenecks are much easier to spot, you become less dependent on a single cloud server’s performance, and you can add servers to the areas where your site actually needs more juice.

Conclusion

We use Rackspace for most of our client projects. We find that the versatility of the platform and the multitude of affordable services help us serve all manner of clients, large and small. We host our own internal projects and websites on Rackspace too. By following the rough guidelines I outlined above, you too can partner with Rackspace for highly scalable Drupal applications.

And if you have any questions, do what we do – hit Rackspace up on chat. They are always helpful and willing to walk you through complicated implementations or services, even for cloud products (their least expensive tier of services).

As the web becomes a larger and larger part of our lives and day to day services, scalability will continue to grow in importance. Just remember the fundamentals outlined above and you’ll be just fine.

About the Author

This is a post written and contributed by
Brian White.

Brian joined Rackspace in 2007 and has been working with Rackspace partners and resellers since 2009. Brian enjoys learning about his partners' business and working with the partners on joint business development efforts. When he isn’t working on Channel initiatives, Brian likes spending time with his wife and daughters, hiking and listening to Red Dirt country music.

To learn more about Project Ricochet’s Drupal Development services, feel free to give us a visit!

Pat

Very Interesting article about drupal scaling but somehow it get me worried : I might be wrong but after browsing ricochet client page I was under the impression that the community project with lot of file uploading you are talking about, should be bluecanvas.com. Looking at alexa traffic indicator (I know it’s only a rule of thumb) this site shows something like 3000 visitors a day. So this is not such an heavy traffic to handle, and if drupal requires such an architecture to handle just that, I’m worried for my project that has to deal with 100K registered visitors a day…