I'm currently trying to spec up a horizontally scalable cluster for a drupal based web app, that looks something like the colorful diagram below:

The load balancer implements sticky sessions, so a user keeps state once they've been allocated a server to work with.

Each app server has the following:

varnish at the front

drupal 6 in the middle running on the lamp stack

memcached at the back

The two mysql database servers are on a shared IP, and they're in a HA cluster with DRBD, and hearbeat, so that losing one won't bring down the whole platform.

There are a few things I'm not certain about that I'd appreciate your opinions on:

How should file storage scale horizontally?

I'm thinking of using NFS to mount a shared files directory on each app server, so a file uploaded in once place is available on all of them. I'm thinking of NFS because it's been around for ages, and I've no experience with MogileFS or GlusterFS, and it's something we've used before, so we're more familiar with it.

Are there any guidelines around to follow for working out how many servers it's wise to share a directory over NFS this way?

How should HA be provided on the shared file storage here?

One problem here is that the NFS server is a single point of failure.

We're already using Heartbeat and DRBD on the Mysql servers, and I'd prefer to keep the number of technologies involved in a stack as low as possible - what pitfalls would there be if I was to use the same HA strategy for the file servers too?

An alternative approach

This is for an internal facing site, with a finite number of users that occasionally use the site very intensively for short periods, when an internal initiative is on. So this doesn't need to scale infinitely like some startup.

Given that

there is an upper limit to traffic we can expect

adding adding HA to the file servers, and designing a setup to scale horizontally like this introduces considerable complexity

I'm also considering just making the two web servers beefier so they that would handle the peak load between them, and setting up unison, or rsync across the two on a cron job, so that:

they files are in still in sync (sticky sessions keep a user on the same server they uploaded a file to)

losing one means the site is still operational.

Does this sound like a possible way to get around any possible NFS/DRBD HA complexity headaches?

3 Answers
3

The NFS server will have at least to have the same provision as the MySQL server, since they have basically the same function and limitations (both are places where you write data to). I don't like the idea of multiple writers to NFS, it makes it very complex to manage file locks and my experiences didn't go very well on that point.

My suggestion would be to concentrate all the writes on one of the app servers (maybe have one app server dedicated to writing on the NFS server) and multiple reader app servers mounting it read only (I know that drupal has some dynamic thumbnails that need to be written, but you can keep the most of it on a RO fs). You will need at least a second NFS server (using DRBD is the best choice here if you don't have a shared storage like a SAN) to ensure HA.

The best way is to find a good storage solution. Depending on the scale and type of application you can use a good NAS, with support for NFS and at least two gigabit ports and power suplies (checkout some enterprise solutions).

If you're really serious about your application your best bet is to check some SAN solutions, but this might be very expensive as it required special hardware (it can be done with off the shelf hardware but it might be too slow).