Recent Downtime

This is a discussion on Recent Downtime within the General Discussions forums, part of the Community Boards category; It looks like we ran out of both RAM and swap space causing the system to completely lock up. This ...

Recent Downtime

It looks like we ran out of both RAM and swap space causing the system to completely lock up. This was due to a combination of factors, I think--for one, we'd been running pretty low on RAM recently. Second, the nightly backup process kicked in, running gzip on some large (GB+) files. Finally, memcached seems to have run up to about 280MB of memory usage (despite my attempt at configuring it to us only 64MB).

At any rate, these stresses caused some serious strain on the overall system by, I believe, slowing down DB access considerably, resulting in the DB maxing out the number of connections available. This killed the C Board and also led to a ton of open HTTP connections. I suspect that some combination of all of these activities created a cascading failure that forced a server reboot.

It isn't clear if this would have happened had vBulletin 3 still been installed (either if it was caused memcached pushing the memory limits over the top or more processing required for vBulleitn 4), but based on what I was seeing in top, I suspect that adding additional RAM to the system is a good step to solving tis problem. We're currently running with only 2GB of RAM and a 2GB swap.

It looks like tonight's downtime happened at the same time as last night's--my theory is that they might both have happened around the same time and been triggered by nightly backups. Fortunately, another reboot solved the issue, and I have a few other ideas about why it might have happened again--e.g. the nightly backups currently need to deal with a much larger amount of data since I have a 1.6GB dump of the C board as a backup prior to the vBulletin upgrade. I can easily move this out of the backup dir and reduce the size of the backup files a lot.