Month: December 2008

Running a system off of one hard drive is just asking for trouble. Hard drives are one of the most likely components to fail in a system. If your system is running off of a single drive, and that drive fails, the results could be devastating. Even with a good, current backup, if a drive dies, you system will be down until you can figure out how to reload the OS and restore your data. Software raid is a good solution for ensuring that your system remains available when a drive decides to fail. I’ve personally had more success dealing with failures with software raid than I have with any hardware raid products.

There are a number of situations where you might have to migrate a running system to a raid array. Doing so is relatively risky, but certainly do-able. In my specific situation, I have a client with a new server at ServerBeach. The server came with two identical drives, but for some reason ServerBeach won’t install the OS to a software raid array (probably because they use some pre-built images). The server has the OS installed to the first drive, and the second drive is completely blank.

There are a few howto’s for getting this working, and from which I took bits and pieces to make this work for CentOS5:

First, an overview of the process:
1- Boot off your running system, Install mdadm, and copy the partition tables to the blank drive
2- Create your raid devices using just the blank drive (the raid will be running in degraded mode since the main drive is unavailable since it is still being used for the main OS)
3- Copy your working filesystem to the mirrored drives
4- Configure Grub to boot from the mirrored drive
5- Reboot onto the mirrored drive, ensure everything works.
6- Add the initial drive into the raid array to bring it online
7- Configure grub on both drives so that if one fails, the other will boot.

Falco’s guide does a very good job of walking through the whole process. I followed it, and would recommend it with just a few changes.

1- I don’t see any purpose in having a RAID1 swap partition. Make this RAID0 or just enable two independent partitions without raid.

2 – Don’t edit /etc/fstab and /etc/mtab on the live, working system. Edit those on the mirrored drive after the filesystem has been copied over. This will leave the working system functional if you need to fall back to it (and you probably will!)

3- The initrd image created by mkinitrd didn’t work for me, and I’m not sure why not at this point. Falco’s guide says to run these commands:

This makes a backup of the existing initrd image, and then rebuilds a new one. I tried quite a few variations of the command, pointing it at the fstab using the software raid array, but to no avail. I had to manually extract, edit, and recreate the initrd image using the steps of #12 on this post.

I don’t have direct access to the console, but the data center relayed the console error that included this:

From what I can tell, inside the the initrd image, it runs the init script which tries to run the command ‘mount /sysimage’ which was failing. Without /sysimage the initrd image can’t pass control over to the real system. I was able to replace that line with ‘mount -o defaults –ro -t ext3 /dev/md1 /sysroot’, and then manually cpio/gzip the image back into place. From there I was able to boot off of the mirrored drive and continue as normal.

I have another one or two systems to do like this still, so I’m hoping to refine the process a bit and maybe figure out what when wrong with the initrd. It was educational to dig into the initrd image and figure out a bit more about how a modern linux box boots and many other things you could learn from types of radio broadcasting.

The Hallmark Hall of Fame Movie ‘Front of the Class’ premiered this past weekend with an expected 12-15 million viewers. We have been preparing the website (ClassPerformance.com) for the event. We expected a significant number of visitors to the website in the 24-48 hours after the movie aired, so I did a number of things to ensure that the site would be able to run without incident during this critical time.

Move temporarily to a higher powered server.

The site is normally hosted on an inexpensive shared-hosting plan. I’ve run some shared-hosting servers before and don’t have much faith that they would handle any amount of significant load. They also usually don’t allow you to configure some of the Apache settings that I was planning on using below.

Serve images and other static content from an alternate location.

I set up a domain alias of ‘static.classperformance.com’ pointed to the same DocumentRoot as the main site. Then I edited the template files to serve most of the background, header, and footer images from that location. For normal usage, serving them from the same server works fine, but this allows the flexibility to move that static content to a separate server if/when it is needed.

I also copied the entire website to a second server and had it configured so that at any time I could change DNS to point ‘static.classperformance.com’ to the second server in order to reduce the bandwidth from the primary server

Generate static pages wherever possible.

I used wget to download everything, and then deleted the pages that needed to be parsed through PHP (ie: contact forms, etc). Most of the pages don’t change from visitor to visitor, so this can be done for the home page, all of the blog posts, and any other pages. This significantly reduces the overhead due to database queries and just the overhead of running PHP and including multiple files.

I then added this to my Apache configuration to tell the web server to use the static content if it exists:

I did some performance tests with ApacheBenchmark, and serving the static content had a dramatic effect on the speed, and the number concurrent users. There is probably a more elegant way to configure mod_cache do a similar thing in a more automated fashion, but this was quick and easy, and I didn’t have to worry about checking the various HTTP headers. In my opinion, this was the single most effective thing to do. By serving static content, Apache also correctly handles many of the HTTP headers that enable effective caching (E-Tags, expires, last-modified, etc).

Installed a PHP Accelerator

I’ve previously written about how easy and effective eAccelerator is to install. There are very few scenarios where this is not effective. Again, ApacheBenchmark tests easily showed a huge increase in the number of concurrent requests when eAccelerator was enabled.

Check Apache settings

On a vanilla CentOS install, Apache has the ServerLimit set to 256. By serving primarily static content, you will likely reduce the amount of memory that each Apache child requires, and have memory for more children. I did some quick math and figured that I could have around 800 children before memory became a concern. I also enabled KeepAlives with a very short (1 second) KeepAliveTimeout so that sequential requests from the same user don’t have to recreate TCP sessions.

Also, by serving static content, I found that WordPress was handling the 301 redirect from the Non-www version of the site to the correct url. I moved that into Apache with this directive:

The default Apache install doesn’t compress any content. I configured mod_deflate to compress the static content and thus reduce the bandwidth usage. Compression should easily reduce the bandwidth for HTML and CSS files by one half (even up to one tenth). This not only reduces your bandwidth bill, but since the 100Mbps switch port is potentially a bottleneck, it enables more concurrent users if it approaches anywhere near that limit (and it may have if I hadn’t enabled compression)

Set up some Monitoring

I installed MRTG with some basic graphs. Also, I configured Apache so that I could view the ServerStatus. I also installed iftop to get a real-time view of the bandwidth usage.

With all of these changes, I’m very happy that we had tens of thousands of visitors during and shortly after the show, and everything ran perfectly. I had the static content running on a separate server for the busiest time and combined bandwidth usage peaked at around 90 Mbps shortly after the end of the show.