Author Archive: Michael Patrick

When you think about all the things that have to go right all the time where all the time is millions of times per second for a user to get your content it can be a little... daunting. The software, the network, the hardware all have to work for this bit of magic we call the Internet to actually occur.

There are points of failure all over the place. Take a server for example: hard drives can fail, power supplies can fail, the OS could fail. The people running servers can fail.. maybe you try something new and it has unforeseen consequences. This is simply the way of things.

Mitigation comes in many forms. If your content is mostly images you could use something like a content delivery network to move your content into the "cloud" so that failure in one area might not take out everything. On the server itself you can do things like redundant power supplies and RAID arrays. Proper testing and staging of changes can help minimize the occurrence of software bugs and configuration errors impacting your production setup.

Even if nothing fails there will come a time when you have to shut down a service or reboot an entire server. Patches can't always update files that are in use, for example. One way to work around this problem is to have multiple servers working together in a server cluster. Clustering can be done in various ways, using Unix machines, Windows machines and even a combination of operating systems.

Since I've recently setup a Windows 2008 cluster that is we're going to discuss. First we need to discuss some terms. A node is a member of a cluster. Nodes are used to host resources, which are things that a cluster provides. When a node in a cluster fails another node takes over the job of offering that resource to the network. This can be done because resources (files, IPs, etc) are stored on the network using shared storage, which is typically a set of SAN drives to which multiple machines can connect.

Windows clusters come in a couple of conceptual forms. Active/Passive clusters have the resources hosted on one node and have another node just sitting idle waiting for the first to fail. Active/Active clusters on the other hand host some resources on each node. This puts each node to work. The key with clusters is that you need to size the nodes such that your workloads can still function even if there is node failure.

Ok, so you have multiple machines, a SAN between them, some IPs and something you wish to serve up in a highly available manner. How does this work? Once you create the cluster you then go about defining resources. In the case of the cluster I set up my resource was a file share. I wanted these files to be available on the network even if I had to reboot one of the servers. The resource was actually combination of an IP address that could be answered by either machine and the iSCSI drive mount which contained the actual files.

Once the resource was established it was hosted on NodeA. When I rebooted NodeA though the resource was automatically failed over to NodeB so that the total interruption in service was only a couple of seconds. NodeB took possession of the IP address and the iSCSI mount automatically once it determined that NodeA had gone away.

File serving is a really basic example but you can clustering with much more complicated things like the Microsoft Exchange e-mail server, Internet Information Server, Virtual Machines and even network services like DHCP/DNS/WINs.

Clusters are not the end of service failures. The shared storage can fail, the network can fail, the software configuration or the humans could fail. With a proper technical staff implementing and maintaining them, however, clusters can be a useful tool in the quest for high availability.

You'll see the word "cloud" bouncing around quite a bit in IT nowadays. If you have been following The Inner Layer you'll have seen it a few times here as well. A cloud service is just something that is hosted on the Internet. Typically in a cloud scenario you are not actually doing the hosting but rather using hosted resources someone else is providing. Usually you'll hear it in terms of computing and storage.

This is going to be a brief article on a cloud storage product we are doing here at SoftLayer called CloudLayer™ Storage.

CloudLayer™ Storage is a WebDAV based system which uses client software on the local machine in order to redirect filesystem operations to the storage repository here at SoftLayer. In Windows you end up with a drive letter; on a Unix system you end up with a mount point. In both cases when you create folders and save files on those locations the actions actually happen on our storage repository here. Because the files are located with us you are able to access them wherever you are. Through the CloudLayer™ web UI you're able to also set up files for sharing with others so even if where you are never changes there is still value to using the cloud storage.

Even using a cloud storage system you must maintain proper backups. Hardware, software and human error all happens. Tied in with that concept of "errors happen" ... if you have files on CloudLayer™ that you need for a presentation, download them ahead of time. You don't want to be caught without files simply because the Internet connection at your hotel decided to take a nap the morning of your event.

Now what of security? Well, the connection to the CloudLayer™ endpoint here at SoftLayer is done via an encrypted session so people cannot snoop on your transmissions. This does mean you need to allow 443/tcp outbound communications via your firewall but since that is the normal HTTPS port I'd imagine you already have it open. Within CloudLayer™ you can control with whom you share your files.

Since CloudLayer™ is a filesystem redirected over the Internet the performance you get will be dependent on your local connection speed. Its best to treat CloudLayer™ Storage as simply a file repository. If you find you need some kind of off-machine storage for running applications on your server you could look into our iSCSI product.

I'm going to start this blog off with making a statement: 250 + 250 > 500. Some of you already know where this is going and you're sitting there thinking 'well, duh'. Other readers probably think I've just failed basic math. If you're in the first group, scan through and see if I bring up something you haven't thought of. If you're in the second group, this article could be useful to you.

Let us suppose you are running a medium popularity website. You have a forum, a database and you serve a bunch of video or image data. You knew that for content plus backups you'd need about 500GB. You have two hard drives, 2 x 250 GB. Content is on drive zero, and drive one is used for holding backups (you are routinely backing up, aren't you?). Being a good and noble system administrator you've kept an eye on your resource usage over time and lately you've noticed that your load average is going up.

Say you were to run a 'top' and get back something like this:

First a couple of caveats:

1) For the astute in the class... yes, I've made up the numbers above.. I don't have a distressed machine handy but the made up numbers come from what I've seen in actual cases.

2) A load of 15 may not be bad for your machine and workload. The point here is that if your load is normally 5 and lately its been more like 15 then something is going on. It is all about knowing what is normal for your particular system.

So, what is top saying? Its saying that on average you've got 14 or 15 things going on and wanting to run. You'll notice from the swap line that the machine isn't particularly hitting swap space so you're probably not having a RAM issue. Lets look closer at the CPU line.

10% user time, 5% system time.. doesn't seem so bad. Even have 15% idle time. But wait, what do we have here.. 80% wa? Who is wa and why is he hogging all the time? wa% is the percentage of time your system is spending waiting on an IO request to finish. Frequently this is time spent waiting on your hard drive to deliver data. If the processor can't work on something right now (say because it needs some data) that thing goes on the run stack. Well what you can end up with is that you have a bunch of processes on the run stack because the CPU is waiting on the hard drive to cough up some data for each one. The hard drive is doing its best but sometimes there is just too much going on.

So how do we dig down further? So glad you asked. Our next friend is going to be 'iostat'. The iostat command will show you which hard drive partitions are spending most of the time processing requests. Normally when I run it I'm not concerned with the 'right then' results or even the raw numbers... rather I'm looking for the on-going action trend so I'll run it as 'iostat 5' which tells it to re-poll every 5 seconds.

(again I've had to fudge the numbers)

So if you were running 'iostat 5' and over time you were seeing the above formation what this tells you is that disk0 (or hda, in the chart) is doing all the work. This makes sense.. previously we discussed that your setup is disk0 - content, disk1 - backups. What we know so far is that disk0 is spinning its little heart out servicing requests while disk1 is sitting back at the beach just watching and giggling. This is hardly fair.

How do you right this injustice? Well, that depends on your workloads and whether you want to (or can) go local or remote. The general idea is going to be to re-arrange the way you do your storage to spread the workload around to multiple spindles and you could see a nice gain in performance (make sure a thing and its backup is NOT on the same disk though). The exact best ideas for your situation is going to be a case-by-case thing. If you have multiple busy websites you might just start with putting the busiest one on disk1, then see how it goes. If you have only one really active site though then you have to look at how that workload is structured. Are there multiple parts to that site which could be spread between disks? The whole point of all this is that you want to eliminate bottlenecks on performance. A single hard drive can be just such a bottleneck.

If the workload is largely writes (most of the time files being created/uploaded) then you're looking at local solutions such as making better use of existing drives in the system or adding new drives to the system. Depending on what you are doing it might be possible to jump up to faster drives as well, say 15,000 RPM drives. I should include mention here of RAID0. RAID0 is striping without parity. What you have is multiple physical drives presented to the operating system as one single unit. What happens is that as a file is written, parts of it end up on different drives so you have multiple spindles going for each request. This can be wickedly fast. HOWEVER...it is also dangerous because if you lose one drive you potentially have lost the entire volume. Make no mistake, hard drives will fail and they'll find the most irritating time to do it. If you think you would want to use a RAID0 and you cannot afford the downtime when a drive fails taking the whole volume with it then you might look at a RAID10. Ten is a RAID0 that is mirrored against another RAID0. This provides some fault tolerance against a failed drive.

If the workload is mostly just reading, say you host images or videos, then you could do multiple drives in the system or the "Big Dog" of adding spindles for read-only workloads is something like our CDNLayer product where a global network of caches stores the image files and people get them from their nearby cache. This takes quite a bit of the I/O load off your system and also (subject to network routing goofiness) your visitors could be getting your content from a box that is hundreds or thousands of miles closer to them.

So, with apologies to my math teacher friend, there are times when 250 + 250 > 500 can be true. Until next time, have a good day and respect your spindles!