Main menu

Posts Tagged 'Data'

Last night while making my regular backup for my World of Warcraft configuration, I thought about the blog and I didn't remember seeing an article that went into more detail than "backups are good" about backing up and restoring data.

If you've been around the InnerLayer for a while you will have noticed that backing up of data comes up periodically. This happens because we frequently see customers whose world is turned upside down due to a mistyped command wiping out their data. If you just thought "that won't happen to me... I'm careful at a prompt"... well, how about a cracker getting in via an IIS zero day exploit? Kernel bug corrupting the filesystem? Hard drive failure? Data loss will happen to you, whatever the cause.

Data that is not backed up is data that isn't viewed as important by the server administrator. As the title of this blog mentioned, backing up isn't the end of the server administrator's responsibility. Consider the following points.

Is the backup in a safe location? Backing up to the same hard drive which houses the live data is not a good practice.

Is the backup valid? Did the commands to create it all run properly? Did they get all the information you need? Do you have enough copies?

Can your backup restore a single file or directory? Do you know how to restore it? Simply put, a restore is getting data from a backup back into a working state on a system.

Backup Safety
At a minimum backups should be stored on a separate hard drive from the data which the backup is protecting. Better would be a local copy of the backup on the machine in use and having a copy of the backup off the machine, perhaps in eVault, on a NAS which is _NOT_ always mounted, even on another server. Why? The local backup gives you quick access to the content while the off-machine copies give you the safety that if one of your employees does a secure wipe on the machine in question you haven't lost the data and the backup.

Validity
A backup is valid if it gets all the data you need to bring your workload back online in the event of a failure. This could be web pages, database data, config files (frequently forgotten) and notes on how things work together. Information systems get complicated and if you've got a Notepad file somewhere listing how Tab A goes into Slot B, that should be in your backups. Yes, you know how it works... great, you get hit by a bus, does your co-admin know how that system is put together? Don't forget dependencies. A forum website is pretty worthless if it is backed up but the database to which it looks is not. For me another mark of a valid backup is one which has some history. Do not backup today and delete yesterday. Leave a week or more of backups available. People don't always notice immediately that something has broken.

Restores
A good way to test a restore is get a 2nd server for a month configured the same as your primary then take the backup from the primary and restore it onto the secondary. See what happens. Maybe it will go great. Probably you will run into issues. Forget about a small operating system tweak made some morning at 4am? How about time? How long does it take to go from a clean OS install to a working system? If this time is too long, you might have too much going on one server and need to split up your workload among a few servers. As with everything else in maintaining a server, practicing your restores is not a one-time thing. Schedule yourself a couple of days once a quarter to do a disaster simulation.

For those who might be looking at this and saying "That is a lot of work". Yes, it is. It is part of running a server. I do this myself on a regular basis for a small server hosting e-mail and web data for some friends. I have a local "configbackup" directory on the server which has the mail configs, the server configs, the nameserver configs and the database data. In my case, I've told my users straight up that their home directories are their own responsibility. Maybe you can do that, maybe not. Weekly that configback data is copied to a file server here at my apartment. The fileserver itself is backed up periodically to USB drive which is kept at a friend's house.

So there I was after work today, sitting in my favorite watering hole drinking my Jagerbomb, when Caira, my bartender asked what was on my mind. I told her that I had been working with clouds and elephants all day at work and neither of those things are little. She laughed and asked if I had stopped anywhere to get a drink prior to her bar. I replied no, I'm serious I had to make some large clouds and a stampede of elephants work together. I then explained to her what Hadoop was. Hadoop is a popular open source implementation of Google's MapReduce. It allows transformation and extensive analysis of large data sets using thousands of nodes while processing peta-bytes of data. It is used by websites such as Yahoo!, Facebook, Google, and China's best search engine Baidu. I explained to her what cloud computing was (multiple computing nodes working together) hence my reference to the clouds, and how Hadoop was named after the stuffed elephant that belonged to one of the founders - Doug Cutting - child. Now she doesn't think I am as crazy.

When it comes to managing a server remember you can never be to careful. In this day and age we face a lot of things that can damage and even take a server to its knees here’s a few things for everyone to consider.

Anti-virus:

This is a must on systems open to the net now days. There are always nasty little things floating around looking to take your server apart from the OS out. For windows servers there are a multitude of choices and I’ll just mention a few that can help protect your goods. You can use several programs such as avast (which offers a free edition), ClamWin (open source), Kaspersky , and Panda just to name a few. I would suggest before installing any of these you check links such as http://en.wikipedia.org/wiki/List_of_antivirus_software to name one that provides a list of several choices and their compatibility. You may also want to read reviews that compare the available options and give you an idea of what to expect from them. This will allow you to make an informed choice on which one works best for you. Now with linux there are also several options for this including the well known clamav which from personal experience works really well and can be installed on a variety of linux disro’s(aka distributions). It’s very simple to use and may prevent you from headache later on down the road.

Firewalls:

Firewalls are a double edged sword but are most defiantly needed. When it comes to firewalls you can protect yourself from quite a bit of headache however if setup to strict you can block positive traffic and even yourself from reaching your server but in the long run a defiant way to help protect your server from unwanted visitors. A lot of firewalls also have modules and add-ons that further assist in protecting you and securing your server. If in doubt it’s always a good idea to have a security company do an audit and even a security hardening session with your server to make sure you are protected the best way possible.

Passwords:

This is probably one of the most important this you can do to secure your server. Use strong passwords (no using password or jello is not a secure password even if it is in all caps) and if you are worried about not being able to come up with a secure one there are several password generators on the web that can come up with secure ones to assist. Passwords should contain caps letters, numbers, symbols, and should be at minimum 8 – 10 characters (the more the better). It’s the easy to remember and easy read passwords that get you into trouble.

Armed with this information and so much more on security that can be located on the web using the great and all powerful Google should be a good start to making sure you don’t have to worry about data loss and system hacks. Also remember no matter how secure you think you are make regular backups of all your important data as if you server could crash at any time.

My grandmother used to say an ounce of prevention is worth a pound of cure. Usually this was her polite way of telling me to pick my skateboard up off the stairs before she stepped on it and broke her neck or to put a sheet of newspaper over her antique kitchen table before I began refueling my model airplane. All very sound advice looking back. And now here I find myself repeating the same adage some twenty years later in the context of predicting mechanical drive failure. An ounce of prevention is worth a pound of cure.

Hard disk drive manufacturers recognized both the reality and the advantages of being able to predict normal hard disk failures associated with drive degradation sometime around 2003. This led a number of leading hard disk makers to collaborate on a standard which eventually became known as SMART. This acronym stands for Self-Monitoring, Analysis and Reporting Technology and when used properly is a formidable weapon in any system administrator's arsenal.

The basic concept is that firmware on the hard disk itself will record and report key "attributes" of that drive which when monitored and analyzed over time can be used to predict and avoid catastrophic hard disk failures. Anyone who has been around computers for more than a day knows the terrible feeling that manifests in the pit of your stomach when it becomes apparent that your server or workstation will not boot because the hard disk has cratered. Luckily, we ALL of course back up our hard drives daily! Right?

All kidding aside even with a recent back up just the task of restoring and getting your system back in working order is a serious hassle and it’s not something you get the luxury of scheduling if the machine is critical to operations and failed in the middle of your work day or worse yet, the middle of your beauty sleep. That is where SMART comes in. When properly used SMART data can give “clues” that a drive is reaching a failure point--prior to it failing. This in turns means you can schedule a drive cloning and replacement within your next regular maintenance window. Really aside from a hard disk that lasts forever what more could an administrator ask for?

SMART drive data has been described as a jigsaw puzzle. That's because it takes monitoring a myriad of data points consistently over time to be able to put together a picture of your hard disk health. The idea is that an administrator regularly records and analyzes characteristics about the installed spinning media and looks for early warning signs that something is going wrong. While different drives have different data points, some of the key and most common attributes are:

head flying height

data throughput performance

spin-up time

re-allocated sector count

seek error rate

seek time performance

spin try recount

drive calibration retry count

These items are considered typical drive health indicators and should be base-lined at drive installation and then monitored for significant degradation. While the experts still disagree on the exact value of SMART data analysis, I have seen sources that claim at least 30% of drive failures can be detected some 60 days prior to the actual failure through the monitoring of SMART data.

Of course not all drive failures can be predicted. Plus some failures are caused by factors other than drive degradation. Consider drives damaged by power surges or drives that are dropped in shipping as good examples of drive failures that cannot normally be detected through SMART monitoring. However in my humble opinion even one hard disk failure prevented over the course of my career is something to celebrate--unless you happen to own stock in McNeil Consumer Healthcare, a.k.a. the distributors of Tylenol!

So what does this have to do with SoftLayer? Well I am certainly not claiming that SoftLayer is going to predict all your hard drive disasters so there is no reason for you to back up your data. In fact, I recommend not just backing it up but backing it up in geographically disparate locations (did I mention we have data centers in Dallas and Seattle?). What I do mean to share is that technologies like SMART data are just one of the many ways SoftLayer is currently investigating to improve what is already the best hosting company in the business.

I should know. I was tasked with writing the low-level software to extract this data. That’s right. SoftLayer has engineers working at the application layer, down at the device driver layer, and everywhere in between. If that doesn’t give you a warm fuzzy about your hosting company, I don’t know what will.