This is the 130th article in the Spotlight on IT series. If you'd be interested in writing an article on the subject of backup, security, storage, virtualization, mobile, networking, wireless, DNS, MSPs or printers for the series PM Eric to get started.

I work at a photographic lab in Richmond, Virginia, that recently went all digital. We are a small business with 100–150 employees and 10 retail stores and the main lab. We have retail stores all over the state, and Hurricane Sandy has been splashing us with her wind and rain. Some stores have been barely impacted and others have been hit with power outages and flooding.

As a 24/7 operation, we must have an active Internet connection in order to receive files submitted by our customers. Most of our photo ordering software uses FTP to send the orders to our main lab. In the event of severe weather, it’s my mission to ensure we stay online at all costs.

Making a plan
The start of my week began with a meeting about our plan. As broadcasters started calling it the “Storm of the Century” and a “Super Storm,” we took the alerts seriously and started preparing for the worst.

When things get messy, I jump in my car and drive 45 minutes in the severe weather to get to the office in time to assess the damage, boot down servers, crank generators, or whatever the situation required. The show must go on; we must continue to receive orders and our photo ordering software only works when there is an FTP site to send to.

We've been extremely fortunate here. While we did have a few stores in Hampton Roads Virginia that were subject to power outages and flooding, we did not encounter any outages at our main lab. Our store in Norfolk was the most heavily affected — they were closed Monday and Tuesday.

We’ve suffered outages in the past. The worst was when we lost power for five days a few years back. This was a challenge complicated further by the fact that it’s hard to remain operational at all when there is no mail or UPS.

From that outage, we learned it’s important to have a backup generator. We were prepared this week. We have a primary and secondary battery backups in place (APC Smart-UPS 3000 RM XL) that will sustain power to our 20 servers for almost 9 hours. This gives us enough time to safely shutdown anything non-essential and re-direct power to a generator. We haven’t needed it yet, but redundancy is always important. It may be an additional price to pay, but when it saves you in the time of need, it was well worth the investment.

I also wrote a simple shutdown script that could be easily executed remotely to shutdown all of our non-essential servers. (Thanks to the Spiceworks community for some advice on this.) I sent out a broadcast message to all users warning them that there could be loss of power and suggested everyone boot down any machines that are not needed by the night crew.

Revisiting the plan
I’ve learned from the few incidents we’ve been through it’s best to prepare for the worse, and we’re preparing to make more changes going forward.

Our current backup plan is to have redundancy. Our SAN is replicated to another storage device in the event that it fails. We don’t virtualize anything yet, and the current setup is failing in more ways than one. We are currently planning on implementing a virtualized solution that would ultimately make disaster preparedness much easier and almost 100% automated.

Actions are already underway to ensure there is an off-site location to collect our FTP orders. The network was set up long ago by my predecessor and it’s not up to par with today's standards. My goal is to have our domain network attached to an actual domain. This will allow us to change the FTP IP address to the domain name. This means that we can setup our DNS records to point to our physical location during normal operations, and will redirect to our "hosted" solution in the event of outages. That way it will appear to our customers we are still online.

Also, this event has finally sparked the realization of the management that our current server/network setup is aged and flawed, and this has granted me the opportunity to plan the entire network from scratch and build with all new equipment.

Most importantly, at home things have not been as bad as we had been preparing for — not nearly what the rest of the East Coast has had to deal with. The bulk of the damage around here was wind damage and down power lines from fallen trees, rain and wind.

23 Replies

Joshua, thanks for sharing your experience. It is very important to have a disaster recovery plan. While we have backup solution for most of the processes in our environment, I still have to put in a efficient and easy to bring back-to-life solution in place in case of our SAN crashing (not considering tape backup) due to any reason.

Amazed you can get by on battery for 9 hours. We went generator about 4 years ago and haven't really had to use it except on some weekends when power was deliberately turned off at work when only the maintenance crew were here.

I would second your move to an offsite or even hosted FTP server. A multi-site server farm could give you 100% up time no matter the weather anywhere even if you aren't able to receive the files immediately at your main lab because of having to be temporarily down.

Something else you might consider is the use of Managed File Transfer(MFT) service or software. It might be cost prohibitive or overkill, but the services that some of these companies offer extreme reliability in making sure you don't lose any data that is sent your way.

Just looked at the APC Smart-UPS 3000 RM XL specs. Very impressive! What is the load rating (in watts) of each of your servers, and do you have one UPS for each server? I also would be interested in seeing your script for shutting things down...

What kind of generators are folks using here? Portable or permanent? We lease our space so I'm thinking a portable generator would be best but I've also considered approaching the other businesses in the building and our landlord to see if any are interested in splitting the cost of a decent natural gas generator with us. Even though we lease we don't plan on moving anytime in the foreseeable future. Any advice?

Our organization (spread out over 50 miles and 10 locations) experienced a 1 day outage at our operations center last year. Each local facility has backups done and conceivably could be restored from backups. We had been discussing the emergency backup operations but it wasn't moving too fast. After the outage, there was a change in sentiment and priorities.. shall we say.

We then created an emergency kit for the operations center and a remote location. A "spare" dc was moved to a remote location with enough software installed that a backup could be restored to complete basic opeations daily till a primary system could be put back in operation.

The problem was the outage duration that would be experienced till then.

The next plan was to build a server large enough to replicate the primary servers and applications with virtual disks. (determined to not be an inexpensive plan) Once the VM's were created, we used an app to preiodically replicate the servers to an existing VM. A remote disaster recovery site was established with a secure vendor. Once a site was selected, MPLS lines were arranged to the site. The the disk writes would be replicated to a server at the disaster site. Using a program named Zerto, the servers are replicated. Now, we've tested the operation configuration and cutovers. In a matter of minutes, we can cut our entire network operations center to the disaster recovery site and all other non-affected locations and still work as normal. In the event its needed, once the local operations center is back online, MPLS lines can be reassigned and all communications move back to and run from the primary site.

"Revisiting the plan
I’ve learned from the few incidents we’ve been through it’s best to prepare for the worse, and we’re preparing to make more changes going forward."

BMoore,

Our location won't see flooding like was just experienced on the east coast, but we have experienced earth quakes, extremely high winds, power outages, blizzards, and tornadoes. When we began planning to build a new DR plan, we began with an assumption:

If hit with a tornado and the entire Operations center was removed to scrapings on the ground (which I have seen happen), what would be the best plan to ensure the other connecting agencies could continue in at least basic operations until a new NOC was build or aquired.

When you start from your weakest point, your options become more clear.

and the @servers.txt file holds a list of all non-essential servers. (one hostname per line)

@Spicydeb: Our generator is nothing specific to Computing and it is known that mechanical equipment WILL cause "static" on electrical circuits. (have seen places where an A/C window unit was plugged into the same breaker as some servers = Not good at all ) In our case it is only for emergencies and we only use it if absolutely necessary.

@Derek: We currently have 13 battery units (was 14 until yesterday) split into a primary and a secondary. Ideally, at full load: they can run for 4-5 hours each. In the event severe weather, where we have time to boot down non-essential servers, we can get about 8-9 hours from them. This is to power our DC, FTP, 2 Modems, 1 firewall and a monitor. The DC and the FTP are both Dell Poweredge 1750's which have 650w PSU's. This is rated at 50Amps per hour (25 each).

@ Sellers: good point. I just assume the worst case scenario as temporary... now that you mentioned it, I need to start from scratch and devise a full plan for the true worst case... When there is nothing left. It's really scary to think about, but it is something we must consider.

I can definitely agree with that. There is also an aspect of compliance to government regulations and well as reputation. An outage at a local water purification plant or electrical plant being unavailable would have more more impact to the public than if a mom & pop show store was offline. yet I have seen just that type of outage in the past.

Here in New Hampshire, we got off with only a 6-hour power outage. Using lessons learned from previous ice storms, hurricanes, blizzards and such, we switched to our (rented) diesel generator and back with no impact on operations.

We're not too concerned about the diesel's power "not being clean." We expect, perhaps overly optimistically, that our UPS power conditioners give the computers what they need. No problems so far, and we've run on rented diesels three times now in the 1.5 years I've been with the company.

Our building is wired with a bus transfer switch that puts half the building - the most essential loads - on the diesel when needed. We rent the diesel when we anticipate power outages and hook it up ahead of time. Then, when the lights go out, we start the engine and flip the bus transfer switch. Our UPSs keep the servers and essential PCs running long enough for us to make that transition.

We usually stay on the diesel for a while once we're on it. Power can go up and down for hours in situations like that, so it's safer and easier just to stay on diesel power. Once we're sure that commercial power is back to stay, flipping the bus transfer switch back to commercial power just takes half a second.

As environments (both the natural external one, and the internal technological one) change, it's important to revisit your DR plan, and make sure it's still meeting your recovery time and point objectives.

A great question to ask your leadership team is "How long can your organization feasibly be out of business before the loss in production/revenue means that you'll be out of business forever?"

I hope that we begin to hear many more stories like yours in the aftermath of Sandy.

It's always great to have a plan in place, so kudos to you! Being in the UPS business, you can imagine the horror stories we hear on a daily basis, not to mention those that occur in the midst of such a devastating weather event like Sandy. One thing to add - make sure your UPS are in working condition every few months, or (at the very least) when you know bad weather's on the horizon. It's so easy to set UPS up and forget about them until it's too late. Here's how to make sure you're up and running and getting the most out of your investment.

And, I agree with Katie - I like hearing all of the success stories! Keep 'em coming :)

Even though it's not directly under my responsibilities, I know we do constant updates with Zerto to our DR center. The combination of a DR Center (and hardware therein), Zerto, and Veem allows us to stay up to date on any required data. Word is that we can now switch to our DR site and be online to all locations with only about a 5 second loss in the change-over. A year and a half ago... not as well.

Even though it's not directly under my responsibilities, I know we do constant updates with Zerto to our DR center. The combination of a DR Center (and hardware therein), Zerto, and Veem allows us to stay up to date on any required data. Word is that we can now switch to our DR site and be online to all locations with only about a 5 second loss in the change-over. A year and a half ago... not as well.

It would be a good product to research.

I've been speaking with a Zerto rep and I understand that at the remote site I need to have a vCenter installation. Can you tell me what other VMware licenses I have to have at the remote site for such a switch over?

The rep also told me that some companies were replacing their backup software with Zerto.

Bill, You'll need additional licenses for VMWare to match the hardware of a DR site as you currently have on systems you are backing up. Also consider something like MPLS lines to handle data between your site, the DR site, and any remote facilities you communicate. The company we use for MPLS lines has DNS records such that we can easily point all external sites to the DR site and run normally from there. When we switch back, Zerto will help to replicate the data changes back to servers at the primary site.

When I was at IBM we would also do something I've not seen anyone mention here - test. Once you've got your DR plan in place and ready to go, you should ideally test it to make sure it will actually do what you need it to do. This is an excellent article on what can (and does) happen when one just assumes that life will always be rosy. Always plan for the worst, but hope for the best.

Thanks!

1

This discussion has been inactive for over a year.

You may get a better answer to your question by starting a new discussion.