Hurricane Sandy: Disaster Recovery Improv Tales

In lower Manhattan, Peer1 Hosting used a bucket brigade to replenish fuel for diesel generators on the 18th floor after pumps and the elevator broke down. In New Jersey, SunGard rerouted fuel trucks to avoid flooded intersections.

Disaster preparedness is a well-known best practice in running a data center, but Hurricane Sandy is showing that in disasters, the unexpected happens. When it does, some disaster recovery plans turn out to have holes in them, while others may still require improvisation.

Even SunGard, a specialist in disaster recovery, had its own brush with disaster when a river levee broke in Carlstadt, N.J. It had three data centers in the nearby vicinity. They had been built on raised floors on what little high ground was available in the region and all three ended up avoiding the rising waters that crept up the margins of the site and into its parking lots. And then there was the issue with the fuel trucks. But more on that later.

The response that may win the Mayor's Office's resilience award, if not one for physical fitness as well, was the formation of a bucket brigade at the Peer 1 facility at 75 Broad in lower Manhattan to carry diesel fuel in five-gallon buckets and jerry cans from a tank at ground level to the 17th floor. From there it was pumped from a day tank into generators positioned atop the 18th floor.

Sabio Banducci, president and CEO of Peer 1 Hosting, said his firm had expected to have to shut down after "four feet of water filled the lobby" and infiltrated into the basement Monday evening, where the building's diesel fuel delivery system was located. That fuel distribution system failed as it was submerged in salt water.

Building engineers at 75 Broad attempted to implement a workaround piping system, but the building pump available wasn't powerful enough to move fuel oil up 17 stories. Peer 1's disaster recovery team, which had survived the 2003 power outage in Manhattan for days without a shutdown, thought ahead and contacted a firm with truck-mounted pumps. But with public transit shut down, city streets were clogged, and the firm wasn't able to move a truck to the Peer1 quickly enough to prevent a shutdown once the short term tanks at the generators ran dry.

Peer 1 customer, SquareSpace, a website development firm, notified its customers of a potential shutdown, noting Peer1's efforts: "Fuel and water-pumps are in short supply" in the city, it said in a posting on its website Tuesday.

Peer 1 posted a status update 10:45 a.m. Tuesday that it was "going to implement a controlled shutdown." But the shutdown never occurred.

Peer 1's generators were on the roof and not subject to flooding, provided they could find some way to get fuel to them. The Peer 1 data center manager and his team decided they could organize a squad to carry fuel by hand up the building's stairwell -- the elevators were out of service, of course. There the fuel could be poured into the day tank's distribution system. And with that, a latter day bucket brigade, not conceived of in the disaster recovery plan, was born.

Data center staff and other Peer 1 employees, plus some contractors, formed a team of 25 that worked deep into night of Oct. 30 and the morning of Oct. 31 to refill the 17th floor tank and keep the generators running. Customers who had seen the "controlled shutdown notice" arrived at the scene, thinking they might need to take extraordinary measures to conserve data. Instead they found a data center that was continuing to run, against the odds.

"Some of our customers came down believing they would have to power down, and instead they joined the bucket brigade," said Banducci.

A brigade of 25 people on the building's stairs lifted heavy, five-gallon containers up flights of stairs. One customer lending a hand was SquareSpace, a Manhattan website development firm. SquareSpace employees posted pictures of the operation on the firm's website Wednesday. Another was Fog Creek, an online project management firm for collaborative software development, located next door at 55 Broad.

Fuel trucks arrived intermittently, usually with eight fifty-gallon drums of diesel that were unloaded and painstakingly poured into five gallon containers at street level. The generators' appetite on the 18th floor proved relentless. Three different 25-man teams worked in shifts in the stairwell.

Banducci said a black humor emerged about how the company had engineered a self-improvement fitness program -- except for the sleep-deprivation. As he discussed the situation Wednesday evening, the generators were still running with a nearly full tank, and the data center had been up continuously. There was fuel to spare at ground level and no prospect of a shutoff. The workers had even been given 90 minutes off for lunch, earlier in the day. Other staffers brought lunch to the crews by foot over the Brooklyn Bridge, avoiding the city's clogged streets. That contingency hadn't been in the disaster recovery plan either.

Not everybody was so fortunate. Co-location and managed service provider Internap at the same location reported a different story. It too was prepared for a power outage with on-site generators and a back-up fuel supply. "As a result of the flooding, both our redundant fuel pumps and our generator fuel tank were compromised and shut down. The system continued to run until all fuel within the secondary feeder tanks were exhausted and our facility lost power" at 11:45 a.m. Tuesday, the company said in a notice.

Senior VP for development and operations Steve Orchard reported in an update Wednesday morning that the 75 Broad Street facility had been brought back online through restored fuel system and generator power. Customers systems were up and running, with 40 hours of diesel fuel on hand and a resupply truck available, he wrote.

A second Internap facility at 111 Eighth Street in lower Manhattan also had rooftop generators. But the "building-fed fuel system malfunctioned" and "pumps could not provide diesel to the rooftop generators, causing them to stop supplying power to our un-interruptible power supply system. Once battery backup was exhausted, our infrastructure lost power," Orchard explained in an Internap status update.

Some great stories of people adapting to the circumstances. It is unrealistic to believe you can think of everything that could possibly go wrong ahead of time. I realize that 20-20 hindsight makes people say "Well they should have thought of that", but things that are obvious after that happen just aren't so obvious beforehand. Bottom line you have to be adaptable to get through events like this. That doesn't mean you don't plan for what you can think of but also don't beat yourself up very badly afterwards. Adjust your plan, but don't fret over the unforeseeable.

Among 688 respondents, 46% have deployed mobile apps, with an additional 24% planning to in the next year. Soon all apps will look like mobile apps – and it's past time for those with no plans to get cracking.