Trending News

In a data center, nothing is more important to business continuity than redundancy.

At Vantage, testing redundancy is non-negotiable. In fact, we’re never not
doing it. We have a computerized maintenance management system (CMMS),
and every one of the components that make up our data centers is in that
system. Each component has a schedule for when, and how, it needs to be
maintained—and that includes testing.

“Certain things have a monthly schedule; other things have a
quarterly or annual,” says Chris Yetman, our Chief Operating Officer.
“And at some point, you’re turning the entire component off.”

Whenever we turn off a component, we note the alarms generated—and
sometimes not generated. If we don’t get the expected result, we create
additional follow-through tickets to resolve the alarm problem.

Introducing the campus-wide blackout.

Of course, we don’t stop with the component level. In the past, we’ve
run blackouts on entire buildings to make sure that we’re successfully
redundant. And this year, we decided to run a blackout on our entire
Santa Clara campus.

We didn’t do it lightly. “We had very stringent methods of
procedure,” Chris says. “And we had steps to take in the event of a
failure, so that there were no interruptions in service.”

For the test, which was completed on April 26, 2017, we brought in
additional staff, and we created stations with small teams of people
wherever we had an electrical switch or a key mechanical component.
There was communication across the board among all of our teams, using
radios and other methods, to make sure that we were staying on track.

When we pulled the plug, each station knew exactly how its equipment
was supposed to react. As long as things went according to script,
nobody would have to touch a thing. And if something didn’t work, we
had plans for every station in place, down to the second. As soon as a
number of seconds passed without the required result, each team had a
process for manually forcing the switch or backing out to ensure that
there was no disruption to our customers’ critical loads.

How did it feel to black out an entire campus? “It may have been a
little scary,” Chris says. “But we knew it was right thing to do.”

Reporting back from the blackout.

At 11:00 a.m. on April 26, 2017, we dropped the mains coming off our
substation and watched nearly 30 generators light up at once. We watched
the power transfer over successfully. And we watched the campus
continue to perform, with no interruption in service.

The blackout confirmed some of our hopes and expectations around
improvements we’d made. For example, over the previous six months, we’d
spent time analyzing the code that tells the parallel switch gear when
to switch over. As a result, we’d re-engineered our timing to improve
the speed, so we’d spend less time on battery and recover mechanical
power faster—and the blackout confirmed the efficiency of that
arrangement. In fact, we cut the amount of time it takes to transfer the
buildings to the generators nearly in half. Of course, we’re never
dropping critical load, but the lights came back on faster and the
mechanical equipment recovered faster, so there was less of a
temperature deviation. And the only way to prove that our buildings are
responding better under these conditions than previously was to test.

We had a few minor blips, too. One of our breakers failed. The power
was simply shunted off in another direction, so it wasn’t a problem—and
it’s an easy go-back-and-fix. And if we hadn’t caught it during the
intentional campus-wide blackout, it might not have been there when we
actually needed it. We also had an issue with a single generator, but we
have more than enough generator redundancy, so that wasn’t a problem,
either. Again, we were able to fix the generator problem and avoid
issues in the future.

Some of our customers were very interested in the blackout. Our
largest customer came over for a visit just before we started. “They
were a bit nervous,” Chris says, “but they completely understood why we
needed to do it.”

After the cut, the customer stopped by again and reported that
everything was good on their end—in fact, the temperature had hardly
moved. Later, when we sent out a notice that we were going to move back
to grid power, this customer wanted to stay and watch. “We invited them
into the room,” Chris laughs, “on the condition that they touch nothing
and remain quiet.”

Most of our customers didn’t even notice that anything had happened. In fact, on the morning of April 27, the day after the
blackout, we got an email from a customer asking us when we’d be doing
it. “They were disbelieving,” Chris says. “And it made me chuckle. But
that’s the point. Nothing happened, and nothing should happen.”

As COO, Chris has over 18 years of operations, engineering and IT experience in the Internet infrastructure industry. Chris is responsible for leading operations, security, network and IT for Vantage. He most recently served as SVP, Process and Technology at Integra. Previously, Chris was VP of AWS Infrastructure Operations at Amazon, where he had worldwide responsibility for operations and network for Amazon’s data centers. Chris also served as SVP of Operations at Level 3 Communications, SVP of Operations at Elevation Data Centers and VP of Operations Architecture at Genuity.

Chris graduated from Northeastern University with a Bachelor of Science in Computer Engineering.

Twitter

Brought to You By

Vantage Data Centers powers, cools, protects and connects the technology of the world’s well-known hyperscalers, cloud providers and large enterprises. Developing and operating across six markets in North America and five markets in Europe, Vantage has evolved data center design in innovative ways to deliver dramatic gains in reliability, efficiency and sustainability in flexible environments that can scale as quickly as the market demands.