Uptime Assurance through the Mobilization and Augmentation of Data Center IT Teams with Artificially Intelligent Applications

Across the entire data center landscape, when asking professionals about what their top concern is within their facilities, they’ll likely respond with one issue: downtime. Even with the most diligent teams and thorough operational protocols, too many factors need to be juggled to completely safeguard against outages, especially when the possibility for human error is involved. With downtime being so debilitating (reports show that in 2017 and 2018 the cost of a single hour of unplanned downtime could reach upwards of $5 million, not to mention the cost of a damaged reputation due to unreliability), data center professionals are seeking new and innovative solutions to prevent costly outages.

When it comes to protecting data center systems against failure, many problems can be solved by careful monitoring and vigilance across day-to-day operations. However, without some way to augment a data center’s existing teams of technicians and engineers, it is not feasible to maintain such an extraordinarily high level of constant oversight. Fortunately, the development of newer technologies based on Artificial Intelligence (AI) will enable data centers to successfully combat issues like downtime, allowing them to meet uptime guarantees and prevent costly outages. In describing the top ten strategic technology trends for 2018, Gartner analysts predicted that AI would become a major industry player. AJ Byers, CEO of ROOT Data Center, a Montreal-based next-generation data center company, notes, “the ability to use AI to enhance decision making, reinvent business models and ecosystems, and remake the customer experience will drive the payoff for digital initiatives through 2025.”

ROOT investigated
the prospect of using AI as the extended eyes and ears of its data center’s
operational teams, adding additional layers of automated surveillance that can
foresee, and potentially correct, issues. ROOT challenged themselves to develop
a plan for utilizing AI sensors and machine learning to predict possible
faults, eliminate human error, reduce downtime and drive efficiency across the
data center.

How
This Opportunity or Challenge was Met

ROOT developed a five-year strategy consisting of four related
projects, the first of which was launched in 2017 and continued to run through
the end of 2018. This initial project focused on installing and deploying
sensors within the generator platform of a 5MW data hall. By deploying these
sensors, data is collected and employed machine learning is applied to
establish a baseline operating level, which would then allow them to alert data
center personnel of operation outside of the baseline indices. Since
the AI persona was implemented, it has gone through over 3,000 training
sessions with the generators, representing 250 hours of monitoring
augmentation.

In the subsequent phases and projects, ROOT established goals and outlined plans for augmenting data center trend analysis, enhancing AI data center controls and finally moving to an AI-first operator system with a human fail-safe. In a step-wise fashion, the AI persona is planned to be expanded into primary monitoring systems where it will predict generator failure and allow for preventative maintenance. From there, it would then be incorporated in a holistic way wherein the operators would no longer make decisions, only confirm the AI’s appraisal and decisions.

The AI system
utilized 3,000 training sessions and 25,000 work units to expand on the domain
knowledge of the sensors, such that they are sensing, identifying and learning
about generator maintenance issues in a large variety of operating conditions.

This project is the
world’s first instance of using AI in a colocation data center to measure and
reduce customer downtime. Alex, the name ROOT’s team gave the AI system, has
become an effectively integrated part of data center operations.

Benefits
of the Initiative

By developing and
implementing the first stages of its five-year plan, ROOT successfully
developed a cost-effective and innovative strategy to reduce the risk of human
error and minimize downtime. AIex successfully maintained uptime for ROOT
customers.

ROOT’s AI was able
to overcome significant technical barriers, including varying ambient noise,
individual noise signatures and a range of other occurrences that demand a
dynamic approach and adjustment strategy for generator operations. Throughout
this project, ROOT not only reduced the risk for downtime and achieved 100
percent uptime within their facility but increased the operational efficiency
for ongoing maintenance.

Overall, Alex set a precedent that has possible applications in other data centers throughout the world, benefitting the industry and its customers that depend on data center uptime. ROOT’s well-read white paper on this use case has resulted in other data centers following in the company’s footsteps, a clear indication of how AI can reduce the risk of downtime.

About the Author

AJ Byers is CEO of ROOT Data Center, a leading Montreal firm that specializes in next-generation colocation that goes beyond reliability and security. AJ leverages over 20 years of experience in the data center industry to support and promote business growth and transformation. Prior to joining ROOT, AJ served as President of Rogers Data Centers, where he was instrumental in leading the team in the development of one of Canada’s largest data center service companies.

Resource Links:

Industry Perspectives

In this special guest feature, Brian D’alessandro, Director of Data Science at SparkBeyond, discusses how AI is a learning curve, and exploring opportunities within the technology further extends its potential to enable transformation and generate impact. It can shape workflows to drive efficiency and growth opportunities, while automating other workflows and create new business models. While AI empowers us with the ability to predict the future — we have the opportunity to change it. [READ MORE…]

Latest Video

White Papers

The data catalog has come from nowhere in the past five years to become a key enabling technology for multiple use cases including self-service analytics, self-service data preparation and multi-location data management. Download a new white paper from Unifi Software that explores the data catalog as a major data management breakthrough, as well as its importance in enabling modern analytics architecture.