IBM Power Systems Tops Global Reliability Survey

The saying, “Time is money” perfectly encapsulates the reality of daily operations in today’s complex and demanding corporate IT environments. Reliability and continuous, uninterrupted access to network resources comprise the essential foundation for daily business operations. High availability (HA) is crucial in the 21st century where servers, OSes, desktops, mobile devices and applications are increasingly interconnected via the Internet of Things (IoT), virtualization and the cloud.

When data is inaccessible for any reason, business ceases. This has a domino effect on the corporation and its end users, customers, business partners and suppliers.

Uptime Requirements

According to Information Technology Intelligence Consulting’s (ITIC) 2017-2018 Global Server Hardware and Server OS Reliability survey, 80 percent of businesses surveyed require a minimum of 99.99 percent uptime—commonly referred to as “four nines.” That’s double the 39 percent of respondents who indicated they required four nines of uptime five years ago. The independent web-based survey polled 800 executives and IT managers at corporations worldwide in August and September. In order to obtain the most accurate and objective results, ITIC accepted no vendor sponsorship.

Additionally, 17 percent of respondents indicated their businesses now require “five nines” or 99.999 percent server and operating system uptime. That equates to 5.26 minutes a year, 25.9 seconds a month or 6.05 seconds a week of unplanned downtime per server (see Figure 1). Moreover, 3 percent of leading-edge businesses—mostly in highly regulated industries such as banking, finance, government and healthcare—need 99.9999 percent uptime. This near-flawless server availability equates to 31.5 seconds of unplanned downtime per server, per month.

“Reliability is paramount. There’s zero tolerance for downtime here,” says Jim Brown, a software quality assurance analyst and system administrator at Boston University. “If BU students and productivity workers can’t access their data and applications for even a few minutes, it’s problematic for all concerned.”

Brown says BU is very proactive in both quality assurance (QA) and reliability testing of all servers and software, while also prioritizing vendor technical service and support. “We need a minimum of four nines and preferably five nines of availability. But invariably, situations can arise that cause unplanned server, OS and application downtime. When that happens, we need to know that our vendors will be right there to assist us,” Brown says.

Andrew Baker, CIO at Brainwave Consulting, a Charleston, West Virginia-based security consultancy, agrees. “The only good downtime is no downtime,” Baker says. “Once a server goes down or an application is unavailable, companies only want to know one thing: ‘How fast can you get us up and running again?’ ” And for small and medium-sized organizations that lack the resources of their enterprise counterparts, extended downtime can put these companies out of business, Baker observes.

IBM Enterprise Servers Top ITIC Reliability Poll

For the 10th year in a row, corporate enterprise users rated IBM Power Systems* and IBM Z* enterprise servers No. 1 in reliability. Among midrange servers, POWER8* processor-based systems recorded just 2.5 minutes of unplanned downtime per server, per annum due to an inherent flaws in the server hardware or components. Between them, the IBM Power Systems and IBM Z servers delivered the highest availability among 14 server models and 11 different server hardware virtualization platforms (see Figure 2).

Other survey highlights include:

IBM Power Systems running Linux* once again exhibited the least amount of unplanned downtime among all mainstream Linux server platforms, with 2.5 minutes per server/per year. These results are notable both for the low unplanned downtime rates as well as for the consistency of IBM servers.

An 88 percent majority of IBM Power Systems clients running Red Hat Enterprise Linux (RHEL), SUSE or Ubuntu Linux experience less than one unplanned outage per server, per year

Only 1 percent of IBM servers recorded more than four hours of unplanned downtime per server, per annum, followed by 6 percent of HPE servers, 8 percent of Dell servers and 10 percent of Oracle servers

IBM hardware running Linux was first or second in every reliability category, including virtualization and security

Survey respondents revealed that server workloads have increased by an average 37 percent over the last 12 to 24 months. The larger workloads are driven by a rise in more compute-intensive applications including data analytics, CRM and ERP.

Overall, 47 percent of respondents across all server platforms indicated the increased data center server workloads negatively impacted monthly and annual server reliability/availability versus 34 percent of enterprises that said their firms had not experienced a decline in server uptime

Hourly Downtime Costs Soar

Organizations, irrespective of size and vertical market, are almost wholly dependent on the reliability of their servers to do business. Additionally, corporate environments continue to increase in size and scope. Businesses are also expanding their use of complex technologies such as virtualization, cloud, IoT and the network edge/perimeter. Today’s businesses also incorporate mobility and bring your own device (BYOD) solutions.

The consequences of downtime ripple across an organization’s entire ecosystem, and the cost for a single hour of server, OS or application downtime continues to increase. In the ITIC survey, 98 percent of respondents reported that a single hour of downtime costs their companies $150,000 or more. And 31 percent estimated that an hour of downtime costs their companies up to $400,000—a 7 percent increase from the 2014 survey. A third of respondents say one hour of downtime now costs between $1 million and $5 million.

It’s important to note that these statistics represent only the average downtime costs related to lost productivity and remediation time to restore connectivity and resume full operations. The costs will increase commensurately if data is lost, damaged, destroyed or changed. The downtime figures are also exclusive of any litigation, civil or criminal penalties that may ensue in the wake of the outage. In addition, they omit the cost of any discounts or rebates an organization may give to its customers, business partners or suppliers as a goodwill gesture.

Reputation damage is more difficult to quantify monetarily, but can result in lost business opportunities.

Catastrophic Events and Security

Security breaches and natural disasters—such as 2017 Hurricanes Harvey, Irma, Maria and Jose—can wreak havoc on businesses. The former is an ongoing battle, and 59 percent of survey respondents cited it as a chief factor negatively impacting reliability.

Mother Nature’s unpredictability has undermined reliability for countless businesses during the hurricane season. While server and application downtime may be inevitable due to power outages following a storm’s landfall, organizations can take practical steps to mitigate the impact. Those include deploying reliable server infrastructure and being prepared and working with vendors to ensure a fast response in the wake of a disaster.

Steve Sommer, CIO at Stromberg & Forbes LLC, a healthcare firm based in Marco Island, Florida, took a direct hit from Hurricane Irma. Sommer is experienced in dealing with catastrophic events; he was the CIO at a New York City law firm near the World Trade Center on Sept. 11, 2001. Power in lower Manhattan—including at many of the organization’s offsite backup locations—was knocked out for weeks.

“We were lucky. Our vendors like IBM and Microsoft just showed up within 24 hours without us having to call them and got us up and running again,” Sommer recalls of the 2001 experience. “It’s crucial to craft a cohesive disaster recovery (DR) and backup plan and to work with vendors who have reliable products and who are true partners, like IBM,” he adds. That forethought and planning aided Sommer as he helped his current organization recover from Irma.

Stu Sjouwerman is the founder and chief executive of Clearwater, Florida-based cybersecurity firm KnowBe4concurs. In the 1990s when Hurricane Andrew and other storms struck his former company, Sunbelt Software, staffers hauled power generators to the physical data center to keep operations running.

“You have to be prepared and a big part of preparedness is having the most reliable, robust and secure servers that are appropriately right-sized for their workloads,” Sjouwerman says. “And you have to have a DR plan that includes the cloud and off-site locations.”

When Hurricane Irma hit Florida in September, the business impact was minimal because KnowBe4’s data resides on remote servers.

KnowBe4 recovered quickly without having its business operations disrupted, although it did lose power at its Clearwater office. The company set up a temporary office nearby for employees who hadn’t had power restored to their homes yet so that operations could continue. “It was a headache but we made it work,” Sjouwerman says, crediting the quick recovery to reliable server infrastructure, top-notch security and close partnerships with vendors. “You can’t predict the weather, but you can minimize downtime with preparedness,” he says.

IBM Systems Magazine is a trademark of International Business Machines Corporation. The editorial content of IBM Systems Magazine is placed on this website by MSP TechMedia under license from International Business Machines Corporation.