For the third year in a row, IBM AIX Unix operating system (OS) running on the company’s Power System servers scored the highest reliability ratings among 19 different server OS platforms – including other Unix variants, Microsoft’s Windows Server, Linux distributions and Apple’s Mac OS X.

Over three-quarters or 78 percent of survey respondents indicated they experienced less than one of the most common, minor Tier 1 incidents per server, per annum on IBM’s AIX v. 5.3 and AIX v 7.1 distributions Those are the results of the ITIC 2010-2011 Global Server Hardware and OS Reliability Survey. ITIC partnered with GFI Software (formerly Sunbelt Software) to conduct this independent Web-based survey. It polled C-level executives and IT managers at 468 corporations from 23 countries worldwide from November through January.

The survey data indicated that the reliability and uptime of all the major server OS and server hardware distributions has improved significantly over the past several years.

Microsoft’s Windows Server 2008 and Windows Server 2008 R2 served up the biggest surprise in the survey, scoring impressive reliability gains and making it one of the top three most reliable, mainstream server OSes. Windows Server 2008 R2’s reliability renaissance is especially impressive since Microsoft’s Windows Server OS noticeably lagged behind the majority of the UNIX, Linux and Open Source distributions in the ITIC/Sunbelt 2008 and 2009 Server Reliability surveys. This was particularly evident when it came to chronicling the most severe Tier 3 outages which typically last for four or more hours, involve data loss and require multiple members of the IT department to perform remediation.

An overwhelming 92% majority of Windows Server 2008 R2 users experienced less than one or one Tier 3 outage per server, per annum followed closely by the 90% of respondents using IBM’s AIX 7.1 who said they experienced one or less than one severe Tier 3 incident, per server per annum. Some 86% of Novell SuSE Linux Enterprise Server 11 and 84% of HP UX 11i v3 users also testified to the reliability of those platforms, reporting that they experienced either one or less than one unplanned Tier 3 outage per server, annually.

The survey found that all server OSes continue to make year-over-year reliability gains. The essay comments and first person customer interviews revealed that the majority of the moderate and severe Tier 2 and Tier 3 outages were attributable to integration and interoperability issues such as incompatible drivers, trouble applying patches, (particularly in highly customized environments), misconfigurations and the lack of a specific component or software fix for a particular platform.

Some IT managers also acknowledged that complexity and the IT department’s unfamiliarity with new products, software versions and new technologies like virtualization and private clouds prolonged downtime. This is particularly true in instances where corporations lacked the time or the funds to certify and re-train the appropriate members of the IT staff on new technologies.

The Sun Solaris 10 now owned by Oracle had respectable reliability statistics, though the Solaris on SPARC systems lagged behind most other OS distributions. Nearly 73 percent of respondents reported that Sun Solaris 10 recorded less than one Tier 1 per server, per annum outage, while only 63 percent of Sun Solaris 10 SPARC users achieved those same reliability results. The numbers were similar for the more moderately serious Tier 2 outages with 70 percent of users running Sun Solaris 10 on SPARC systems reporting less than one incident per server, per year. Sun Solaris 10 on x86 systems fared slightly better with 71 percent recording less than one Tier 2 incident per server on an annual basis. With respect to the most severe Tier 3 outages, 70 percent of Sun Solaris 10 on SPARC survey participants say they experienced less than one incident on each server during the year, compared with 74 percent of Sun Solaris 10 running on x86 platforms who reported less than one severe Tier 3 incident per server, per annum.

Overall, with respect to the most severe and prolonged unplanned Tier 3 outages, Sun Solaris 10 also lagged behind all of the major OS distributions with 70 percent of customers reporting less than one outage. That is the approximately the same percentage of organizations that are still using the eight year-old Windows Server 2003 server operating system. Some 69 percent of Windows Server 2003 users reported less than one per server, per annum Tier 3 outage.

IBM Tops in Server Hardware Reliability

IBM hardware was also best in class in terms of reliability, stability and performance. IBM’s System z mainframes recorded the least amount of downtime; 76% indicated System z machines experienced just one-to-five minutes of unplanned outages per server, per year, the equivalent of 99.999% or better availability.

Stratus Technologies’ ftServer 6300 and 4500 series and Fujitsu’s Primequest and Primergy Servers also made impressive showings. Some 75% percent of Stratus ftServer 6300 and 4500 users say they experienced one-to-five minutes of per server, per annum downtime, for five nines of availability. Some 74% of HP’s Integrity and Fujitsu Primequest and Primergy server said they experienced less five minutes or less of unplanned annual server downtime.

Among the other survey highlights:

A 57% majority of respondents said their server hardware is between one and three years old. One-in-five corporations – 20% – said their servers were three-to-four years old.

One-quarter – 25% — of businesses refresh their main line of business server hardware “as needed” and 10% said they upgrade a portion of their servers annually.

Only a very small 2% minority of organizations aggressively upgrade their servers every two years. The majority of companies are on a three, four or five year server refresh cycle with 15% of participants stating they upgrade servers every two years; 15% upgrade every three years and 17% are on a protracted five or six year server upgrade cycle. Another 15% said they have “no specific” server upgrade timetable.

A higher percentage of users prefer to apply patches manually rather than automatically. Nearly three out-of-10 organizations – 30 percent say they opt to apply patches manually, all or most of the time. Another 35 percent of survey participants say they “sometimes” apply patches manually. Only 16 percent of respondents never apply patches manually.

Some 26 percent of respondents who always use group policy to apply patches and 16 percent who sometimes utilize group policy methods compared to 52 percent of survey respondents who eschew group policy.

The manual patch method does take longer than applying patches automatically or using group policies. Overall 61 percent of those polled said they spend more than one hour applying patches to their server platforms for each specific upgrade . Of that figure, just under half – 29 percent – revealed that it takes them in excess of four hours to apply patches for each incident.

The length and severity of Tier 1, Tier 2 and Tier 3 unplanned outages and the patching actions related to each correspond to specific line item capital expenditure (CAPEX) and operational expenditure (OPEX) costs for the business. Reliability, measured by downtime, can positively or negatively impact TCO and accelerate or delay the time it takes to realize ROI.

Improvements or declines in reliability also mitigate or increase technical and business risks to the organization’s end users and its external customers. The ability to meet service-level agreements (SLAs) hinges on server reliability, uptime and manageability. These are key indicators that enable organizations to determine which server operating system platform or combination thereof is most suitable.

Overall, these survey responses provide crucial, comparative reliability metrics to enable customers to make informed choices on which server hardware and server operating system or combination thereof, best suits their specific business and budgets needs.

Conclusions and Recommendations

In summary the ITIC 2010-2011 Global Server Hardware and Server OS Reliability Survey findings indicates that all of the server operating system platforms have achieved a high degree of reliability. However, the IBM AIX 7.1 operating system, followed closely by Windows Server 2008 R2, HP UX 11i v3 and Novell SuSE Enterprise Linux 11 are the top four most reliable server OS distributions.

Do you have any availability/downtime data on Red Hat Linux running on Dell M710 Blades. I’m in the process of building an availability model and will appreciate any info you have re: Dell Blades running Linux.

Would be great to get these results translated into tables, charts. I’ve been following Laura’s reliability surveys for Yankee and now ITIC and have found them most useful, and I’d love to be able to compile multi-year changes to see where ‘cross-overs’ occur.

Hi, Colin: Thanks for the kind words about my research. I do have this information in tables and charts and I do track the year over year results. Just let me know what you’re looking for and I’ll see what I can do. Best Regards, Laura

Hello, Giovanni: Yes, I will post the graphic on customer satisfaction with IBM and other vendors within the next week. I just completed the ITIC 2013 Global Server Hardware and Server OS Reliability Survey, so you’ll be able to see the latest data. Thanks for asking.

Laura, thanks. I’ve been away! Enjoying the rodeo in Houston – all that testosterone…
I’m really looking for a simple graphic of the improvements in reliability provided by the main server OSs, over time (years). In our specific case, we’re examining AIX running over Power 5-7, and Linux (typically RHEL or SUSE) over x86 blades. IBM’s offering are excellent and come out in front in many of these surveys, but ‘how far away’ are blade and Linux, with recent advances? I see a lot of our telco-grade systems switching to blade, but not our batch systems.

I second Colin’s comment about charts and tables. I am very much interested in annual changes in reliability of x86/x64 servers vs. Power and SPARC as well as reliability of Windows, Linux, AIX and Solaris.