Ask Ars: with Xeon’s improvement, why bother with Itanium?

Itanium is always getting "canceled" in the press, but in the real world Intel …

In 1998, Ask Ars was an early feature of the newly launched Ars Technica. Now, as then, it's all about your questions and our community's answers. Each week, we'll dig into our question bag, provide our own take, then tap the wisdom of our readers. To submit your own question, see our helpful tips page.

Q: I've been reading for years that Itanium is going to get cancelled, but Intel still keeps producing new versions of it. So my question is what, specifically, is Itanium so good at that Intel keeps it around, despite the fact that Xeon keeps getting more powerful and is much cheaper than Itanium? What kinds of applications are people using Itanium for, and why can't they just switch to Xeon instead of hassling with a different architecture?

It's not for nothing that Intel's Itanium processor family is commonly called "Itanic." Predictions of the line's demise regularly crop up in the tech press, with the most recent one coming courtesy of Oracle, which declared the Itanium line dead before canceling the future development of its popular database stack for the architecture.

Despite the perpetual doom and gloom that has become part of Itanium's lore, the processor line marches on. Intel recently unveiled Poulson, the next entry in the Itanium line—a 32nm, 3 billion-transistor monster of a chip. And then there's Kittson, Poulson's official successor and the part that's rumored to be, yet again, the "last" Itanium processor before the line gets canceled.

Intel also recently unveiled the Xeon E7, another monster of a processor about which the company's own Kirk Skaugen made the following claim at IDF 2011 Beijing this past April: "Xeon's reliability and performance is now equal [to]—and in some cases better than—Itanium, and they're going to leapfrog [each other] in performance over time."

So the question you've posed is a good one. If, by Intel's own admission, Xeon is as good as Itanium now while being drastically cheaper, why do Intel and its customers bother with Itanium?

As with so many other things in this world, the answer boils down to money. But let's break the "money" answer down further by looking at Itanium from the perspective of both Intel and its customers.

Why Intel still makes Itanium

Intel has recently become fond of pointing out a set of numbers that tells you all you need to know about why the company continues to pour development money into Itanium.

In both the aforementioned IDF presentation and at the most recent Intel Investor meeting, Intel execs pointed out that Itanium has grown into a $4 billion per year business for the chipmaker.

Now, this $4 billion number is a lot smaller than $30 billion, which is the size of Intel's Xeon business. But it's a lot larger than $1.6 billion, which was the revenue for all of AMD combined (CPUs, GPUs—the whole company) in the first quarter of 2011. Furthermore, Intel puts the size of the entire Opteron ecosystem at $2.8 billion, so Intel's Itanium revenues alone beat everyone's total revenues (AMD plus all of its OEM partners) from Opteron.

So while Itanium shipments may be so small that they number in the hundreds of thousands, the business is big enough to make it worth staying in for Intel.

Why customers still buy Itanium

The Itanium decision on the customer side is dominated by two big issues: legacy software and RAS. Let's take the last one first, since it came first temporally.

RAS is an acronym that stands for "Reliability, Availability, Serviceability," and it essentially means "all of the features that you need to ensure that your system never goes down. Ever. For any reason. Because downtime costs you millions of dollars an hour."

Because companies depend on their mainframes to have zero downtime over the course of decades, mainframe architectures like Itanium and its competitors have long had RAS features that go beyond mere support for ECC RAM. If you're interested in learning more about these features, here's a really good breakdown of Itanium RAS features from 2007; Intel is committed to seeing Xeon support all of these, so that RAS isn't part of the Xeon vs. Itanium decision going forward.

While the commodity Xeon line may be finally gaining support for mainframe-level RAS features, it will take a very long time for Xeon to displace Itanium and other mainframe architectures in the mainframe market. That's because in the mainframe market, legacy issues are an order of magnitude more important than they are in any corner of the commodity tech market.

For mainframe users, "legacy" doesn't just mean a giant, crufty, Rube Goldberg-esque conglomeration of legacy programs and libraries that keeps your specific, idiosyncratic workload running 24/7—although that's part of the picture. It also means critical vendor and contractor relationships, including support and maintenance agreements that have multiyear timespans. With so much legacy inertia standing in the way of change, mainframe users are just not interested in entertaining the terrifying prospect of porting their mission-critical mainframe systems to an entirely new hardware and OS stack, so they keep refreshing and expanding the systems that they already have. Hence the market for new mainframe systems like Itanium. In other words, this is a classic case of vendor lock-in.

Intel's Kirk Skaugen acknowledged the centrality of OS and vendor choices in his aforementioned IDF presentation. Here's the full quote for reference:

"We used to position Itanium as highest performance, highest reliability... We're still committed to Itanium. It's really now a choice of operating system. Xeon's reliability and performance is now equal [to]—and in some cases better than—Itanium, and they're going to leapfrog [each other] in performance over time."

"If you like HP-UX, OpenVMS, Nonstop, and [other] mainframe operating systems, we're going to fully support you on Itanium. But now Xeon is in a space where there's no workload on the planet that Xeon can't handle."

While there may be no workload on the planet that Xeon can't handle, there are definitely some mainframe operating systems out there that it can't, and it can't because those OSes run on Itanium.

The reason Skaugen listed HP-UX first is because it's a really popular Itanium OS, and a factor in keeping customers locked into Itanium. According to a widely quoted IDC estimate, HP ships some 90 percent of Itanium systems sold, so in the vast majority of cases, choosing Itanium means choosing HP and HP-UX. And the reason customers keep buying those HP systems is because they bought HP Itanium systems in the past and they'd like to keep upgrading them and expanding them.

Whither Itanium?

While Itanium's future may not look particularly bright, it's not all that dim, either. Intel's newly announced Poulson will double the performance of the Itanium platform, and it also marks a fundamental shift in direction from the Itanium chips that have come before.

Poulson essentially gives up on the core VLIW (very long instruction word) idea that was the genesis of Intel's original IA-64 project, and it breaks apart VLIW's signature, statically scheduled instruction bundles into individual instructions before scheduling them (sometimes out-of-order).

Poulson also scales back the massively powerful floating-point hardware that (along with RAS) has historically set the Itanium line apart from Intel's x86 chips; this floating-point muscle was an artifact of the workstation era, when Intel designed Itanium to compete in the now-defunct RISC workstation market. In throwing out the monster FPU and replacing it in part with post-VLIW bookkeeping apparatus that's designed to improve Poulson's performance on the kinds of server workloads for which Itanium is actually used today, Intel is finally bringing Itanium a few major steps toward Xeon, even as it moves Xeon toward Itanium via RAS improvements.

So while Itanium and Xeon are beginning to look more and more alike, the aforementioned legacy and economic issues suggest that Itanium will stick around, at least in the medium-term, as a separate and viable product line within Intel.

This is probably a stupid question since I'm not familiar with the subject matter. But how much of that $4 Billion business represents new customers versus existing customers upgrading? I guess a related question would be, for someone doing a completely new build-out (and zero Itanium hardware) is there any reason not to go with Xeon?

"This is probably a stupid question since I'm not familiar with the subject matter. But how much of that $4 Billion business represents new customers versus existing customers upgrading? I guess a related question would be, for someone doing a completely new build-out (and zero Itanium hardware) is there any reason not to go with Xeon?"

There's a market for large systems - above 4 to 8 sockets - where x86 has relatively little market share. HP-UX Itanium boxes scale to 32 sockets, with 64 sockets on the horizon. Their Itanium-based NonStop systems have very impressive scalability, to several thousand processors.

Mainframe, that word does not mean what you seem to think it means (even NonStop and VAX are only descendants of a mini computer OS's). The biggest problem is trying to emulate the weird bitness of the legacy chips on x86, the VLIW design of Itanium has some serious advantages when it comes to emulating these old systems so even though raw performance of Itanium and the top bin Xeon's is close the performance when running legacy code is large. The biggest threat to Itanium is Oracle and IBM because they make the two largest RDBMS's and both want that $4B per year of system revenue (much more if you include services) for themselves and both are doing their best to kill HPUX.

There are a lot of reasons for new customers to take it up. Some people buy it because it ends up saving almost half a million dollars in Oracle licenses, because of the way HP in particular lets you hard-partition systems. So, you don't need to license the whole box, just the CPUs you're actively using. And as far as real "mainframe" computing goes, with things like Nonstop, those systems are designed to make it so you NEVER lose a packet or a DB query. A lot of Xeon environments like to use virtualization, and if your VM fails and has a hot-standby running, you can lose a packet or two. Not only that, but you can lose CPUs and DIMMs without the box going down. It will actually de-allocate a CPU, and continue running until you have a scheduled downtime window. The best x86 box will force you to reboot before you can do that, and usually at the loss of multiple CPUs to de-allocate one. Even on the latest Westmere-EX.

Did the Itaniums that support QPI use a variation on the 'boxboro' chipset? How different is the Itanium version from the Xeon version (or is it just BIOS/firmware)?If the platforms are similar, is it out of the question to consider a system set up to take either Itaniums or Xeons with the potential to have a mixed hybrid system in the future (especially considering the Itanium roots of UEFI and how Intel has been pushing it)?

Mainframe, that word does not mean what you seem to think it means (even NonStop and VAX are only descendants of a mini computer OS's). The biggest problem is trying to emulate the weird bitness of the legacy chips on x86, the VLIW design of Itanium has some serious advantages when it comes to emulating these old systems so even though raw performance of Itanium and the top bin Xeon's is close the performance when running legacy code is large. The biggest threat to Itanium is Oracle and IBM because they make the two largest RDBMS's and both want that $4B per year of system revenue (much more if you include services) for themselves and both are doing their best to kill HPUX.

And I say more power to them. Trying to use HPUX was the most frustrating experience of my life. It was probably entirely user (my) error, but I was glad when we stopped supporting that platform. Linux on x86 is good enough for me. A pox on the house of HPUX!

The Itanium Chip is used by several manufacturers but largely is used by HP. Itanium is hardware for HP's UNIX (HP-UX), just as POWER is hardware IBM's UNIX (AIX), and SPARC is Sun/Oracle's UNIX (Solaris). IDC.com reported UNIX to represent 21.8% of total server revenue for Q1 2011 so UNIX may be shrinking but it continues to be a sizeable part of the server market. Itanium is used extensively in Healthcare, Telecom, and Trading companies where downtime could result in loss of lives, large losses against SLAs, or large trading opportunity losses. The combination of Itanium Hardware wtih HP-UX creates an environment with 99.999% reliability with features not present yet in Xeon with Linux or Microsoft Windows (or with virtual server technologies available on those operating systems. Such featuers include full electrical isolation of hard partitions (within a server), built in support for failover clusters, redundant I/O paths, multiple levels of error correction code, double chip sparing, hot swappable power, fans, PCIe cards, clock, crossbars, etc.

In my opinion (excluding extraordinary influences) Itanium/HP-UX will be replaced by Xeon when the combination of Xeon and the available operating systems provide similar features to Itanium/HP-UX. When that happens, I believe Itanium, POWER, and SPARC will likely all die together in favor of x86 chips.

It is funny to note that the press seems to focus on Itanium when companies like HP have for the past 10 years maintaind between a 30-40% market share in High-end RISC + Epic Servers (Itanium-based) and companies like Sun/Oracle have seen drops in the same market share from @24% to 7%.

We run OpenVMS on Itanium and we see uptimes of a 2 or 3 years The main reason for downtime is power outages because the batteries in their UPS's only last a few years. (Yes they could do something different but a reboot every few years isn't a problem).

In my opinion (excluding extraordinary influences) Itanium/HP-UX will be replaced by Xeon when the combination of Xeon and the available operating systems provide similar features to Itanium/HP-UX

I mostly agree with that but I would modify it a bit. Itanium goes away when the feature difference is small enough that you just can't justify the massive price difference. Of course that day never comes when you are the government.

I always thought one of the big drivers of Itanium was the research/modeling area that tends to have a floating point fetish so I am surprised to see them cutting back on the FPU.

Now, this $4 billion number is a lot smaller than $30 billion, which is the size of Intel's Xeon business. But it's a lot larger than $1.6 billion, which was the revenue for all of AMD combined (CPUs, GPUs—the whole company) in the first quarter of 2011.

Why are you comparing yearly revenue to quarterly revenue? AMD's 2010 revenue was around 6.5B, so it seems like the Itanium business is about 60% of AMD by this measure.

Did the Itaniums that support QPI use a variation on the 'boxboro' chipset? How different is the Itanium version from the Xeon version (or is it just BIOS/firmware)?If the platforms are similar, is it out of the question to consider a system set up to take either Itaniums or Xeons with the potential to have a mixed hybrid system in the future (especially considering the Itanium roots of UEFI and how Intel has been pushing it)?

I can only speak to the HP version of Itanium servers, but their entry-level blades actually use Boxboro. The Superdome, which has more RAS features and scales much larger uses a proprietary chipset.

And I say more power to them. Trying to use HPUX was the most frustrating experience of my life. It was probably entirely user (my) error, but I was glad when we stopped supporting that platform. Linux on x86 is good enough for me. A pox on the house of HPUX!

HP-UX may not be as friendly as Linux but I got dozens of Itanium and PA boxes running a shared abusive environment and it is mostly boot it and forget it affair - upwards of thousands of days uptime isn't uncommon.

And the backwards compat with PA-RISC is stellar too. The only issue is Oracle as highlighted - many of the HPUX boxes run Oracle and that would definitely hurt. But still it makes for a great OS to run native and Java EE applications.

For those of you wondering why the FPU was cut down, it's simply because X86_64 took over in the last few years. The HPC market prefers to have a lot of cheap cores and don't care about losing one node of a cluster once in a while.

Itanium was a great processor for HPC but it was too power hungry and too expensive in the end.

This graph from top500.org displays well the rise and fall of Itanium in this particular market :

Now, this $4 billion number is a lot smaller than $30 billion, which is the size of Intel's Xeon business. But it's a lot larger than $1.6 billion, which was the revenue for all of AMD combined (CPUs, GPUs—the whole company) in the first quarter of 2011.

Why are you comparing yearly revenue to quarterly revenue? AMD's 2010 revenue was around 6.5B, so it seems like the Itanium business is about 60% of AMD by this measure.

HP ships some 90 percent of Itanium systems sold, so in the vast majority of cases, choosing Itanium means choosing HP and HP-UX. And the reason customers keep buying those HP systems is because they bought HP Itanium systems in the past and they'd like to keep upgrading them and expanding them.

This raises the question as to why did those customers buy Itanium systems in the first place? The answer is simple: planned obsolesce by HP for its previous HP-UX platform, PA-RISC. Not only that, HP managed to move customers from Compaq's Alpha architecture to Itanium as part of the HP-Compaq merger. OpenVMS was ported to Itanium and a port of Tru64 was discussed early on but never materialized.

That collusion between HP and Intel to migrate toward Itanium pretty much left the high end server market where it is today. Oracle is still developing the SPARC line up after it acquired Sun and IBM is still pushing POWER chips in high end servers to game consoles (including the just announced Wii U). Would Itanium be around today if it didn't have two legacy platforms and MS supporting Windows on it? Not very likely. Speaking of Windows, MS is dropping support for it too.

As for Itanium being a mainframe processor, I'd describe it more of a midrange architecture between the traditional mainframe and Unix based servers. Memory mirroring and lock step processor mirroring are features that can be added to many platforms with a bit of additional system logic to improve RAS. A true mainframe takes RAS to another level. For example, instead of memory mirroring, a modern mainframe incorporates RAID5 across multiple ECC protected memory channels with the ability to hotswap memory modules on a running system. System controllers are duplicated to provide further redundancy. Lock step mirroring isn't available across different sockets but entire systems. Virtualization is also a mainframe hallmark with the ability to virtualize a bare metal hypervisor inside of another bare metal hypervisor. Further more, the hypervisor can report virtual hardware to a guest OS like 8 virtual processors on a system with only 4 cores and no SMT.

Now, this $4 billion number is a lot smaller than $30 billion, which is the size of Intel's Xeon business. But it's a lot larger than $1.6 billion, which was the revenue for all of AMD combined (CPUs, GPUs—the whole company) in the first quarter of 2011.

Jon, are you sure these numbers are right? I can believe that Itanium system sales are 4 Billion a year, but that includes a whole lot more than chips and motherboards - the Intel slice of that is a alot smaller. Even the $30 billion for Xeon doesn't make sense from an Intel perspective - looking at Intel's 2010 financial reports they hadrevenues of ~$43 Billion total, and only 20% of that was from the Xeon/Itanium segment, so $8-9 Billion total for all their chips and motherboards.

A true mainframe takes RAS to another level. For example, instead of memory mirroring, a modern mainframe incorporates RAID5 across multiple ECC protected memory channels with the ability to hotswap memory modules on a running system.

IBM brought this all the way down to the x86 market with Chipkill, you can run most Tier1 servers with both Chipkill and mirroring (RAID51 if you please). Hotswap isn't such a big deal in the x86 world which is why HP discontinued things like hotswap PCI devices. I know if I had something with the kind of uptime requirements where hotswap PCI cards was a consideration I would rather just cluster the thing and have each node have redundant components and swap out the failed unit during a maintenance window.

$4 billion a year sounds nice, but I wonder how much money Intel has dumped into Itanic development. Xeon and Opteron are offshoots of an architecture also sold in desktops, notebooks, and workstations. Itanic is a massive die with a single market. Sure prices are high, but yeilds are probably really low. I just wonder, cause we all know how corporations can try to make things look favorable to hold a customer base.

Don't forget that the Itanium also underlies the NSK, the descendent of the Tandem computing line. Those boxes are incredibly reliable. In fact it used to be that you could pretty much fire a gun through the computer and take out a few CPUs and all you'd lose would be about a hundredth of a second and no transactions.

However, coding for that level of reliability is a bit too expensive. Nowadays you can still fire the gun, but you'll probably lose a trans or two that were executing on the CPU's you just destroyed. But the rest of the computer will roll merrily along.

Of course, since the network that people rely upon to access these boxes isn't nearly as reliable, (by about a factor of 100 in my experience), that's a *lot* of money being paid for a pretty marginal gain in availability.

I guess a related question would be, for someone doing a completely new build-out (and zero Itanium hardware) is there any reason not to go with Xeon?

If the software you need to run doesn't run on a Xeon-compatible OS.

I recently did a job at a trucking company that had redundant IBM mainframes to support some legacy software for scheduling or something or another. Swapping the software wasn't an option, so they paid on the order of $1.3mil a month in support/maintenance/etc. contracts for those machines.

Generally, the hardware costs in a large environment are chump change compared to the operating costs, which are again chump change compared to the cost of a radical change in your environment.

If Intel is backing off from producing CPUs with powerful FPUs (floating-point units) built-in, then what's going to stop someone from developing a new discrete math coprocessor? Can GPUs handle floating-point arithmetic as well as a specialist FPU could do so?

Thanks for the article Jon, that explains a lot more of the differences between mainframes and desktop PCs.

matthewslyman wrote:

If Intel is backing off from producing CPUs with powerful FPUs (floating-point units) built-in, then what's going to stop someone from developing a new discrete math coprocessor? Can GPUs handle floating-point arithmetic as well as a specialist FPU could do so?

GPUs themselves are specialized floating point processors, but becoming more general-purpose. It all depends on the type of load. GPUs have a ridiculous number of floating-point units that can run in parallel (the Radeon 6990 has 3000+ stream processors!)

I'd agree that the vendor servicing contracts and lock-in are a major reason for Itanium's continued usage. There was mention of legacy systems, but in our case the issue was not that the software wasn't supported on newer architecture, but rather the licensing costs of moving to a new platform.

We are still running one of those oldie HP Alpha boxes with VMS...and we were considering moving to Itanium and HP-UX. The problem is that changing the processor and/or the OS changes the platform...which would require new licensing. Licensing costs for Intersystems Cache is hundreds of thousands to millions of dollars just to put a different a couple words on a piece of paper.

Now we'd have had to take on that licensing cost regardless of processor because we were also going to change OSes, but that may not be the case for someone already on Itanium. You don't want to switch if you're already licensed on a platform. As the article and comments mention...you just upgrade and expand your current processor family. You don't switch unless there are very high dollar savings in converting to a new processor via equipment, staffing, and support.

Now, this $4 billion number is a lot smaller than $30 billion, which is the size of Intel's Xeon business. But it's a lot larger than $1.6 billion, which was the revenue for all of AMD combined (CPUs, GPUs—the whole company) in the first quarter of 2011.

Jon, are you sure these numbers are right? I can believe that Itanium system sales are 4 Billion a year, but that includes a whole lot more than chips and motherboards - the Intel slice of that is a alot smaller. Even the $30 billion for Xeon doesn't make sense from an Intel perspective - looking at Intel's 2010 financial reports they hadrevenues of ~$43 Billion total, and only 20% of that was from the Xeon/Itanium segment, so $8-9 Billion total for all their chips and motherboards.

I find ars' analysis of financials to be a key weak point. The guys (and girls) are excellent at tech, and so should either just stick to that, or engage a suitable writer to cover the financial aspects of the articles more appropriately.

IBM brought this all the way down to the x86 market with Chipkill, you can run most Tier1 servers with both Chipkill and mirroring (RAID51 if you please). Hotswap isn't such a big deal in the x86 world which is why HP discontinued things like hotswap PCI devices. I know if I had something with the kind of uptime requirements where hotswap PCI cards was a consideration I would rather just cluster the thing and have each node have redundant components and swap out the failed unit during a maintenance window.

Hot swap of a PCI/PCI-X/PCI-e card isn't that difficult of a feature to find even in x86 server world. The hot swapping of an expansion card is far simpler than hot swapping an active memory module.

ChipKill generally operates at the DIMM level for detection and correction. A single DRAM chip failure on the individual DIMM can be tolerated. This natively does not allow for memory hot swapping as there is no data redundancy between DIMMs or memory channels. RAID 1 of a memory channel would provide the redundancy in this case but it would take a bit more to support hot swapping it. A modern mainframe can support an entire memory channel failure without losing data.

Though I will agree that clustering is 'good enough' for most environments and high end x86 box will suffice. Mainframes escalate RAS into the realm of insanity.