Share this story

If you're still using Intel's Itanium processors, you'd better get your orders in soon. Intel has announced that it will fulfill the final shipment of Itanium 9700 processors on July 29, 2021. The company says orders must be placed no later than January 30, 2020 (spotted by Anandtech).

The Itanium 9700 line of four- and eight-core processors represents the last vestiges of Intel's attempt to switch the world to an entirely new processor architecture: IA-64. Instead of being a 64-bit extension to IA-32 ("Intel Architecture-32," Intel's preferred name for x86-compatible designs), IA-64 was an entirely new design built around what Intel and HP called "Explicitly Parallel Instruction Computing" (EPIC).

High performance processors of the late 1990s—both the RISC processors in the Unix world and Intel's IA-32 Pentium Pros—were becoming increasingly complicated pieces of hardware. The instruction sets the processors used were essentially serial, describing a sequence of operations to be performed one after the other. Executing instructions in that exact serial order limits performance (because each instruction must wait for its predecessor to be finished), and it turns out isn't actually necessary.

There are often instructions that don't depend on each other, and they can be executed simultaneously. Processors like the Pentium Pro and DEC Alpha analyzed the instructions they were running and the dependencies between them, and those used this information to execute instructions out of order. They extracted parallelism between independent instructions, breaking free from the strictly serial order that the program code implies. These processors also performed speculative execution; an instruction depending on the result of another instruction can still be executed if the processor can make a good guess at what the result of the first instruction is. If the guess is right, the speculative calculation is used; if the guess is wrong, the processor undoes the speculation and retries the calculation with the correct value.

The processor must still act "as if" it's running instructions serially, one by one, in the exact order that the program determines. Considerable processor resources are dedicated to handling this; first figuring out which instructions can be run in parallel and out of order, and then putting things back together again when updating system memory, to ensure the illusion of serial execution is preserved . Instead of putting all this complexity in the processor, Intel's idea for IA-64 was to put it into the compiler. Let the compiler identify which instructions can be run simultaneously, and let it tell the processor explicitly to run those independent instructions in parallel. With this approach, the processor's transistors could be used for things like cache and functional units—the first-generation IA-64 processors could run six instructions in parallel, and the current chips can run a whopping 12 instructions in parallel—instead of using those transistors for all the machinery to handle the out-of-order, speculative execution.

Theory meets reality

This was a nice idea, and indeed for some workloads—particularly heavy-duty floating point number crunching—Itanium chips performed decently. But for common integer workloads, Intel discovered a problem that compiler developers had been warning the company about all along: it's actually very hard to figure out all those dependencies and know which things can be done in parallel at compile time.

For example, loading a value from memory takes a varying amount of time. If the value is in the processor's cache, it can be very quick, fewer than 10 cycles. If it is in main memory, it may take a few hundred cycles to load. If it's been paged out to a hard disk, it could be billions of cycles before the value is actually available for the processor to use. An instruction that depends on that value might thus become ready for execution within a handful of nanoseconds, or a billion of them. When the processor is dynamically choosing which instructions to run and when, it can handle this kind of variation. But with EPIC, the scheduling of instructions is fixed and static. The processor has no way of carrying on with other work while waiting for a value to be fetched from memory, and it can't easily fetch values "early" so that they'll be available when they're actually needed.

This problem alone was likely insurmountable, at least for general-purpose computing. But Itanium then faced challenges even in those fields where it showed some strength. The initial Itanium hardware included hardware-based IA-32 compatibility, so it could run existing x86 software, but it was much slower than contemporaneous x86 processors. For companies wanting to transition their software from 32-bit to 64-bit, this wasn't very satisfactory. During the transition, the ability to run mixed workloads (some software 32-bit, some 64-bit) is valuable. IA-64 didn't really offer this transitional path; it could run 64-bit software at native speed but took a big hit for 32-bit software, and the x86 chips that were good at 32-bit software couldn't run IA-64 software at all.

Intel's competitor AMD also wanted to build 64-bit processors, but without the resources to come up with an all-new 64-bit architecture, AMD did something different. Its AMD64 architecture was developed as an extension to x86 that supported 64-bit computation. AMD didn't want to fundamentally change how processors and compilers worked; AMD64 processors continued to use the same out-of-order execution and complex hardware as was found in high-performance IA-32 chips (and which continues to be essential to high-performance processors to this day). Because AMD64 and IA-32 were so similar, the same hardware could be easily designed to handle both, and there was no performance hit to running 32-bit software on the 64-bit chips, so transitional, mixed workloads could run unhindered.

This made AMD64 much more appealing to developers and enterprises alike. Intel scrambled to create its own extension to IA-32, but Microsoft—which already supported IA-32, IA-64, and AMD64—told the company that it wasn't willing to support a second 64-bit extension to x86, leaving Intel with little choice but to adopt AMD64 itself. It duly did so (albeit with some incompatibilities), under the name Intel 64.

IA-64 left with no place to go

This squeezed out Itanium from most markets. AMD64 offered the transitional path from IA-32, so it won over the enterprise and swiftly moved down into the consumer space, too. Itanium still had a few tricks up its sleeve—Intel's most advanced reliability, availability, and serviceability (RAS) features made their debut with Itanium first, so if you needed a system that could take serious problems like memory failures and processor failures in stride, Itanium was, for a time, the way to go. But for the most part, these features are now available in Xeon chips, eliminating even that advantage.

The proliferation of vector instruction sets—AMD64 made SSE2 mandatory, and Intel's AVX512 adds substantial new capabilities—also means that it's still possible, in some ways, to explicitly instruct the processor to perform operations in parallel, albeit in a fashion that's much more constrained. Rather than bundles of different instructions all meant to be performed simultaneously, the vector instruction sets perform the same instruction to multiple pieces of data simultaneously. This is not as rich and flexible as the EPIC idea, but it turns out to be good enough for many of those same number-crunching workloads that Itanium excelled at.

Currently, the only vendor still selling Itanium machines is HPE (the enterprise company that came from HP's 2014 split) in its Integrity Superdome line, which runs the HP-UX operating system. Superdome systems offer a particular emphasis on RAS, which once made Itanium a good fit, but now they can be equipped with Xeon chips. Those, rather than Itanium, have a long-term future. HPE will support systems up to at least 2025, but with the end of manufacturing in 2021, the machines will be living on borrowed time.

156 Reader Comments

"This made AMD64 much more appealing to developers and enterprises alike. Intel scrambled to create its own extension to IA-32, but Microsoft—which already supported IA-32, IA-64, and AMD64—told the company that it wasn't willing to support a second 64-bit extension to x86, leaving Intel with little choice but to adopt AMD64 itself. It duly did so (albeit with some incompatibilities), under the name Intel64."

What are some of the incompatibilities between AMD64 and Intel64, this is the first I've heard of this and I'm genuinely curious?

Also, not that Itanium was widespread in its adoption, but it's always sad to see yet another architecture go. We're entering a world where we're going to be an ARM/x86(64) bi-culture as MIPS slips, POWER disappears, and RISC-V hasn't materialized yet in a meaningful way.

Goodbye Itanium. An interesting idea, executed poorly (in that Intel wanted to squeeze EVERY BIT of money out of the new hardware, making it out of reach of almost everybody). I think it's hilarious that AMD literally got it's name on the 64 bit instruction set because Intel was so short-sighted they didn't believe a transitional set was useful.

I remember the arguments in CPU & MoBo over Itanium. a few people slobbering all over themselves when Merced's SpecFP results were posted, and one or two in particular who insisted IA-64 was going to take over the entire world.

Turns out a general-purpose CPU has to be good at things other than stuff written in FORTRAN.

What is EPIC? You just kinda name drop an acronymn and no where in the article is it defined.

Explicitly Parallel Instruction Computing. Taking a bunch of instructions, assembling them into an "instruction bundle," then executing them all at once. Unfortunately, because of the dependencies Peter describes, those bundles would contain a lot of "NOP" (no operation) instructions just to fill them.

I remember an HP rep in 2004 telling me it would be a mistake to choose IBM Power over Itanium due to the fact that Intel was about to ramp up their investment in Itanium (although he could not put that in writing). It was already becoming clear at that time that Intel was not willing invest enough to make Itanium viable.

Another HP rep around the same time sold another group in my company on the idea that Itanium was the perfect solution to a Windows SQL sever running out of capacity, and gave us a box to test on. When we ran the tests, around 80% of the time Itanium would outperform x86 by 10%. For the other 20% of the time, Itanium was 400% slower. HP blamed Microsoft and promised a fix would be out any day. It might have been MS's fault, but over the next year, the fix never came.

As recent as 2012, another HP rep tried to convince me that despite the fact they were nowhere near as fast as x64, the reliability features more than made up for that.

As indicated in the article, the architecture probably had insurmountable flaws from day 1.

AMD64 is probably the best thing that happened to Intel. If they had continued to focus their 64 bit efforts on Itanium, IBM Power would probably be a much more common chip today.

I had a buddy who nearly got a job out of graduate school working for Intel early on developing the Itanium line, right before it got steamrolled by AMD64 and the Opteron. As I recall, he basically had the job, but the whole thing got held up because of some technical issue that scuttled the whole thing.

To this day, he insists that he still regrets not landing the job, and I just as stridently insist that he dodged the mother of all career bullets. He's a sensibly guy, so he obviously doesn't really think this, but I suspect that some part of his animal brain does think he could have fixed it.

I remember the arguments in CPU & MoBo over Itanium. a few people slobbering all over themselves when Merced's SpecFP results were posted, and one or two in particular who insisted IA-64 was going to take over the entire world.

Turns out a general-purpose CPU has to be good at things other than stuff written in FORTRAN.

I think as late as 2011 or so if you said Itanium too many times in one thread, Paul Demone would reregister an account just to make sure everyone knew Itanium was going to eventually replace x86.

"This made AMD64 much more appealing to developers and enterprises alike. Intel scrambled to create its own extension to IA-32, but Microsoft—which already supported IA-32, IA-64, and AMD64—told the company that it wasn't willing to support a second 64-bit extension to x86, leaving Intel with little choice but to adopt AMD64 itself. It duly did so (albeit with some incompatibilities), under the name Intel64."

What are some of the incompatibilities between AMD64 and Intel64, this is the first I've heard of this and I'm genuinely curious?

Itanium was really interesting to read up on after I took some microprocessor classes. Cool stuff, but its downfall seems like it's due to the same sneaky reality that many don't realize - great software is 10 times harder to do than hardware.

I took a CUDA computing course, and know full well that compiling and parallelism is hard, you really need to customize code to the specific task you're doing. So looking back on Intel trying to push the complexity to the compiler, there's no surprise it didn't go well.

Whether or not it was a practical idea, Itanium was an interesting experiment that tested a genuinely new approach to executing instructions. Who - if anyone - has done anything bold for general computing since then?

I remember the arguments in CPU & MoBo over Itanium. a few people slobbering all over themselves when Merced's SpecFP results were posted, and one or two in particular who insisted IA-64 was going to take over the entire world.

Turns out a general-purpose CPU has to be good at things other than stuff written in FORTRAN.

I think as late as 2011 or so if you said Itanium too many times in one thread, Paul Demone would reregister an account just to make sure everyone knew Itanium was going to eventually replace x86.

"This made AMD64 much more appealing to developers and enterprises alike. Intel scrambled to create its own extension to IA-32, but Microsoft—which already supported IA-32, IA-64, and AMD64—told the company that it wasn't willing to support a second 64-bit extension to x86, leaving Intel with little choice but to adopt AMD64 itself. It duly did so (albeit with some incompatibilities), under the name Intel64."

What are some of the incompatibilities between AMD64 and Intel64, this is the first I've heard of this and I'm genuinely curious?

There were some minor differences in the exact instructions supported in 64 bit mode between the AMD Hammer chips and the first Intel P4s with 64 bit support. They were corrected pretty quickly though and didn't affect much in practice.

That's mostly due to HP, as they have been more committed to Itanium than Intel has been for years. There was a point where Oracle announced they were ending support for Itanium due to low demand, and HP took them to court for $3 Billion (and won), and forced them to resume development and support for the platform.

What is EPIC? You just kinda name drop an acronymn and no where in the article is it defined.

Explicitly Parallel Instruction Computing

That makes more sense than Electronic Privacy Information Center, or Elderly Pharmaceutical Insurance Coverage or European Prospective Investigation Into Cancer and Nutrition (which would be EPICN, but that's being nit picky), or Eclipse Plugin Central or, well, here's the list.

Explicitly Parallel Instruction Computing is actually #4 on it. Guess that particular instruction set needed to be run in parallel with the reading of the article. Hey! That means an Itanium processor could have handled it, right?

Guess the human brain is more like the AMD64 than the IA32 or IA64, huh?

I recall hearing a lot of hoopla over Itanium, but as it was a high-end offering and I was mostly dealing with home SOHO level equipment, I didn't pay a lot of attention to it. Then I heard the soft chirping of crickets until Xenon hit the scene, and I've seen AMD64 all over the place. Now I know why. Thanks for the history lesson I SHOULD have picked up as it was happening.

"This made AMD64 much more appealing to developers and enterprises alike. Intel scrambled to create its own extension to IA-32, but Microsoft—which already supported IA-32, IA-64, and AMD64—told the company that it wasn't willing to support a second 64-bit extension to x86, leaving Intel with little choice but to adopt AMD64 itself. It duly did so (albeit with some incompatibilities), under the name Intel64."

What are some of the incompatibilities between AMD64 and Intel64, this is the first I've heard of this and I'm genuinely curious?

Also, not that Itanium was widespread in its adoption, but it's always sad to see yet another architecture go. We're entering a world where we're going to be an ARM/x86(64) bi-culture as MIPS slips, POWER disappears, and RISC-V hasn't materialized yet in a meaningful way.

EM64T’s BSF and BSR instructions act differently when the source is 0 and the operand size is 32 bits. The processor sets the zero flag and leaves the upper 32 bits of the destination undefined.

AMD64 supports 3DNow! instructions. This includes prefetch with the opcode 0x0F 0x0D and PREFETCHW, which are useful for hiding memory latency.

EM64T lacks the ability to save and restore a reduced (and thus faster) version of the floating-point state (involving the FXSAVE and FXRSTOR instructions).

EM64T lacks some model-specific registers that are considered architectural to AMD64. These include SYSCFG, TOP_MEM, and TOP_MEM2.

EM64T supports microcode update as in 32-bit mode, whereas AMD64 processors use a different microcode update format and control MSRs.

EM64T’s CPUID instruction is very vendor-specific, as is normal for x86-style processors.

EM64T supports the MONITOR and MWAIT instructions, used by operating systems to better deal with Hyper-threading.

AMD64 systems allow the use of the AGP aperture as an IO-MMU. Operating systems can take advantage of this to let normal PCI devices DMA to memory above 4 GiB. EM64T systems require the use of bounce buffers, which are slower.

SYSCALL and SYSRET are also only supported in IA-32e mode (not in compatibility mode) on EM64T. SYSENTER and SYSEXIT are supported in both modes.

Near branches with the 0×66 (operand size) prefix behave differently. One type of CPU clears only the top 32 bits, while the other type clears the top 48 bits.

n 2008, HP agreed to pay Intel $440 million dollars over five years – between 2009 and 2014 – to keep producing the Itanium chips. Of course, HP would also have to pay for the cost of the processors it ordered. Then in 2010, the two companies signed another $250 million deal that would keep Itanium on life support through 2017.

That's mostly due to HP, as they have been more committed to Itanium than Intel has been for years. There was a point where Oracle announced they were ending support for Itanium due to low demand, and HP took them to court for $3 Billion (and won), and forced them to resume development and support for the platform.

Well, they cast off HP/PA and Alpha in favor of it, so all their eggs were in one basket. IIRC reading (from I think DriverGuru in the forums) that IA-64 supported certain "privilege" levels or rings required by VMS which didn't exist on x86/amd64.

Whether or not it was a practical idea, Itanium was an interesting experiment that tested a genuinely new approach to executing instructions. Who - if anyone - has done anything bold for general computing since then?

I was going to contradict you here, but when I thought about it, I realized that all of my examples effectively predated Itanium.

You're right. The world has essentially become awash in cheap x86/AMD64 architected servers, ARM core based boxes, and a handful of GPU style compute designs used in some kind of daisy-chain system. (And the latter isn't really "general purpose") There is some innovative custom silicone out there, but it's basically all special purpose. Sort of sad, really.

I've got fond memories of working with Superdomes back in the early to mid 2000's.

64 Itaniums and 1TB of memory in a single system image, in a box like two refrigerators side-by-side. With something like 32 2GB Fiber Channel links to a gaggle of EMC DMX3 disk arrays, it really ran the pants off Oracle.

At the time, I think the only other option around that could, maybe, handle the same concurrent Oracle load was an IBM mainframe, which we were trying to migrate away from at the time.

It was really fun working with those old beasts. Took a long time before collections of white-box hosts working together could achieve the same level of performance AND reliability.

Whether or not it was a practical idea, Itanium was an interesting experiment that tested a genuinely new approach to executing instructions. Who - if anyone - has done anything bold for general computing since then?

I wonder if HPE's "The Machine" is getting any closer to commercial use. That seems like their next big bet to keep the mainframe segment alive after Itanium winds down.

I feel old(er) now. I remember being in college when IA-64 was becoming a thing.

The path forward for computing was rather murky at the time. I remember hearing and being part of lots of passionate (and probably not entirely well-informed) arguments about CISC architectures running out of steam because Moore's Law was "hitting a wall", clock skew across the ever expanding die sizes was becoming "insurmountable" for the high gate counts CISC instructions required, and therefore super-fast, super-lean RISC was the future.

IA-64 seemed a lot more plausible back then. It was pretty clear that "do one thing, make it go faster" wasn't a long term success strategy. Somehow multi-threaded programming and parallelism was going to magically make everything okay.

This month's Communications of the ACM has a good article / Turing lecture by John L. Hennessy & David A. Patterson (the latest recipients of the Turing Award - for RISC). They describe the history of CPU architectures including the Itanium failure. They put it as "too difficult to write compilers to take full advantage of the architecture", but it is essentially the same reason.

Unlike some of the posts above, they think we're about to enter a golden age for computer architecture. Yes there are lots of challenges due to the 'laws' of Moore, Denning, and Amdahl; and changing needs (eg. power/temperature management in embedded systems), but these problems represent opportunities for innovation.

But for common integer workloads, Intel discovered a problem that compiler developers had been warning the company about all along: it's actually very hard to figure out all those dependencies and know which things can be done in parallel at compile time.

Yup, that was a lesson that is slowly being learned. It used to be thought that static analysis was the bees knees, since you had all the time in the world (relatively speaking) for analysis and high quality code generation. But dynamic optimization, such as that done by modern CPUs, as well as dynamic code modification like that in the Hotspot JVM, while having less time for analysis, can do things that a static compiler could never do, like doing not only speculative execution, but risky things like speculative inlining, which can require dynamic modification of the stack and/or running code, when speculation fails, which static analysis simply can't do.

Which strategy is better can vary depending on the situation, but for modern, object-oriented languages with dynamic dispatch and a trend towards highly factored code, the dynamic route wins more and more.

"This made AMD64 much more appealing to developers and enterprises alike. Intel scrambled to create its own extension to IA-32, but Microsoft—which already supported IA-32, IA-64, and AMD64—told the company that it wasn't willing to support a second 64-bit extension to x86, leaving Intel with little choice but to adopt AMD64 itself. It duly did so (albeit with some incompatibilities), under the name Intel64."

What are some of the incompatibilities between AMD64 and Intel64, this is the first I've heard of this and I'm genuinely curious?

Also, not that Itanium was widespread in its adoption, but it's always sad to see yet another architecture go. We're entering a world where we're going to be an ARM/x86(64) bi-culture as MIPS slips, POWER disappears, and RISC-V hasn't materialized yet in a meaningful way.