What are the technical reasons behind the “Itanium fiasco”, if any? [closed]

https://stackoverflow.com/questions/1011760

06-07-2019
|

Question

In this article Jonh Dvorak calls Itanium "one of the great fiascos of the last 50 years". While he describes the over-optimistic market expectations and the dramatic financial outcome of the idea, he doesn't go into the technical details of this epic fail. I had my chance to work with Itanium for some period of time and I personally loved its architecture, it was so clear and simple and straightforward in comparison to the modern x86 processors architecture...

So then what are/were the technical reasons of its failure? Under-performance? Incompatibility with x86 code? Complexity of compilers? Why did this "Itanic" sink?

Itanium processor block

Solution

Itanium failed because VLIW for today's workloads is simply an awful idea.

Donald Knuth, a widely respected computer scientist, said in a 2008 interview that "the "Itanium" approach [was] supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write."¹

That pretty much nails the problem.

For scientific computation, where you get at least a few dozens of instructions per basic block, VLIW probably works fine. There's enough instructions there to create good bundles. For more modern workloads, where oftentimes you get about 6-7 instructions per basic block, it simply doesn't (that's the average, IIRC, for SPEC2000). The compiler simply can't find independent instructions to put in the bundles.

Modern x86 processors, with the exception of Intel Atom (pre Silvermont) and I believe AMD E-3**/4**, are all out-of-order processors. They maintain a dynamic instruction window of roughly 100 instructions, and within that window they execute instructions whenever their inputs become ready. If multiple instructions are ready to go and they don't compete for resources, they go together in the same cycle.

So how is this different from VLIW? The first key difference between VLIW and out-of-order is that the the out-of-order processor can choose instructions from different basic blocks to execute at the same time. Those instructions are executed speculatively anyway (based on branch prediction, primarily). The second key difference is that out-of-order processors determine these schedules dynamically (i.e., each dynamic instruction is scheduled independently; the VLIW compiler operates on static instructions).

The third key difference is that implementations of out-of-order processors can be as wide as wanted, without changing the instruction set (Intel Core has 5 execution ports, other processors have 4, etc). VLIW machines can and do execute multiple bundles at once (if they don't conflict). For example, early Itanium CPUs execute up to 2 VLIW bundles per clock cycle, 6 instructions, with later designs (2011's Poulson and later) running up to 4 bundles = 12 instructions per clock, with SMT to take those instructions from multiple threads. In that respect, real Itanium hardware is like a traditional in-order superscalar design (like P5 Pentium or Atom), but with more / better ways for the compiler to expose instruction-level parallelism to the hardware (in theory, if it can find enough, which is the problem).

Performance-wise with similar specs (caches, cores, etc) they just beat the crap out of Itanium.

So why would one buy an Itanium now? Well, the only reason really is HP-UX. If you want to run HP-UX, that's the way to do it...

Many compiler writers don't see it this way - they always liked the fact that Itanium gives them more to do, puts them back in control, etc. But they won't admit how miserably it failed.

Footnote 1:

This was part of a response about the value of multi-core processors. Knuth was saying parallel processing is hard to take advantage of; finding and exposing fine-grained instruction-level parallelism (and explicit speculation: EPIC) at compile time for a VLIW is also a hard problem, and somewhat related to finding coarse-grained parallelism to split a sequential program or function into multiple threads to automatically take advantage of multiple cores.

11 years later he's still basically right: per-thread performance is still very important for most non-server software, and something that CPU vendors focus on because many cores is no substitute.

OTHER TIPS

Simple. It wasn't x86 compatible. That's why x86_64 chips are.

Itanium designed rested on the philosophy of very wide instruction level parallelism to scale performance of a processor when clock frequency limit is imposed due to thermal constraints.

But AMD Opteron DISRUPTED Itanium adoption by PROLIFERATING x86_64 cores to achieve scalable performance and also being compatible with 32bit x86 binaries.

Itanium servers are 10x expensive than x86 for similar processor count.

All these above factors slowed adoption of Itanium servers for the mainstream market. Itanium's main market now is a mission critical enterprise computing which is a good $10B+/year market dominated only by HP, IBM and Sun.

It was very hard to write code generators for; and it didn't have much reasons to succeed in the first place (It was made by Intel, so what?).

I've heard some JITs gave worse perfomance than interpreters on Itanium because gcc optimized interpreter better; that's a no-go if a processor requires that level of optimizations.

Non-mainstream RISCs are losing grounds; They didn't see that or hoped it would become mainstream; too bad it wouldn't because there weren't any reasons for that.

I read that article, and I'm completely missing the "fiasco" he refers to. As he mentions near the end, at the mere sight of Itanium, "one promising project after another was dropped". MIPS, Alpha, PA-RISC -- gone. Sun has cancelled their last two big Sparc projects, though it wasn't exactly a big seller even before those. PowerPC is only surviving in the embedded space.

How is Intel killing off all the competition, using a single product line, anything but the greatest microprocessor victory of all time? I'm sure they weren't smart enough to have anticipated this, but even if they knew it would fail, throwing a few $billion at a feint worked wonderfully. Apparently they could afford it, and everybody else just dropped dead.

On the desktop, in the server room, and even in supercomputers (87% of the top-500 list), it's x86-compatible as far as the eye can see. If that's the result of an Intel "fiasco", then what words are left for the processors that didn't make it?

Intel and Itanium, in my book, ranks up there with Microsoft and MS-DOS: despite how lousy it may have been technically, it enabled them to utterly dominate the industry.

EDIT: And Itanium had x86 compatibility from day 1, so that's not it. It was slow, but it was there.

I think Itanium still has its market - high end systems and HP blade servers. Performance is still much higher compared to x86. I'm not sure why would some one call it a failure when it is generating billions of $ for HP (although it is not just the processor; it is itanium server sales that is generating revenue).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow