In technology there is a relentless drive for Faster, Better, Cheaper. Of the three we often get to pick only two. Then Intel revealed the Itanium and told the world to pick NONE. This is due to a deep, fundamental design flaw in the entire architecture as a whole, but most people don't completely understand what it is, why it is important, and how Intel managed to get it so, so wrong.
Today on Education Yourself, we will be looking at the mechanism that makes processors as fast as they are, and how Itanium messed it up so badly that it is now the laughing stock of CPU design.
EDUCATION
YOURSELF
Part 1: Doing More At Once
The typical CPU workflow for the longest time was to first fetch the instruction and its needed operands, then process said instruction and operands. During this time, there are periods where the part of the CPU that does the fetching just sits around doing nothing, and the rest of the CPU twiddles its thumbs while the next instruction is getting fetched. Seems rather wasteful, doesn't it? Well, there is a way to solve this, and it is to have the next instruction and operands fetched while the current instruction is being worked on. This very crude and simplified example is what is known as "pipelining". It eliminates the waiting times, meaning more instructions get processed faster, leading to a faster CPU.
But there is one issue with this. There is a kind of instruction called a "branch" instruction, and it is what makes a computer a computer, and not a really big calculator. Branching allows the direction of program flow to change based on the results of previous instructions. For example, a program could do some math on some input numbers and output the result, and if the result is 69, it would also print out "Very Nice". In assembly, this would be done by comparing the result with 69, and if they are NOT equal, the program would branch to go around the instructions that print "Very Nice". It is branching instructions like this that allow for computer software to adapt to the inputs given to it and perform all the critical tasks they do today.
So how does the CPU handle pre-fetching the next instruction if the current instruction is to make a branching choice? The correct path is currently unknown... but it can only be one of two possible choices. So in this position, the CPU will make an educated guess as to what the next instruction most likely is, and pre-fetch that. This is known as "branch prediction". If it picks the correct path, everything continues along at normal speed. If it picks the wrong path, then the CPU has to flush the pipeline of its contents (because they are incorrect) and then restart at where it is supposed to be. This slows things down for a mere moment, but if it happens a lot in succession, then the user will notice this as the CPU is running slower than normal. The larger the pipeline (that is, how many instructions are pre-fetched), the bigger the slowdown.
Now if you're asking yourself "But Techo, couldn't this pre-fetching system be abused to make the CPU read data it shouldn't?", then CONGRATULATIONS you've just rediscovered the basis of Meltdown and Spectre, which do just this to make a CPU blurt out things it shouldn't, like encryption keys and your banking password. Newer CPUs were designed smarter to (hopefully!) not allow these kinds of exploits, but existing CPUs instead use mitigation patches that are part of OS updates and motherboard firmware updates. However, these fixes tend to solve these very serious security issues by disabling parts of branch prediction, making CPUs run a bit slower. As a result, people actively try to bypass and remove these mitigations in the name of MOAR SPEED. I can't make this shit up. People - namely hardcore Linux users - intentionally make their computers less secure to make them faster.
So, how does this relate to Itanium?
Part 2: Attempting To Replace The x86
In 1989, at Hewlett Packard, engineers were working on a new processor to replace their existing PA-RISC line. The goal was to have a CPU that could execute multiple instructions in parallel to get insane performance. This type of architecture is called Explicitly Parallel Instruction Computing, or EPIC for short. With software properly written to take advantage of parallelism, this type of architecture could outperform RISC. Meanwhile, Intel had launched its i860 RISC processor line, which did very meh in the market. In 1993, HP approached Intel and offered a collaborative partnership to make their EPIC processor, then called PA-WideWord, as a market-dominating beast. Intel was so impressed, thinking it could kill both x86 AND PowerPC, that they ultimately canned what would have been their next x86 CPU, P7, and focus on adopting HP's designs instead.
When it came time to finally start making the chip, they ran into lots of issues, like the processor just being too damn huge. It was originally going to have an x86 core for backwards compatability, but that was completely removed just to make things fit. They also had to nerf various subsystems of the processor, and shrink the on-chip cache. They also had to wait for a newer lithographic process node in order to make it a reasonable size, and found the speed of the processor to be very fragile, with even the slightest tweak messing up the timing. It took until July of 1999 for the processor to be taped out, and actual complete chips were finally made in August of that year.
And yet, Intel told people to... not use this chip? HP was working on a major overhaul called "McKinley" that would perform better, and everyone was told to wait for that for actual production use, and only use the first generation chips for development, testing, and debugging. This created the "wait for McKinley" narrative, and meant that actual production Itanium processors wouldn't be out until 2001 at the earliest. Well, 2001 was when the first generation chips from Intel were released, and the "McKinley" chips didn't launch until late 2002. It wasn't long until AMD dethroned it in 2003 with the Opteron and their take on a 64-bit version of x86, which Intel... quickly copied for the next generation of Xeon processors, launched in 2004.
HP continued to feed money to Intel to try and keep Itanium alive with newer revisions based on upgraded process nodes and using multi-core designs, but by 2017 both HP and Intel finally gave up and retired the thing.
Part 3: Itanium's Flaws
So, this processor was plagued with a nightmarish design phase, a delayed launch, a show of zero confidence from half the people responsible for it, and quickly being sidelined by AMD making x86 but 64-bit. But there is more to it! Those "McKinley" Itaniums had a design flaw that made internal circuit paths unstable and cause system crashes, with the fix being to underclock the CPU. The actual architecture itself is crazy complicated, yet suffers from one key weakness. Remember earlier how we discussed branch prediction? Well, the solution that was used in the Itanium is to... make it the compiler's problem.
Gee. I sure hope the compiler isn't complete dogshit!
Part 4: The Compiler Is Complete Dogshit
Oh sweet Jebus, this is a disasterpiece. The official compiler for Itanium is the most sorry excuse of a compiler ever made. It is very bloated, slow, and complicated. It generates large, unoptimized binaries that don't perform that much better than the same code compiled on x86! In fact, back with the i860, that processor had special capabilities for parallelism and floating point handling, but the compiler didn't even USE these capabilities. And they did it again here with Itanium! Instructions are not properly bundled to take advantage of the parallelism of the architecture, and special features are just flat-out ignored. While there were other compilers that came out later, they didn't stick around for long, as people began to back away from the platform.
Part 5: Closing Thoughts
Wow. This was a CPU that had a lot of promise, but instead suffered numerous technical issues, massive delays, shunned by half of its family, and was never able to properly take advantage of all that it could do. This is just sad.
NEXT TIME ON EDUCATION YOURSELF:
Let's look at an older failure of a computer! And this time there's no security angle to it... I hope!
