In this blog, you can read the 'aha' moments in my rendezvous with computers, computing, and electronics, HOWTOs and whys that you could not get an answer from your textbook and then once in a while, a strange mixture of global economics.

Wednesday, July 15, 2009

The argument between CISC architecture and RISC architecture is longstanding. For compiler designers, RISC is a little burden since the same C code will translate to nearly five times more lines of RISC assembly code compared to x86 assembly code. But from pure academic point of view, it is easy to see that RISC wins the argument because of several of its advantages. RISC instruction set is very small, for which it is easy to optimize the hardware. Simple instructions running in a single clock cycle is a typical characteristic of RISC that permits aggressive pipelined parallelism. RISC invests more area on registers (using a technique called register windowing), allowing easy out-of-order execution. OOO and pipelining are possible in CISC, but a little clumsy.

One reason that RISC cannot win despite all these advantages is Intel. Microsoft too is one of the major reasons because during the PC revolution, Win 95 had no support for RISC processors. But Intel with its CISC based x86 based architecture blocked all the avenues in general purpose computing for RISC processors. RISC has a good presence in embedded processing however, because of its low power, high real-time, small area advantages.

Two years ago I tried to investigate why Intel did not change its x86 core to a RISC. The findings were astounding, but then I did not have time to write it down in a blog like this. Better late than never. After the success with CISC based CPUs, in 1990 Intel entered the RISC zone with the introduction of i960. The i960 architecture however mainly targeted the embedded systems domain and not the general purpose computer understandably due to the lack of software support.

In general computing domain, Intel Pentium employed two staged pipeline for its IA-32 instructions. The presence of variable length instructions obligated an inherent sequential execution because every execution cycle involved identifying the length of the instruction. As a result, new instruction can begin anywhere with the set of instructions that the processor fetches. As the world was moving towards parallel programming, the only advantage that CISC enjoyed was the software support which might die down soon.

Sometimes when you think that you know where things are heading, there will be a ground breaking invention that would change the entire scenario. One such seminal invental in the form of the introduction of high performance substrate (HPS) by the famous microarchitecture guru, Yale Patt. Although I am tempted to explain HPS in detail, I would rather consider it to be out of the scope of this blogpost. A very simple (not necessarily accurate) description would be that Patt succeeded in converting the CISC instruction to multiple RISC-like instructions or micro-ops.

Intel demonstrated its fast finger by implementing this in its P6 architecture. As any successful, innovative company, Intel is always good at adapting to the new wave. It did it by jumping from its memory business to microprocessor back in eighties and now it did it again by using HPS. Intel’s first IA-32-to-micro-op decoder featured in Pentium Pro. P6 architecture contained three parallel decoders to simultaneously decode the CISC instructions to micro-ops resulting in a deeply pipelined execution (see figure). Sometimes this instruction decoding hardware can become extremely complex. But as the feature size reduced at very fast rate, Intel did not face any significant performance issue with this approach.

Now we are into the post-RISC era, where processors have the advantages of both RISC and CISC architecture. The gap between RISC and CISC has blurred significantly, thanks to the scale of integration possible today and the increased importance of parallelism. Trying to jot down the difference between the two is no longer relevant. Intel’s Pentium Core 2 Duo processor can execute more than one CISC instruction per clock cycle due to increased processing speed. This speed advantage would enable CISC instructions to be pipelined. On the other hand, RISC instructions are also becoming complex (CISC-like) to take advantage of increased processing speed. RISC processors also use complicated hardware for superscalar execution. So at present, classifying a processor as RISC or CISC is almost impossible, because their instructions sets all look similar.

Intel remained in the CISC even when the whole world went towards RISC and it enjoyed the advantage of software support. When the situation started favoring RISC in the advent of parallel processing, Intel used micro-op convertors to exploit the pipelining advantages of RISC. The current Intel processors have a highly advanced micro-op generator and an intricate hardware to execute complex instructions in a single cycle – a powerful CISC-RISC combination.

Just started computer science? You may need to complete your microprocessor course or computer architecture course to understand this. I did not want to dig further because this particular post covers a wide range of architectural paradigms. I don't have space or time for getting into detail.

Great analysis! Modern day Intel CPUs are actually running a RISC processor. They have two portions, the frontend and backend. The frontend takes the CISC instructions and converts them to a series of RISC instructions. The backend performs out-of-order execution of the RISC instructions. The frontend and backend are not necessarily running at the same clock speed! This less-efficient method is the main reason Intel is having such a hard time getting Atom processors to compete with high-performance ARM cores. ARM doesn't have this horrible frontend issue to deal with. The frontend is a terrible energy hog.