RISC vs CISC: What's the Difference?

A new study comparing the Intel X86, the ARM and MIPS CPUs finds that microarchitecture is more important than instruction set architecture, RISC or CISC.

If you are one of the few hardware or software developers out there who still think that instruction set architectures, reduced (RISC) or complex (CISC), have any significant effect on the power, energy or performance of your processor-based designs, forget it.

Ain't true. What is more important is the processor microarchitecture — the way those instructions are hardwired into the processor and what has been added to help them achieve a specific goal.

Sankaralingam, associate professor, Computer Sciences, Electrical and Computer Engineering at UMM, said the study is the most comprehensive analysis to date on all aspects of the design and implementation of three major architectures: Intel's x86, the ARM architecture, and Imagination Technology's MIPS CPU.

"While there may have been differences in the past between RISC and CISC ISAs in current architectures, there certainly aren't now in terms of the parameters we focused on: performance, power, and energy,” Sankaralingam told EE Times. “Where the ISA is lacking, the microarchitecture is enhanced to make up for it, and vice versa."

He said that that there is only one true RISC architecture out there, Imagination's MIPS, which is based on the architecture developed by researchers from Stanford University. The X86 was originally a pure CISC design, but over the years has taken on a much more RISC-like structure while ARM's approximately RISC architecture has taken on more CISC features, including the addition of the Thumb 1 and Thumb 2 ISAs.

"So basically what it comes down to is comparing today's implementations of the processors from Intel, ARM, and Imagination in today's market environment. And by almost every measure we used, even there ISA is irrelevant."

Because previous studies were handicapped by comparisons between systems with varying hardware and software resources, Sankaralingam said the VRG team worked hard to make sure measurements were made on roughly equivalent platforms and in roughly comparable environments. To separate out implementation and ISA effects, where possible they used multiple chips for each ISA with similar microarchitectures.

This study confined its comparisons to those implementations of the Cortex-A8 or higher, with little focus on any of the Cortex-M devices. "The reason for this is simple: one of our goals was to have platforms we could compare and quantify," he said. "There is no way to do that below the A9, in terms of competitive architectures." In the Cortex-M0 environment, where ARM is competing with 8-bit MCUs over the 1-20 MHz and 2-50 mWatts range, the operating overheads of the X86 instruction set makes that untenable.

The team's evaluations were performed on one MIPS implementation (Loongson), three ARM platforms (Cortex- A8, Cortex-A9, and Cortex-A15), and three x86-based designs (Atom, Bobcat, and Sandybridge i7). They also used the same operating system, Linux 2.6 LTS (but 2.8 on the A5) and the same gss 4.4-based cross compiler front end. For mobile client workloads, they used the CoreMark and Webkit benchmarks. For desktop apps, SPECCPU2006 was used, while server benchmark workloads used included lightpd and CLucerne.

The implementations span diverse ISAs and within each ISA, diverse microarchitectures. "Overall, our choice of platforms provides a reasonably equal footing, and we performed detailed analysis to isolate out microarchitecture and technology effects, "said Sankaralingam. The VRG team did performance comparisons of the processors in terms of execution time, cycle count, instruction count, instruction format and mix, microarchitecture and ISA influence on microarchitecture. Power and energy analysis measurements were also extensive: average power, average technology independent power, and average energy, among others.

According to Sankaralingam, the takeaway on this report is that although ISA is relevant to power and performance by virtue of support for various specializations (virtualization, accelerators, floating point arithmetic, etc.), whether the ISA is RISC or CISC is largely irrelevant for today’s mature microprocessor design world.

"Based on this study, developers can safely consider ARM, MIPS, and x86 processors simply as engineering design points optimized for different levels of performance," he said. “There's nothing fundamentally more energy-efficient in one ISA class versus another."

In the concluding paragraph of the report, the authors write: "It appears that decades of hardware and compiler research has enabled efficient handling of both RISC and CISC ISAs, and both are equally positioned for the coming years of energy-constrained innovation."

While Sankaralingam thinks the VRG team developed an exhaustive and rigorous methodology for its study he said “there are many ways to analyze the data.” So for those interested in doing their own analysis, the raw data for the study can be downloaded from the University of Wisconsin VTG web page.

You can also download “ISA Wars: Understanding the Relevance of ISA being CISC or RISC” at the Association of Computing Machinery web site. Even if you are not a member of the Association of Computing Machinery, and so have to pay a $15 fee to download the paper, it is worth the one-time price.

Join over 2,000 technical professionals and embedded systems hardware, software, and firmware developers at ESC Silicon Valley July 20-22, 2015 and learn about the latest techniques and tips for reducing time, cost, and complexity in the embedded development process.

Even though the researchers went to significant effort to limit confounding factors in that previous paper, one is inherently limited by the availability of hardware and software. For example, it is known that gcc has been improving its ARM support, so even using the same compiler version will not result in a perfectly fair comparison. The use of 32-bit x86 noticeably reduces performance compared to using 32-bit pointers with x86-64 (the extra registers have a significant impact on performance, perhaps c. 10%).

Aside from the process technology differences and higher-level core microarchitectural differences, there are also differences in non-core microarchitecture and in low-level optimization. ARM cores often pay an abstraction penalty both in the use of simple synthesis and in a more generic interface to the rest of the chip. The measurement of system power further complicates comparisons both from the variability of implementation quality and the diminished impact of the core. (The latter is a fact which inherently, and rightly, diminishes the effect of ISA.) The fact that ISA influences microarchitecture increases the difficulty of comparison.

While the conclusion that process technology and microarchitecture are the dominant factors in higher performance designs is firm, the implication some may take that ISA doesn't matter at all is problematic.

Such studies also do not fully address the effect of ISA on performance or energy efficiency since they are limited to specific architectures. What information should be communicated to the hardware and how to encode that information are still, I believe, open questions, in part because the answers vary with design targets (including time to market). For some targets a better answer is VLIW, not RISC or CISC.

The new paper presumably has significant refinements, but the measurement difficulties are likely to be persistent. In some sense, these questions do not matter since end-users choose existing hardware and most hardware developers either choose existing designs or work for a company with existing ISA commitments.

fab76: I agree that functional verification is hugely expensive. From my experience over many different large ASICs, it pretty much matches the design costs. That is about a 1:1 ratio of design-to-verification engineers where the division of labor is clearly on the design/verification interface. This seems to apply whether it's CISC, RISC, or router, etc.

All that said, these are NREs. In a large production run, the cost-per-chip is still less than the package and, if managed well, the effort can be amortized over several projects.

For x86, validation is largely a sunk cost (for existing x86 implementers). In addition, one can design a much cleaner CISC (variable length instructions, 16 GPRs, memory+op instructions, memcopy, et al.) than x86. ISAs tend to accumulate cruft, so it should not be surprising that x86 is on the crufty side. (I will not claim that the ISA was well managed, but some degree of cruft would have been difficult to avoid given the different tradeoffs in different markets with different manufacturing technologies.)

In addition, once one gets to the high end of performance, the complexity is substantially contained within the front-end and other complexity factors tend to dominate. It should also be recognized that only certain features need to be made performant, which allows significant simplification of design. This is a similar issue of previously paid costs; the techniques to make a high performance x86 had significant research and development costs but can be reused wihtout additional cost.

x86 is not a great example of a modern CISC (Renesas RX is much cleaner, though the design is more focused on code density for microcontrollers), but the costs for Intel et al. are not that extreme.

I think the main take away was actually stated in the article: The ISA *does* matter to performance, but other aspects of the design matter more, so you can compensate for a poorer ISA choice and end up with a decently performing product. Comparisons were in the upper echelon of performance/power, while area and design cost were not factors in the study.

The primary way I see the article as potentially misleading is the study found chips made from both CISC and RISC vendors had comparable total performance, so you are safe using either. It did not determine the performance impact of the ISA (other than on CPI), so it's hard to say which will scale farther in the future.

It's like the Porsche analogy: Porsches are one of the best handling cars in the world, despite the common wisdom that you shouldn't put the engine in the rear. The fact that Porsche is a master of engineering and compensated for this doesn't mean we should all go out and make rear engine sports cars. :) But it also means that you shouldn't avoid buying a Porsche because it has uneven weight distribution, either. Cost would be a bigger reason. :)

"There is no way to do that below the A9, in terms of competitive architectures." In the Cortex-M0 environment, where ARM is competing with 8-bit MCUs over the 1-20 MHz and 2-50 mWatts range, the operating overheads of the X86 instruction set makes that untenable.

So the study defines "comparable" as "undifferentiated with respect to certain parameters" and then concludes that the subjects are, well, undifferentiated with respect to those parameters?

Tell me: am I correct in interpreting this as saying that in order to make an apples-to-apples comparison, they left out low-power/mobile because ... what? x86 isn't a "competitive architecture" there?

I never understood the RISC zealots. For all their fervor, you would have expected 'RISC' ISAs to have had a significant commercial advantage, but of course, they didn't. I never really understood how the PowerPC's rich and expressive assembly language was RISC, and ARM with predicated instructions, then Thumb and NEON was RISC, but the original 8086 was CISC. Doesn't a _reduced_ instruction set imply fewer instructions?

The best definition I heard for RISC was:

Any processor or ISA produced or defined after 1989. [When RISC architectures captured the imagination of many]

In my view, this definition accurately captures the absurdity of the debate - which this report hopefully puts to rest. Processors should be measured and characterized by their actual performance.

Jason - You may know RISC WAS originally reduced instruction architecture. Early RISC were load/store register machine, single-word instruction code, fixed operand bit coding, one instruction / one operation / in one clock. Often little or no embedded immediate operand in instruction code, often 3-operand and with zero-register. SPARC and MIPS are typical early RISC architecture.

Over time, those "pure RISC" architectures are becoming things of the past. Todays ARM CORTEX handles multi-word instruction code which can handle pretty long fixed operand, single instruction code can initiate rather complex series of sequence - for example, single "mov pc, lr" works as return from interrupt: pop several registers from stack then switch processor context. I agree with you it that, is hardly be "Reduced, simplified, one instruction / one operation" architecture, in the context of "RISC" back in 90's.

Perhaps, the only characteristics today's RISC architecture inheriteted from early days is load-store architecture. You have to mv r2, 4 / ld r1, [r3] / add r1, r2 then st r1, [r3]. You cannot "add dword ptr [di], 4" in syntax of x86. I don't think it is fundemental difference between RISC vs CISC argument - just matter of flavor.

Load/store architecture is generally considered the hallmark of a RISC, as is simple addressing modes, simple decode with few instruction formats, plenty of registers to reduce memory traffic, and simple instructions that can be executed in a single cycle. Having

Many RISCs do indeed support more complex instructions like multiply and divide which can take multiple cycles, and several support load/store multiple (or at least 2 registers). The return-from-interrupt example is something that is complex on every CPU, on Cortex-M transistors were saved by popping registers from memory rather than increasing the register file. This is not really complex when you already support load/store multiple instructions.

I agree being purist is bad, and that goes both for RISC and CISC. Very CISCy architectures have all died (x86 is one of the least CISCy, it's very lucky in that compilers can completely avoid all the complex microcoded instructions, even load+operate and complex addressing modes are rarely used). Similarly very pure RISCs have not been successful.

Jason see my other post for the accepted definition of RISC. RISC ISAs certainly turned out to have a significant commercial advantage - just compare x86 with ARM volume: ~300 million vs 12 billion per year. You can also compare x86 CPUs with similarly performing ARM ones, which do you think turns out to be significantly larger, more complex and more expensive?

So I don't believe this report is in any way conclusive - they would have to prove 2 equally capable teams could develop RISC/CISC CPUs at similar cost/area/power/performance. The CPUs actually on the market clearly prove this is impossible. The fact is that Intel has pumped ~$10 billion into promoting x86 in phones and still has zero market share. So claiming the debate is over is wishful thinking at best.

The Data General Nova was a RISC machine years before the term RISC was invented.

When using a MIPS processor, be sure your startup code is in read-only memory, as some MIPS processors perform random writes when the caches are initialized.

My understanding, is that the early RISC developers bet that speed would be cheaper than complexity as fabrication processes improved, and that RAM would remain faster than processors. Instead, complexity turned out to be cheaper than speed, and processors became faster than RAM.

I'd say the 6502 bet on memory speed (it really was faster back then), not modern RISC (maybe you could class the 6502 and PIC are pre-modern RISC?)

RISC did introduce a number of approaches that were then used by other CPU makers (some of this was probably done by the mini/mainframe guys) such as optimizing instruction set for the compiler (e.g. regular instructions are much easier for the compiler; x86 has some odd-ball instructions that are probably never used by any compiler), analyzing actual programs to figure out what instructions programs actually use, designing ISA for pipeling, and large, regular register sets (many early ISAs have all sorts of limits on what register can be used to do what).

Two of the main RISC guys are still working, although in different roles: David Patterson (Berkely, inspired SPARC) is still there, working on the RISC-V (which EETimes covered) and John Hennesey (Stanford, MIPS) is President of Stanford University.

I believe the x64 architecture is much more RISC-like than the x86, e.g. more registers and such.

On the MCU side, the non-RISC processors would be ISA's like 68000 (pretty much replaced by Coldfire and ARM at Freescale), Renesas RX (and earlier, including legacy Hitachi, NEC, etc ISA's), 8051, etc.

TonyTib, you are correct, the earliest processors were slower than memory. I was thinking of the 32-bit RISC processors of the 1990s, as on-chip signals started to become faster than off-chip outputs.

In the mid-1990s, I took a graduate class in RISC processors, using Hennessey and Patterson's text, "Computer Architecture, a Quantative Approach", if I recall correctly. Each chapter stated and developed a concept for making processors faster, and concluded with a postscript of famous cases where the concept failed to speed things up. One of the appendices in the back of the book apologetically covered the x86, criticizing the x86's nonorthogonality but noting the x86's sales volume was so large as to be elegant in its own way.

In a talk that I heard John Cocke give in the mid-1980s, he said that his goal with 801 and his definition of RISC was an ISA designed to make it easier for the compiler to generate good code, while a CISC is an ISA designed to make it easier for a human to write assembler. He noted that 801 was not really a RISC architecture in terms of instruction count. He preferred to talk about a streamlined instruction set. The apex of CISC was reached with VAX, and a lot of the VMS operating system had hand-written assembler. I used to point out that the VAX presented compilers with a difficult "reverse semantic gap" in that sometimes a whole procedure's worth of C/C++ code could be compiled into one instruction, but compilers were not clever enough to do so.

In terms of hardware, a CISC obviously has a larger overhead in instruction fetch and decode, but that is a small fraction of a processor. A bigger impact of CISC is the smaller register set that requires more memory accesses, which in some ISAs means more instructions. Bill Wulf once said that a big headache for compiler writers is the plethora of CISC addressing modes, and the difficulty in having the compiler efficiently use them all. Compilers didn't use them all, which is why most were dropped in RISC.

@HankWalker: In a talk that I heard John Cocke give in the mid-1980s, he said that his goal with 801 and his definition of RISC was an ISA designed to make it easier for the compiler to generate good code, while a CISC is an ISA designed to make it easier for a human to write assembler.

That accords with my memories. I was watching when RISC CPUs were the new hotness. The VAX architecture was held up as the exemplar of CISC, with a "super instruction set". The problem was that most of those instructiones were never used. Increasingly, programmers didn't write in assembler. They wrote in a high level language like C or Pascal, and the compilers didn't generate code that used all of those fancy instructions.

So designers said "Why have them? Most of those high level instructions can be implemented as combinations of simpler ones, so let's design CPUs with only the basic simple instructions, concentrate on making them run as fast as possible, and let the compiler do the heavy lifting and optimization."

The results included the DEC Alpha, Sun SPARC, and HP PA-RISC architecture among others.

CISC won, but the reasons I could see had nothing to do with performance and everything to do with cost. DEC was already in trouble when the Alpha was released: the market for the VAX was eroding rapidly under the pressures of super-micros based on off the shelf MC680X0 CPUs runing flavors of Unix, that could do what a VAX did almost as fast at a tenth of the price. DEC tried to ramp up production and sale of Alpha based workstations, but couldn't do so quickly enough to stem the bleeding. DEC competitor Data General had the same sort of woes with its RISC entry.

HP shifted from PS-RISC to x86 because they could get the performance required from off the shelf chips that were well understood with a substantial eco-system and highly developed toolchain for creating software. They didn't have to spend money on design, manufacture, and updates. Sun stuck by SPARC, but hedged its bets with a line of Opteron based x86 architecture models. Using x86 was simply cheaper. Performance might not have been at RISC levels, but it was good enough. The advantage from using RISC wasn't pronounced enough to justify the higher cost. The decisions were ultimately economic, not technical.

(And I was grimly amused at one point. AMD had a RISC processor called the 29000 back then. They came out with a new x86 compatible CPU, and from what I could see, it used the 29000 RISC core. x86 instructions in code were intercepted and converted on the fly to the underlying instructions the 29000 actually executed.)

ARM is winning in the mobile space because of lower power consumption, but ARM has the advantage of being fabless. Lots of folks license ARM designs and make ships based on them, and the market is large enough that Intel x86 chips don't have a cost advantage. OEMs making products using them can buy them off the shelf, and don't have the overhead of design and manufacture.

I think CISC vs RISC frames the question in the wrong terms. It's about the money, and the question is what the cheapest solution is that will do the job.