RISC-V: An Open Standard for SoCs

Just as Linux has become the standard OS for most computing devices, Berkeley researchers envision RISC-V becoming the standard ISA for all computing devices.

Systems-on-a-chip (SoCs), where the processors and caches are a small part of the chip, are becoming ubiquitous. Thus many more companies today are making chips that include processors than in the past. Given that the industry has been revolutionized by open standards and open-source software -- like TCP/IP and Linux -- why is one of the most important interfaces proprietary?

While instruction set architectures (ISAs) may be proprietary for historical or business reasons, there is no good technical reason for the lack of free, open ISAs.

It's not an error of omission. Companies with successful ISAs like ARM, IBM, Intel, and MIPS have patents on quirks of their ISAs, which prevent others from using them without licenses that academia and many small companies can't afford. Even IBM's OpenPower is an oxymoron; you must pay IBM to use its ISA.

An ARM license doesn't even let you design an ARM core; you just get to use its designs. (Only about 10 big companies have licenses that allow them to design custom versions of ARM cores.) While the business is sound, licenses stifle competition and innovation by stopping many from designing and sharing their ISA-compatible cores.

Nor is it because the companies do most of the software development. Despite the value of the software ecosystems that grow around popular ISAs, outsiders build almost all of their software.

Neither do companies exclusively have the experience needed to design a competent ISA. While it's a lot of work, many today can design ISAs.

Nor are the most popular ISAs wonderful ISAs. ARM and 80x86 aren't considered ISA exemplars.

Neither can only companies that design ISAs verify them. Long ago, open organizations developed mechanisms to ensure compatibility with hardware standards, such as floating point units (IEEE 754), networking chips and switches (Ethernet), and I/O buses (PCIe). If not for such organizations, open IT standards would not be so popular.

Finally, proprietary ISAs are not guaranteed to last. If a company dies, it takes its ISAs with it. Digital Equipment's demise also terminated the Alpha and VAX ISAs. Note that an ISA is really an interface specification, and not an implementation.

There are three types of ISA implementations:

Private closed-source, analogous to Apple iOS

Licensed open-source, like Wind River VxWorks

Free, open-source that users can change and share, like Linux

Proprietary ISAs in practice allow the first two types of cores, but you need a free, open ISA to enable all three.

We conclude that the industry would benefit from viable, freely open ISAs just as it has benefited from freely open versions of the software stack. For example, it would enable a real, free, open market of processor designs, which patents on ISA quirks prevent. This could lead to:

Greater innovation via free-market competition from many more designers, including open vs. proprietary implementations of the ISA.

Shared, open core designs, which would mean shorter time to market, lower cost due reuse, fewer errors given many more eyeballs, and transparency that would make it hard, for example, for government agencies to add secret trap doors.

Affordable processors for more devices, which would help expand the Internet of Things, whose target cost could be only $1.

RISC-V is not following the same worn path of past instruction sets, which traditionally grow in size over time, and then compilers need to figure out how to include new instructions every year or so. 80x86 has added on average on instruction per month over its 30+ year liifetime. They kind of track Moore's Law, just not that fast.

RISC-V has a well designed integer base that will never change (RVI). The optional compressed instructions (RVC) are handled by the assembler, since there is a 1-1 mapping of every 16-bit format to the equivalent 32-bit one.

As those who read the RISC-V manual can see, we recommend people target software for RVG, which is shorthand for the following optional extensions: IMAFD (Integer, Multiply, Atomic, Single Precision Fl Pt, Double Precision Fl Pt)

Many of the other optional extensions will be done in libraries (e.g., decimal floating point).

As the tables in the technical report show, there are surprisingly few instructions that need to be added when going from 32-bit addresses to 64-bit addresses to 128-bit addresses. Basically, all the registers just get wider.

RISC-V also has a very fast unimplemented instruciton trap to user mode as well as 16-bit and 32-bit jump and link instructions that can be used by a linker to replace any unimplemented instruction with a jump and link to library code that implements the missing instruction.

By having all this instruction planning laid out up front, we believe the complier issues are well in hand, and yet we can adapt to needs of the SoC by other leaving out what you don't need or adding extensions that you do need.

We need to prove this, but we thought about it carefully while designing RISC-V, and so we're aware of the implications of what we're doing.

Sorry that we did not have enough space to fully describe that particular experiment. The area numbers we pulled from ARM exclude floating point and NEON, see http://www.arm.com/products/processors/cortex-a/ cortex-a5.php. Also, the RISC-V core used had TLBs, branch prediction (BTB, BHT, and return address stack), and caches designed to match the Cortex-A5 memory hierarchy. We didn't have time to strip our core down to 32-bits to match the ARM core, so unfortunately we were handicapped with the full64-bit virtual address width in TLB/BTB/RAS, as well as in the integer regfile.

Variants of this RISC-V "Rocket" core have been fabricated multiple times in 45nm and 28nm processes, with the resulting chips booting Linux. Some variants run well over 1GHz. Some of the variants include full 64-bit IEEE-754/2008 vector floating-point units, with well over 10 GFLOPS/W energy efficiency running actual kernels (not just peak). Some of the variants are cache-coherent multicores. Other variants run below 0.5V with extremely high energy efficiency. One of our first chip publications will appear at ESSCIRC in September if you'd like to see more concrete details.

We look forward to seeing ARM publish SPEC numbers, or other open benchmark scores, for representative versions of their cores, so we can have fair and open comparisons.

To be clear, we believe ARM is a great company that has built a very productive ecosystem for SoC designers. However, there are significant markets that don't match ARM's business model and we would like to provide an alternative.

(It's fun to critcized again about RISC benchmarks; I'm feeling a wave of nostalgia. In fact, I'm playing Aerosmith in the background while writing this response.)

Rocket has caches, TLBs, FPUs, .. the whole nine yards. It boots Linux and we run SPEC2006 on it. And by the way, it has 64-bit addresses and datapaths. Its a real computer.

We picked Cortex-A5 (even though it's just a wimpy 32-biter) because it is a single issue in order pipeline like Rocket and because, as ARM currently says on its web site :

The ARM® Cortex®-A5 processor is the smallest, lowest cost and lowest power ARMv7

(http://www.arm.com/products/processors/cortex-a/cortex-a5.php)

Rocket is 1/2 the size, 1/2 the power, and 10% faster at the same GHz.

Should have we compared size and power to a larger ARM implementation?

We'd LOVE to get SPEC numbers for ARM. We asked our friends at ARM, and they said there are no such numbers available. We even asked them to reccommend a platform that we could do it ourselves, and they couldn't come up with one that would run SPEC2006.

Alas, only benchmark that runs on ARM that we can compare against is Dhrystone (!).

Hennessy and I dropped the pitfall about not running Dhrystone in the 3rd edition of Computer Archtecture: A Quantiative Approach because we thought Dhrystone was dead. Apparently, Dhrystone is the Dracula of bad benchmarks.

Until we can get our hand on something that runs full Linux on ARM, it's the best we can do, as we're anxious to show off RISC-V on real programs.

"Thanks in part to the open-source Chisel hardware design system, one 64-bit RISC-V core is half the area, half the power, and faster than a 32-bit ARM core with a similar pipeline made in the identical process."

This is hardly a valid comparison. The Cortex-A5 supports fast multiplies, DSP extensions, SIMD extensions, large TLBs and caches, branch prediction, compressed instructions, hardware Java execution, security extensions, interrupt control, multi-core etc etc. The base RISC-V ISA is more similar to a Cortex-M0 which is significantly smaller and more efficient than a Cortex-A5. But like RISC-V's basic ISA it is not suitable to run eg. Android.

It is easy to make a simple MIPS-like RISC ISA and a bare-bones CPU which appears to do well on Dhrystone. However that's hardly proof of anything. MIPS used to have various CPUs that showed that with 64-bit load/store and delayed branches you get amazing 32-bit Dhrystone scores from a simple pipeline - nice trick, just a shame that it didn't help nearly as much when running real code. Cortex-A5 actually runs Android pretty well, and I bet RISC-V with its very basic ISA won't be able to keep up.

Also this appears to be a comparison of a 5 year old widely used CPU with a simulation of an unfinished CPU. Let's compare when actual hardware is available - and instead of Dhrystone, compare with the 64-bit A53 running eg. SPEC2000.

Do you envision having variants of the architecture for different domains (like MIPS and ARM) or do you think we should move forward with a heterogeneous apporach (different ISAs/approachs for different tasks)?

btw. Love the fact that it runs on Zynq! We will give it a try on Parallella immediately.:-)

Thanks for the quick reply. I've used it as an excuse to go off and read the V2.0 ISA spec for RISC-V instead of doing the day job :-) I see that RISC-V does have multicore support in its memory model.

OpenRISC has a core instruction set, which is stable and then various extensions around that core. We have debated the number of extension sets - there are too many at present, meaning compilers need too many multilibs to use them efficiently.

As we have found with the OpenRISC GCC implementation, combinatorial explosion of multilibs may be a challenge for RISC-V. There are a 32-bit base, 64-bit base, 128-bit base and 10 standard extensions, so a compiler potentially needs 8192 multilib variants to efficiently support all possible combinations of ISA. It is possible that some extensions will have no impact on compiled code, but the number of multilibs will still be too high, so the compiler will need to restrict itself to likely popular combinations, degrading efficiency - 15-20 multilibs is a practical limit. Alternatively the user builds a compiler just for their specific architecture, but that still puts a demand on the compiler writer to be able to juggle all the possible options (consider for example how to optimally compile a * b for all C/C++ types with all possible combinations of optional ISA extensions).

BTW, OpenRISC made delayed branches optional a few years ago. All the recent implementations (e.g. Julius Baxter's mor1kx implementations) don't have delayed branches. The online version of the OpenRISC 1000 architecture spec should reflect this.

I hope RISC-V is successful, but I would rather it focussed on forward-looking innovation in ISA development, instead of industry standardization, which is innevitably backwards looking.