ARM’s Race to Embedded World Domination

From a Tiny Acorn Grew a Mighty Business

The computer and semiconductor industries are so PC-centric that if you ask most people what was the best selling 32-bit microprocessor family last year was they would probably say x86. But in terms of unit volumes the x86 was outsold by about 20% by various versions of a certain RISC processor architecture. This processor family, licensed by virtually every semiconductor company on the planet including Intel, sold a little over 150 million processors in 1999 [1]. Some of these were in the familiar form of standalone microprocessors, while many more were buried within larger special purpose chips.

Despite far exceeding all rivals in growth rate by more than tripling 1998 sales, this processor is not an overnight success story, and in fact has a long and colorful history. Even its acronym-based name, ‘ARM’, has changed meaning over the years. It started out as the Acorn RISC Machine, named after a British company, Acorn Computers Ltd. In the early 1980’s Acorn was looking to replace the 6502 processor in its line of personal computers, which was rapidly running out of steam (not to mention address space). Acorn briefly considered the Motorola 68000 but rejected on the ground that its inclusion of long running uninterruptible instructions, like divide DIVS, would have more than doubled its interrupt latency compared to the 6502. This meant that expensive direct memory access (DMA) hardware would have been needed to support fast input/output operations.

To Acorn’s visionary and daring engineers, the answer was to exploit a brand new way of designing high performance processors. Called Reduced Instruction Set Computing, or RISC, it was simple enough that even a class of electrical engineering graduate students could create a competitive 32-bit processor as a term project (See RISC vs. CISC Still Matters). A group of Acorn engineers, including Robert Heaton, Stephen Furber, and Jamie Urquhart, created a behavioral model of the new processor architecture, and worked on the design of the register file, data path, and control circuitry. They completed the design in 18 months and taped it out to VLSI Technology Inc. in January 1985. Test chips were running in evaluation boards soon after silicon became available in April [2].

The chip Acorn created, dubbed ARM, was incredibly modest even by 1985 standards. Yet it compared quite favorably to the much more complicated and expensive designs, such as the Motorola 68020, which represented the state of the art in CISC processor design. The 2.0 um 68020 incorporated 190,000 transistors in an 85 mm2 die. In contrast, the ARM incorporated about 25,000 transistors on a 50 mm2 die manufactured using a 3 um CMOS process. It boasted a 25 entry 32-bit register file, and executed a little over two dozen different instruction types in a three stage pipeline at 3 MHz, for a peak execution rate of 3 MIPS.

The ARM instruction set architecture is a fairly conventional RISC design with several unusual and forward looking features. ARM instructions have 4-bit register fields and can address 16 distinct registers, R0 through R15, with R15 implementing the program counter (PC). The processor contained more than 16 physical registers because some of the 16 logical registers were shadowed with extra physical registers that would replace them during specific exception processing states. These special registers reduced interrupt latency and processing overhead by providing immediate scratch pad registers to interrupt handlers without the need to explicitly save any registers to memory and later restore them.

ARM uses the standard RISC convention of fixed-length, 32 bit long instruction formats. In a move that foreshadowed the use of full predication in IA-64 architecture by a decade, all ARM instructions are conditional with the upper 4 bits of the instruction specifying logical combinations of the PSR flags (as well as neverexecute and always execute conditions). The space in the instruction word to accomplish this came from the use of 4-bit register addressing instead of the conventional 5-bit addressing found in most RISC instruction set architectures. The ability to make any instruction conditional can be used to eliminate some conditional branches through a process known as ‘if-conversion’, which can increase both the performance and/or density of compiled code. Compiler code generation and optimization was also made easier by the ability to explicitly control which instructions modified the condition code or flag bits.