Embedded Processor Forum Day 1: 32-Bit, 64-Bit CPUs

This site may earn affiliate commissions from the links on this page. Terms of use.

Attendees gathered at the Embedded Processor Forum to celebrate next-generation embedded cores – and to mourn the demise of the single-chip embedded microprocessor.

While desktop chips like National Semiconductor’s Geode have attempted to merge the CPU, core logic, and graphics processor, the design is the exception rather than the rule. “Discrete” is generally synonomous with “performance”, at least in the desktop space.

The opposite is true in the embedded world, where tightly coupled on-chip buses speed the transfer of data to multi-gigabit speeds, powering the Internet backbone and smaller high-speed networks. Chips like PMC-Sierra Inc.’s RM9000x2 also defied that rule of thumb.

“The digital genie is out of the lamp; we’ve digitized everything,” said Steve Leibson, analyst with MicroDesign Resources Inc., Sunnyvale, Calif., and editor of The Microprocessor Report. “But nobody’s willing to pay for it.”

While that doesn’t surprise embedded systems-on-chip designers, it does dictate the design of the next-generation microprocessor cores. Attendees provided presentations on both 32-bit and 64-bit processors, with a small helping of configurable cores as well.

Although attendees said the event drew less people than in the past, it was not for lack of presentations. Over 30 new chips will be introduced over the three days, with Thursday entirely devoted to network processors.

Motorola Inc., Schaumburg, Ill., didn’t even have time to announce the successor to the DragonBall Super VZ line of embedded processors, the backbone of handhelds manufactured by Palm Computing Inc., as well as the MX1, based on an ARM920T core from ARM Ltd. The Super VZ line still uses a 32-bit address/16-bit data bus, based on the FLX68K CPU, but runs at a faster 66-MHz speed. The SuperVZ is scheduled to sample in the fourth quarter for $14. The MX1 will be available at about the same time for $19; a list of features was not available at press time.

64-bit cores

While some may argue that 64-bit processing is unnecessary, Leibson pointed out that IPv6 network protocols feature 128-bit addressing, a far cry from the 32-bit addressing used by IPv4. Still, he noted, a typical chip might only have between six to ten pipeline stages, posing a difficult barrier to pushing speeds upwards to 1-GHz and above. However, designers are also usually savvy enough to look at the chip’s performance in a variety of applications.

MIPS Technologies Inc., Mountain View, Calif., introduced the MIPS64 5Kf, a 64-bit core that is compliant with the MIPS64 instruction set. The core issues two floating-point arithmetic instructions simultaneously, and the FPU is IEEE 754 compliant. A single-precision multiply-add instruction yields 2 floating-point operations (FLOPS) per cycle. MIPS allows instruction and data caches up to 64 Kbytes each, individually controlled by the cache controller, up to 4-way set associative.

Instructions are dispatched through the integer pipeline, but they are executed in parallel to the floating-point pipeline, improving performance up to 100 percent in selected applications. The instruction pipeline itself is 6 stages.

“The major goal is that performance should not limit frequency,” said Morten Zilmer, engineering manager at MIPS.

In total, the size of the core ranges from 3.4 to 4.2 sq.mm, excluding caches; the core runs between 340 to 390 MHz typical and above 270-MHz in a worst-case scenario. Sustained performance is about 340-390 MFLOPS or between 510 and 585 DMIPS. The core consumes between 1.3 and 1.5 mW/MHz, according to MIPS.

NEC, a MIPS licensee, also showed off the VR5500, a core which was based upon the MIPS IV instruction set. “The primary goal for this design is to enable performance scalability through higher clock frequencies,” said Tomohisa Arai, engineering manager for the microcomputer division of NEC Corp., San Jose.

The MIPS VR5000 ISA sits atop the VR5432, providing compiler-level backwards compatibility to the previous core. The new IA adds 3-operand integer multiplies, 2-operand integer MACs, and 3-operand integer MACs. The core features an 8 to 10 stage pipeline, depending on the implementation as part of a dual-issue out-of-order superscalar architecture with six execution units; dual 64-bit instruction and floating-point units have been integrated.

Both the instruction and data caches are 32 Kbytes, 2-way set associative, with per-line cache locking. NEC is calling this implementation the “Blue Sapphire”. Speeds will exceed 300-MHz to 400-MHz and beyond, Arai said, pushing 603 Dhrystone 2.1 MIPS and a peak output of 150 MFLOPS at 300-MHz. The die size was not disclosed; however, the chip will be packaged in a 272-pin ABGA. A “Yellow Sapphire” core is due next year, featuring a DDR interface and an enhanced level-2 cache.

PMC-Sierra Inc. captured some of the limelight with the RM9000x2, a high-performance core for the networking market. In a 0.13-micron CMOS process, the seven-stage core–designed by Quantum Effect Devices on a MIPS code base–can hit 1-GHz, according to David Lau, a fellow with PMC-Sierra. The new pipeline contributed a 67 percent speed increases compared to the older RM7000, Lau said.

Instructions feed into the dual cores by way of a 16 Kbyte instruction and data cache, as well as a 256 Kbyte level 2 cache. If the L1 cache is missed, a 5 cycle penalty typically occurs before the L3 cache is accessed. Each cache is 4-way associative, with cache locking per line allowed in both caches. Since the caches must remain coherent with identical information, the company developed a 5-state protocol, adding “Modified” and “Shared” to a standard Modified-Exclusive/Shared/Invalid (MESI) protocol.

From there, the CPUs are connected via a “buffer pool”, an on-chip memory fabric that’s actually constructed as a register file with multiple 500-MHz ports to provide true concurrent data transfers, Lau said. From there, data can be fed to a 200-MHz DDR SDRAM controller, control registers, and a HyperTransport I/O controller.

The 9 mmx11 mm die will be fabricated in a 0.13-micron process from TSMC, featuring 8 layers of metal and a copper-based process. The SBGA package will require 656 total pins. Power consumption is an estimated 8 watts.

Speed was also of the essence for the TriCore 2 core from Infineon Technologies Inc., San Jose, Calif. Although analyst Leibson wondered if the lack of performance in the first-generation core contributed to the lack of sales, Sorin Zarnescu, technical marketing manager for 32-bit cores, said that wasn’t true.

The instruction set is a superset of the first-generation ISA, but adds a new 6-stage superscalar pipeline. However, the longer pipeline also introduced latency, according to Robert Ober, director of architecture at Infineon. The company therefore designed “trombone” pipes that could feed data from loads into different points of the integer pipe. As a result, the new pipeline looks identical to the TriCore 1 pipeline, eliminating the need for software changes, Ober said.

The final core in the 64-bit presentations was offered by Broadcom Corp., whose FirePath processor architecture was originally designed by Element 14 Corp. acquired in July of 1999. While the FirePath core was originally thought to be a general SoC core, Broadcom will use it in DSL products, according to Sophie Wilson, the core’s chief architect.

“SIMD is a fact of life for a 64-bit data path,” she told attendees. “Otherwise, your wasted effort in your data path becomes quite large.”

The core uses two forms of parallism: each of the two 64-bit data paths uses a long-instruction-word format, and each data path uses a SIMD (Single Instruction Multiple Data) process. The result is “128-bit” execution, Wilson said. Like Infineon, she characterized the presentation as a technology disclosure, and said Broadcom would offer specifics at a future time.

Sun Microsystems Inc.’s Microelectronics Group didn’t present a core, but a bus. The JBUS project was designed to allow small-way symmetric multiprocessing on a single chip, combining between 4 and 8 cores. Sun will likely use JBUS in future SPARC implementations, according to Renu Raman, director of engineering for Sun, Mountain View, Calif.

Sun’s JBUS features a 128-bit packet-switched split-transaction request and data bus, governed by a simple MOESI coherency protocol. The protocol will support out-of-order data return from different cacheable addresses, fully synchronous operation, and peer-to-peer transfers. In a reference implementtion in in 0.25-micron CMOS, the 200-MHz JBUS core required 18 sq. mm of die size. Speeds can be pushed to 1-GHz using a 0.13-micron process, Raman said.

32-bit cores

While 32-bit cores may seem humdrum by comparison, attendees said that they still provide the bext mix of price and performance. “It’s the bread and butter in embedded,” said analyst Leibson.

MIPS again introduced a core, this time the MIPS32 4KE. The 4K family includes a single-issue, 5-stage pipeline, a common memory management unit (MMU), a 32×16 multiply-divide unit, and debug capabilities. The 4KE family adds MIPS 16 code-compression support, a larger, 64-Kbyte cache, and extended power management.

The 8 new instructions in the so-called MIPS16e instructions include four casting instructions, two compact jump instructions, and two special instructions for stack setup and teardown. Overall, code size is reduced by 40 percent, according to Larry Hudepohl, director of 32-bit core development. The core runs between 275 to 340-MHz, performing 385 to 475 DMIPS. It consumes below 0.45 mW/MHz, and requires under 1.5 sq. mm, minus the caches.

Motorola SPS announced the e500, the company’s first PowerPC using the Book E instructions for improved memory handling. The core is synthesized within a 800-MHz 5-layer-metal implementation, using a 0.13-micron process.

English designer ARM Ltd., meanwhile, showed off the ARM 10 Rev1,a semi-custom core running either at 215 Mhz at 1.8 volts or 400-MHz at 1.0 volts. The core, which includes a floating point unit, encompasses approximately 2.4 sq. mm.

Finally, TriMedia Technologies Inc. demonstrated a prototype of a 32-bit TriMedia chip, within a 352-pin BGA package. The core, which measures approximately 16.9 sq. mm, will be fabricated in 0.18-micron CMOS.

This site may earn affiliate commissions from the links on this page. Terms of use.

ExtremeTech Newsletter

Subscribe Today to get the latest ExtremeTech news delivered right to your inbox.

Email

This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our
Terms of Use and
Privacy Policy. You may unsubscribe from the newsletter at any time.