Sunday, March 23, 2008

Embedded + CPU != x86

I work on embedded systems, generally specialty networking gear like switches, routers, and security products. We use a commodity processor to handle administration tasks like web-based management, a command line interface, SNMP, etc. This processor runs an actual OS, which up to about six years ago would have been a commercial realtime package like vxWorks or QNX. In the last few years everything I've worked on has used Linux.

One question which comes up with reasonable frequency is why do embedded systems generally use RISC CPUs like PowerPC, MIPS, and ARM? Usually the question is phrased the other way around, "Why don't you just use x86?" To be clear: the processors used in the embedded systems I'm talking about are not the ultra-cheap 4, 8 or 16 bit CPUs you might think of when someone says "embedded". They are 32 or 64 bit CPUs, with memory protection and other features common to desktop CPUs. Asking why we're not using x86 is a legitimate question.

Lets not kid ourselves: pricing is by far the most important factor in component selection. If the pricing worked out, any other issues with using an x86 processor would either find a resolution or just be accepted as is. Common embedded processors in the market segment I'm talking about range from about $20 at the low end (various ARM9 chips, Freescale PowerPC 82xx) to $500 or more at the high end (Broadcom SB-1 MIPS, PA Semi PowerPC). More importantly, these prices are available at relatively low volumes: in the hundreds to thousands of units, rather than the 50,000+ unit pricing thresholds common in the x86 market. [Actually the pricing in the x86 market is far more complex than volume thresholds, but that quickly gets us into antitrust territory where my head starts to hurt.]

This leads us to one of the vagaries of the embedded market: it is enormous, but very fragmented. For example in 2007 Gartner estimates that 264 million PCs were sold, nearly all of which contain an x86 processor. In the same year almost 3 billion ARM processors were sold. To be sure, the average selling price of the ARM CPU would have been considerably less than the average x86 CPU, but it is still an enormous market for chip makers.

Yet while the top 6 PC manufacturers consume 50% of the x86 CPUs sold, the embedded market is considerably more fragmented. For every product like the Blackberry or Linksys/Netgear/etc routers selling in the millions of units, there are hundreds of designs selling in far lower volumes. In the embedded market the pricing curve at the tail end is considerably flatter: you can get reasonable prices even at moderate volumes. As a practical matter this means the chip maker's pricing also has to factor in several layers of distributors between themselves and the end customer.

System on Chip

Once we get past the pricing and volume discount models, the biggest differences between an embedded CPU and an x86 CPU designed for PCs are in the peripheral support. Consider the block diagram shown here: the CPU attaches to its Northbridge via a Front-Side Bus (*note1). The Northbridge handles the high bandwidth interfaces like PCIe and main memory. Lower speed interfaces are handled by the Southbridge, like integrated ethernet and USB. Legacy ports like RS-232 have mostly been removed from current Southbridge designs; an additional SuperIO component would be required to support them. Modern PCs have dispensed with these ports, but a serial port for the console is an expected part of most embedded networking gear.

I'll hasten to add that this arrangement makes a great deal of sense for the PC market, given its design cycles. CPU designs change relatively frequently: there will be at least a clock speed increase every few months, and a new core design every couple years. Memory technologies change somewhat more slowly, requiring a new Northbridge design every few years. The lower-speed interfaces handled by the Southbridge change very slowly, Southbridge chips can stay on the market for years without modification. Segregating functionality into different chips makes it easier to iterate the design of some chips without having to open up all of the unrelated functionality.

Requiring a Northbirdge, Southbridge, and additional supporting chips adds cost in two ways: the cost of the additional chips, and the additional board space they require. The primary functionality of these products is not the CPU: the CPU is just there for housekeeping really, the primary functionality is provided by other components. Taking up more space for the CPU complex leaves less space for everything else, or requires a more expensive board with denser wiring to fit everything.

(*note1): AMD incorporates the memory controller into the CPU, so DRAM would be connected directly. Future Intel CPUs will do the same thing. They do this to reduce latency to DRAM and therefore enhance performance. A Northbridge chip will still be required to provide I/O and hook to the Southbridge.

By contrast, CPUs aimed at the embedded market are System on Chip designs. They contain direct interfaces for the I/O ports and buses typically required for their application, and require minimal support chips. The CPUs I work with include a MIPS or PowerPC CPU core running at a modest frequency of 1GHz or below. The DRAM controller, PCI bus, and all needed ports are part of the CPU chip. We strap Flash and memory to it, and off we go.

There are a vast number of embedded CPUs tailored for specific industries, with a different set of integrated peripherals. For example, automotive designs make heavy use of an interface called the Controller Area Network. Freescale Semiconductor makes a MPC5xx line of PowerPC CPUs with integrated CAN controllers, a healthy amount of Flash memory directly on the CPU, and other features tailored specifically for automotive designs. Other markets get similar treatment: the massive mobile phone market has a large number of CPUs tailored for its needs.

Production lifetime

Its relatively easy for a processor vendor to produce versions of a given CPU at different frequencies: they simply "speed bin" CPUs coming off the line according to the maximum frequency they can handle. CPUs which run faster are sold at higher prices, with a reduced price for lower frequencies. At some point the pricing curve becomes completely artificial, for example when essentially all CPUs coming off the line will run at 1.5 GHz but a 1.0 GHz speed grade is sold because there is sufficient demand for it.

It becomes more difficult for the processor vendor when they switch to a new core design. Maintaining a production line for the older design means one fewer production line for the newer design, which undoubtedly sells at a better profit margin. The vendor will pressure their customers to move off of the oldest designs, allowing them to be End of Lifed (EOL).

PC vendors typically revise their system designs relatively frequently, simply to remain competitive. You might have bought an iMac in 2005 and another iMac in 2007, but if you open them up they won't be the same. By contrast embedded system designs are often not revised at all: once introduced to the market, the exact same product will remain on the price list for years. Newer products with an improved feature set will be added, but if a customer has qualified one particular product they can continue buying that exact same product year after year.

In the embedded world a minimum 5 year production lifetime is standard. The design lifetime to EOL varies widely in the x86 world. An industrial version of Intel's 486 remained in production until 2006, but the Core Solo was only produced for about 18 months.

Power and heat

The type of systems I work on are relatively compact, using a 1U rackmount chassis. There are a number of other power-hungry chips in the system in addition to the CPU, and we have to reserve sufficient power to handle a number of PoE ports. All of this power consumption demands airflow to keep it all cool, so we'll have a line of fans blowing a sheet of air over the entire board.

This means we don't have much room in the power supply or cooling capacity for a hot CPU. The power-hungry Pentium 4 was simply unthinkable, and the typical 65 watt Thermal Design Power (TDP) of current x86 CPUs is still too much. We look for something closer to 8 watts. Indeed, none of the systems I've worked on has ever had a CPU fan, just a heat sink.

Future Developments

AMD and Intel have both dabbled with embedded derivatives of x86 chips over the years. Intel's 80186 was an 8086 CPU with additional peripherals and logic, making it into a System on Chip for the embedded market of that time (I bet you thought they skipped from 8086 directly to 80286). Intel also produced various industrial versions of the 386, 486, Pentium, and Core CPUs. AMD produced embedded versions of the 486 and K5, branding them "Elan." These CPUs have experienced various degrees of success, but mostly the chip vendors have seemed to treat the embedded market as a place to dump the hand-me-downs from last years PCs. If your design really does call for something which looks a lot like a PC you might use these chips, but otherwise the various ARM/MIPS/PowerPC SoC products would be a better fit.

Starting in 2008 Intel began a renewed focus on the embedded market. The Tolopai design is the most interesting for the kind of products I work on: a highly integrated SoC with multiple gigabit ethernet MACs and no integrated graphics. I can definitely see a Tolopai-like processor being used in the sort of products I work on, if the middle volume pricing is competitive. We'll have some endianness issues to work through: we write a lot of code in C and the processors used now are mostly big-endian. Though we try to handle the endianness correctly, there are undoubtedly places where we've slipped up.