Intrinsity boosts the brainpower of a cellphone chip to the level of a PC

In September, when Samsung boasted at a Taipei industry conference that it could make smartphone chips with PC-like performance, a company in Austin, Texas, took an offstage bow.

It was Intrinsity, a small chip designer, that had made the South Korean silicon giant’s claims possible. It had taken the 650-megahertz ARM Cortex-A8—a CPU designed for smartphones, licensed to Samsung by ARM Holdings—and hot-rodded it into a 1-gigahertz processor dubbed Hummingbird. The result could, with Samsung’s backing, power an impressive portion of the next generation of must-have mobile devices. Some even speculate that Apple itself will put Hummingbird in a coming upgrade to its iPhone. It’s all happening very quickly for a start-up that still shares office space with a local magazine and a dentist.

Now that it’s souping up a next-generation Cortex-A9 for an undisclosed partner, Intrinsity is bidding to carve itself a slice of the market for higher-priced smartphones, netbooks, and other portables. And what a market it is! Just as PCs supplanted minicomputers and laptops supplanted PCs, so smartphones and netbooks will replace laptops for many of their uses. Intrinsity and other designers stand to do well for themselves.

“We’re a speed shop that doesn’t go and burn a lot of fuel in the process,” says Intrinsity president and CEO Bob Russo. He and his 99 engineers grease the wheels of a CPU by trimming inefficiencies in the way logic gates are used.

For instance, when those engineers found that the conventional Cortex-A8 used 20 logical steps to perform some simple binary addition functions, they worked out a way to do the same job in four steps, saving computation time. Because this made some regions in the A8 speed up, the transistors in other regions needed to accommodate less traffic and therefore could be shrunk. Smaller transistors mean less power consumption—a critical element in a battery-powered device.

The key to this and many other performance tricks is the type of logic gate Intrinsity uses: 1-of-n domino logic, or NDL, part of its suite of technologies called Fast14 (named after the atomic number of silicon). Russo says NDL can speed up a logical step by 40 to 60 percent. About a fifth of the A8’s functions are benefiting from it, he adds.

Domino logic is a technology typically used in laptops and desktops, where power consumption isn’t nearly as big a deal as it is in a smartphone. Domino logic does most of its work with n-channel metal-oxide semiconductor (NMOS) transistors, which use electrons to carry charge. In contrast, complementary-metal-oxide semiconductor (CMOS) logic uses equal helpings of both NMOS and p-channel metal-oxide semiconductor (PMOS) transistors, which are slower because they juggle holes—the absences of electrons in a crystal lattice. By relying on faster transistors than CMOS logic does, domino logic can speed computation.

EXPERT CALLS
“Choosing to invent a funky new logic structure—without the required factor-of-three improvement—is not a sustainable strategy.”
—T.J. Rodgers

“Seems like a lot of work for a modest improvement in performance.”
—Kenneth R. Foster

“I’d say the company is a winner, but they seem to be relying on trickiness rather than fundamental technological breakthroughs. Is this a ­lasting ­solution for a business model?”
—Robert W. Lucky

Today firms such as AMD, IBM, and Intel use domino logic but only in certain circuits. That’s because it consumes more power than CMOS logic, its circuit timing can be more difficult to manage, it’s more sensitive to noise, and it generally requires more time and money to design and implement.

Fast14 puts a twist on domino logic, which lets it use power more economically. One of the most obvious differences is in the way it represents bits. In CMOS logic you can represent the numbers 0 through 3 on two wires as 00, 01, 10, and 11. In traditional “dual-rail” domino logic, you’d need four wires—two for the bits and two for their complements—so you have to switch twice as many wires every time the output changes, wasting power.

Intrinsity’s 1-of-n domino logic saves power by switching less often. To represent 0 through 3, it uses four wires but switches only one at a time: Zero is represented by 0001, 1 by 0010, 2 by 0100, and 3 by 1000. That scheme means that many transistors will be turned off most of the time. What’s more, the 1-of-n design lets engineers make more-complex functions in a single gate, reducing the number of steps needed to complete a logic function.

Indeed, Russo says, one of the qualities Intrinsity looks for when recruiting engineers is the ability to translate standard bit-based gates into faster 1-of-n domino circuits. The real art of integrating NDL into a processor core like the A8, he says, involves discovering which parts most need the NDL speedup at the expense of complicating that part of the design with 1-of-n logic. (As much of Intrinsity’s NDL process is proprietary, Russo would say only that the trade-off isn’t always worth the extra effort, which is why Intrinsity has used NDL “sparingly” in Hummingbird. It’s a small tweak that makes a huge difference.)

Another key trick to Fast14 is the use of multiple slightly out-of-phase clocks. Mark McDermott, the company’s vice president of engineering, likens it to a taxi’s progress along a street dotted with traffic lights. An instruction such as “Call up memory register A and add it to the contents of memory register B” is like telling the cabbie to go 50 blocks uptown and race through as many green lights as possible on the way. A regular processor core would use a route in which every light was in sync, turning red or green at the same time. That way, the taxi driver would move only a handful of blocks before stopping. But when you add Fast14 to the processor, McDermott says, you stagger the traffic lights so that the taxicab can travel dozens of blocks before having to stop for a red light. Everything goes faster.

Intrinsity’s engineers use these and other time-shaving tricks to ensure that the CPU stands idle as little as possible. “All it takes is one choke point, and that’s going to set your frequency,” says Intrinsity chip designer Brent Chambers.

What’s crucial to Hummingbird’s appeal is that it represents a modification of the ARM Cortex-A8 rather than a top-to-toe redesign. This means that makers of devices that now run on regular A8s can drop in a Hummingbird without changing a thing. By substituting one chip, Russo says, the manufacturer can get either a device that runs faster or a device that draws less power at a slower speed.

“There’s no one in the market right now who can deliver this kind of [clock speed] with this kind of power range,” says Russo. “We’ve had a lot of interest from a lot of big companies to implement our technology in their designs.”

Tom Halfhill, a senior analyst at In-Stat’s Microprocessor Report, says that Hummingbird boasts some impressive performance enhancements. He says that Hummingbird’s specs, published in July, imply that the chip will consume 750 milliwatts at 1 GHz, leaking current in the “very low milliwatt range.” Compare that, for instance, to Intel’s Atom N270, a processor for embedded systems, which clocks in at a slightly faster 1.6 GHz but guzzles 2.5 watts. (Unfortunately, power usage comparisons to the unmodified A8 are not easy, Halfhill says, because the A8 runs on less power than a Hummingbird—385 mW—but only at 650 MHz. The proper comparison would be a Hummingbird dialed down to 650 MHz. But neither Intrinsity nor Samsung has released power specs for Hummingbirds running at such speeds.)

So how much does Hummingbird’s beefed-up performance cost? “The value of the A8 in the iPhone 3GS is probably about US $15 to $18,” says Will Strauss, a market analyst with Forward Concepts, in Tempe, Ariz. “The Hummingbird version will carry a premium price. It’s probably going to be more in the $18 to $25 range.”

The extra dollars add up fast, too: In 2008, 139 million smartphones were sold. If Hummingbird makes Intrinsity the leading hot-rodder of smartphone CPUs, the company will surely build more partnerships with powerhouse companies—and take more offstage bows—in the months to come.

Hummingbird faces a host of competitors, but Intrinsity has the advantage of having made a great career move. Not only does the company wring superior performance from a chip, it has chosen the right chip to wring it from. “The A8 is likely to be the dominant engine in smartphones” by the middle of 2010, Strauss says.

Probably Hummingbird’s closest competitor in 2010 will be Qualcomm’s top-to-toe redesign of the A8, dubbed Snapdragon, with two cores running at up to 1.5 GHz.

But raw power, says Microprocessor Report’s Halfhill, isn’t everything. Equally important is the ease with which Hummingbird can be integrated into preexisting smartphone designs, such as the Palm Pre and the iPhone 3GS, that already use the A8.

“If you have a whole new microarchitecture, like Snapdragon, then that could possibly change the whole rest of the chip design—all the peripherals attached to it, the coprocessors,” Halfhill says. “If you’ve got to redo the whole chip, then you’re adding maybe another year to the project.”

And if a company decides to add that extra year of design time, by the time it goes to market it could find itself facing other smartphones powered by ARM’s dual-core next-generation Cortex-A9—it’s rated at 2 GHz and expected as soon as mid-2010. (Remember, Intrinsity is now hard at work souping up that A9.)

Then there’s Intel, the world’s biggest chipmaker, which is trying to break into the smartphone market with its Atom processor. But Intel faces an uphill climb, with 95 percent or more of the cellphones in the world running on ARM processors, Strauss estimates. Intel presents no great threat, though, because its refusal to license its technology means that any smartphone using the Atom line would have to begin at Intel itself. And Intel is no consumer products company.

ARM technology, however, has flowered precisely because so many third parties have had a chance to shape it as they saw fit. And fourth parties, too: Intrinsity didn’t need any license to tinker with ARM’s technology—it just had to license its own hot-rodding to Samsung.

The real fight, instead, looks to be between the Samsung/Intrinsity alliance and two key ARM processor makers: Texas Instruments and Qualcomm. The unmodified A8 powers TI’s new OMAP3 family of processors, one prominent user of which is the new Nokia N900 smartphone. Unlike Qualcomm, however, TI doesn’t hold what’s known as an architectural license from ARM, which would allow the company to do a full redesign the way Qualcomm is doing with Snapdragon. TI has a license only to sell ARM products or ARM “look-alikes”—cores whose cycle-by-cycle operation is the same as that of an ARM core. One such look-alike is Hummingbird. In fact, Halfhill speculates that Intrinsity could possibly end up on two sides of Samsung’s battlefield. Holding its own Cortex-A8 license, TI could itself turn to Hummingbird to speed up its next-generation OMAP3s, Halfhill says.

With the smartphone industry constantly shifting and allies and enemies often changing sides, Halfhill says the only sure thing about Hummingbird is that it’s a well-designed core that should wind up in a number of smartphones, netbooks, and other portable devices in the coming year.

The marketplace is moving in Intrinsity’s direction, Strauss adds, with an increasing number of mobile devices running off a shrinking number of chips or even a single system-on-a-chip, or SoC. And once a device maker designs an SoC around a processor core, it will want to be able to upgrade its device with the least possible disruption to the SoC design. That could mean just dropping in an amped-up processor core like the Hummingbird, or its successors coming out of Intrinsity’s shop.

We judge Intrinsity’s modified A8 a winner because it improves the performance-to-wattage ratio, as the market for mobile devices demands, and because it does so in an open-chip architecture that’s on the ascendancy. In a year’s time, you might well be holding a phone with a Hummingbird inside.

This article originally appeared in print as “A More Cerebral Cortex.”

About the Author

Mark Anderson covered two winning technologies in this issue. He describes a hot-rodded smartphone chip with the power of a PC processor, designed by Intrinsity. Even during breaks, the brains of Intrinsity’s engineers were still running at full speed, Anderson noted. The engineers turned the lunchroom’s wall of windows into a dry-erase board, covering it in equations and circuit diagrams. In “Optical Lasers in a $100 Cable. Really,” Anderson reports that optical cable technology is overtaking copper, even in home electronics.