Physical Register Files to Save Power

The original x86 instruction set has a very limited number of registers (8). In order to maintain backwards compatibility with legacy x86 code, the ISA and associated registers were preserved. To scale performance with wide out of order architectures however, we needed larger register files. The solution was to enable register renaming, where the hardware could have additional registers not defined in the x86 spec and rename them on the fly.

Register renaming is done in all modern day x86 processors. There are two approaches to register renaming. The current Phenom II/Opteron approach actually carries the data from renamed registers along with the instruction as it moves through queues before it gets executed. You effectively create very wide instructions, which is horribly power inefficient (moving data on a chip takes a lot of power) although it gets the job done from a performance standpoint.

The alternative is something that we don’t see used in any current generation microprocessors. Instead of carrying data along with the instructions, you simply carry pointers to the data with those instructions. There’s added management complexity but you don’t have to worry about moving lots of data around, and therefore avoid much of the power penalty.

Bobcat (as well as Bulldozer) uses physical register files to save power. Intel actually did this in the Pentium 4 but hasn’t used PRFs since. AMD argues that with power as a major driver of design, PRFs will be necessary in future architectures.

Bobcat’s Performance Expectations

With nearly the same pipeline depth as Atom (15 vs. 16 stages), nearly the same cache latencies, the same instruction issue width and presumably competitive clock speeds (~1.5GHz), Bobcat based microprocessors should inherently outperform Atom thanks to its out of order architecture.

Atom does hold an advantage in that each core is multithreaded, so heavily threaded apps may have an advantage on Intel’s architecture. That being said, by far the biggest issue we have with Atom based netbooks is their single threaded performance that contributes to an overall slow user experience. Bobcat should hopefully address that.

On the threaded side, AMD does have another solution. As I mentioned before, Bobcat won’t be used in a microprocessor by itself - Ontario will feature two of them. AMD said that future designs are expected to integrate 2 or 4 Bobcat cores, while there are no plans to produce a single core version it’s always possible.

I believe a dual core Ontario based on Bobcat, if clocked high enough, could deliver a good enough balance of single and multithreaded performance to really challenge Atom in the netbook space. The assumption is that graphics performance will be much better than Atom with Ontario integrating an AMD GPU.

AMD’s official line is that Ontario will be able to deliver 90% of the performance of a mainstream notebook in less than half the die area. AMD isn’t just looking to compete with Atom, but go after even the CULV market with Ontario. Only time will tell if the latter is over zealous.

Power Concerns

AMD calls Bobcat sub-1W capable, which seems to imply that short of a smartphone Bobcat could go anywhere Atom could go. Technically, if AMD wanted to, even getting one into a smartphone wouldn’t be impossible - it would just require a healthy investment in chipsets.

It remains to be seen how good TSMC’s 40nm process will be compared to Atom’s Intel-manufactured 45nm transistors in terms of power consumption. Presumably the out of order aspect of the design will guarantee higher power consumption than Atom, but for the netbook/CULV notebook market the added performance may be worth the added power consumption.

Post Your Comment

76 Comments

Comments like this really bother me. You may not care about netbooks, but a lot of people do. Current ones don't pass the grandma test - your grandmother can do whatever task she needs to on them, like check e-mail, browse the internet, watch HD video - and any advance here is welcome.

Generally speaking a netbook is not supposed to be your main machine, but something you can chuck into your bag and take with you and do a little work on here and there. I write a lot, and have to work on other peoples' computers from time to time, so a netbook that doesn't completely suck is invaluable to me. Netbook performance is dismal right now, but Bobcat could successfully fix this market segment.

So no, you're not interested in netbooks and you'd rather be raked through hot coals than purchase one. But that just means they're not useful - TO YOU. There are a lot of people here interested in what Bobcat can do for these portables, and I count myself among them.Reply

Seriously Anand, it is crummy that I cannot find a whole section of your website. I hate to spam an entirely separate article, but how completely lame it is to have to spend 15 minutes doing a Google advanced search to find the Anandtech article I'm looking for.

One of the very, very few truly Class A+ hardware sites on the internet - you can count all the members of that class on one hand - and you make it seriously hard to find past articles and you completely OMIT a link to an entire category of your reviews. Insane.

In fact, most common tags can be put there (i.e. /AMD, /Intel, /NVIDIA, /HP, /ASUS, etc.) The only catch is that many of the tags will only bring up articles since the site redesign, so you'll want to stick with the older main topics for some areas. Hope that helps.Reply

BullDozer sounds like amazing stuff. I wonder, if the way that they have arranged int units into modules, if that means that we will be getting more cores for our dollars, compared to Intel. More REAL cores, I mean. I'm just a little disappointed that the int pipelines went from 3 ALU to 2 ALU, I hope that doesn't affect performance too much.Reply

Integer instruction pipelines are increased from 3 to 4. That's 33% more peak throughput. The number of ALUs/AGUs to keep these pipelines busy is meaningless without knowing details. K10 has 3 ALUs and 3 AGUs, but they are bottlenecked and partially idling most of the time. Bulldozer can do more operations per cycle while drawing less power, even with only 2 ALUs and 2 AGUs. How can that be disappointing?Reply

I think Bulldozer has the potential to be really competitive, mainly because Sandy Bridges looks quite unimpressive.In a recent leaked powerpoint from Intel, apparently until Q3 2011 the best Intel CPU is still going to be Gulftown based, possibly Core i7 990X. According to Intel benchmarks on the leaked powerpoint, the best Sandy Bridge, that is, Core i7 2600, apparently will be around 15% to 25% better than the i7 870, with the i7 980X being 25% to 35% better than the i7 2600.Reply

I have a question.. it was earlier speculated that BD would have four ALU pipelines per integer core. It was thought that one way they could make use of them was to send a branch down two pipes and take the correct result. Obviously this isn't the case, but my question is, why not? Wouldn't it be better to do that and just discard the branch predictors entirely? Why isn't that better? Reply