Physical Register Files to Save Power

The original x86 instruction set has a very limited number of registers (8). In order to maintain backwards compatibility with legacy x86 code, the ISA and associated registers were preserved. To scale performance with wide out of order architectures however, we needed larger register files. The solution was to enable register renaming, where the hardware could have additional registers not defined in the x86 spec and rename them on the fly.

Register renaming is done in all modern day x86 processors. There are two approaches to register renaming. The current Phenom II/Opteron approach actually carries the data from renamed registers along with the instruction as it moves through queues before it gets executed. You effectively create very wide instructions, which is horribly power inefficient (moving data on a chip takes a lot of power) although it gets the job done from a performance standpoint.

The alternative is something that we don’t see used in any current generation microprocessors. Instead of carrying data along with the instructions, you simply carry pointers to the data with those instructions. There’s added management complexity but you don’t have to worry about moving lots of data around, and therefore avoid much of the power penalty.

Bobcat (as well as Bulldozer) uses physical register files to save power. Intel actually did this in the Pentium 4 but hasn’t used PRFs since. AMD argues that with power as a major driver of design, PRFs will be necessary in future architectures.

Bobcat’s Performance Expectations

With nearly the same pipeline depth as Atom (15 vs. 16 stages), nearly the same cache latencies, the same instruction issue width and presumably competitive clock speeds (~1.5GHz), Bobcat based microprocessors should inherently outperform Atom thanks to its out of order architecture.

Atom does hold an advantage in that each core is multithreaded, so heavily threaded apps may have an advantage on Intel’s architecture. That being said, by far the biggest issue we have with Atom based netbooks is their single threaded performance that contributes to an overall slow user experience. Bobcat should hopefully address that.

On the threaded side, AMD does have another solution. As I mentioned before, Bobcat won’t be used in a microprocessor by itself - Ontario will feature two of them. AMD said that future designs are expected to integrate 2 or 4 Bobcat cores, while there are no plans to produce a single core version it’s always possible.

I believe a dual core Ontario based on Bobcat, if clocked high enough, could deliver a good enough balance of single and multithreaded performance to really challenge Atom in the netbook space. The assumption is that graphics performance will be much better than Atom with Ontario integrating an AMD GPU.

AMD’s official line is that Ontario will be able to deliver 90% of the performance of a mainstream notebook in less than half the die area. AMD isn’t just looking to compete with Atom, but go after even the CULV market with Ontario. Only time will tell if the latter is over zealous.

Power Concerns

AMD calls Bobcat sub-1W capable, which seems to imply that short of a smartphone Bobcat could go anywhere Atom could go. Technically, if AMD wanted to, even getting one into a smartphone wouldn’t be impossible - it would just require a healthy investment in chipsets.

It remains to be seen how good TSMC’s 40nm process will be compared to Atom’s Intel-manufactured 45nm transistors in terms of power consumption. Presumably the out of order aspect of the design will guarantee higher power consumption than Atom, but for the netbook/CULV notebook market the added performance may be worth the added power consumption.

Post Your Comment

76 Comments

If you're encoding using Adobe software, ditch AMD until Bulldozer. Adobe's software makes heavy use of SSE 4.1 instructions, which current AMD chips lack, and the extra two cores don't pick up the slack compared to a fast i7.Reply

I think he's picking up on the point that this general purpose design is going to favour integer operations over floating point. Looking at this architecture from the perspective of someone wanting to perform a lot of floating point matrix calculus then the performance improvement of each "core" is going to be proportionally less than for integer calcs.

So what he's saying is that quite clearly AMD believe that general purpose CPUs are just that and have designed for a well defined balance of FP and Interger operations i.e. If you want more FLOPS go talk to the GPU? Reply

"And if Bulldozer comes any later, it will be up against the die shrink of SandyBridge, Ivy Bridge. Things dont look so good in here."

Basically, you've contradicted yourself right here:

"Most of us dont need SUPER FAST computer."

True, and true.... Ivy will probably be faster than Bulldozer (speculatively) as is Nehalem to Stars, but most people, i.e. the "cash cow" won't buy these expensive products. Instead they will focus on mid to low end computers which by their performance is more then/or enough for their needs.

So things might not look good in reviews and bench tops, but in the stores and on people's bank balances they will look pretty good.Reply

"All the real world is interested in is the best CPU for the buck in a $400 PC box to run W7 and Office on. AMD needs to get a proper marketing dept to start telling folks that."

"The real world lost interest in CPU performance the minute dual cores arrived and they could finally run IE/Office and a couple of mainframe sessions without it grinding to a halt."

Apparently us Engineers aren't part of "The rest of the world".Try running products from the likes of Mentor Graphics, Cadence, and Synopsis for reasonably large designs. Check out what a difference each new CPU makes in PROe (assuming sufficient GPU horsepower). Run some large Matlab simulations, Visual studio compilations, and Xilinx builds. You don't even have to get out of college before you run into many of these scenarios.

Trust me when I say that we care about the next greatest thing.An extra $1000 dollars on a CPU is easily justified when companies are billing $100+ per Engineering hour (not to be confused with take home pay). Reply

Exactly so: An example would be a 24hr calculation to perform a detailed 3D finite element analysis. This is not unusual using highly spec'd Xeon work stations from your vendor of choice.

It might take 5 to 10 days to set up a model including testing of different aspects: Mesh density, discretisation errors, boundary effects, parametric studies. The set up time with numerous supporting pre-analysis runs is what really costs. Anything we can do to reduce this is worth while.

The above would be the typical process BEFORE considering a batch-job on a HPC cluster if we wanted to look at a series of load cases etc.

I know a number of movie studios who love every extra bit ofCPU muscle they can get their hands on. Rendering reallyhammers current hardware. One place has more than 7000XEON cores, but it's never enough. Short of writing specialisedsw to exploit shared-memory machines that use i7 XEONs (whichhas its own costs), the demand for ever higher processingspeed will always persist. Visual effects complexity constantlyincreases as artists push the boundaries of what is possible.And this is just one example market segment. As BitJunkiesuggests, these issues surface everywhere.

Another good example: the new Cosmos machine in the UKwhich contains 128 x 6-core i7 XEON (Nehalem EX) with2TB RAM (ie. 768 cores total). This is a _single system_,not a cluster (SGI Altix UV). Nothing less is good enough forrunning modern cosmological simulations. There will bemuch effort by those using the system on achieving goodefficiency with 512+ cores; atm many HPC tasks don't scalewell beyond 32 to 64 cores. Point being, improving theperformance of a single core is just as important as generalcore scaling for such complex tasks. SGI's goal is to producea next-gen UV system which will scale to 262144 cores ina single shared-memory system (32768 x 8-core CPUs).

You're 1% of the market... for you, Intel and AMD have reserved cherry-picked chips that they can charge you 1K for but at the same time offer you that needed speed. How's that?

BTW, he said real world, not rest of the world. That makes you somewhat of an illusion. But don't take it the bad way... more like most of us would dream working in an environment full of hot setups, big projects and big bux, unlike in the real world where you have to mop the floor after debugging for 8 hours straight... if they don't force you to work extra two hours without pay, never-mind that before you start the workday you have to go to various bureaucratic public clerk offices to deal with stuff that was supposed to be taken care by secretaries... which got fired for no apparent reason some time ago.