A quick look back at Banias

The core technologies of the Pentium M remain unchanged in Dothan. We've already explained them in great detail but here's a quick recap for those of you who haven't read or don't remember the original article.

The Pentium M is characterized by the following 7 design features and principles:

Mid-Length pipeline

The Pentium M has a pipeline that's shorter than that of the Pentium 4 (much shorter than that of Prescott), but longer than that of the Pentium III. Intel needed a longer pipeline to ensure that higher clock speeds would be possible, but shunned the Pentium 4's extremely long pipeline as it is quite a power hog. Although extremely high clock speeds can be wonderful for performance and marketing, they are a nightmare when it comes to power consumption. The longer your pipeline, the harder you have to work to keep that pipeline filled at all times and the bigger the penalty that you pay if the pipeline is ever left idle or has to be flushed (thanks to a mispredicted branch, for example).

To this day, Intel has still not disclosed the number of stages in the Pentium M pipeline out of an extreme desire to protect the processor's underlying architecture. The only thing we know is that Dothan's pipeline remains unchanged from Banias; a very good thing considering the surprise we all got with Prescott .

Much of Banias (and also Dothan) remains unpatented and protected using trade secret law in order to prevent the underlying ideas behind the CPUs' design from being picked up by competitors.

Micro Ops Fusion

The Pentium M, like all of Intel's modern day microprocessors, decodes regular x86 instructions into smaller micro-ops that are the actual operations sent down the pipeline for execution. Micro Ops Fusion takes certain micro-ops and "fuses" them together so that they are sent down the pipeline together and are either executed in parallel or serially without being reordered (or separated from one another). Micro Ops Fusion can only apply to certain types of instructions, which Intel has not officially disclosed.

The benefits of Micro Ops Fusion are multi-faceted; first, you have the obvious performance improvements, but alongside them, you also have reduced power consumption, thanks to not wasting any cycles waiting for dependent micro ops to retire before working on others.

Dedicated Stack Manager

Banias' dedicated stack manager is another power saving tool integrated into the Banias architecture that is designed to manage stack pointers and other stack-related data. Remember that stacks are used to store information about the current state of the CPU, including data that cannot be kept in registers due to limits in the number of available registers; thus, a dedicated manager can help performance considerably. As usual, whenever efficiency is improved, power consumption is optimized, which is the case with Banias here as well.

High Performance Branch Predictor

Banias' branch predictor reduced mispredicted branches by around 20% when compared to the Pentium III (when running SPEC CPU 2000 tests, but the improvements are very real world). The improvements are thanks to a larger branch history table (for storing data used to predict branches) and better handling of branching in loops, the latter of which is improved in Dothan.

Pentium 4 FSB, Pentium III Execution Units

The execution back end of Banias is identical to that of the Pentium III, making the Pentium M a relatively narrow microprocessor when compared to AMD's Athlon 64 and Intel's Pentium 4. Given the low power target for Banias, this decision makes a lot of sense as it reduces power consumption and die size; but keep in mind that the lack of extreme width in the pipeline means that technologies like Hyper Threading will be kept away from the Pentium M. Instead, we can look forward to having multi-core Pentium M designs, which is made somewhat easier to implement, thanks to a relatively small die.

In order to keep the processor fed, however, Intel implemented the Pentium 4's 64-bit quad-pumped front side bus. Currently, the FSB clock on all Banias (and Dothan) parts is 100MHz quad-pumped (effectively, 400MHz for 3.2GB/s of bandwidth), but by the end of this year, it will move to 133MHz (effectively 533MHz).

Power Saving Cache

Banias (and Dothan) implement an 8-way set associative L2 cache, which is not uncommon amongst modern day microprocessors. A set associative cache increases hit rate (likelihood that something you want will be found in cache) at the expense of increased cache latency. Cache latency is increased because once the location of data is found in cache, in which "way" it exists must be determined and selected - an incorrect determination will further increase cache latency.

In order to optimize the 8-way set associative cache for low power consumption, each "way" is further divided into quadrants. Once a "way" is selected, the L2 controller will determine in which quadrant the needed data resides and only activates that part of the cache. With such a large cache, it is important to save power here as much as possible.

Artificially Limited Clock Speed Design

Generally speaking, when you design a microprocessor, you want it to run as fast as possible. Normally, there's an initial idea of target clock speed and once the chip is actually back from the plant, it's not uncommon to find parts of the chip that run slower than your clock target, while others run faster (sometimes much faster). In desktop microprocessor design, the goal is to speed up the slowest parts of the chip (or critical paths as they are known among chip designers) and tweak the chip and the manufacturing process to run as fast as the fastest parts.

With Banias, Intel took a different approach. The design team set a clock speed target, and if any part of the chip exceeded that clock speed target, then that part of the chip had to be slowed down. The idea was that if a chip can run faster than its target, then you're wasting power - a luxury that isn't present in mobile chip design. The upside to this design methodology is that power consumption is further reduced, and when coupled with the other power-saving advancements that we've talked about, we're dealing with a fairly low power chip. The downside is that each generation of the Pentium M has a very well defined clock speed wall, and the only way over that wall is to use a smaller, cooler and faster manufacturing process. This is why you will see Pentium M ramp much slower in clock speed than any other Intel chip and why you will see clock speed bumps coincide with new manufacturing processes. It also means that if Intel ever has yield problems with a new manufacturing process (which isn't uncommon), the Pentium M will suffer. It's a risky move, but it's the type of move that is necessary to truly build a good mobile CPU.

Post Your Comment

28 Comments

I'd like to see a P4 3.2-3.4 w/ dual channel DDR3200 used instead of single channel DDR2700 that was used for the P4 testbed - as a rough idea where a Dell 9100/XPS, for example, would fall in line with the Dothan.Reply

Something else I noticed - where's the Banias 1.7 and the Dothan 735? It would make comparing Banias to Dothan so much easier to do that...

Also, what motherboards did you use on the desktop chips?

#25: Take a look at http://cpu-museum.de/forum/viewtopic.php?t=1089. PowerLeap actually DID have something in the works, and was going to release in Quarter 1, but they cancelled it because of (well, at least this is what they said - I wouldn't be surprised if it was because Intel held them at gunpoint) LGA775 coming out (which can't really work with an adaptor) and the fact that the P-M wasn't running at 2.0GHz (I told them about Dothan, though).Reply

Intels "dominace" in encoding? Those days are history my friends. If they ever were.

Anandtech's article published just before this one was proves it. Here's a quote:

"It was difficult to resist being a little sensationalist in this 939 roundup and titling the review, "Who needs 925X?" That would have been a fair title, however, since you can clearly see that all of the Socket 939/FX53 boards completely outperform Intel's top 560 on the top 925X motherboard. Even Media Encoding, the last bastion of Intel dominance, has fallen in benchmarks with our new AutoGK benchmark. "

As a matter of a fact it's always been a mismoner if you look at other suites/Codecs such as:

Since the Pentium M shares its bus protocol with the Pentium 4, I think companies like Powerleap should be more then able to provide an adaptor for desktop use.

This would likely result in an "Intel unknown" detection, along with a lack of power save options, but previous experience with the "Tualatin" P3(1400) learned that this could work and even offer full performance compared to a full recognition.

Also, I think the limited memory bandwith did not hold Athlon 64 back, because I still remember seeing Athlon FX (dual channel) and 64 (sinle channel) perform almost equal for the same clockspeed. However, the Pentium 4 would benefit from more bandwith, as happened before on every FSB bump it got. So I think adding faster memory would only make a noticeable difference for the Pentium 4.Reply

I didn't say it would make up *all* the performance, but enough to make it competetive. For example: the P4 is essentially faster than A64 in those tasks but the A64 is still sufficiently close to be a viable altermative. Dothan doesn't have to beat the P4 in media encoding or content creation, just come close enough to be a good desktop CPU. It should be possible by increasing the FSB.Reply

One comment I forgot to mention: even with the talk of thermal design targets and clock speed limitations, I imagine that a Dothan CPU in a desktop motherboard with a large copper heatsink would have quite a lot of overclocking headroom. I know there was a French site that claimed to have overclocked a Dothan to 2.4 GHz, and the performance was quite impressive (if true). I would really like to see overclocking results for Dothan (and Banias) on a desktop system. With processors designed to generate 1/4 the amount of heat of a P4 3.2+, it could be interesting. Here's hoping some motherboard manufacturers will accommodate us! :)Reply

Overall, I thought this was a great article. Those complaining about various configuration issues need to stop whining. Never once did AT actually give out any numbers for battery life or low-power performance, or claim that the Dothan was beating a desktop Athlon 64/P4. All we're looking at is what the various can do in typical *laptop* configurations. Getting a P4 and A64 laptop would make them into real laptops, but then we would have the laptop manufacturer's configuration, and likely it wouldn't be the same as what they had for the Dothan system. What we've got is three platforms running the same RAM, Hard Drive, and graphics card. Yes, it's limiting what some of the CPUs could do, but it's about as fair as you can get.

Of course, if history is any indication, I'm fairly confident that Dothan is going to dominate other mobile architectures in battery life - when using the same screen, hard drive, and battery. Too bad it's so difficult to actually meet those criteria. Screens differ quite a bit, and many P4 and A64 laptops are shipping with 90+ kWhrs batteries, while Banias and Dothan laptops often get by with 60 to 70 kWhrs. Anyway, the Banias was generally the best laptop CPU before (i.e. most efficient while still providing good performance), so why shouldn't the Dothan be similar if not better? The results are hardly surprising.

I really hope that we one day see some desktop boards designed for Dothan, though. I imagine that getting the FSB up to 200 MHz quad-pumped should be possible, although even 133 or 166 would be helpful. Combine a Dothan CPU with all the other desktop accoutrements, and it would likely be a formidable gaming platform. Of course, it would really only be about the same as Athlon 64, and Intel is currently milking the Dothan/Banias line for all they can. $600+ for a processor that probably costs Intel less money to create than their P4 chips.

Again, great article, Anand. (And for the interested, Jon "Hannibal" Stokes over at Ars Technica put together as much information as I've seen anywhere about how the internals of the Banias/Dothan function. Not much detail in comparison to other CPU comparision articles he's written, but there is some additional information about what ops can benefit from the fusion technique, IIRC.)Reply

I would also like to see this review updated with an Athlon 64 with 1 meg L2 cache. It seems a shame to compare an expensive 2 meg L2 cache Dothan with the cheaper 512KB L2 cache Athlon 64's. In addition, by keeping Athlon 64's to the slower/single channel RAM, you are making the L2 cache more important than normal, and hindering the Athlon 64.Reply