The 5 Things that Comprise Dothan

There are five basic parts of Dothan that differ it from Banias, but unfortunately (just as was the case with Banias), Intel is not very forthcoming with details about Dothan out of a desire to guard their intellectual property. Even a year after its release, we have yet to see any serious competition for the Pentium M and Intel wants it to remain that way for as long as possible.

That being said, we will try to be as specific about the details of Dothan as much as possible; and we'll start at the most obvious - its 90nm process.

90nm process and 2MB L2

Banias was built on Intel's 0.13-micron manufacturing process at its peak. The tried and true manufacturing process meant that Banias faced no manufacturing delays and could hit its target clock speeds without a problem.

Dothan gets its most noticeable improvements over Banias, thanks to the move to Intel's smaller 90nm manufacturing process. This is the same process that's used in the manufacturing of Prescott, which means a couple of things. For starters, it explains why availability of Dothan hasn't been incredible, since its launch as 90nm production is still ramping. The availability problem aside, 90nm gives Dothan the ability to cram almost twice as many transistors onto the chip without increasing the overall die size compared to Banias.

Dothan is now a 140 million transistor chip (up from 77 million in Banias) with those 140 million transistors occupying the same 84 mm2 die area as Banias (almost, Banias is about 1 mm^2 smaller). Almost twice the transistors with no increase in die size? It's a chip manufacturer's dream. Because of the stagnant die size, yields should not differ between Banias and Dothan (once Intel's 90nm process has truly matured) and it shouldn't cost Intel any more to produce Dothan than it did Banias.

The majority of the increase in transistor count is thanks to Dothan's 2MB L2 cache, twice that of Banias' 1MB cache. The 64KB L1 cache remains the same that was present in Banias.

We believe that Intel is using the same 90nm SRAM cells from Prescott in Dothan. If they are indeed, then the extremely small 84 mm2 die is further enabled by the significantly smaller 90nm SRAM cells that Intel developed. However, we are not clear as to how independent Banias and Dothan's SRAM cell design remains from the desktop chips, thanks to their unique power requirements.

Along with a larger L2 cache, Intel has increased how aggressively Dothan prefetches data into its cache in order to take advantage of the extra on-die L2. This is a fairly normal practice that microprocessor designers employ whenever an architecture stays the same, but cache size increases in order to help improve performance.

The 90nm process will also allow Dothan to scale up in clock speed, thanks in part to Intel's strained silicon technology, something that we're already seeing the fruits of today with its introductory 2GHz clock speed (up from Banias' 1.6GHz intro speed). Dothan will break the 2GHz barrier by the end of 2004. Remember that Intel's design philosophy with Dothan, just like Banias, is to design the chip for a specific power consumption and to leave clock speed scaling mostly up to the manufacturing process to enable.

Dothan's 90nm manufacturing process, in the end, gives it the higher clock speeds and larger L2 cache, which offer some of the more tangible advantages over Banias. Another very important fact to keep in mind is that these are the only major changes to Banias that make up Dothan; unlike Prescott, the pipeline has not been changed at all. Even Intel's Dothan design team views Prescott as a bit of a risky move, to try out significant modifications to the architecture alongside a brand new manufacturing process. Thus, it's no surprise that Dothan remains relatively unchanged architecturally outside of the move to 90nm; the pipeline and L1 cache are identical to Banias.

Micro Ops Fusion

Intel has been deliberately vague about Banias' micro ops fusion and they continue to be such with the modifications to the micro ops fusion engine in Dothan. All that we are allowed to publish is that Dothan now allows more types of micro ops to be fused, which isn't a bad thing, it would just be nice to know which ones and what enables Dothan to support the fusing of more micro ops.

Local Branch Prediction Improvements

With Dothan, there have been some improvements to branch prediction performance in order to reduce power consumption and increase performance. Remember that the fewer branch mispredicts you have, the less power that is wasted on refilling the pipeline after a flush.

One of the biggest improvements to Dothan's branch predictors is in its loop detector. Although most don't think of a loop as a branch, all loops either end or begin with some sort of a comparison statement that determines whether the loop should continue to execute (e.g. if i ‹ 10, then keep looping). Loops are normally handled by a static branch predictor that always predicts taken once a loop is detected, and usually the only mispredicts that exist once a loop is detected are at the end of the loop. While this works fine for larger loops (100+ iterations), it does not work so well for extremely small loops (e.g. 5 iterations). What ends up happening is that the 5th, 6th and 7th time around, the predictor will mispredict a taken branch when, actually, the loop is finished with. Mispredicting 3 times for a loop that only runs for 5 iterations does not help branch prediction accuracy, so we have a problem on our hands.

Dothan includes a more sophisticated algorithm in its detection and prediction of branches involving small loops; once again, Intel was purposely vague about exactly what Dothan does that Banias did not, but just know that Dothan has better overall branch predictor performance, thanks to modifications like improved detection of short loops.

Faster Integer Division

When moving to a small manufacturing process, it's often possible to include logic that didn't make the cut originally due to space constraints, such is the case with Dothan and its integer division performance. Once again, all we know is that integer division is faster on Dothan, but no idea how fast or why.

Enhanced Register Access Mananger

As we mentioned at the beginning of this article, much of what went into Dothan were tweaks to Banias that couldn't be implemented without pushing the design completion date further out. One such fix that didn't make it to Banias was a workaround to a register access issue that caused the entire pipeline to stall in Banias. The situation was a unique one, where a partial register write followed by a full register read would cause the pipeline to stall. Dothan features a workaround for the problem and there is no longer a performance penalty for performing a partial register access followed by a full register access.

Post Your Comment

28 Comments

I'd like to see a P4 3.2-3.4 w/ dual channel DDR3200 used instead of single channel DDR2700 that was used for the P4 testbed - as a rough idea where a Dell 9100/XPS, for example, would fall in line with the Dothan.Reply

Something else I noticed - where's the Banias 1.7 and the Dothan 735? It would make comparing Banias to Dothan so much easier to do that...

Also, what motherboards did you use on the desktop chips?

#25: Take a look at http://cpu-museum.de/forum/viewtopic.php?t=1089. PowerLeap actually DID have something in the works, and was going to release in Quarter 1, but they cancelled it because of (well, at least this is what they said - I wouldn't be surprised if it was because Intel held them at gunpoint) LGA775 coming out (which can't really work with an adaptor) and the fact that the P-M wasn't running at 2.0GHz (I told them about Dothan, though).Reply

Intels "dominace" in encoding? Those days are history my friends. If they ever were.

Anandtech's article published just before this one was proves it. Here's a quote:

"It was difficult to resist being a little sensationalist in this 939 roundup and titling the review, "Who needs 925X?" That would have been a fair title, however, since you can clearly see that all of the Socket 939/FX53 boards completely outperform Intel's top 560 on the top 925X motherboard. Even Media Encoding, the last bastion of Intel dominance, has fallen in benchmarks with our new AutoGK benchmark. "

As a matter of a fact it's always been a mismoner if you look at other suites/Codecs such as:

Since the Pentium M shares its bus protocol with the Pentium 4, I think companies like Powerleap should be more then able to provide an adaptor for desktop use.

This would likely result in an "Intel unknown" detection, along with a lack of power save options, but previous experience with the "Tualatin" P3(1400) learned that this could work and even offer full performance compared to a full recognition.

Also, I think the limited memory bandwith did not hold Athlon 64 back, because I still remember seeing Athlon FX (dual channel) and 64 (sinle channel) perform almost equal for the same clockspeed. However, the Pentium 4 would benefit from more bandwith, as happened before on every FSB bump it got. So I think adding faster memory would only make a noticeable difference for the Pentium 4.Reply

I didn't say it would make up *all* the performance, but enough to make it competetive. For example: the P4 is essentially faster than A64 in those tasks but the A64 is still sufficiently close to be a viable altermative. Dothan doesn't have to beat the P4 in media encoding or content creation, just come close enough to be a good desktop CPU. It should be possible by increasing the FSB.Reply

One comment I forgot to mention: even with the talk of thermal design targets and clock speed limitations, I imagine that a Dothan CPU in a desktop motherboard with a large copper heatsink would have quite a lot of overclocking headroom. I know there was a French site that claimed to have overclocked a Dothan to 2.4 GHz, and the performance was quite impressive (if true). I would really like to see overclocking results for Dothan (and Banias) on a desktop system. With processors designed to generate 1/4 the amount of heat of a P4 3.2+, it could be interesting. Here's hoping some motherboard manufacturers will accommodate us! :)Reply

Overall, I thought this was a great article. Those complaining about various configuration issues need to stop whining. Never once did AT actually give out any numbers for battery life or low-power performance, or claim that the Dothan was beating a desktop Athlon 64/P4. All we're looking at is what the various can do in typical *laptop* configurations. Getting a P4 and A64 laptop would make them into real laptops, but then we would have the laptop manufacturer's configuration, and likely it wouldn't be the same as what they had for the Dothan system. What we've got is three platforms running the same RAM, Hard Drive, and graphics card. Yes, it's limiting what some of the CPUs could do, but it's about as fair as you can get.

Of course, if history is any indication, I'm fairly confident that Dothan is going to dominate other mobile architectures in battery life - when using the same screen, hard drive, and battery. Too bad it's so difficult to actually meet those criteria. Screens differ quite a bit, and many P4 and A64 laptops are shipping with 90+ kWhrs batteries, while Banias and Dothan laptops often get by with 60 to 70 kWhrs. Anyway, the Banias was generally the best laptop CPU before (i.e. most efficient while still providing good performance), so why shouldn't the Dothan be similar if not better? The results are hardly surprising.

I really hope that we one day see some desktop boards designed for Dothan, though. I imagine that getting the FSB up to 200 MHz quad-pumped should be possible, although even 133 or 166 would be helpful. Combine a Dothan CPU with all the other desktop accoutrements, and it would likely be a formidable gaming platform. Of course, it would really only be about the same as Athlon 64, and Intel is currently milking the Dothan/Banias line for all they can. $600+ for a processor that probably costs Intel less money to create than their P4 chips.

Again, great article, Anand. (And for the interested, Jon "Hannibal" Stokes over at Ars Technica put together as much information as I've seen anywhere about how the internals of the Banias/Dothan function. Not much detail in comparison to other CPU comparision articles he's written, but there is some additional information about what ops can benefit from the fusion technique, IIRC.)Reply

I would also like to see this review updated with an Athlon 64 with 1 meg L2 cache. It seems a shame to compare an expensive 2 meg L2 cache Dothan with the cheaper 512KB L2 cache Athlon 64's. In addition, by keeping Athlon 64's to the slower/single channel RAM, you are making the L2 cache more important than normal, and hindering the Athlon 64.Reply