The 5 Things that Comprise Dothan

There are five basic parts of Dothan that differ it from Banias, but unfortunately (just as was the case with Banias), Intel is not very forthcoming with details about Dothan out of a desire to guard their intellectual property. Even a year after its release, we have yet to see any serious competition for the Pentium M and Intel wants it to remain that way for as long as possible.

That being said, we will try to be as specific about the details of Dothan as much as possible; and we'll start at the most obvious - its 90nm process.

90nm process and 2MB L2

Banias was built on Intel's 0.13-micron manufacturing process at its peak. The tried and true manufacturing process meant that Banias faced no manufacturing delays and could hit its target clock speeds without a problem.

Dothan gets its most noticeable improvements over Banias, thanks to the move to Intel's smaller 90nm manufacturing process. This is the same process that's used in the manufacturing of Prescott, which means a couple of things. For starters, it explains why availability of Dothan hasn't been incredible, since its launch as 90nm production is still ramping. The availability problem aside, 90nm gives Dothan the ability to cram almost twice as many transistors onto the chip without increasing the overall die size compared to Banias.

Dothan is now a 140 million transistor chip (up from 77 million in Banias) with those 140 million transistors occupying the same 84 mm2 die area as Banias (almost, Banias is about 1 mm^2 smaller). Almost twice the transistors with no increase in die size? It's a chip manufacturer's dream. Because of the stagnant die size, yields should not differ between Banias and Dothan (once Intel's 90nm process has truly matured) and it shouldn't cost Intel any more to produce Dothan than it did Banias.

The majority of the increase in transistor count is thanks to Dothan's 2MB L2 cache, twice that of Banias' 1MB cache. The 64KB L1 cache remains the same that was present in Banias.

We believe that Intel is using the same 90nm SRAM cells from Prescott in Dothan. If they are indeed, then the extremely small 84 mm2 die is further enabled by the significantly smaller 90nm SRAM cells that Intel developed. However, we are not clear as to how independent Banias and Dothan's SRAM cell design remains from the desktop chips, thanks to their unique power requirements.

Along with a larger L2 cache, Intel has increased how aggressively Dothan prefetches data into its cache in order to take advantage of the extra on-die L2. This is a fairly normal practice that microprocessor designers employ whenever an architecture stays the same, but cache size increases in order to help improve performance.

The 90nm process will also allow Dothan to scale up in clock speed, thanks in part to Intel's strained silicon technology, something that we're already seeing the fruits of today with its introductory 2GHz clock speed (up from Banias' 1.6GHz intro speed). Dothan will break the 2GHz barrier by the end of 2004. Remember that Intel's design philosophy with Dothan, just like Banias, is to design the chip for a specific power consumption and to leave clock speed scaling mostly up to the manufacturing process to enable.

Dothan's 90nm manufacturing process, in the end, gives it the higher clock speeds and larger L2 cache, which offer some of the more tangible advantages over Banias. Another very important fact to keep in mind is that these are the only major changes to Banias that make up Dothan; unlike Prescott, the pipeline has not been changed at all. Even Intel's Dothan design team views Prescott as a bit of a risky move, to try out significant modifications to the architecture alongside a brand new manufacturing process. Thus, it's no surprise that Dothan remains relatively unchanged architecturally outside of the move to 90nm; the pipeline and L1 cache are identical to Banias.

Micro Ops Fusion

Intel has been deliberately vague about Banias' micro ops fusion and they continue to be such with the modifications to the micro ops fusion engine in Dothan. All that we are allowed to publish is that Dothan now allows more types of micro ops to be fused, which isn't a bad thing, it would just be nice to know which ones and what enables Dothan to support the fusing of more micro ops.

Local Branch Prediction Improvements

With Dothan, there have been some improvements to branch prediction performance in order to reduce power consumption and increase performance. Remember that the fewer branch mispredicts you have, the less power that is wasted on refilling the pipeline after a flush.

One of the biggest improvements to Dothan's branch predictors is in its loop detector. Although most don't think of a loop as a branch, all loops either end or begin with some sort of a comparison statement that determines whether the loop should continue to execute (e.g. if i ‹ 10, then keep looping). Loops are normally handled by a static branch predictor that always predicts taken once a loop is detected, and usually the only mispredicts that exist once a loop is detected are at the end of the loop. While this works fine for larger loops (100+ iterations), it does not work so well for extremely small loops (e.g. 5 iterations). What ends up happening is that the 5th, 6th and 7th time around, the predictor will mispredict a taken branch when, actually, the loop is finished with. Mispredicting 3 times for a loop that only runs for 5 iterations does not help branch prediction accuracy, so we have a problem on our hands.

Dothan includes a more sophisticated algorithm in its detection and prediction of branches involving small loops; once again, Intel was purposely vague about exactly what Dothan does that Banias did not, but just know that Dothan has better overall branch predictor performance, thanks to modifications like improved detection of short loops.

Faster Integer Division

When moving to a small manufacturing process, it's often possible to include logic that didn't make the cut originally due to space constraints, such is the case with Dothan and its integer division performance. Once again, all we know is that integer division is faster on Dothan, but no idea how fast or why.

Enhanced Register Access Mananger

As we mentioned at the beginning of this article, much of what went into Dothan were tweaks to Banias that couldn't be implemented without pushing the design completion date further out. One such fix that didn't make it to Banias was a workaround to a register access issue that caused the entire pipeline to stall in Banias. The situation was a unique one, where a partial register write followed by a full register read would cause the pipeline to stall. Dothan features a workaround for the problem and there is no longer a performance penalty for performing a partial register access followed by a full register access.

Post Your Comment

28 Comments

#3 I agree. Banias is a better chip. It would be nice to see Banias at 0.09 with 1MB cache, would be smaller, cheaper and a lot more chips per waffer, but Intel isn't interested in these yet, at least maybe a Celeron line when Banias phased out.

Isn't Ati 9100 chipset compatible with Banias and P4 compatible? A bios change or something more wouldn’t do the trick?Reply

Interesting read. Some comments though: the Dothan has a HUGE L2 cache, which people, in a thread over at Ace's, suggest gives it a large edge in many applications (there were complaints that it excels in SpecInt simply because of this, and with very large datasets, performance rapidly tails off). Nothing wrong with that, but it might explain why the Dothan has issues with media-encoding and the like, where the volume of data is so large that the size of the L2 cache becomes less important.

Also, the test was a little bit of comparing apples to oranges. I see why this was done: to try and give a laptop-like playing field. But Dothan is almost certainly highly optimised to run with, say, single channel, slow RAM. By forcing this on Athlon64 and Pentium 4 desktops, which are optimised for fatter memory channels, you are slightly crippling performance. As such, it's probably a fair test for laptop performance, but probably doesn't indicate how a Dothan-like desktop chip would hold up. This might explain how well it holds its own against the Athlon64 and beats the P4 in many tests.

Just a question... I thought the new sucessor to the prescott was going to be the derivative of the dothan -- eg merging back the mobile and desktop solutions? I'm wrong right? So what exactly are they going to replace prescott with?Reply

Excellent chip. However, it's bloody expensive. At $637 it is exactly the same price as a 3.6GHz Prescott 560 or right between Athlon 64 3500+ and 3700+, so it's not a good choice for the desktop.

Also Anand's comment "...it's faster and uses less power than Banias" is not quite accurate.

Under full CPU load, yes this is certainly true but, as you'd expect from 90nm, the leakage power has shot right up, meaning that in its low power states, the CPU is draining a great deal more power than Banias. How much time does a laptop spend idling relative to flat out? My guess: quite a bit. I'd still choose a Banias in my laptop for that reason alone.

Still good article, and I'd love (from a purely academic point of view) to see what this baby could do when coupled up with a dual-channel memory interface and a good desktop chipset!Reply

Probably the best heat vs. performance processor out there, at least for x86. Why Intel is dumb to shove Prescotts which use 5x more power for the same performance is beyond me; I would get this for a desktop quicklike.

Of course, we have Intel's TDP instead of what the processor may acutally put out on worst case conditions. That and we don't know what the Athlon 64 at 90nm will put out, at least at 2.0ghz, since all they are doing is a few tweaks to the core (isn't it smaller than 100mm?) That and I guess if you really meant unpatented, that was what to make sure no one really knows why it's so great?Reply