The launch of Bulldozer in October wasn't exactly a success for AMD. In our review, Anand ended up recommending the Intel i5-2500K over AMD FX-8150. One of the reasons behind the poor performance of Bulldozer is its unique design: each Bulldozer module consists of two integer and one floating point core. Todays operating systems don't know how to optimally schedule threads for this design and as a result, the full potential of Bulldozer has not been achieved. Microsoft has released a hotfix for Windows 7 and Server 2008 R2 that should increase the performance of Bulldozer.

Let's look at the problem to see what happened and how the hotfix helps address it. Before the update, Windows didn't know how to ideally schedule threads on Bulldozer. Essentially, it didn't know when it was good to place threads on single module versus multiple modules.

The picture above explains this pretty well. Before the update, Windows more or less randomly placed the threads which meant many modules were unnecessarily active at the same time. This capped the maximum Turbo speeds because those can only be achieved when some of the modules are inactive (power gated).

VR-Zone is claiming that Windows sees one Bulldozer module as a single multi-threaded core, similar to an Intel Hyper-Threading core. Basically, your 8-core FX-8150 is seen as a quad-core, 8-thread CPU—just like Intel's i7-2600K for instance. This goes against AMD's design and marketing because Bulldozer is closer to an 8-core CPU.

We have not yet tested Bulldozer with the hotfix, but don't expect miracles as Microsoft is suggesting a 2-7% increase. Better scheduling for the Bulldozer CPUs will improve performance a bit, but not enough to close the gap in many scenarios. Windows 8 already has the new thread scheduler, and according to AMD's own and third party tests the performance increase is up to around 10%, but Bulldozer needs a lot more than 10% to surpass Sandy Bridge.

Update: VR-Zone reports (and we can confirm) that the download link for the hotfix is no longer functional. There were apparently unexpected performance drops in some cases after applying the hotfix and Microsoft is investigating the issues. Modifying the scheduler in Windows is not something to be done lightly, as it changes a core element of the OS, so more testing and validation for such updates is always a good idea.

Update 2: Apparently there is a second part to the hotfix that was not pushed live, and this hotfix was pushed live prematurely.

Each BullDozer shares the FP (floating point) and Instruction Decode units between the cores on the module.

The consequence of this is that when 2 threads are scheduled on a single module, the threads are competing for resources... and thus, depending on what they are doing, may run slower when scheduled on a single module, than when scheduled on separate modules.

Offsetting this is the competing tension that when fewer modules are active, turbo mode kicks in and ups the clock speed

The point is there's this tension between "maximizing clock speed" vs. "maximizing Instructions per clock cycle".

If a given pair of threads are collectively bottlenecked by "Instructions per Clock Cycle" (because the penalty they incur by competing for resources, exceed the benefit they would receive from higher clock speed), then those threads will run faster when run on separate modules, despite the lower overall clock speed.

The point is the choice between scheduling a task to a separate module or on the same module, is potentially task dependent.

This is very very very nuanced.... and tricky for something as low level as the task scheduler to get right, without additional hints........ so it will be interesting to see if without compiler support Microsoft can come up with a one-size-fits-all solution in the scheduler.Reply

The thing that gets me about this is that AMD should have waited until MSFT had the patch ready before releasing the processors, AND they should have included an update disk with each CPU so that you could get maximum performance from the processor even if you were not connected to a network. Reply

you do realize that AMD CPUs run on OS other than MS Windows. So for them to hold up an entire product launch because of Windows doesn't make sense especially on the server side of things,

As far as Desktops are concerned, although the CPUs didn't run at full potential without this updated scheduler, it did run good enough that it made sense to release the product. AMD just can't sit on their hands waiting for MS to finish the scheduler optimizations before releasing their product. They knew the update was coming soon so they decided to release bulldozer.

Also if you for one second think companies don't release products until they have fully optimized drivers, etc. you need to wake up. This is fairly standard practice. This is not limited to AMD CPUs, just look at the GPU industry for starters. Intel released drivers that take advantage of it's new architecture in its Sandy Bridge iGPUs after it's hardware release.Reply

Uhhh, wrong. Where in the x86 world have you been?? Every piece of hardware released as an x86 processor must run legacy x86. Hence, new processors should run existing software just as fast as previous hardware.

"new processors should run existing software just as fast as previous hardware"

Hrm.. So P4 Hyper-Threading was a complete failure too? There were lots of applications notably for desktops (games were a big one) that were noticeably slower with Hyper-Threading enabled.

How about when multi-core processors came out with slightly lower clocks than the single core siblings and they were slower at single threaded applications than multi-threaded? Was that a big fail?

Or how about when applications would refer to the RTC in multi-core processor systems rather than the OS Clock? Some would glitch and freeze, some would return incorrect results, some would crash.

Or how about applications that would just flat out crash with SpeedStep?

Heck pretty much all the code from the previous generation is supported with bulldozer, whether it will get optimized or not is still a good question. Windows is only a part of the problem from a performance perspective. I would recommend taking a broader look at the situation.

Software/data necessitates hardware yes, but software updates tend to follow hardware for support. Whether its a driver, application update, or operating system update it varies depending on the situation.

Don't get me wrong, the product doesn't impress me, I will wait until Piledriver before making a decision between it and Ivy Bridge but your argument is clearly flawed looking at history.Reply

People buying Bulldozer based Zambezi CPUs do not want Intel CPUs. The hotfix is a free small performance enhancement. It is not intended to make the FX CPUs as fast as Intel's CPUs. It's intended to better use the available processing power of the Bulldozer architecture. People are voting with their wallet and FX CPUs are selling very well.Reply