Much of AMD's bad luck over the last three months revolves around a nasty bug it just can't shake

Erratum, to those in the hardware or software industry, is a nice way of saying "we missed a test case" during development and design.

Yesterday, The Tech Report confirmed AMD's iteration of Intel's F00F bug. The bug, which has been documented since at least early November, can cause a deadlock during recursive or nested cache writes.

How does the TLB erratum occur? All AMD quad-core processors utilize a shared L3 cache. In instances where the software uses nested memory pages, this processor will experience a race condition.

AMD's desktop product marketing manager Michael Saucier describes a race condition as a series of events "where the other guy wins who isn't supposed to win."

In the software world, a typical memory race condition occurs when the memory arbiter is instructed to overwrite an older block of memory, but write the old block of memory to somewhere else in cache. In the instance where two arbiters follow this same rule set, its easy to see how a race condition can occur: both arbiters attempt to overwrite the same blocks of information, resulting in a deadlock.

From what AMD engineers would tell DailyTech, this example is very similar to what occurs with nested memory pages in virtualized machines on these K10 processors.

AMD partners tell DailyTech that all bulk Barcelona shipments have been halted pending application screening based on the customer. Cray, for example, was allowed its latest allocation for machines that will not use these nested virtualization techniques. Other AMD corporate customers were told to use Revision F3 (K8) processors in the meantime.

The TLB erratum will be fixed in the B3 stepping of all AMD quad-core processors, including Phenom and Barcelona. However, AMD considers the B3 stepping a "March" item on its 2008 roadmap. Processors shipped between then and now will still carry the TLB bug, though with the BIOS workaround these machines will not experience a lockup.

AMD's latest roadmap hints that its tri-core processors are merely
quad-core processors with one core disabled. The company also indicated
that it will introduce some of these tri-core processors with the L3
cache disabled. Removing the shared-L3 cache from the chip design
eliminates the TLB bug.