AMD quad-core 'erratum' creates problems for early adopters

A "design and process tuning step" is being blamed for the first quad-core AMD processors at 2.4 GHz being shipped with a BIOS fix, containing a workaround for what now appears to be a serious erratum.

An AMD spokesperson told BetaNews this afternoon that the first wave of its quad-core Opteron server CPUs and Phenom desktop CPUs were shipped on the November 19 launch date with a known erratum -- a documented bug. Customers received CPUs along with a BIOS fix that includes a workaround.

"Respective to Quad-Core AMD Opteron," AMD's Phil Hughes told BetaNews, "we are only shipping processors earmarked for specific end-user installations where customers have had the opportunity to validate the stability and robustness of the solution where it leverages the BIOS fix or some other potential software workarounds. Quad Core AMD Opteron processors shipping for general availability in the Q108 timeframe will not have this erratum."

Hughes acknowledged shipment delays for the latest wave of Barcelona architecture CPUs, but reiterated earlier comments made by AMD executives that the company continues to plan to ship "hundreds of thousands of quad-core processors" during the fourth quarter of the year...a quarter that is now three weeks from being over.

The problem, Hughes confirmed, was first brought to light on Monday by The Tech Report, and concerns a critical element of the CPU called the transaction lookaside buffer (TLB). Here's what it does: In modern computer environments, each application thinks it has the entire memory space all to itself. Virtual memory enables this application to "think" it has 4 GB of unencumbered address space, while the CPU is busy translating the addresses the application presents into real addresses in physical memory.

To do that, the operating system presents the CPU with a virtual memory handler. But modern CPUs bypass this handler, using a faster local cache where it can buffer translated addresses in advance. This is the TLB, which for AMD quad-cores is located in the L3 cache. The TLB is designed for failover, so in the event of a "TLB miss" (when the table doesn't translate correctly), the OS virtual memory handler is called in for backup.

Apparently, a certain level of stress which AMD's own engineers did not test for, causes more TLB misses than usual. And though AMD has yet to say so explicitly, it would appear its BIOS fix could be turning off the TLB altogether...which could explain what many hardware enthusiast sites are reporting to be a 10% performance hit compared to unpatched BIOS.

Despite the apparent setback, Hughes said his company is sticking by its projections of shipping "hundreds of thousands" of quad-core processors this quarter.

"Quad Core AMD Opteron processor is the most advanced x86 processor ever introduced to the market," he stated, "and as such there are design and process tuning steps that have taken longer than expected."