DragonFly BSD developer stung by Opteron bug

Matthew Dillon, the lead developer behind the DragonFly BSD fork of the open source FreeBSD Unix variant, had some issues with crashes on Opteron-based systems running his operating system for more than a year - and now Advanced Micro Devices says it's a bug in earlier generations of Opteron processors.

The errata does not affect current Opteron 4200 and 6200 processors or the impending Opteron 3200 chips, all of which have a different microarchitecture based on the "Bulldozer" cores.

Dillon described the crashes he was seeing with Dragonfly BSD in a posting on his kernel list on Christmas Day, and said he had been bug hunting for more than a year.

The problem occurs with the cc1 C compiler in the open-source GNU gcc 4.4.7 compiler set. Dillon ran tests on Opteron and Phenon II machines as well as on Sandy Bridge Xeon E3 processors from Intel, and said he didn't see the issue on Intel iron and had discounted the OS as the source of the problem causing the crashes. On Monday, in another post, Dillon said that after reviewing his issues, AMD confirmed the bug and said that doing certain kinds of loop operations, the processor can incorrectly update some Opteron stack pointers.

AMD provided El Reg with the following statement about the bug:

A program exception has been identified in previous generations of the AMD Opteron processor that occurs in certain environments that leverage a very specific GCC compiler build. A workaround has been identified for the small segment of customers this could potentially impact.

While there are millions of these processors in the field, no other cases have been reported.

To see this observation multiple events needed to happen concurrently and required a certain BSD-derivative environment (BSD is based on the Unix operating system) that uses a unique GCC compiler build.

This erratum CANNOT occur on AMD Opteron 3200, 4200 and 6200 ("Valencia" and "Interlagos") Series processors since it utilizes a different microarchitecture.

And finally and even more important for readers to understand, AMD and Intel post errata updates on a regular basis, the difference here is a developer with a blog and unique GCC compiler build uncovered it.

In the wake of that statement, AMD told Dillon that it would be updating its revision guides for its 10h and 12h processors to document this erratum, which has been given issue #721. At that time, AMD will provide a model-specific workaround to the issue, it said. ®