Skylake bug causes Intel chips to freeze under ‘complex workloads’

This site may earn affiliate commissions from the links on this page. Terms of use.

Intel has disclosed that its sixth-generation Core products (known as Skylake) suffer from a CPU bug that can cause a system to hang. The company has only publicly identified one application family that causes it, Prime95.

The Prime95 thread on Skylake instability dates back to early December, when testers noted that running the 768K test on the latest Intel processors would cause the application to fail — sometimes within minutes, sometimes only after hours. The forum users collectively worked through the usual suspects and double-checked RAM, motherboard vendors, voltage levels, clock speeds, Prime95 software versions, and whether the CPU was overclocked or not.

Disabling Hyper-Threading apparently fixes the problem (based on user reports), but none of the other variables had a measurable impact on the issue. If you run Prime95 on a Skylake CPU with the maximum number of threads available on the processor with the “CpuSupportsFMA3=0” (which forces the use of AVX) at the 768 FFT size, the system will eventually crash.

Hello All,Intel has identified an issue that potentially affects the 6th Gen Intel® Core™ family of products. This issue only occurs under certain complex workload conditions, like those that may be encountered when running applications like Prime95. In those cases, the processor may hang or cause unpredictable system behavior. Intel has identified and released a fix and is working with external business partners to get the fix deployed through BIOS.

It’s not clear yet what the fix will be, or if it will require end users to avoid certain code paths or features when testing processors. Niche cases like this can have enormous impacts on companies — in the early 1990s, Intel’s Pentium processors suffered what became known as the FDIV bug. The chip’s worked perfectly in the vast majority of cases, but would return an incorrect value in specific floating-point cases. Specifically, the returned values were incorrect by roughly 0.000061.

Nonetheless, the bug caused serious headaches for Intel. The company took a hammering in the press and a charge of $475 million against earnings to resolve the problem. Since then, we’ve seen a number of high-profile errors — AMD has its TLB bug with the original Phenom, Intel’s first iteration of TSX (Transactional Synchronization Extensions) were disabled via microcode update. There’s a bug in Intel’s VM implementation that can allow a guest VM to fault in a way that traps the CPU in an infinite loop.

Intel turned some of the flawed Pentium chips into keychains.

We think of processors as essentially flawless devices that “just work,” but reality tells a different story. Check out Intel’s list of errata in Haswell — there’s a five-page list of flaws and issues, virtually all of which are labeled as “No fix.” The solution, in the majority of cases, is “Don’t do it like that.” AMD chips aren’t immune from these kinds of issues by any means, but there’s been less hammering on AMD chips since they don’t have the enterprise market share they used to command.

Sometimes bugs are disclosed, sometimes they aren’t — Piledriver has a significant problem with 256-bit AVX instructions, for example, that injects an 18-20 cycle delay into executing multiple consecutive instructions. Every original Intel Atom (before Bay Trail) had a floating point flaw that could insert a NOP (no operation) into every other cycle, effectively doubling FPU compute time. No one bought an Atom for its FPU performance, so the bug didn’t get talked about.

We’ll have to wait and see what Intel’s solution for this problem is. The simplest way to fix it might be to tell the CPU to avoid using AVX in specific instances, but the FDIV bug demonstrated that users often demand 100% compatible CPUs — even if they aren’t using the functions that actually trigger a bug. The problem is, as CPUs add more features and capabilities, it takes longer and longer to adequately test those functions.

Tagged In

This site may earn affiliate commissions from the links on this page. Terms of use.

ExtremeTech Newsletter

Subscribe Today to get the latest ExtremeTech news delivered right to your inbox.

Email

This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our
Terms of Use and
Privacy Policy. You may unsubscribe from the newsletter at any time.