Last year I reported on a bug in 64-bit Windows 7 SP1’s support for AVX-capable processors. This bug causes stack corruption when a 32-bit program crashes while being debugged in Visual Studio, even if AVX is not used.

Microsoft has a fix, but they will only ship it for Windows 7 if there is enough demand.

So this is your chance. Comment honestly on whether this bug affects you. Note that for this bug to be triggered it is sufficient to have an AVX capable processor – you don’t have to be doing AVX programming.

Bug details

The bug is in the AVX support added to Windows 7 SP1. Saving the state of the AVX registers requires additional space, and apparently the WoW64 (32-bit Windows on 64-bit Windows) debug support fails to reserve enough space, so the stack gets corrupted. Oops.

In my sample test program I have a Crash() function which can be invoked by selecting “Crash normally” from the file menu. It seems reasonable, especially in a debug build, that crashing in this function should give a nice helpful call stack like this:

That used to be what would happen. But no longer. On 64-bit Windows SP1 on AVX processors when debugging 32-bit C++ code with any version of Visual Studio you will probably see something like this:

The most common signature of this bug is seeing ntdll.dll!_ZwRaiseException on the call stack, typically twice.

The first call stack makes the bug trivial to diagnose. The second call stack… it doesn’t even show the location of the crash, and it lists three functions that aren’t really on the crash call stack. At least it lists the parent function this time – but don’t count on that.

Clearly the corrupted stack can make crash analysis a lot trickier. Depending on the stack layout the corruption may hit multiple stack frames, including the local variables contained within them.

Luckily this bug does not seem to affect minidump files saved by exception handlers, so post-mortem debugging seems to be unaffected.

Take action now

The bug is well understood, and Microsoft really just wants to know whether it’s worth the cost and risk of fixing it. So let them know. Remember that this bug requires 64-bit Windows 7 SP1, an AVX capable processor, and 32-bit development. If you’re running a 32-bit OS (really?), or don’t have an AVX capable processor, or you’re doing 64-bit development then you are immune. You’re also immune if you are running Windows 8 (it’s fixed there), Windows Vista (no AVX support), Linux, or MacOS.

If you have noticed this bug then say so in a comment below.

If you have not noticed this bug then maybe download the test program and see if you can repro it. Share your experiences either way.

If you think this is a complete waste of time, perhaps because you have already moved on to Windows 8, Linux, or MacOS, then let us know.

I prefer comments here, but commenting on reddit works also. Whatever is easiest.

Workarounds

While waiting for Microsoft to respond there are two workarounds available, each with its own downsides:

Change Visual Studio solution settings

The stack corruption happens in the first-chance exception handler. You can tell Visual Studio to halt in the debugger before running this, thus giving you a chance to see the crash details before they are corrupted. To do this go to the Visual Studio ‘Debug’ menu and select ‘Exceptions’. In the dialog that comes up check Win32 Exceptions.

One problem with this workaround is that this must be done for every Visual Studio solution. Also, this workaround doesn’t help if a process crashes and then the just-in-time debugger attaches. The stack will already be corrupted before you attach.

Disable AVX

The other workaround is to disable AVX support. You can do that by running this command from an elevated command prompt and then rebooting:

bcdedit /set xsavedisable 1

The obvious disadvantage is that you no longer have AVX support – if you implement AVX detection properly then it will be detected as no longer available. I don’t like this solution, but given the number of different projects that I work on, and the importance of just-in-time debugging, I had no choice but to do this. If Microsoft ever fixes this bug then you can remove the workaround by running this command and then rebooting:

bcdedit /set xsavedisable 0

You can see your current bcdedit settings by running bcdedit with no parameters from an elevated command prompt. If xsavedisable is present in the output and has a non-zero value then the buggy code in Windows is disabled.

I recommend getting your IT department to push the bcdedit command to all developer machines, or to all machines. It’s the only way to solve the problem until Microsoft fixes it.

Why a blog vote?

I tried creating an issue at connect.microsoft.com but that site doesn’t seem to support Windows bugs. A suggestion I made for Visual Studio that would have mitigated this bug was marked private, thus shutting down voting. So I’m posting here. And I promise that Microsoft will at least take a look.

Credit where credit is due

Share this:

Like this:

LikeLoading...

Related

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs.
I also unicycle. And play (ice) hockey. And juggle.

84 Responses to Should This Windows 7 Bug be Fixed?

Reproducible on my i5-3570K. I can understand Microsoft’s reluctance in patching everyone’s WoW64, so maybe skip Windows Update, but they should release this as a hotfix at the very least, so those of us with the need and knowledge can apply the fix.

I’ve tested this bug on my laptop which is a core I7 running Win7 64bit. It does indeed crash with an unreadable stack just as Bruce describes.

I would very much like to see 32-bit software development continue to be supported. There are many instances where the traditional programming models are useful; it would be a shame to be forced to cut back on 32-bit support.

I’ve experienced this bug, along with colleagues. As I use SEH in development as well as AVX the workarounds are not that useful. Microsoft need to fix this.

I’ve not tried a crash dump handler like Breakpad yet on Win 7 to see if this issue prevents us from getting proper call stack information from customers – if it does this is an even more serious issue.

I spent weeks after getting a new computer at work trying to chase crashes with useless stacks. It took debugging back to the stone age, and I didn’t understand why.

When I finally had enough and decided to dig into the problem, I noticed the wrong stack pointer after returning from the exception handling code. With that in hand, a bit of googling took me to your previous post on this matter and the workaround to disable AVX, and life was good again.

But obviously that is just a workaround, and it needs to be fixed. Please, Microsoft, remember: it’s all about developers, developers, developers.

Been bit by this too many times (and sometimes you can’t get another repro of the crash for a while) and it’s a definite must fix IMO (I’ve seen this on other platform than PC and it’s really cumbersome)

As a middleware developer, I need to support a wide variety of systems, and that definitely includes shipping 32-bit executables on Windows 7 SP1.
That bug needs to get fixed. It’s hindering development.

Disabling AVX is not an option if we also want to have optimized AVX code paths!

I did experience the crash too and had troubles for days before finding the workaround you described. And I blamed myself thinking “what morron I was to deactivate that”. I never suspected that it was a bug, nor due to a windows update. With time I stumbled on the problem again and I have to re-enable exception Break like you describe every once in a while. Would be nice to have a fix.

All of the above. It is an OS bug, not a debugger bug. The bug can actually be triggered with windbg (or any other debugger) as well. The first-chance exception handling in any debugger will corrupt the stack.

Wow, I have actually had this error before, I always found the cause however it would have been helpful if I was given the line! I always blamed my code when the debugger failed to pick up the location of errors.

So that’s what happened!
I develop in 32 bit because it’s the only way to get edit and continue to work and the startup times of the applications I’m working on are frustratingly long, so indeed it’d be nice to have a fix for this.

I only recently stumbled on this one, since most of my work is prototyped in 64 bit mode and then built in 32 bit when it’s pretty much bug free. A current personal project was only 32 bit only code and then I spent a few hours wondering why Visual Studio was dying. Yes, this must be fixed. I’m rather shocked this wasn’t considered a class #1 bug due to its nature. :(

No doubts,
correction is needed definitely.
This issue makes debugging much longer. If you don’t know which process will crash, you don’t know to which process attach in advance and you need to relay only on post-mortem debugging. But on W7 you can’t :/

My main development machine is a Win7 64-bit SP1 one. But it seems the Core i7 cpu is old enough to not have AVX support, so I couldn’t reproduce this with the test program. My normal development is still 32-bit so if I would have a newer CPU I would be certainly annoyed about this, and would hope for a fix. (However, next time I upgrade hardware, it’s most likely I’ll also switch to newer version of Windows, so I’ll be probably skipping over this issue.)

I hit this all the time and it drove me mad until I figured out it was AVX related and implemented a workaround. For the hours of productivity I’ve wasted on this, It amazes me that MS hasn’t shipped a fix for it.

At the company I work at, 8 out of 11 development machines are running Windows 7 SP1 x64 on an AVX supporting i7, developing 32bit Software.

Now that we know about the issue, we can disable AVX at work. At home, I am not willing to disable AVX as I use my computer for a lot of other things than just coding, most of them benefitting from AVX.

The worst thing about this issue is the huge amount of people not knowing about it. Hard to imagine how much productivity might be crushed by this bug just at this very moment.

We don’t hit this often during development, and I think it’s because we have our own unhandled exception filter. I’m surprised it’s working though, because I would expect the same broken pieces are being used. In particular, we use LPEXCEPTION_POINTERS::ContextRecord and I would expect that to fall victim to this bug. There’s something that’s going on there that hides this bug most of the time.

After our filter has gathered the info and created a crash dump, the application will usually keep on throwing STATUS_WAIT_0 exceptions. Maybe that’s because of this stack corruption? That is super annoying because the app needs to be force killed through task manager. This is a pain for some of the devs, because they won’t realize they need to do that. They wind up frustrated and confused with several ghost processes running on their machine taking up time and stopping them from rebuilding the exe.

I have run into the bug many times when the solution wasn’t set to trap on the exception.

Microsoft, I or my fellows here could be writing the NEXT big app, so tell me what is the risk assessment on losing that to a non-windows platform vs. patching this bug? With a platform that isn’t the be-all it used to, the one area Microsoft has a generally stronger position is developer tools… or does it?

This bug cost a lot of my time developing a Windows game. Please fix this for all those poor souls out there who don’t realize (like I didn’t for a very long time) that this isn’t a problem in their codebase but a problem in the OS.

That hot fix is for a different issue. The bug that I am hoping that they will fix is stack corruption when a crash occurs (on 64-bit Windows 7 SP1 on an AVX capable system while debugging 32-bit code).

I originally blogged about two issues in one article, which may have caused some confusion. The original article is here:

I am aware it’s not for the stack corruption issue, I just wanted to point out that for the issue it DOES fix, it needs to also be enable via the registry setting, not just installed (Which isn’t possible if you already have an OS version that it was rolled into, so many people assume the KB article isn’t applicable at all.). And since you said you weren’t sure what it did to change the behavior that hotfix addresses – the registry key name itself gives us at least a hint on what it does to do so. :) (As does this stackoverflow entry: http://stackoverflow.com/questions/11376795/why-cant-64-bit-windows-unwind-user-kernel-user-exceptions )

It seems this hotfix is for non-SP1 Windows 7. Googling number 976038 shows that this fix is from year 2011, so it definitely doesn’t resolve this.
Anyway – I tried, but I got “The update is not applicable to your computer.” for my Windows 7 SP1 x64.

The hotfix does not just require installation but also enabling via a registry setting – I assume this hotfix was rolled into SP1, but you still need to enable it via the registry setting specified in the KB!

And I forgot one thing: For this to work as you’d expect it to, you probably also want to have the DisablePagingExecutive registry setting enabled. (Which you should already have enabled anyway as a developer.)

That hot fix is for a separate issue. The stack corruption bug which this post discusses was introduced in Windows 7 SP1 so it could hardly have been fixed in a hot-fix that was released before then. However a hot-fix for the stack corruption has finally been released — see this post:

I hit this all the time, in multiple projects. it must have lost me so many hours (that have been repaid in grey hairs), and I’m sure I’m not alone. Great to have a workaround – a proper fix would help everyone who hits this but isn’t fortunate enough to know about the workarounds.

It never even occurred to me that there might be a bug, I just thought Visual Studio was mostly useless when it came to debugging crashes. Talking to people around the office, that’s what they thought too. Please fix!

I just tried this on my machine with x9000 running on windows 7 professional 64 bit and vs 2012.

I believe Microsoft has enough cash and its a small risk to fix something like this. I cant say that im as experienced as some programmers that did comment but i believe we as Developers gave something to Microsoft and they should respond in a better way.
Since they have solution it makes no sense ( at the end we pay for a good support and a good product with features and being bug-free).

Unfortunately, no. That hot-fix adds AVX support to VS 2010, which is important if you are trying to debug AVX code. The issue discussed in this post is a Windows 7 SP1 bug that corrupts the stack when your code crashes if you have an AVX processor, regardless of whether you are using AVX. There is no way that a patch in VS 2010 can undo the stack corruption which the OS causes. Only an OS fix can correct this bug.