Process vanishing randomly

This is a discussion on Process vanishing randomly within the Windows Programming forums, part of the Platform Specific Boards category; I have a server application that just disappears at random intervals. The process will vanish and the eventlog shows a ...

That's an Access Violation (0xc0000005) at some point. I cannot debug it, because it will fail randomly (sometimes while a client is connected, sometimes while connecting, sometimes while a job is running and sometimes even when idle) and it will not show up in any of my catch blocks.

We have about 30 of those servers running and in a day, about 5 will fail.

"mfc100!Ordinal2014" probably means that the error happened in the 10.0 version of an mfc dll at the ordinal number 2014. But without a function name, I have no idea how to find out what was called. My GoogleFu let me down, no hits for Ordinal 2014.

I'm downloading the windows sdk right now, but if anyone has additional hints how to find out why my process is swallowed by eternal darkness every few hours, I'd be very happy

I'd be checking the code surrounding the MFC calls for buffer overruns, uninitialize variables, failures in malloc() and fopen(), memory leaks, handle leaks etc. If the problem you're experiencing is unique enough to not be mentioned on google it is very likely a problem in your code not the MFC code that's causing it.

The process just vanishing is because somebody's called SetErrorMode(SEM_NOGPFAULTERRORBOX) or the thread equivalent on Windows 7 somewhere.

As for not catching access violation exceptions, you need to use Structured Exception Handling (Windows)
Standard c++ try/catch will not catch them.

That depends on the compilation flags, catch(...) will catch SEH exceptions if compiled with /EHa in Visual Studio. There's no denying SEH's __except is better for them though, since you can actually get at the exception information.

Thanks for the help. I found out that /ESa somehow disappeared when we converted the project to 2010, so that will be in the next version. I'm about to install the debugging tools to our production servers and attaching a crash-debugger to all processes so I will at least get a full dump when another one crashes.

Okay, I'm back. And more confused than ever. With various crashdumps I came to the following conclusion: the application crashes, when it tries to handle a WM_NCPAINT Message with the parameters 1 and 0, where the 1 is supposed to be a handle to a REGION to be painted and the second parameter is not used. Although we do use the message loop for custom windows messages, I neither sent WM_NCPAINT, nor do I handle it. It enters my switch-Block, is identified as "not one of my messages" and passed along to the original window handler, which then crashes. I guess "1" is not a valid handle, at least it looks fishy. But what can I do? Why would a WM_NCPAINT message appear out of nowhere with invalid parameters?

None of our custom messages is using the value of WM_NCPAINT (133). All SendMessage calls we do use our defined values. No math or typos. I could handle (ignore) WM_NCPAINT alltogether, but I guess my window would look like... nothing. Any way to find out what's wrong or any ideas what I could check?

A handle of 1 is a valid parameter for WM_NCPAINT, it means the entire non-client window area. Or at least it used to mean that. This seems to have been removed from the documentation for some reason. But even if the handle is invalid it shouldn't cause a crash in the default window procedure, I just tested this in both an MFC and a normal win32 app by repeatedly posting WM_NCPAINT with random numbers as wparams.
Have you examined the exact point in DefWindowProc() where it crashes? It might be hard since you don't have the source for it, but it might give some clues as to why it crashes.