Using the powers of the debugger to solve the problems of the world - and a bag of chips
by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

Tess Ferrandez

I work as a developer evangelist at Microsoft, and my job is to help
developers make the most of their skills on the MS stack.

In this blog I share tips on anything from debugging and troubleshooting to
development on platforms like Windows, Web, Windows Phone and Kinect. And also some random
tidbits about computing and my life at MS.

.NET Debugging Demos Lab 2: Crash

.NET Debugging Demos Lab 2: Crash

It was nice to see that so many people downloaded the demo site already and checked out the lab instructions for the first lab, and thanks to Pedro for pointing out that the original demo site required .NET Framework 3.5... I've changed it now so the one that you can download from the setup instructions page should not require .Net Framework 3.5. (Even though I would encourage you to download 3.5 and play around with it anyways:))

2. Click on the Refresh button in the reviews page. This will crash the w3wp.exe process (or aspnet_wp.exe on IIS 5)

Note: If you have Visual Studio installed a Just-In-Time Debugger message may pop up (just click no for the purposes of this excercise). However since this message box will sit there and wait for user input in order to shut down the app you may want to disable JIT debugging if you have visual studio installed on a test system.

Examine the eventlogs

1. Open the Application and System eventlogs, the information in the eventlogs will differ based on the OS and IIS version you are running. Among other events you may have a System Event looking something like this...

2. Open a command prompt and move to the debuggers directory and type in "adplus -crash -pn w3wp.exe" and hit enter

Q: A new window should appear on the toolbar, what is it?

Q: What is the debugger waiting for? Hint: Check the help files for ADplus/crash mode in windbg

3. Reproduce the issue by clicking on the refresh button in the reviews page.

Q: What files got created in the dump folder? Note: The dump folder will be located under your debuggers directory with the name crash_mode and todays date and time

Open the dump in windbg

1. Open the dump file labeled 2nd Chance CLR Exception in windbg (file/open crash dump). Note that this dump got created just before the 1st chance process shutdown.

Note: if you throw an exception (.net or other) you have a chance to handle it in a try/catch block. The first time it is thrown it becomes a 1st chance exception and is non-fatal. If you don't handle the exception it will become a 2nd chance exception (unhandled exception) and any 2nd chance exceptions will terminate the process.

In a crash dump, hte active thread is the one that caused the exceptions (since the dump is triggered on an exception).

Q: Which thread is active when you open the dump? Hint: check the command bar at the bottom of the windbg window.

Examine the callstacks and the exception

1. Examine the native and managed callstacks.

kb 2000!clrstack

Q: What type of thread is it?

Q: What is this thread doing?

2. Examine the exception thrown

!pe

Note: !pe/!PrintException will print out the current exception being thrown on this stack if no parameters are given

Q: What type of exception is it?

Note: In some cases, like this one where the exception has been rethrown, the original stacktrace may not be available in the exception. In cases like this you may get more information if you find the original exception

3. Look at the objects on the stack to find the address of the original exception

!dso

Q: What is the address of the original exception

Hint: Look at your previous pe output to see the address of the rethrown exception. Compare this to the addresses of the objects on the stack. You should have multiple exceptions, a few with the rethrown exception address but one of the bottommost exceptions will be the original one (look for one with a different address).

4. Print out the original exception and look at the information and the callstack

!pe <original exception address>

Q: In what method is the exception thrown?

Q: What object is being finalized?

Note: you could actually have gotten this information by dumping out the _exceptionMethodString of the rethrown exception as well, but with !pe of the original exception you get the information in a cleaner way.

Q: Normally exceptions thrown in ASP.NET are handled with the global exception handler and an error page is shown to the user. Why did this not occurr here? Why did it cause a crash?

Examine the code for verification

1. Open Review.cs to find the destructor/finalizer for the Review class

Q: which line or method could have caused the exception

As an extra excercise you can also examine the disassembly of the function to try to pinpoint better where in the function the exception is caused

Although i probably wouldn't recommend running 200 apps per app pool because of how much memory usage there would be per process (likely OOMs just because of the dlls loaded alone) your question is very valid.

The finalizer thread is common to all apps in the process but for all other threads you can check out the threads in !threads and check which appdomain the code is running in by running !dumpdomain on the domain in the domain column.

It's obvious, that the 100% CPU utilization is a problem (that has been solved already). My question is if and how the Threadpool used the current CPU utilization for scheduling WorkItems or to control how and if new Workerthreads are created (the Threadpool is primary used for async Socket Operations (HttpListener) within this project).

It seems like no new threads are started, even if the current number of Threadpool Threads is below the Max-Threads (what could makes sence because there would be no resources available for the new thread).

So ... how can the values (Total, Running, Idle, MaxLimit and MinLimit) be interpreted?

First of all: No. This is not related to this lab. I just read the Review posting where you said that questions - even not directly related to the lab - are welcome.

Thank you for sharing this information! The 80% threshold is hardcoded or configurable?

Maybe you can also explain the Completion Port Thread values (Total, Free, MaxFree, CurrentLimit, MaxLimit and MinLimit) too? I've already search over and over the web but doesn't find anything about them.

Its totally cool to ask questions not related to the lab:) just wanted to make sure that the lab didnt behave like that on your machine.

The completion port threads are pretty much the same. Completion ports are mostly used for callbacks but can be used for work items too if there are available completion port threads but no available worker threads.

The 80% is hard coded but in reality there is no use to change it since you really can't do much with new threads at that CPU level anyways.

Andrew Lomakin

14 Feb 2008 5:43 AM

Hello Tess!

Great post, gives very important knowledge needed for newbie .NET crash analyst :)

I have a situation where DFS management snapin is crashing due to a null-reference exception (0x80004003), and i've gone a long way to identify at what level the exception occurs (let me know if you're curious enough to look at the dump - you should have my email somewhere hopefully). Eventually i came to the clr thread stack where the exception occurs, but i'm stuck, because i want to observe what parameters are passed to each funciton in the stack, but i went through your blog posts, and Johan's, and i can't seem to find a way to do this. Can you advice a little bit please? I've been banging my head against the table about this case for weeks now.

Yepp, I remember you:) I can't really commit to look at any dumps but I can give you some pointers.

0x80004003 is not really a clr exception, and I am not sure based on your comment if you are actually stopped at the exception or just see it on the heap. If you got it from the heap you won't be able to inspect the parameters etc. so in that case you would have to set up debug diag or an adplus config file to get a full dump on 0x80004003. Check the windbg help files for adplus config for more info on that...

If you are stopped on the exception you can either use !clrstack -p to find the parameters or if that doesnt help you can try !dso to see the objects on the stack,