Thursday, June 28, 2007

Just got directed (via Richard Seroter) to Rob Steel's blog on BizTalk performance. He is putting together what he refers to as the BizTalk Performance Lab Delivery Guide, which is the process his team has established for successful BizTalk testing. Go check it out at BizTalk Performance Explorer!

If you've ever been tasked to test BizTalk you can get a jumpstart from reviewing this blog. If your managing a BizTalk testing engagement, or any testing engagement for that matter, you'll still find good information around lab process(es). He's also posted links to several other relevant BizTalk testing information.

Open Windbg, select File | Open Crash Dump… and select the *.dmp you want to analyze.

Since we are analyzing managed code we are going to need a little help in the form of a Windbg extension. The extension we are interested in is the SOS.DLL (Son of Strike), which is included in the installation of Debugging Tools for Windows, allows Windbg to understand CLR constructs such as the managed heap.

To load the SOS extension, type the following in Windbg:

0:068> .load clr10\sos.dll

Alternatively, you can path the explicit path to SOS for the version of the .NET framework you’re targeting.

0:068> .load C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\sos.dll

Typically, when a snap is created as the result of an exception (-crash), the debugger will identify which thread has the exception and position the Windbg context to that thread. One of the first things you want to do is take a look at the call stack. The managed call stack can be viewed by using !clrstack. This will produce something similar to below.

Note, that in order to get great detail around the stack, you’ll need to have debug symbols available for Windbg to resolve some of the names and information. If you have source code available as well, you can get to the actual source and line data!

A quick review of the managed threads will show if there are any threads that have exceptions. For example, the output of !threads shows that we have two threads with exceptions; TID’s 63 & 68. Notice how the IDE has already positions us to TID 68?

Notice that we’ve already got useful information, the _message object has already been dumped for us. However, we could use !dumpobject 0x0110f9e4 and inspect it ourselves. If we do so, we find that it’s actually a reference to a System.String.

Looking at the contents of the EDI register, we have the address 0x051a1a28. If we dump the address in this register, we see that it is indeed a string with the value of ‘MC’. This value obviously cannot be converted into a double, so a System.FormatException is thrown. The astute observer will notice that the address can be traced back up the call stack to the very top. Recognizing that sort of pattern, and a little insight as to what’s going on, will go a long way!

Conclusion

I actually contrived this issue just a little. Not much granted, an unhandled System.FormatException will terminate a process just as fast as a System.OutOfMemoryException. However, I didn’t have any sample code illustrating an unhandled exception. Each of us can create that in our sleep. What I did have was a live test case in our BizTalk staging environment where, while doing a little issue resolution on another exception type, I noticed this little guy being thrown. Not often, but I was seeing them.

This System.FormatException obviously wasn’t terminating the process; we’d have noticed it much sooner if it had! That means it was being handled somewhere, either in our code base, or in BizTalk. By attaching Windbg to the BTSNTSvc.exe process, and giving the debugger first chance at any exceptions (Adplus –crash –p xxxx -FullOnFirst) I was able to catch the exception in action. Reviewing the call stack produced shows that this exception was happening during the execution of a BizTalk map, from an orchestration, specifically when a particular functoid was being invoked: IsNumeric. It seems that IsNumeric is simply attempting to parse the value into a System.Double. Presumably, if it can, it returns true. If it can’t, the System.FormatException is thrown, but then caught in IsNumeric and then IsNumeric returns false. This catch uncovered a bug in the map that hadn't yet been flushed out in testing.

Last time we discussed how to snap a process; lets build on that a little further.

Scenario

An exception is not being handled and it causes the process to terminate abnormally. The runtime has given any callers in the call stack a chance to handle the exception and none have done so. This type of exception is then passed back to the kernel who then invokes the default exception handler, which terminates the process to preserve system integrity.

Identify the Process

Typically, you know which process is crashing; it’s your process after all. Open up your favorite system tool and take note of the process identifier (PID).

If you don’t own the process, it gets a little trickier. For example, all BizTalk code executes in one, or more, instances of the BTSNTSvc.exe host process. ASP.NET has a similar host model, at least with IIS 6.0, in that code can run in multiple instances of the IIS worker process W3WP.exe. Reporting Services also has its multiple hosts: ReportingServices.exe and W3WP.exe.

Unfortunately, using the standard Windows process tool, Task Manager, you may not be able to get enough of the information to identify the specific process your code is executing in. Never fear! Process Monitor is here! Process Monitor, now Microsoft Sysinternals, is the end all, be all system information tool. It can help you identify which process instance you are truly interested in.

For BizTalk, using Process Explorer, you can identify the command line of the specific service instance your interested in. Remember, since BizTalk has a generic host instance for code to run in, the way it determines which code will run where is by GUID. This GUID, and a user friendly host name, is passed to the service at startup on the command line. You can view this command line information by adding the 'command line' column from Process Monitor.

For Reporting Services and ASP.NET applications, the name of the IIS Application Pool will be passed on the command line as well. If you are in a situation where you have multiple W3WP.exe processes running for a given Application Pool, it gets a little more complicated. You can manually attach/snap the individual processes based on the command line identifiers. Fortunately, the tools team realized how many programmers hate repetition and gave us an easy out. Just pass the –iis flag to Adplus. This will probably create more dumps than your interested in, but you’ll make sure you get everything. This will dump each instance of an IIS related process: INETINFO.EXE, DLLHOST.EXE and W3WP.EXE.

Alternatively, you can also use the –pn flag and execute the same Adplus commands against all processes with the same name.

Now that you've got the correct process instance your interested in, its time to actually setup the trap and capture the crash!

Capture the Crash

Open a command promptChange to the Debugging Tools for Windows directoryExecute adplus –crash –p xxxx (where xxxx is the PID of the process you want to capture the crash, e.g. 1234)

This will open a new debugger host process which has attached to the specified process. Now, you either wait for the crash to happen during the normal course of events, or if you know how to trigger the crash that will speed things along. E.g. you know that when BizTalk processes a particular message, a certain report is executed, or a certain user event in your web application triggers the unhandled exception.

Of course, this only captures those exceptions which are unhandled by anyone. What happens when you suspect there are exceptions that are being handled? That’s where the –FullOnFirst flag for Adplus comes into play. When using –FullOnFirst, Adplus will create a full memory snap the first time it sees an exception.

Capture the Crash (First Chance Exceptions)

Open a command promptChange to the Debugging Tools for Windows directoryExecute adplus –crash –p xxxx -FullOnFirst (where xxxx is the process id of the process you want to capture the crash)

One word of warning, if this is something that happens frequently, you can get a LOT of snaps in a very short period of time. Remember, since we are snapping the state of the process, this includes all its memory. If its 400MB in memory, it will be 400MB on disk!