Using the powers of the debugger to solve the problems of the world - and a bag of chips
by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

Tess Ferrandez

I work as a developer evangelist at Microsoft, and my job is to help
developers make the most of their skills on the MS stack.

In this blog I share tips on anything from debugging and troubleshooting to
development on platforms like Windows, Web, Windows Phone and Kinect. And also some random
tidbits about computing and my life at MS.

The export symbols are part of the dll itself. For example ntdll.dll and kernel32.dll expose a big part of their functions as export symbols so that they can be called as API’s, but most dll’s that you find in a process have a very small set of exported symbols. Generally export symbols don’t contain parameter info for the functions and since very few functions are exposed this way you can’t really relay on the validity of the stacks when you have only export symbols.

Public symbols contain some basic symbols such as function names and global variables, but again, not all function names are exposed in public symbols. The developer of the dll chooses what to expose as public symbols and thus he/she can hide anything that they feel would give away too much information about the implementation.

Private symbols contain pretty much everything listed in the first paragraph

When debugging, symbols are matched up with the respective dlls or exe’s by a GUID linking the dll/exe to a symbol file (thanks Pavan for the correction). This means that if you have multiple ntdll.pdb’s in your symbol search path, the debugger will know which one corresponds to your particular version of ntdll.dll. The search path is given by .sympath, and apart from what is listed in your sympath the debugger will also look in the directory where the dll is loaded from as well as anything in the paths given in the environment variable _NT_SYMBOL_PATH.

So… what happens if you have the wrong symbols?

Let’s look at this stack with public symbols for mscorsvr.dll: (sorry about the formatting)

The debugger is nice and tells us that we couldn’t find the symbol file for mscorsvr, but yet it gives us a function name (from the export symbols) so it appears that we’re calling some function called GetCompileInfo, hmm… very strange, why did it pick that name?

If we were to list the symbols (export symbols for mscorsvr) we would get a list that looks like this

So we are executing something at address 791dbe6e (791d2fd5+0x8e99), if we take a peak at the symbols with public symbols loaded, we would see that that address landed inside the intermediateThreadProc function (+0x44 to be exact)

So in essence, the reason it picked the name GetCompileInfo was that that was the last symbol it could find, preceding the address that we were executing at.

You can see how this can easily get confusing. Oh and btw, not only did it give us a completely wrong function name, but if you take a closer look at the two stacks, we lost a whole function call on the stack with the export symbols (The call to WorkerThreadStart). The reason for this is that the function we were supposed to look at was at address 791b578b and this address is located before our first exported function so it can’t even resolve it to something fake.

How do you know if your symbols are good?

For starters, just by looking at the stacks we can see that the function names seem pretty weird. It doesn’t make sense that a thread would start with lstrcmpiW, they usually start with BaseThreadStart or something to that effect. The second thing that gives it away just by looking at the stack is that we were supposed to be 0x8e99 instructions in to the function GetCompileInfo, that is 36505 instructions, wow, that would be a mighty long function.

The more definite way of knowing would be to run lmv on the executable. (notice the extra m before mscorsvr for match pattern)

Lmv will give you a lot of good information about the dll, but in this particular case what we are interested is what is printed after the module name, i.e export symbols in this case. (You can also get this for all modules by running lm)

One last comment on this: As you may have noticed, going to the public Microsoft symbols server you sometimes get public symbols and sometimes end up with export symbols for mscorsvr.dll for example. This is because some cases like certain hotfixes and private builds have not been uploaded to the public symbol server.

Lazy symbol loading and downloading symbols for offline debugging

If you run lm you will notice that a lot of the dll’s and exe’s will have their symbols listed as deferred. This is because the debugger doesn’t actually load all the symbols when the debugger starts (as this would take a looooooooong time especially if you have a remote symbol store). Symbols are loaded on an as needed bases, i.e. when you run kb (to list the stacks) or run x to examine the symbols for a particular executable, or any other command that requires symbol lookup.

Sometimes though, it is useful to download all the symbols for a process, a common example would be if your application is running on a machine with no internet access and you want to be able to live debug it.

In this case what you can do is this:

Take a dump of the process when all modules are loaded and open it on a machine with internet access.

Set the sympathy to srv*c:\myappsymbols*http://msdl.microsoft.com/download/symbols;

run .reload /i /f to force load all the symbols for the dump

This will download all the symbols available on Microsofts public symbol server that match your dlls into the c:\myappsymbols and then you can simply copy this folder over to the machine without internet access and set the symbolpath there to srv*c:\myappsymbols*c:\myappsymbols;c:\folderwithanyadditionalsymbols.

But… some of the functions on my stack only show up as addresses, what is up with that?

This really doesn’t have anything to do with symbols but I thought it was worth mentioning anyways. In the callstack below we can see one function call only listed by it’s address… 0xd44334a, the reason for this is that it is a managed function which doesn’t have a native translation, and after all windbg is merely a native debugger. You can run !clrstack or !ip2md on the address to get the .net function name.

Technically for debugging with windbg the answer is that you would need them very seldom or never. If you do have managed symbols and the dlls are compiled in debug mode you can get line numbers and source files. You can also get some parameter info when using !clrstack –p. But all in all, the extra “stuff” you get from managed symbols is very little when using a native debugger like windbg.

I have talked about a number of different hang/performance issues in my posts. This post is a generic

Coderjoe

14 Nov 2006 4:41 PM

Wow. I was trying to figure out how I was going to load symbols for Windows DLLs that don't seem to match the symbol packs while remote debugging without being able to access the internet to get to the symbol server (stupid VPN software redirects all traffic except DHCP and the VPN connection over the VPN, and the customer site does not allow internet access.) I'm debugging using Visual Studio, and was not aware that I could create a dump file using it. I'm currently waiting for the dump file to transfer across the VPN so I can load it while I have internet access. I don't see why this wouldn't work, though. Thanks!

(found this blog entry via a google search, fwiw)

Coderjoe

14 Nov 2006 6:50 PM

Apparently, I can create the dump using VS, but in order to download the symbols, I have to load the dump into WinDbg and do the force reload command given above. Once I do that, everything works. Thanks again!

The debugger matches the symbols not by the timestamp and checksum but by a GUID that links the dll and pdb. This GUID is generated by the compiler and is inserted into the dll and pdb headers.

Thanks.

yanglei

2 Oct 2009 4:05 AM

0:054> lmv mmscorsvr

shoulde be

0:054> lmv m mscorsvr

Tess

2 Oct 2009 4:32 AM

Yanglei, you can do it either way

JR

12 Dec 2009 1:52 PM

I was wondering what steps are need for offline dump analysis for .net code. I always end up xcopying the tools to the prod server because when I take a crash dump and copy it local the managed functions are gone.

it seems as though windbg needs the actual .net exe and dll files to get the info from because none of the .net code was built debug. Is offline dump analysis of sites that use the gac and temp dlls possible / practical without copying a bunch of files along with the dump file?