A Mixed-Mode Stackwalk with the IDebugClient Interface

A native stackwalk funtion like Stackwalk64 cannot handle mixed-mode stacks, since managed code does not use the stack in the same way as native code does. There is an API called IDebugClient, that does walk a mixed-mode stack correctly, which we will explore.

Introduction

Do you need to traverse the callstack of a mixed-mode (unmananged/managed) application or are curious how it can be done?
In this article, I show how the IDebugClient interface can be used to walk a mixed-mode callstack, and how to use the
IXCLRDataProcess interface to find the symbol names of managed methods.

Although it gives a full native callstack, it is not able to completely resolve all managed
method names. But if you are curious about the IDebugClient interface
and want to know more about how to interact with the CLR runtime, I think it can be interesting to continue reading.

Background

It all began with a wish to improve the performance of an application at work. The application is partly written in C++, partly in C#. Part of the C++ framework had already been optimized.
This was done by simply inserting a call to StackWalk64 (dbghelp.dll), at places that we called often, and writing the callstack to disk. Yes,
it is a poor-man's profiler,
but the truth is that it is effective and is very easy to use. The weakness is that it does not handle managed code, so I only got a partial stacktrace.
This led my to look at alternative stackwalk APIs. Each with its own pros and cons.

Alternative APIs

Below is an example of a mixed-mode callstack.

We start from the bottom, where a native code calls into managed code. This managed code calls into native code, and we end up with an interleaved stack.

System.Diagnostics.StackTrace

In .NET, you can quite easily walk the stack with the System.Diagnostics.StackTrace class. Below is a sample written in C++/CLI (usable from both managed and unmanaged code).

It correctly unwinds the callstack down to the last function call MixedABCDEF.

StackWalk64

Stackwalk64 lets us see native stack frames on the stack, but not the managed frames. One reason for this is that managed frames does not use the stack in the same way as native code.
Below is how you more or less use the StackWalk64 function. In order to map the Instruction Pointers to symbol names, one must
call SymInitialize once, and call
SymGetSymFromAddr64 for each found EIP.

There are symbols that cannot be found. They actually belong to kernel32 and msvcrt. They should have been resolved, with a little troubleshooting they can probably be resolved.
Remember that SymInitialize is asynchronous, it returns but Symbol files are loaded in the background. If you try to resolve before the symbol file has been loaded, you will get an error.

What I wanted to show was that the managed frames are not displayed. They are displayed as function calls within the CLR runtime, which isn't very helpful.

The stack looks very similar to mine. A bit better, because it correctly resolves functions from kernel32 and msvcrt.
But look closely. But there are addresses that cannot be resolved. Apparently, a normal stackwalk gets confused "WARNING: Frame IP not in any known module. Following frames may be wrong."
Normally DLLs get loaded into a memory space, and code is located within that memory range. Assemblies are also loaded, but don't contain any executable code.
The JIT compiler takes the IL code and generates machine code which it puts on the heap. A native stackwalker only sees the generated code, which is not in any loaded module (correct).
The stackwalker doesn't know anything about the IL code, nor can it use the PDB files correctly, because it maps to the IL-code and not the machine dependent code.

WinDbg with SOS extension

SOS is a
WinDbg extension to debug managed applications. It is capable of walking mixed
stackframes with the !clrstack command. Let's see how well it performs.

Discarding System.Diagnostics.StackTrace

The StackTrace class works great, but it has disadvantages.

Firstly, it uses Reflection and is really slow. Secondly, the old way of manually instrumenting code is not good any more. The stacktrace calls can not be left in the source code,
and adding and removing them each time would be time consuming. Thirdly, I didn't really know where we had performance problems and where the stack should be traced.
What I now need is to take stacktrace samples and based on the frequency be pointed in a general direction where the problem was.
This is also know as Sample Profiling. An external application hooks up to a target app, and at regular intervals, e.g., every 20 ms, it takes a stacktrace sample.

Where to find information about possible solutions

One possibility is to try to reverse engineer the SOS extension.

Try to find some CLR API, maybe look at mscoree.h and related includes.

If you want to know more about the CLR runtime and how to interact with it through mscoree, and the CLR hosting interfaces,
I can recommend reading Customizing the Microsoft® .NET Framework Common Language Runtime.
The CLR APIs might not be enough. But a wonderful source of inspiration is the Rotor source code, it is the implementation of an unoptimized CLR runtime written by Microsoft
for standardization purposes.

The implementation

We will explore two interfaces. IDebugClient that is said to give the full stacktrace and
IXCLRDATAProcess (mscordacwks.dll) to translate managed addresses into readable method names.

IDebugClient

Microsoft has been kind enough to provide an API that can walk mixed-frames, IDebugClient
which is exposed by dbgeng.dll.

It is fairly straightforward to use the API. There are several samples on the internet.
You create an object through a special function called DebugCreate and feed it the GUID of the
IDebugClient interface.

I actually got into problems with the attach. It sometimes worked, sometimes not. It worked when I debugged it. It even worked by adding a sleep after the attach.
I found out what it was. It takes some time for the attach to complete, it is only initiated, so the object isn't really ready yet.
What we can do is, to set the execution status of the target app to "go". The process is already running (it was never suspended),
so the call will hopefully return immediately. But here is the clever thing. It returns when the debugger is properly attached.

I made one big mistake, which took some time to fix. Since I was just interested in the function
IDebugControl4::GetStackTrace,
and I wasn't going to use it for stepping, setting breakpoints, etc., I didn't bother to implement the callback functions for printing to screen,
and the thing kept crashing on me when trying to get the interfaces. Come on! Couldn't someone have inserted an extra test to see whether
users of the API were interested in the events or the output? I implemented these debug output callback classes too.
Well, I left the function bodies totally empty. I wasn't interested in it anyway.

CLRDataCreateInstance

CLRDataCreateInstance (defined in clrdata.idlCorGuids.lib) can return a COM object with
the interface IXCLRDataProcess. This object can enumerate tasks, appdomains, methods, etc.
It also contains functions for mapping addresses or internal CLR IDs to methods, classes, assemblies, etc. Quite neat. This might be what we need.

A small problem is that the IXCLRDataProcess interface doesn't have a header file, but you can generate one from
xclrdata.idl
which is part of the Rotor source code.
According to the license, it is allowed to use the source code for non-commercial purposes.

Some of the Rotor source code can be non-trivial to understand. I want to give credit to Steve's Blog,
which gives some useful instructions, but unfortunately, Steve supplies no source code So, there was actually quite
a lot of implementation and debugging work left to do.

Implementing ICLRDataTarget

In order to create an IXCLRDataProcess, the function CLRCreateInstance expects an ICLRDataTarget object
that interacts with the managed application. It is an interface that needs to be implemented by the user. I have no idea why a default implementation of this interface doesn't already exist.
It does only basic stuff such as reading and writing to raw memory, returning pointer size, etc.

I cut some corners doing the implementation. I only support the x86 architecture. Managed apps, compiled for the "any" platform, can run in both x86/x64 mode depending
on the OS hosting it, but in Visual Studio 2010, it actually defaults to the x86 architecture. Apart from that, I wanted the x86 to work first, before I tried x64.
Always do the easy case first. When it works, we extend.

How do you know if a process is a .NET process? The PE file header, present in all executables, contains that information.
A simpler way is to look if certain CLR modules have been loaded like clr, clrjit, mscorlib_ni, mscoree, etc.

How do you know if it is a CLR v4.0? Look for clr.dll. Mscorwks.dll was renamed from v2.0 to v4.0. Might give false positives, if someone names their modules
clr.dll,
but isn't this article about mixed-mode/managed apps anyway?

Obtaining the IXCLRDataprocess object

To create the IXCLDataProcess object we need to call CLRDataCreateInstance located in the data access
DLL named mscordacwks.dll.
Remember that the DLL is CLR version dependent, so it must be loaded from the correct file location, that is why we check the CLR version.
Then we have to manually load the library into memory, then call GetProcessAddress to get the address of
CLRDataCreateInstance.
Finally we call the function, giving it an instance of our ICLRDataTarget.

I am sorry about all the TCHARs, char, std::string, and
std::wstring you might find in my code. A TCHAR is a
wchar_t when compiled for Unicode, and a char when compiled in multibyte.
Regardless of how it is compiled. StackWalk64 always uses chars, but most Win32 APIs adapts
themselves. It can be messy sometimes when you have to convert back and forth.

Resolving Managed Method names using IXCLRDataProcess::Request

In Steve's blog, you can read about something called
DacpMethodDescData and IXCLRDataProcess::Request.
It is a generic interface that takes an enum value describing what type of data you want, a pointer to the input parameter, and a pointer to the output parameter.

A powerful interface, if you know what enum values to send in, otherwise you
are doomed. It gives the same info as GetRuntimeNameByAddress.
Below is a code snippet, you can also find it in the attached source code.

Sample apps

There are three applications present in the demo folder. It looks in the current folder for
pdb files.
It also looks in C:\symbols. It also tries to download PDB files from Microsoft and store them in
C:\symbols.
Without these symbols, Stackwalk64 might get lost very quickly, since it doesn't know about the calling convention, omitted frame-pointers, and other optimizations.

Start CppCliApp.exe first, it prints the process ID, and outputs a
Stackwalk64 and a System.Diagnostics.StackTrace callstack. Use the
pId when you use the other apps.

We managed to get a full stacktrace from a managed app. It even resolved the addresses that WinDbg failed on, thanks to
mscordacwks.dll.
But my own managed classes still don't appear. This is unfortunate. The calls to
clr!xxx makes absolute sense if we think about it.
IL code cannot run, it must be JITted to machine code, but the CLR probably has
a native function that executes JITted code.
It is this function that we see.

On a purely managed app, I actually get a much better stacktrace. Many CLR functions show up in readable code, but my own managed method names are still hiding.

The IDebugClient interface is supposed to be able to walk the callstack. I don't know why it fails. The difference is that I have a Managed C# app
at the bottom that calls into mixed-mode/libraries. The other app was a C++/CLI app that called into mixed-mode/managed libraries. The libraries are the same.

After all it was not the result I expected. Even if I got the full stacktrace, using the
IDebugClient and the IXCLRDataProcess is not enough.

The solution

The solution is to use the ICorProfiler interface instead. It allows you to create an in-process profiler that interacts with the CLR.
It contains code for walking mixed mode apps. Inprocess means that it is a DLL that loads into the process space of the target process.
This means also that we can say goodbye to the IDebugClient interface, since it is not possible to attach a debugger to
the same process we make the attach from.
I have done a small sampler profiler too, but it will be another article.

Points of interest

There are some great sources on the internet Profiler stack walking: Basics and beyond and
Building a mixed mode stack walker
The last link made me realise that what I really needed was the ICorProfiler interface. But at that point
I had already done 95% of what I have just shown you. So I decided to finish it anyway.
I made a brave attempt, and I learned a great deal along the way. I hope some of this information can be useful for you too.

For people analyzing memory dumps of .NET apps using WinDbg and SOS, mscordacwks.dll
is a must to know about. A memory dump of a .NET app from one machine cannot
simply be copied to other machines for analyzing. The SOS extension must load
the correct version of mscordacwks.dll in order to understand the memory dump.
But if the machine where the memory dump was saved didn't use exactly the same .NET
Framework version, the SOS cannot understand the data. To overcome this problem,
mscordacwks.dll should be copied along with the dump file.

Share

About the Author

Mattias works at Visma, a leading Nordic ERP solution provider. He has good knowledge in C++/.Net development, test tool development, and debugging. His great passion is memory dump analysis. He likes giving talks and courses.