Tazatel

WPR and managed call stacks

Dotaz

I have noticed that I can get managed call stacks with the newest version part of the Windows 8 SDK. If I want to use this one to find bottlenecks of my application which I am developing with VS it would be nice to get full call stacks. When I create
a small sample app with a decent nested call stack like

I do get only the currenlty executing managed method back. In this case F9

Line #

NewProcess

NewThreadId

NewThreadStack

ReadyingProcess

ReadyingThreadId

Count

Ready (us)

Ready (us)

Waits (us)

Waits (us)

Count:Waits

SwitchInTime (s)

NewInPri

OldProcess

OldThreadId

OldOutPri

OldInSwitchTime (us)

OldState

OldWaitReason

OldWaitMode

Cpu

IdealCpu

% CPU Usage

2

11128

[Root]

17680

107939,480

195,067

4261686,154

112927,851

17390

1642470,502

0,41

3

|- FastEventLogReader.exe!FastEventLogReader.Program::F9

17630

107009,457

195,067

3833861,204

3667,100

17354

1623974,498

0,39

4

|- ntdll.dll!RtlUserThreadStart

36

646,254

106,749

394412,491

112927,851

25

15579,220

0,02

5

|- ntdll.dll!LdrInitializeThunk

8

135,164

59,519

289,911

155,900

8

1221,086

0,00

6

|- FastEventLogReader.exe!FastEventLogReader.Program::Main

1

47,615

47,615

4630,913

4630,913

1

1473,368

0,00

7

|- ?!?

2

17,280

10,752

28488,947

28488,947

1

112,125

0,00

8

|- FastEventLogReader.exe!FastEventLogReader.Program::F4

1

23,423

23,423

0,000

0,000

0

28,031

0,00

9

|- FastEventLogReader.exe!FastEventLogReader.Program::F3

1

3,840

3,840

0,000

0,000

0

2,304

0,00

10

|- ntkrnlmp.exe!KiStartUserThreadReturn

1

56,447

56,447

2,688

2,688

1

79,870

0,00

I am not sure how to deal with this seemingly broken call stack. It does look like while sampling the stacks I do get an extra node for each uniquely executing method. When I am at F3 then a new node is created. When it is in F9 (most of the time) I do get
a lot of counts there but I am missing the parent methods. Is there a way to see a little more of the managed call stacks as well? One stack frame is nice but some more would help in many cases.

Do I need to modify the trace buffer sizes or buffer counts to see more. Or has this to do something with the CLR Rundown event provider to enable mixed stack walks. I am using .NET 4.0 x64 on Windows 7 on a 8 Core Xeon 2.8 GHz with 12 GB RAM.

Všechny reakce

It is a known issue that for X64 Processes before windows 8 (server 2012), the ETW stack crawling logic stops at the first frame whose code was dynamically generated (that is Just in time compiled). This issue is fixed in Windows 8.

You can work around the problem by

Running the app as a 32 bit application

NGENing the code you care about.

Run on Windows 8

There is a whole section on this in the PerfView users guide that goes into these mitigations (you can get PerfVIew from
http://www.microsoft.com/en-us/download/details.aspx?id=28567) See the FAQ or 'BROKEN stacks'. These mitigations will work for WPR too. (in general, WPR
and PerfView can use each other's data).

I will try this out when I get back to work. Is there any chance to get a fix in Windows 7 as well? If it is a small thing I could try to open a business case to get it. Do you know which OS component would need to be patched? Since it is fixed in Windows
8 it should be easy to backport? This tool is immensely useful but only if the call stacks are not broken. When I want to check if a specific build does work as expected I do not want to NGen it every time since this is rather time consuming for our product.
(~2h on my 8 Core machine). 32 bit is also not an option in my case since it does process large amounts of data.