"How to crush a DC in 10 easy steps." or, what is debugging anyway?

Before we move along, I should say that I look at stuff all day long and use all kinds of tools and different methods to fix problems. Many of the cases involve going through some source code of random component X in order to see where things blew up.

As I mentioned in a previous post "Debugging isn't a tool - it's a methodology. Use the knowledge of your system to piece things together in a logical fashion and you'll find the problem. "

This post is a good example of NOT using any code, but a familiar toolset, in order to realize what was going on.

So the day begins with a request to see why a DC is being crushed. CPU is through the roof and they have some SPA data, as well as some Network traces:

I jumped right to the net traces.

Looking atCrushedDC.cap I see 500 requests forLsarLookupNames

Plus repeated calls~- 370- for group expansions via SamrGetMembersInGroup(this API lists all the members in a group) and the Rids in each request vary..

The data below was from printing the trace to a file, in Ethereal ( ya ya Wireshark I know.. ) and simple searches

Once we determinedthat the source of the CPU utilization was from client machines, we asked for a complete kernel dump from a client the next time the issue occurred.

Of course, the client neededto be identified from a trace taken at the same time. Luckily they were able to do this and found 2 local clients to get data from.

When I got the dumps, what was the goal?

To see what process on the clients was sending over ridiculous amounts of traffic for LookupNames , or similar lookup requests.

In order to do this I first setup a log file using :

.logappend c:\processes.log

And thendumped all stacks for all processes on the machine.

!for_each_process ".process /p /r @#Process;!process @#Process"( read as:for each process set the process context and reload the usermode symbols, then do a !process to get all stacks for the current process )

Stepped away to work on something else for a bit as both debugger instances chunked through each process and dumped all the stacks to my logfile.

Once I came back it was a simple matter of looking forthe string “lookupaccount”

Ahaaa!! There was the bugger. There it was ....WMI.

From the code I could see the paramters passed on for the exact queries being done and dumped these queries from the data in the dump:

These queries, from a number of clients, are ineffecient and cause the DC to enumerate waay to much data to find what the client needs.

WARNING: Stack unwind information not available. Following frames may be wrong.

00162044 00000000 00000000 00030004 0008019e BadApp+0x47eb

So – we go back and reconfigure badapp.exe ( of course it wasn’t the real name, the real app is a well known one I don’t think I can mention here ) to NOT do such terribly expensive WMI queries contantly… and no more DC being crushed.