Issue

Recently a colleague asked this question. They had a customer who was experiencing a heap corruption so as expected we enabled PageHeap but there was a catch. The application had to run for a long time (around 30 days) in order to reproduce the crash and we had no idea what’s causing the crash.

How do we enable PageHeap?

We can enable standard PageHeap using following command run from an admin command prompt: gflags /p /enable ImageFileName
To enable full PageHeap use the following: gflags /p /enable ImageFileName /full

(MSDN) Use care in interpreting the Enable page heap check box for an image file in the GFlags dialog box. It indicates that page heap verification is enabled for an image file, but it does not indicate whether it is full or standard page heap verification. If the check results from selecting the check box, then full page heap verification is enabled for the image file. However, if the check results from use of the command-line interface, then the check can represent the enabling of either full or standard page heap verification for the image file.

Why application hung?

So customer enabled PageHeap and went home. Came back next day to see that the application has stopped responding and is hung. The application hung apparently after enabling PageHeap and as we know of PageHeap: every allocation is paged to the page file. So guess why would the hang take place? PageFile size!

Resolution

The customer had set PageFile to its default size which apparently was not enough in this case. We suggested to increase the PageFile size and the hang went away. This resolved the issue. Note that if you enable PageHeap and then go home no matter what’s the PageFile size eventually the result will be unpredictable as the PageFile size is finite. You might need to tweak your PageHeap settings and make it per module or non-full standard page heap.

Conclusion

Please note there are different variants of PageHeap. In this case we needed a full PageHeap so please note this will be pretty heavy on the PageFile.

What is a Symbol?

When you compile your executable the compiler generates debugging symbol information for every file it compiles and then linker assembles all of these symbol information into one file called PDB file or the Program DataBase file. Every variable, function in your application code can be called as a symbol which implies there will be private and public symbols.

This generated .pdb file’s full local path is embedded into the executable file. This comes in handy while you are debugging this application on your development machine. The PDB file will be picked up by the debugger since full path to the PDB is embedded into the executable by the linker. This will help the debugger figure, line numbers, file names, callstacks, local and global vars.

Why do we need Symbols?

Symbol file or .pdb file contains information which are actually not needed when you run your application but these come in handy when debugging application for bugs. Without pdb files or symbol files figuring out bugs or exact callstacks will be a pain on Windows. If that’s the case you might ask then why is this information not embedded into the executable? The answer is symbols are not always needed hence they are dumped into a separate file called .pdb so that you can debug when needed and also you can choose who see’s what symbols in turn making it hard for people to reverse engineer your code.

Public and Private Symbols

When linker generates a PDB file it contain both private and public debugging symbol information. Of course you can configure what it generates in the linker property pages.

Private symbol data contains following (mostly)

Global and Local Variables.

Functions

All user defined types.

Line number and source file information.

Public symbol table contains following…

Functions (just the address)

Global variables that are visible across obj files.

As you might have inferred private symbol files will be bigger in size compared to public symbols files. Also since private symbol file contains public symbols information as well, we can generate a separate public symbol file from this private symbols file. We use a tool called pdbcopy.exe for this purpose, comes with the windows debugger installation.

Symbol Path

So how do we tell the debugger where to look for symbols. One of my favorites is to use the environment variable_NT_SYMBOL_PATH. This variable provides us the flexibility to specify cache directories for downloaded symbols, we can even specify per symbol server cache directory.

Following value for _NT_SYMBOL_PATH downloads symbols from the server and puts into C:\Symbols folder.

Windows debugger provides commands to controls symbol path, .sympath, .symfix. I use .symfix to quickly setup a default symbol path and symbols will be downloaded to a sym folder under the debugger folder. While .sympath is a cool command. If you would quickly add a symbol path to the debugger, just do the following…

.sympath+ C:\AnotherSymbolFolder
.reload

Controlling Symbol Loading in Windows Debugger

The debugger provides a command called .symopt. If we run the command without any arguments its shows our current symbol loading settings, for e.g.

So we see in this case we’ve configured to load line number information, and since we haven’t said SYMOPT_PUBLICS_ONLY, then private symbols are loaded. SYMOPT_AUTO_PUBLICS tells debugger to look for public symbols only as a last resort.

Along with this to see a list of modules for which symbol loading failed use command ‘lme’. To get a verbose output of the symbol loading process in the debugger use “!sym noisy” to turn it off use “!sym quiet”.

The above code is bit of pointer arithmetic, first m_pszData is cast to a pointer to CStringData and then the casted type is deducted by –1 (which will equate to -sizeof(CStringData). So lets see while debugging if we can get to the CStringData located at a negative offset. First lets get the size of ATL::CStringData in memory.

0:045> ?? sizeof(ATL::CStringData)
unsigned int 0x10

Size of ATL::CStringData comes to 0x10 bytes. So in my test application lets find out what is located at a negative offset of 0x10 bytes. In my current frame I’ve the following locals. My CString object is called TestCString, highlighted in bold in the below code snippet.

High Memory Usage Scenario

Recently had a customer who was complaining about high memory usage on Windows 8.1. The application consumed about 140 MB on a Windows 8.1 OS as compared to a meager 3 to 4 MB on a Windows 7 or 8 machine.

Hmm interesting. Being experienced in troubleshooting for sometime now this smelled to me like an issue with some kind of debug flag settings. So immediately checked with customer if he has accidentally left some GFlags setting configured.

Reminded me of a customer who had an issue wherein all applications on his box started showing high memory usage, eventually this turned out to be an issue with a system wide flag configured via GFlags. GFlags is a helpful tool but please do remember to undo the changes once you’re done with the debugging. Probably stick a sticky somewhere which will hint you to turn off these settings.

So coming back to this incident, hmm why would the application consume high memory on Windows 8.1. Note: He had the application compiled using VS2008.

Memory Dump Analysis for High Memory Usage

Checked memory dump of Test.exe running on Windows 8.1 in our debugger and saw that it has some heap validation features enabled. This is the reason why huge amount of memory is being consumed since these heap validation features will require extra memory.

I was bit surprised as the customer said he doesn’t have GFlags on his box. So I renamed Test.exe to Test1.exe and this is what the dump shows now. Looks like someone’s enabling heap validation flags on Test.exe.

Cause of High Memory Usage

Eventually we figured out who’s turning the heap validation flags on. The integrated Application Verifier included in the Visual Studio Team Suite and Visual Studio Team System for Developers versions of Visual Studio was turning these features on and that was expected as well. The customer had pro version hence he probably didn’t see the settings in project properties. This is how the project property pages will look like…

So if you have application verifier installed on your box you’ll see your application listed as Visual Studio turns certain registry settings on/off based on your settings. Once your application starts up these settings will take effect. Troubleshooting is fun isn’t it.

About AgeStore

It’s a good habit to clear out old symbol files. Debugging tools for windows comes with a built in tool which help us do this. The tool is named ‘AgeStore’.

AgeStore executes in three modes…

-date=mm-dd-yy – deletes all files that were last accessed before the specified date.

-days=xx – deletes all files that were last accessed before today minus the amount of days specified by ‘xx’.

-size=xx – deletes files in order of last access time (oldest first), until all the files in the directory total to the amount of bytes specified by ‘xx’.

There is a caveat when running this command on vista and later. On Vista and later by default “Last Access Time” is disabled, since AgeStore works on “Last Access Time” the tool will fail. Use fsutil command to turn on “Last Access Time” feature, as follows…

E:\>fsutil behavior set DisableLastAccess 0DisableLastAccess = 0

This will turn on last access feature. Please note if this feature was off by default, you’ll not see any old files (based on access) since you turned on last access feature just now. So you’ll have to leave this feature on and then later run the AgeStore command.

Note also that if you run the AgeStore command, the default action is to delete files unless, please be very careful. AgeStore can be used on any folder, not just on symbol folder.

AgeStore Help Text

E:\>Agestore
agestore [pathspec]
Deletes all files from a directory based on the last access time of the files.
[pathspec] defines the root path and file specification.
The default is all files in the current working directory
It runs in one of these modes...
-date=mm-dd-yy - deletes all files that were last accessed before the specified date.
-days=xx - deletes all files that were last accessed before today minus the
amount of days specified by 'xx'.
-size=xx - deletes files in order of last access time (oldest first), until all the
files in the directory total to the amount of bytes specified by 'xx'.
-size - lists the amount of bytes in the directory.
-lat=<on off> - toggles filesytem support for last-access-time.
These other command line switches alter the behavior of the program.
-l - list files only, don't delete
-s - include subdirectories.
-k - keep empty subdirectories - normally they are removed.
-q - quiet mode stops listing of files as they are deleted.
-y - eliminates the (y/n) prompt.
-r - deletes RO files
This program deletes files. You should run agestore with the -l switch
to see what it will delete, before actual usage

Sample Commands

The following command lists all symbols older than the given date AgeStore e:\pdbsymbols -date=07-08-13 -s –l

The following command list all pdb files older than the number of days given belowAgeStore e:\pdbsymbols -days=60 -s –l

The following command deletes files in order of last access time (oldest first), until all the files in the directory total to the amount of bytes specified by the parameter passed to –size command.AgeStore e:\pdbsymbols -size=8000000 -s -l<snip>10375868360 bytes would be deleted4336640 bytes would remain

The following command lists the amount of bytes in the directory. AgeStore e:\pdbsymbols -size -s

While stepping through disassembly code you might have wondered if there is a way to jump directly to the next branching statement or the next call or the next return statement instruction. The answer is: Yes there are some very useful ones, the following table of commands is taken from WinDbg documentation.

Target executes until it reaches a call instruction or return instruction. If the current instruction is a call instruction or return instruction, the instruction is traced into until a new call or return is reached.

Target executes until it reaches any kind of branching instruction, including conditional or unconditional branches, calls, returns, and system calls. If the current instruction is a branching instruction, the instruction is traced into until a new branching instruction is reached.

Filename and line number information is stored inside private symbols (.pdb file). So if private symbols are available the debugger will try figuring out the line number information. Note: public symbols doesn’t have line number information.

So the question I’ve heard people new to windbg ask is how to turn off line number display. What’s the command for this. What I normally do is and the easiest of all is the ‘.lines’ command. This is a toggle command, next time you execute .lines, the command will turn ‘on’ line number information.

The symbol option of interest to us is: SYMOPT_LOAD_LINES. Following is the MSDN description of this item.

This symbol option allows line number information to be read from source files. This option must be on for source debugging to work correctly.

In KD and CDB, this option is off by default; in WinDbg, this option is on by default. In CDB and KD, the -lines command-line option will turn this option on. Once the debugger is running, it can be turned on or off by using .symopt+0x10 or .symopt-0x10, respectively. It can also be toggled on and off by using the .lines (Toggle Source Line Support) command.

This option is on by default in DBH. Once DBH is running, it can be turned on or off by using symopt +10 or symopt -10, respectively.

What’s NonInvasive debugging?

Non-Invasive debugging is a useful technique to debug hung processes. When NonInvasive debugging is going on the debugger suspends all threads in the process and has access to all threads, memory and register’s of the process. The debugger when NonInvasive debugging is in progress cannot modify process memory, cannot instruct the process to run as well.

Open WinDbg, press F6 or File->Attach to a Process. Please make sure you check “Noninvasive” check box in the “Attach to Process” dialog.

While non-invasive debugging is in progress we can have another instance of the debugger attached to the debuggee. This proves that when non-invasive debugging is in progress the debugger is not attached to the debuggee. Note on Windows only one debugger can be attached at any time to a process. Also while debugging non-invasively common windbg commands like ‘g’ won’t work because this debugger is not attached to the debuggee invasively hence cannot instruct the process to resume execution. A debugger invasively attached to a debuggee manipulates it via a thread created in the remote process.

Thread manipulation during NonInvasive debugging

As written already: for non-invasive debugging the debugger suspends all the threads in the process so this also means that we can resume execution of these threads too. The command to do this is as follows…

0:000> ~*m

~m resumes a thread while ~n suspends a thread. If we don’t resume the threads we won’t see the process UI as the UI thread is also in a suspended state.

Viewing debuggee process memory during NonInvasive debugging

Now with two debuggers monitoring the process we can view the process’ memory via the non-invasive debugger as well. For e.g. when you set a breakpoint via the second debugger (attached invasively) its interesting to see how the function code is modified by the debugger to get the breakpoint to work, see below e.g.

This is how code for ntdll!ntopenfile will look like before a breakpoint is set…

In effect the original three byte instruction (4c8bd1) is replaced by (cc8bd1). The only change: 4c –> cc. cc evaluates to int 3. When the breakpoint is hit (or when we press Ctrl + Break) the breakpoint instruction is replaced by original op code i.e. 4c8bd1.

We could figure this out via the non-Invasive debugger. If a process is hung we can in effect go through the call stacks and find out potential hang scenarios, for e.g. a process waiting on a network drive.