Hi!My name is Tate.I’m an Escalation Engineer on the Microsoft Critical Problem Resolution Platforms Team.I wanted to share one of the most common errors we troubleshoot here on the CPR team, its root cause being pool consumption, and the methods by which we can remedy it quickly!

This issue is commonly misdiagnosed, however, 90% of the time it is actually quite possible to determine the resolution quickly without any serious effort at all!

This is our friend the Server Service reporting that when it was trying to satisfy a request, it was not able to find enough free memory of the respective type of pool.2020 indicates Paged Pool and 2019, NonPaged Pool.This doesn’t mean that the Server Service (srv.sys) is broken or the root cause of the problem, more often rather it is the first component to see the resource problem and report it to the Event Log.Thus, there could be (and usually are) a few more symptoms of pool exhaustion on the system such as hangs, or out of resource errors reported by drivers or applications, or all of the above!

What is Pool?

First, Pool is not the amount of RAM on the system, it is however a segment of the virtual memory or address space that Windows reserves on boot.These pools are finite considering address space itself is finite.So, because 32bit(x86) machines can address 2^32==4Gigs, Windows uses (by default) 2GB for applications and 2GB for kernel.Of the 2GB for kernel there are other things we must fit in our 2GB such as Page Table Entries (PTEs) and as such the maximum amount of Paged Pool for 32bit(x86) of ~460MB puts this in perspective in terms of our realistic limits per processor architecture.As this implies, 64bit(x64&ia64) machines have less of a problem here due to their larger address space but there are still limits and thus no free lunch.

*For more about determining current pool limits see the common question post “Why am I out of Paged Pool at ~200MB…” at the end of this post.

These pools are used by either the kernel directly, indirectly by its support of various structures due to application requests on the system (CreateFile for example), or drivers installed on the system for their memory allocations made via the kernel pool allocation functions.

Literally, NonPaged means that this memory when allocated will not be paged to disk and thus resident at all times, which is an important feature for drivers.Paged conversely, can be, well… paged out to disk.In the end though, all this memory is allocated through a common set of functions, most common is ExAllocatePoolWithTag.

Ok, so what is using it/abusing it? (our goal right!?)

Now that we know that the culprit is Windows or a component shipping with Windows, a driver, or an application requesting lots of things that the kernel has to create on its behalf, how can we find out which?

There are really four basic methods that are typically used (listing in order of increasing difficulty)

1.)Find By Handle Count

Handle Count?Yes, considering that we know that an application can request something of the OS that it must then in turn create and provide a reference to…this is typically represented by a handle, and thus charged to the process’ total handle count!

The quickest way by far if the machine is not completely hung is to check this via Task Manager.Ctrl+Shift+Esc…Processes Tab…View…Select Columns…Handle Count.Sort on Handles column now and check to see if there is a significantly large one there (this information is also obtainable via Perfmon.exe, Process Explorer, Handle.exe, etc.).

What’s large?Well, typically we should raise an eyebrow at anything over 5,000 or so.Now that’s not to say that over this amount is inherently bad, just know that there is no free lunch and that a handle to something usually means that on the other end there is a corresponding object stored in NonPaged or Paged Pool which takes up memory.

So for example let’s say we have a process that has 100,000 handles, mybadapp.exe.What do we do next?

Well, if it’s a service we could stop it (which releases the handles) or if an application running interactively, try to shut it down and look to see how much total Kernel Memory (Paged or NonPaged depending on which one we are short of) we get back.If we were at 400MB of Paged Pool (Look at Performance Tab…Kernel Memory…Paged) and after stopping mybadapp.exe with its 100,000 handles are now at a reasonable 100MB, well there’s our bad guy and following up with the owner or further investigating (Process Explorer from sysinternals or the Windows debugger for example) what type of handles are being consumed would be the next step.

Tip:

For essential yet legacy applications, which there is no hope of replacing or obtaining support, we may consider setting up a performance monitor alert on the handle count when it hits a couple thousand or so (Performance Object: Process, Counter: Handle Count) and taking action to restart the bad service.This is a less than elegant solution for sure but it could keep the one rotten apple from spoiling the bunch by hanging/crashing the machine!

2.)By Pooltag (as read by poolmon.exe)

Okay, so no handle count gone wild? No problem.

For Windows 2003 and later machines, a feature is enabled by default that allows tracking of the pool consumer via something called a pooltag.For previous OS’s we will need to use a utility such as gflags.exe to Enable Pool Tagging (which requires a reboot unfortunately).This is usually just a 3-4 character string or more technically “a character literal of up to four characters delimited by single quotation marks” that the caller of the kernel api to allocate the pool will provide as its 3rd parameter.(see ExAllocatePoolWithTag)

The tool that we use to get the information about what pooltag is using the most is poolmon.exe.Launch this from a cmd prompt, hit B to sort by bytes descending and P to sort the list by the type (Paged, NonPaged, or Both) and we have a live view into what’s going on in the system.Look specifically at the Tag Name and its respective Byte Total column for the guilty party!Get Poolmon.exe Hereor More info about poolmon.exe usage.

The cool thing is that we have most of the OS utilized pooltags already documented so we have an idea if there is a match for one of the Windows components in pooltag.txt.So if we see MmSt as the top tag for instance consuming far and away the largest amount, we can look at pooltag.txt and know that it’s the memory manager and also using that tag in a search engine query we might get the more popular KB304101 which may resolve the issue!

We will find pooltag.txt in the ...\Debugging Tools for Windows\triage folder when the debugging tools are installed.

Oh no, what if it’s not in the list? No problem…

We might be able to find its owner by using one of the following techniques:

• For 32-bit versions of Windows, use poolmon /c to create a local tag file that lists each tag value assigned by drivers on the local machine (%SystemRoot%\System32\Drivers\*.sys). The default name of this file is Localtag.txt.

Really all versions---->• For Windows 2000 and Windows NT 4.0, use Search to find files that contain a specific pool tag, as described in KB298102, How to Find Pool Tags That Are Used By Third-Party Drivers.

Using driver verifier is a more advanced approach to this problem.Driver Verifier provides a whole suite of options targeted mainly at the driver developer to run what amounts to quality control checks before shipping their driver.

However, should pooltag identification be a problem, there is a facility here in Pool Tracking that does the heavy lifting in that it will do the matching of Pool consumer directly to driver!

Be careful however, the only option we will likely want to check is Pool Tracking as the other settings are potentially costly enough that if our installed driver set is not perfect on the machine we could get into an un-bootable situation with constant bluescreens notifying that xyz driver is doing abc bad thing and some follow up suggestions.

In summary, Driver Verifier is a powerful tool at our disposal but use with care only after the easier methods do not resolve our pool problems.

4.)Via Debug (live and postmortem)

As mentioned earlier the api being used here to allocate this pool memory is usually ExAllocatePoolWithTag.If we have a kernel debugger setup we can set a break point here to brute force debug who our caller is….but that’s not usually how we do it, can you say, “extended downtime?”There are other creative live debug methods with are a bit more advanced that we may post later…

Usually, debugging this problem involves a post mortem memory.dmp taken from a hung server or a machine that has experienced Event ID:2020 or Event ID 2019 or is no longer responsive to client requests, hung, or often both.We can gather this dump via the Ctrl+Scroll Lock method see KB244139 , even while the machine is “hung” and seemingly unresponsive to the keyboard or Ctrl+Alt+Del !

When loading the memory.dmp via windbg.exe or kd.exe we can quickly get a feel for the state of the machine with the following commands.

So in this rudimentary example the offender is clear in mybadapp.exe in its abundance of threads and one could dig further to determine what type of thread or functions are being executed and follow up with the owner of this executable for more detail, or take a look at the code if the application is yours!

Common Question:

Why am I out of Paged Pool at ~200MB when we say that the limit is around 460MB?

This is because the memory manager at boot decided that given the current amount of RAM on the system and other memory manager settings such as /3GB, etc. that our max is X amount vs. the maximum.There are two ways to see the maximum’s on a system.

Note that we have to specify a valid path to dbghelp.dll and Symbols path via Options…Configure Symbols.

For example:

Dbghelp.dll path:

c:\<path to debugging tools for windows>\dbghelp.dll

Symbols path:

SRV*C:\websymbols*http://msdl.microsoft.com/download/symbols

2.)The debugger (live or via a memory.dmp by doing a !vm)

*NonPaged pool size is not configurable other than the /3GB boot.ini switch which lowers NonPaged Pool’s maximum.

128MB with the /3GB switch, 256MB without

Conversely, Paged Pool size is often able to be raised to around its maximum manually via the PagedPoolSize registry setting which we can find for example in KB304101.

So what is this Pool Paged Bytes counter I see in Perfmon for the Process Object?

This is when the allocation is charged to a process via ExAllocatePoolWithQuotaTag.Typically, we will see ExAlloatePoolWithTag used and thus this counter is less effective…but hey…don’t pass up free information in Perfmon so be on the lookout for this easy win.

Welcome to the Microsoft NTDebugging blog! I’m Matthew Justice, an Escalation Engineer on Microsoft’s Platforms Critical Problem Resolution (CPR) team. Our team will be blogging about troubleshooting Windows problems at a low level, often by using the Debugging Tools for Windows. For more information about us and this blog, check out the about page.

To get things started I want to provide you with a list of tools that we’ll be referencing in our upcoming blog posts, as well as links to some technical documents to help you get things configured.

The big list of tools:

The following tools are part of the “Debugging Tools for Windows” – you’ll definitely need these