If the crashes appear to be random and you don't know how to trigger them, it's hard to connect windbg to the problem postgres.exe before it crashes.

+

If the crashes appear to be random and you don't know how to trigger them, it's hard to connect windbg or Visual Studio to the problem <code>postgres.exe</code> before it crashes.

−

Setting windbg as the JIT (just-in-time) or post-mortem debugger won't help you, because postgresql generally runs as a service under a different user account that cannot interact with the desktop. You could always initdb a new cluster under your normal user account and use pg_ctl to start the postmaster with that cluster manually, so you can JIT debug under your own user account where Pg can interact with the desktop. This isn't suitable for production use, though, and you might not be able to reproduce the problem that way.

+

Setting <code>windbg</code> as the JIT (just-in-time) or post-mortem debugger won't help you, because PostgreSQL generally runs as a service under a different user account that cannot interact with the desktop. You could always <code>initdb</code> a new cluster under your normal user account and use <code>pg_ctl</code> to start the postmaster with that cluster manually, so you can JIT debug under your own user account where Pg can interact with the desktop. This isn't suitable for production use, though, and you might not be able to reproduce the problem that way.

−

For that reason, there's now a pre-release version of a [https://commitfest.postgresql.org/action/patch_view?id=396 crash dump handler for Pg on Windows] that traps backend crashes and writes dump files that may be opened later with Windbg or with Visual Studio (paid versions only) to debug the backend crash. This lets you to debug without disrupting the operation of your server, or send the crash dump to somebody else who can learn more from it than you might. Testing of the crash dump handler would be greatly appreciated.

+

In PostgreSQL 9.0 and above there is a [http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/port/win32/crashdump.c;h=7550fa6f26b82d6fc41f5f68afb35ec44d25d00b;hb=HEAD crash dump hander] included in PostgreSQL. To use it:

* Give the PostgreSQL user (<code>postgres</code> by default) "full control" of it in the security tab of the folder properties

+

* Run the problem code. You don't need to restart Pg or change any settings.

+

* When a backend crashes, a Windows minidump should be created in the <code>crashdumps</code> directory.

+

+

You can debug this file with Microsoft Visual Studio or with windbg, using much the same procedures you use when attaching to a process ID as explained above. Unfortunately the more recent Express editions of Visual Studio do not have the ability to debug minidumps.

+

+

Minidumps can also be sent to somebody else to debug. For this to work you must tell the other person the *exact* PostgreSQL version you are using, because the debug symbols vary from minor release to minor release. You can't debug a 9.1.0 minidump with debug symbols taken from a 9.1.1 install. It's often easiest to just send them the <code>.pdb</code> files from your install alongside the minidump.

Once you have installed Process Explorer and the Debugging Tools for Windows, you'll need to do a little bit of setup so you can get useful stack traces.

Note: The instructions below assume your copy of Windows is installed on c: so you may have to adjust them if this is not the case.

Configuring the symbol path

First, you need to set the symbol path. First you have to figure out what it is, since it depends on the version of PostgreSQL you've installed, the language of your version of Windows, and where PostgreSQL was installed to. What you need to find is the location of the symbols folder inside the PostgreSQL install directory. If you've used the EnterpriseDB 1-click installer for PostgreSQL 8.4 and you're using an English version of Windows, it'll be:

C:\Program Files\PostgreSQL\8.4\symbols

I'll assume that's the case in the following instructions, so adjust the path shown to match your system if required.

To set the symbol path, you'll need to use the System control panel. You can get to this quickly and conveniently with all versions of Windows by holding down the Windows key and pressing the BREAK key (it's often under the Pause key). You can also find it in the Control Panel; you may need to search for "system". If you can't find it, open the Start menu and click "Run" then enter "sysdm.cpl" and press enter.

In the System control panel, click on the Advanced tab and then click "Environment Variables". An "Environment Variables" dialog will appear. In that dialog, under the label "User Variables for <username>" you'll see a list. What you need to do is click the "New" button under that list.

In the dialog that appears, enter the variable name exactly as:

_NT_SYMBOL_PATH

Note the leading underscore ("_") and the fact that the variable name is ALL UPPER CASE with UNDERSCORES_BETWEEN_WORDS.

The value is in two parts. The first part depends on where PostgreSQL is installed, as described above. The second part tells debugging tools to use a Microsoft server (the Symbol Server) to find out about parts of Windows its self.

Setting up Process Explorer

Open Process Exploer from the start menu, use "procexp.exe" in the Start->Run dialog. Read and accept the license agreement if prompted.

Now, in the "options" menu choose "configure symbols". First, note the "symbols path" entry, which should be the same as the value you entered for _NT_SYMBOL_PATH in the System control panel. If it isn't, you've made a mistake setting up the environment variable for the symbol path in the System control panel.

There is another entry in the "configure symbols" dialog labelled "dbghelp.dll path". If this begins with "c:\windows" you'll need to change it to the alternative provided by the Debugging Tools for Windows. The exact location varies depending on your debugging tools version. Use the "..." button, type %ProgramFiles% into the "file name" field and press enter. Now find the debugging tools for windows folder, open it, and find dbghelp.dll. Double click on it. The dbghelp.dll path should've changed to something like:

C:\Program Files\Debugging Tools for Windows (x86)\dbghelp.dll

You can now click "OK".

Checking the setup

You should now be able to look at the stack trace of a commonplace process that's pre-installed on Windows systems. I suggest starting Notepad (start->run->notepad.exe), then launching Process Explorer, double-clicking on "notepad.exe" in the process list, switching to the "threads" tab, and double-clicking on the only entry in the list shown there. The displayed stack trace should look like this (on Windows XP SP3 at least):

with lines about wow64. This backtrace is useless for debugging. You're debugging a 32-bit binary on 64-bit windows with 64-bit debugging tools. You might be able to get a useful trace by installing and using the 32-bit debugging tools, but I haven't verified that.

Getting the stack trace

Phew. You now have things configured and can actually start collecting information.

Important: Don't waste your time generating stack traces that're useless. Please read #How to make sure a stack trace is useful before spending lots of time collecting information or posting to the mailing list. Your stack trace needs to contain "!SymbolicNames" as well as "+offsets" to be much use.

If you are collecting information about a postgresql backend that appears to be in an infinite loop or using too much CPU, you will need to take more than one stack trace. You should try to get quite a few over a short period of time so that developers can tell where it's spending its time. One stack trace alone doesn't really tell you a lot about what it's doing overall.

Getting the process ID of the PostgreSQL backend

Identifying the process ID of the PostgreSQL process you're interested in isn't always easy. If you have an active connection to the backend you want to look at, you can run select pg_backend_pid(); to get it's process ID. Sometimes it's obvious - it's the one using constantly high amounts of CPU as shown in Process Explorer, for example. At other times, though, you'll need to use psql or PgAdmin III to look in the pg_catalog.pg_stat_activity or pg_catalog.pg_locks tables/views to find out the process id ("pid" or "procpid") of the process you're interested in.

Getting a stack trace using Process Explorer

Launch Process Explorer from the Start menu, or by running "procexp.exe".

You'll need to find the postgres.exe instance you're interested in. It might be one showing consistently high CPU usage, but if not, see the instructions above for finding it by process ID.

Once you know which postgres.exe backend you want to examine, double-click on it in the list in Process Explorer. In the dialog that appears open the "Thread" tab. You'll now see a list of threads. In most cases you're interested in the first thread in the list. Select it by clicking on it and click the "stack" button. After a short delay a list will appear in a new dialog, looking something like this:

See how there's different text after the exclamation mark on each line? If that's not the case, your symbol path is probably set up wrong, and Process Explorer can't figure out what the program is doing. Without the text after the exclamation marks the stack trace is mostly useless, so you'll need to collect that information.

You need to copy the whole list. Click on the first entry, hold down the shift key, and press the "end" key (or scroll to the bottom and click the bottom entry), then click the "copy" button.

You can now paste the stack trace into your email program to send it to the PostgreSQL mailing list. By its self it is not useful, so you MUST provide the usual debugging information too - your OS version, PostgreSQL version, what's going wrong, what you're doing when it goes wrong, etc etc. See Guide to reporting problems.

Getting a stack trace using WinDBG

The Debugging Tools for Windows include a program called windbg.exe, the Windows Debugger. It's a very powerful and useful tool, but it's not user friendly. Nonetheless, it provides better debugging information than Process Explorer can, and unlike Process Explorer it is capable of capturing debugging information from a postgres.exe backend that crashes. Windbg.exe may also be used to remotely debug a system even if you don't have full GUI access. Note, however, that unlike Process Explorer using windbg may interrupt the execution of the backend or possibly (though not usually) even hold up the entire database system.

You will need to know the process ID of the backend you want to connect to. See "Getting the process ID of the PostgreSQL backend" above for how to find that out.

Once you know the process ID, you need to start WinDBG. It'll be in the Start menu under Debugging Tools for Windows. To connect to the PostgreSQL backend of interest, use the File -> Attach to Process menu item (shortcut: F6). Enter the pid you found earlier into the "process id" field and press enter.

windbg has now attached to that postgres backend. Its execution has paused, and it will appear to have hung. This is normal. You're in a command-line interface, but you don't really need to know much about it. All you'll need to do is type the command:

~*k

in and press enter, then select all the output (CTRL-A) and copy it (CTRL-C) to the clipboard. You can now paste this information into a text file, email, etc. It'll look something like this:

If you've collected everything you needed to you can detach from the PostgreSQL proces (debug menu -> Detach debugee) and close WinDBG. Alternately, if you want to collect a few more stack traces at different points in execution, use the "G" command to tell PostgreSQL to resume running. When you want to interrupt it again, use "Break" from the Debug menu and enter ~*k to get another set of stack traces.

Getting a stack trace of a repeatable backend crash

If you're trying to track down a backend crash and you can reasonably predictably trigger the crash in a connection of your choice, you'll probably want to use windbg. You will need to attach windbg to the backend while it's still running (see above) and issue the "G" command in windbg to get the backend to continue running. Now you need to do whatever is required to trigger the crash in the backend using your connection to it.

When the backend crashes, windbg should interrupt the crash. At this point, you can run the following command in windbg to get some stack trace information about the crash:

~*k

Random and unpredictable backend crashes

If the crashes appear to be random and you don't know how to trigger them, it's hard to connect windbg or Visual Studio to the problem postgres.exe before it crashes.

Setting windbg as the JIT (just-in-time) or post-mortem debugger won't help you, because PostgreSQL generally runs as a service under a different user account that cannot interact with the desktop. You could always initdb a new cluster under your normal user account and use pg_ctl to start the postmaster with that cluster manually, so you can JIT debug under your own user account where Pg can interact with the desktop. This isn't suitable for production use, though, and you might not be able to reproduce the problem that way.

In PostgreSQL 9.0 and above there is a crash dump hander included in PostgreSQL. To use it:

Create a directory named crashdumps (all lower case) in the PostgreSQL data directory (as shown by SHOW data_directory; in psql)

Give the PostgreSQL user (postgres by default) "full control" of it in the security tab of the folder properties

Run the problem code. You don't need to restart Pg or change any settings.

When a backend crashes, a Windows minidump should be created in the crashdumps directory.

You can debug this file with Microsoft Visual Studio or with windbg, using much the same procedures you use when attaching to a process ID as explained above. Unfortunately the more recent Express editions of Visual Studio do not have the ability to debug minidumps.

Minidumps can also be sent to somebody else to debug. For this to work you must tell the other person the *exact* PostgreSQL version you are using, because the debug symbols vary from minor release to minor release. You can't debug a 9.1.0 minidump with debug symbols taken from a 9.1.1 install. It's often easiest to just send them the .pdb files from your install alongside the minidump.

Crashes during postmaster startup

Remote debugging with windbg.exe

You might not want to, or be able to, sit at the database server console for debugging. That won't stop you using windbg on PostgreSQL, though - you can launch a remote debugging server on the Pg machine and remote debug the machine.

Launch dbgsrv.exe on the remote machine as an Administrator. Use the command: dbgsrv.exe -t tcp:port=12345,password=fred . (If you need to, use runas.exe to get admin rights).

On the workstation you're debugging from:

Launch windbg

In the file menu, choose "Connect to remote stub..."

As the connection string enter "tcp:server=YOUR_PG_SERVER_NAME_OR_IP,port=12345,password=fred"

Click "OK"

Windbg will now try to do all work via the remote debug stub, as if you were running windbg on that machine, so you can follow the instructions as given above.

Debugging using Microsoft Visual Studio or VS Express 2008

Windbg is free, quick and easy to install, supports remote debugging, and a real pain to use. You'll want something much nicer to use if you're trying to do more than just get a stack trace - and for some problems, you will indeed need more than a stack trace to know what's going wrong. If you intend to step through the execution of PostgreSQL's code and examine the value of variables, using Microsoft Visual C++ will make it a lot easier.

Microsoft Visual C++ has a much nicer debugger that I highly recommend using in favour of windbg.exe - if you have the option. The free 2008 express edition can be used for many debugging tasks; it lacks remote debugging support and just-in-time postmortem debugging (which the full, paid edition has) but otherwise does a very good job as a user-friendly debugger. Unfortunately VS 2010 removes the ability to attach to already-running processes, so unlike 2008 it's basically useless for debugging PostgreSQL. This means you can't use 2010 express edition at all, and you can't use 2008 express edition for debugging random crashes, but 2008 is still great for tracking down predictable crashes and runaway backends - anything where you can connect to the backend by process ID.

Attaching

Once you've installed VC++ 2008 Express, launch it and use "Attach to Process" in the Tools menu to connect to the PostgreSQL backend you want. Make sure "show processes from all users" is checked, select the postgres.exe process you want to debug, and click "Attach". You'll get a security warning, which you should read and understand before accepting.

VC++ will load symbols for PostgreSQL, which may take a few minutes if you haven't used VC++ to debug PostgreSQL before (it'll have to fetch them from the symbol server). Once VC++ finishes loading symbols, execution of the Pg backend will resume and your database system will be working normally.

Breaking into execution

You can now break into the execution of the postgres.exe backend with the break button on the toolbar. It looks like a pause button. If you're debugging a runaway backend, you'll want to break into its execution now. Alternately, if you're debugging a crash, you should trigger the crash now.

If VC++ detects a crash it'll display a dialog like this to give you the chance to break into the process's execution:

If you break into a process's execution while it's running in the kernel you'll see a dialog like this:

... which is common for PostgreSQL since it'll often be waiting for network or disk activity. You can generally ignore it.

Starting debugging

You'll get a dialog telling you that "there is no source code available for your current location". Click OK. Now, on the bottom right (by default) you'll see a set of tabs named "call stack", "breakpoints", and "output". Click on "call stack".

You can see the stack of function calls that were in progress in Pg's main thread at the time you interrupted postgresql. (If you want or need to change threads, use the "threads" tab on the bottom left - double click on the thread of interest to debug it - though you usually want the main thread which is active by default).

In the call stack you can double-click on an entry to see where in the executable the function call arose. If you don't have source code, it'll offer to show you a disassembly, but you have source code for PostgreSQL so you can do better than that. (If you don't have source code for Pg, download and extract the sources for your version now. If you can't open the .tar.gz file, install [www.7-zip.org/|7-zip]). Double-click on the topmost line in the stack trace that starts with "postgres.exe" and a dialog will appear asking for a .c file:

It'll note the original location of the file at the top of the dialog. In my case, it wants socket.c from src\backend\port\win32\socket.c, so I find the postgresql sources (on my machine: c:\developer\postgresql-8.4.0) and from there open src, then backend, then port then win32. Visual studio notices "win32.c" within the directory and opens it automatically. From now on, during this debugging session VS should find all other postgresql source files too.

Debugging tips

The Visual Studio debugger is far too capable to fully describe here. I'll just give a few hints - and suggest that you use the documentation.

Now that you have sources displayed, you can step through the code line-by-line, run until the end of functions, step into functions, and otherwise control the execution of the PostgreSQL process in detail using the buttons on the toolbar. You can also examine local variables at the stack frame you're on using the "locals" tab on the left, set breakpoints in functions using the "breakpoints" tab on the right, etc.

For example, if I wanted to pause execution the next time PostgreSQL tried to receive data from the network I might set a breakpoint on the pgwin32_recv function and resume execution:

If I then issue a query on the backend I'm debugging, Visual Studio will shortly break back into the debugger at the start of the pgwin32_recv function. I can now step through the function's execution to see what functions are called, what values variables take, etc. I can add and remove breakpoints by clicking in the margin on the left of the source file display.

Note that you must be careful not to change the values of any variables or cause execution to skip lines using the "set next statement" option. If you're not careful you could crash the server or possibly severely corrupt your database.

How to make sure a stack trace is useful

A stack trace isn't always trustworthy or useful. This is particularly the case on Windows, where programs generally contain no debugging information and proper stack traces can't reliably be created without it.

If you're going to post a stack trace as part of a bug report or request for help, it's really important that it contains symbolic information about the call path, like the example posted above. If, instead, it contains just module names and numeric offsets within the modules, it doesn't tell the reader anything useful unless they know exactly what versions of the modules you have. Even then, it's difficult and cumbersome to read. So, make sure your stack trace contains lines showing modulename!SymbolicName+offset, like this:

The C function <code>process_implied_equality(...) from backend/optimizer/plan/initsplan.c is deep in the query planner, and is (according to the comments) "currently used only when an EquivalenceClass is found to contain pseudoconstants". It's not going to be calling into hal.dll - not, at least, without a fairly long call chain between it and hal.dll. So we can't trust that postgres.exe was even in the process_implied_equality function when it called into hal.dll and if it was we don't have any idea how it got there. The call trace is basically useless.

Compare it to the example stack trace shown earlier in the article, where we can clearly see that PostgreSQL has called WaitForMultipleObjectsEx (to wait until data is delivered to a socket) via its win32 wrapper layer from the pg_getbyte function ... and so on.

No symbolic names? Why?

Your stack traces aren't showing !SymbolicNames, and you don't know why. Here are a few possibilities.

No symbols for the module

You might simply not have any way to get symbols for the module you're debugging. If you're finding that most of the stack trace (at least the parts in the system DLLs and NT kernel) have symbolic names shown, but other parts don't, it's quite likely you just have no debugging information (pdb files) for those parts.

This shouldn't be the case for PostgreSQL if you've set up the symbol path to point to the symbols for your version of PostgreSQL as well as the Microsoft Symbol Server, though.

_NT_SYMBOL_PATH not set

Are you sure you set the _NT_SYMBOL_PATH environment variable?

In the Start menu choose Run, then enter "cmd" and press enter.

In the resulting window, type in exactly:

set | findstr symbol

You should get the following result, allowing for differences in PostgreSQL versions, drive letters, etc:

Note that _NT_SYMBOL_PATH is in capitals, with underscores between words and a leading underscore.

The same string should show up in Process Explorer, in Configure Symbols under the Options menu.

Add-in software firewall

Do you have an add-in software firewall installed? The firewall built in to Windows will not cause problems as it only affects listening sockets and incoming connections, but 3rd party firewalls often block outbound connections too. This may prevent access to the symbol server from working.