Debugging the MaxDB Kernel on Windows – How to get a Call stack from hanging tasks

Ok, today I’ll show you how to install, setup and use the Windows Debugger to get information about hanging MaxDB kernel tasks.The first thing we need is obviously a hanging MaxDB session, so let’s produce one!

Setting up the test case

My MaxDB instance is a 7.6.03 Build 9 instance with only one 50 MB data volume. The goal of the test setup is to create a DB FULL situation, since this is a rather common MaxDB hanging situation when DATA AREA AUTOEXTEND is not active.Therefore I create a simple table in SQL Studio:

create table test (col1 int primary key)

and fill it with LOTS of data (notice the Cartesian product of the three-way join with the files table):

insert into test (select rownum from files a, files b, files c)

After a short amount of time my DB will be filled up. This is something the DBMGUI really tells you:

For our analysis of this situation we need some more information, especially we need the Windows thread ID of our hanging session. This information is available via the x_cons tool (you may as well use the dbmcli-command ‘db_cons’ or the DBMGUI function CHECK -> Database Server -> ACTIVE:

As we see only two tasks are currently ACTIVE. One (T33) is the SAVEPOINT that is triggered automatically whenever a LOG FULL/DB FULL situation occurs.The second one (T17) is our hanging session and we see it’s waiting on the DB FULL (197) suspend reason. Note the ‘Win tid’ column; it’s what we’re looking for right now. Our user task has the Windows thread ID 0x4EC in Hex-Format or 1260 in decimal format.

Installing the Windows Debugger

Now we’ve two options to get a call stack. The first one is to use the MS Windows Debugger. This tool is available on the Microsoft Website for free download.

Depending on your Platform either choose “Install Debugging Tools for Windows 32-bit Version” or “Install Debugging Tools for Windows 64-bit Versions” and install the tools on the machine where our database runs.

I installed mine into the standard path ‘C:\\Program Files\\Debugging Tools for Windows’.

The next things we need are the symbol files.These file contain the information that make it possible that the debugger can tell us, what function is at what memory address of any windows program.These symbols can either be downloaded completely via the ‘Download Windows Symbol Packages’ or we can tell the debugger to load them on-demand while we debug. As my machine is connected to the Internet.I go for this option and show you how to configure in the next few paragraphs.

Debugging our user task

Now, we just start the WinDbg.exe (after the installation there is a new entry in the START-menu).In order to analyze anything we’ve to tell the Debugger where to find all the symbols. Via the Menu ‘File’-> ‘Symbol File Path’ we get a dialog like this:

We’ve to enter a PATH Variable here, so we can put in several paths limited by the semicolon sign ‘;’. Since we want to have the Windows symbols automatically downloaded from the MS website the first path we enter is this one:

The bold printed part denotes the local folder where the debugger should store the downloaded symbol files. Next we need to supply the paths to the MaxDB symbols. Since these are delivered by default in each installation I just have to put in these paths on my machine:

C:\\sapdb\\nbd\\db\\symbols;C:\\sapdb\\programs\\symbols

As you see there is a ‘symbols’-Folder in each DEPENDENT folder and one in the INDEPENDENT-folder.

Now the debugging can start.To do so we need to attach the debugger to the process we want to debug. This can be done via ‘File’ -> ‘Attach to process [F6]‘.We get a list of running processes and choose our ‘KERNEL.EXE’:

Make sure you check the box ‘Noninvasive’ since we want our process to run further after our debugging and all we want to do now is ‘just looking’!

If you’re unsure which one of possibly several KERNEL.EXE-processes on your machine would be the right one, just look up the process ID in the knldiag-File in the startup part:

We click on ‘Display Threads’ [1] first and choose a thread in the list. Now we need to remember our Windows thread ID from above: 0x4EC.After we select our thread we click on ‘show Callstack’ [2] et voilà:

That’s the call stack of our hanging session.To read it more easily or to hand it over to support for analysis we want a text-version of it. Nothing easier than that: right-click on the Windows-Title bar and click ‘Copy stack to clipboard’ in the context menu. Now paste it into notepad and we’re done.

To stop our debugging session we just select ‘Debug’ -> ‘DETACH DEBUGEE’ from the menu.

Process Explorer Approach

What we’ve right now can also be easily archived with the Sysinternals tool Process Explorer. All I can say about the Sysinternals tool suite for Windows is: get it onto your servers!Whenever there is anything weird to analyze these tools have been reliable sources of in-depth system information.So double recommendation: install them! Today!The tools are also available for free at the MS website:

To be able to actually use symbols and produce readable call stacks we need to setup the process explorer a little bit.It also needs to have the Windows Debugger tools installed, since it makes use of the special Dbghelp.dll that comes with them.All we need to do now is to tell process explorer where this DLL is stored and enter our symbols path (see above):

Now we just browse through the process overview and double-click on our KERNEL.EXE.The process ID from the knldiag-file can again help us here to find ‘our’ process:

Now we browse through the threads until we find thread ID 1260 (= Hex 0x4EC) and click on [Stack]:

Again we can use ‘copy’ to save this information to a text file or a support message.

To ‘read’ the call stack we start from the button and move upwards (that’s how a stack works). I will just comment on a few of these lines now.We see at line 2d that this is a user task ‘kernel!SQLTask+0x7b’ that currently handles a statement that changes the db content ’27 kernel!a501exec_with_change_rec+0x299′.Since the statement is not a single-table statement but involves a sub-select we see ’26 kernel!a502complex_execution+0x407′ before that.A bit further up we see that the table-content will be changed ‘1f kernel!k61table_ins_del_upd+0x34′ and also we see that the select of the subquery is handled ‘1e kernel!k720_single_select+0x24’ and later it’s inserted into our target table ’17 kernel!kb61insert_rec+0xa7′.An insert need of course some work on the b*tree in which the table is stored so we see ’15 kernel!bd30AddToTreeWithBeforeImage+0x153‘ and know by now that this transaction will be rollback-able.After all Undo- and Log-Information has been written (yes, we follow the WAL Write-Ahead-Log approach) a new page needs to be allocated, to hold our data ‘0d kernel!Data_PageAccessManager::NewPage+0x1e1′.As this page has to be provided by the Converter we find‘0a kernel!Converter_Converter::RequestNewPageNo+0x18a’.The Converter than realizes that there is no space available anymore, so it suspends our task and lets it wait for the notification that the DB FULL situation is gone ’05 kernel!__os74WaitForWakeup+0xa4‘.

As we see it’s no too difficult to understand what tasks are waiting for once we have the necessary information (the call stack).This was just a demo situation but in other hanging situation this canhelp to find the cause.So better install the Windows Debugger and the Sysinternals tools (plus the Symbols if no Internet is available) on your Windows server today!

Cleaning up

But one last thing: how to solve our Data Full Situation?In MaxDB you HAVE (sorry no more ways to stress this via formatting…) to add a data volume now. There’s no way to cancel or kill the user task (that is the user session) that causes the big data load even if it’s not wanted beforehand.So what to do when we don’t really want the data or can not provide much storage space right now?Well, we kill the user task anyhow!

The ‘!’ sign denotes that this user task has been marked to be killed.Due to the cooperative multitasking model on user task level, the usertask is not killed by some authority but has to commit ‘suicide’.It first has to be able to read the KILL flag and react on this.But to be able to do so, it must get out of the DB FULL situation.

So we really need to provide a new data volume here to get the task actually killed.Therefore I add just a very small volume. Since the task will be killed the transaction that loads the data will be rolled back. In effect the database will have much more freespace again afterwards.

After the rollback is done I can simply drop the data volume again – this is the point where we are happy that this database is already on 7.6 😉MaxDB Versions <7.5 are not able to drop volumes!