Categories

Debugging Stuck Process in Linux

The other day I faced a problem with monitoring setup and I found that the WebUI is not responding. I SSHed into server and checked if process is running. It was. Checked if port was open. It was. So as it happened, the process was running and listening on port but it was stuck somewhere and it was not accepting connection. So there it was, a running stuck process.

Now I could simply have restarted the stuck process but that wouldn’t tell me what actually happened and where it was stuck.

This is not step by step guide but it provides an insight on how various tools commands can be used. So here’s what I did to investigate what was going on:

Find The Stuck Process:

You can use the following to get the process ID

Shell

1

$psauxww|grep

Okay so we have PID now. Let’s look into what the stuck process is doing right now. strace comes to rescue here and it showed something like

Shell

1

$strace-precvfrom(11,)

If you google around for system call called recvfrom you will get something like:

The recvfrom() and recvmsg() calls are used to receive messages from a socket, and may be used to receive data on a socket whether or not it is connection-oriented.

So we now know process is trying to receive data and stuck there itself, reading further into that man entry it says the first argument is socketfd (which was 11 in my case) That can help us know more on that socket which is stuck.

So to dig more in that socketfd we use /proc filesystem.

Shell

1

2

3

4

5

6

7

$ls-l/proc//fd

lrwx------1sanket sanket64Feb523:000-&gt;;/dev/pts/19

lrwx------1sanket sanket64Feb523:001-&gt;;/dev/pts/19

lrwx------1sanket sanket64Feb522:592-&gt;;/dev/pts/19

...

lrwx------1sanket sanket64Feb523:0011-&gt;socket:[102286]

Note how FD 11 is a socket fd. Note the number (102286). Now let’s dig more into that socket. lsof can help us here.

This will finally tell us where the socket is connected to. It can be your database server. So there. You know you have to fix your database.

Doing something with stuck process:

I went a step ahead to unfreeze the process. Getting it back on without restarting it. So here comes a debugger in picture. Fire up gdb and force process to give up on that FD. ie call the close method on the stuck fd.