The session trying to issue the kill will hang for 60 seconds and then return this “session marked for kill” message. And the target session does not get killed at all.

So why is that?

The issue is in what this alter system kill command is doing. It’s not actually killing the target session (like kill -9 would do for OS processes). It just sets a bit in the target sessions state object, which marks that the target session should end. But its entirely up the target session to check this bit and act on it!

So, intead of ALTER SYSTEM KILL SESSION, the command should look something like ALTER SYSTEM ASK SESSION TO COMMIT SUICIDE.

All the kill session command is doing is ASK the target session to clean up and exit – via setting that bit.

Now, normally the target sessions are nice and check that bit often enough in their code, act on it and die.

But sometimes when the target session happens to be busy looping in some tight loop (due a bug perhaps) or is hung, then it never gets to check that “please die” bit and never exits.

This is why DBAs often need to kill the OS process or thread via OS tools to get rid of that session (and its locks, transactions) as when you kill the OS process, PMON will detect it (if not fast enough then it can be woken up via ORADEBUG WAKEUP call few times) and clean up after that session.

So, the “ORA-00031: session marked for kill” message you see after 60 seconds just means that:

So, the killer waits for inactive session wait for 60 seconds and then times out. If the target session to be killed ever checks that “please die” bit, it will clean up and exit, otherwise it could be stuck forever!

This is the case when you need to log on to the server and kill the target process from OS level. If you can’t log on to target OS for whatever reason (but still have SYSDBA access) then you could try attaching to target process with oradebug and running ORADEBUG EVENT IMMEDIATE CRASH as I’ve explained here.

My preferred approach for killing sessions usually is:

1) of course – verify whether you’re killing the right session first

2) ALTER SYSTEM KILL SESSION

3) If that doesn’t work immediately then check whether the target session has acknowledged the kill and is rolling back its (large) transaction. You can do this by checking V$TRANSACTION.USED_UREC for that session’s transaction (if it has any) and see if its decreasing. If yes, the transaction is rolling back and we just need to wait it to finish. There’s also a bit in V$TRANSACTION which states whether the transaction is rolling back, but I don’t have my notes available right now (I’m on vacation actually :)

4) If there’s no rollback happening and session just seems to be stuck, then its time to kill that session’s process from OS level.

5) If couple of minutes after killing the process from OS level that sessions and its locks & resources are still not released (remember, we have verified that there was no long transaction rollback happening) then I would attach to my own process with oradebug and run “ORADEBUG WAKEUP 2″ couple of times (and checking if the session has gone + waiting few seconds between each invocation). The “2” here means Oracle PID of PMON process which is usually 2, but you should check it from your V$PROCESS view.

If the session is still not gone, it’s time to open a support request.

Note that I didn’t post all the little details here – for example, before killing a problem process which is stuck it makes sense to gather some diagnostic data first (like run pstack on the problem process few times to get its current stack trace).

16 Responses to Alter system kill session and ORA-00031: session marked for kill

That’s probably one of the main reasons, why in 11g Oracle has introduced ALTER SYSTEM DISCONNECT SESSION … IMMEDIATE command. According to documentation: The IMMEDIATE setting disconnects the session and recovers the entire session state immediately, without waiting for ongoing transactions to complete.
And it works quite nice in real world, we don’t have to login to DB host to kill sessions anymore.

Tanel:
Very informative article. Is 152 the session from which you issued the kill? When you say ” Your session waits the target session to die for 60 seconds and times out after it doesn’t happen”. So it waits for 60 seconds for the bad session to get killed otherwise it just sets the bit and we hope that that the bit is acknownledged by the bad session? How is the limit of 60 seconds determined?

Correct me if I’m wrong, but under 9i it seems that after you killed an Oracle session and it gets “marked for kill”, the process goes away but not the session. So at that point you can’t get the SPID, as it’s not found in V$PROCESS.

I believe this ISN’T the case in 10g. I need to fit 9i on my laptop and test …

I will write a separate article about the disconnect session and kill session immediate. In my previous experience the immediate options haven’t worked reliably. I tested on 11.1.0.7 on Linux and still it succeeds only in some cases.

Regarding why I would not want to kill -9 a session when its already rolling back a long transaction – the reason is that I would prefer the session to clean up after itself, instead of disturbing it and letting smon to take over.

There can be value in killing the session and letting smon to take over though, if you’re on EE, have lots of CPUs and have fast_start_parallel_rollback enabled, this can make smon spawn multiple slaves for recovery. However it doesn’t always necessarily speed up the rollback and I’ve had bad experiences with completely hung parallel rollbacks in 9.2. I’m sure this issue is fixed by now but I’m just being careful – if a session is already rolling back after itself and if the estimated rollback time is acceptable, then don’t touch it!

1. create the session by logging into the database as user x and execute exec dbms_lock.sleep(100)
2. create another session and select sid,serial#,event from v$session where username=’userx';
3. on the same session, ‘alter system kill session ‘sid,serial#';
BIG NOTE: Wait for 60 seconds and don’t be impatient. It’ll be killed after sometime.

@ronald
Ronald, you’ll probably find that the session got finally “killed” after 100 seconds, the time you requested it to sleep. But if the sleep is longer, or the session is hopelessly stuck in some loop (due a bug) then the session won’t be killed until the sleep/operation completes.

As you promised “I will write a separate article about the disconnect session and kill session immediate.”, i am just requesting you to, if you please get time to write it or if you have posted it, kindly share the link, so that I may know what is mystery behind simple kill and disconnect session.