9 Answers
9

Generally, you should use kill -15 before kill -9 to give the target process a chance to clean up after itself. (Processes can't catch or ignore SIGKILL, but they can and often do catch SIGTERM.) If you don't give the process a chance to finish what it's doing and clean up, it may leave corrupted files (or other state) around that it won't be able to understand once restarted.

strace/truss, ltrace and gdb are generally good ideas for looking at why a stuck process is stuck. (truss -u on Solaris is particularly helpful; I find ltrace too often presents arguments to library calls in an unusable format.) Solaris also has useful /proc-based tools, some of which have been ported to Linux. (pstack is often helpful).

the compelling reason is that if you get in the habit of sending SIGKILL, then when you get to a program which will, for example, corrupt an important database for you or your company, you'll really regret it. kill -9 has its use, as a last-resort terminator, emphasis on last-resort; admins that use it before the last-resort a) do not understand being an admin too well, and b) shouldn't be on a production system.
–
ArcegeMar 9 '11 at 12:39

7

@Mikel Another thing to through it, sometimes it's best to trick an app into cleaning itself up with a signal like SIGQUIT or SIGSEGV if it won't respond to SIGINT/SIGTERM. For example, a full screen 3-D app or even Xorg. Using SIGQUIT, it won't have a chance to clean-up anything, but tricking it into thinking a segment fault happen and it will feel it has no choice but to clean up and exit.
–
penguin359Apr 3 '11 at 11:10

9

@Arcege Do you think that using a database that corrupts data if killed with -9 is a database worth using after all? iirc, mysql, bdb, pg, etc... all behave well when killed with -9.
–
dhruvbirdJan 28 '14 at 6:52

@dhruvbird: just because your DBs are supposed to come equipped with bullet-proof vests doesn't mean you should shoot them if you don't need to. While you may be right that it's not as risky as Arcege seems to say, I think his point still stands that it's risky and should be a last resort.
–
iconoclastJan 30 '14 at 15:34

Won't the operating system close any open file descriptors (including sockets) when the process terminates?
–
Brian GordonJan 28 '14 at 5:10

2

Yes it will. But suppose you are killing a server process with clients connected, then the clients won't notice that the server is gone before timeouts.
–
Björn LindqvistJan 28 '14 at 8:48

16

Ah yes the old "if it is in any way imperfect you are stupid to use it" argument.
–
TimmmmJan 28 '14 at 19:17

2

Or stupid to use if if the process in question is your company's production
–
Warren PJan 29 '14 at 3:24

@Timmmm: I had the same impression as you, but I read the post again and nowhere does it say or suggest "you are stupid". May it be that we're simply misinterpreting a strong statement about an action as saying something about the doer?
–
ShreevatsaRJan 26 at 22:58

It should always be OK to do kill -9, just like it should always be OK to shutdown by pulling the power cable. It may be anti-social, and leave some recovery to do, but it ought to work, and is a power tool for the impatient.

I say this as someone who will try plain kill (15) first, because it does give a program a chance to do some cleanup -- perhaps just writing to a log "exiting on sig 15". But I won't accept any complaint about ill-behaviour on a kill -9.

The reason: plenty of customers do it to things programmers would prefer then don't. Random kill -9 testing is a good and fair test scenario, and if your system doesn't handle it, your system is broken.

How do you test for "random kill -9"? When you get kill -9, you are done and finished.
–
Karel BílekJan 28 '14 at 7:28

11

@Karel: You test whether your system can recover afterwards, and clean up any mangled transactions that were being processed at the time of SIGKILL.
–
Tadeusz A. KadłubowskiJan 28 '14 at 8:09

5

It is not OK to do a kill -9 just like it is not OK to pull the plug off. While of course there are situations where you have no choice, this should be a last resort action. Of course, pulling the power cable or kill -9 shouldn't have adverse effects like preventing the application or the OS to restart properly if at all, but shit happens and using the recommended ways (kill [-15]) or regular shutdown will help avoiding the mess that might occur if you routinely interrupt programs and OSes that way. In any case, there is always a risk to lose data regardless of the code robustness.
–
jlliagreJan 28 '14 at 12:51

3

I suspect what Michael meant by 'OK' is that your program should deal with this situation gracefully, and be able to do some form of cleanup on restart. For instance, cleaning up PID files and so forth, rather than just throwing its toys out of the pram and refusing to start.
–
gerrykJan 28 '14 at 22:58

@gerryk They should indeed but the issue is some people will take that answer as a "license to kill -9" whatever the situation and the environment. It is an irresponsible attitude.
–
jlliagreJan 29 '14 at 7:24

I use kill -9 in much the same way that I throw kitchen implements in the dishwasher: if a kitchen implement is ruined by the dishwasher then I don't want it.

The same goes for most programs (even databases): if I can't kill them without things going haywire, I don't really want to use them. (And if you happen to use one of these non-databases that encourages you to pretend they have persisted data when they haven't: well, I guess it is time you start thinking about what you are doing).

Because in the real world stuff can go down at any time for any reason.

People should write software that is tolerant to crashes. In particular on servers. You should learn how to design software that assumes that things will break, crash etc.

The same goes for desktop software. When I want to shut down my browser it usually takes AGES to shut down. There is nothing my browser needs to do that should take more than at most a couple of seconds. When I ask it to shut down it should manage to do that immediately. When it doesn't, well, then we pull out kill -9 and make it.

I agree that a process should be written to be tolerant to such a failure, but I think it is still bad practice to do this. A database will recover but it might detect the rude abort and then trigger significant recovery checking when restarted. And what about the requests a process is serving? They will all be severed instantly, the clients might have bugs and fail too?
–
Daniel James BryarsMay 24 '14 at 9:40

1

A database that can't be killed at any time isn't a properly reliable database. This is a pretty basic requirement if you require consistency. As for the clients: if they go haywire and corrupt data when the connection is severed, they are badly designed as well. The way to address loss of service is through redundancy and automatic failover/retry strategies. Usually for most of the system failing fast is preferable to trying to recover.
–
borudSep 19 '14 at 16:36

1

@borud It may not be perfectly written software, but it's software people use all the time. What system administrators have the luxury of always being able to choose software that's perfectly written, down to always recovering gracefully from sudden disruption? Not many. Personally I use shutdown scripts, and start/stop processes via this. If they don't respond to the shutdown script (which does a proper signaling to the process), I kill -9.
–
Steve SetherDec 29 '14 at 20:59

Killing processes willy-nilly is not a smooth move: data can be lost, poorly-designed apps can break themselves in subtle ways that cannot be fixed without a reinstall.. but it completely depends on knowing what is and what is not safe in a given situation.
and what would be at risk. The user should have some idea what a process is, or should be, doing and what it's constraints are (disk IOPS, rss/swap) and be able to estimate how much time a long-running process should take (say a file copy, mp3 reencoding, email migration, backup, [your favorite timesink here].)

Furthermore, sending SIGKILL to a pid is no guarantee of killing it. If it's stuck in a syscall or already zombied (Z in ps), it may continue to be zombied. This is often the case of ^Z a long running process and forgetting to bg before trying to kill -9 it. A simple fg will reconnect stdin/stdout and probably unblock the process, usually then followed by the process terminating. If it's stuck elsewhere or in some other form of kernel deadlock, only a reboot may be able to remove the process. (Zombie processes are already dead after SIGKILL is processed by the kernel (no further userland code will run), there's usually a kernel reason (similar to being "blocked" waiting on a syscall to finish) for the process not terminating.)

Also, if you want to kill a process and all of its children, get into the habit of calling kill with the negated PID, not just the PID itself. There's no guarantee of SIGHUP, SIGPIPE or SIGINT or other signals cleaning up after it, and having a bunch of disowned processes to cleanup (remember mongrel?) is annoying.

Bonus evil: kill -9 -1 is slightly more damaging than kill -9 1 (Don't do either as root unless you want to see what happens on a throw-away, non-important VM)

Never never do a kill -9 1. Also avoid doing a kill on certain processes like mount`. When I have to kill a lot of processes (say for example an X session gets hung and I have to kill all the processes of a certain user), I reverse the order of the processes. For example:

The downvote was someone else. But which resources are not released? Do you just mean the process can't perform its normal cleanup? What about file locks , semaphores, etc.? Can you elaborate?
–
MikelMar 9 '11 at 11:39

This answer is part confusing and part wrong. kill -9 1 is just ignored under most unices. There's no need to avoid kill -9 for mount, but no point in it either. I don't know what you mean by “reverse the order of the processes”. kill -9 does stop (as in, kill) a process, without giving it a chance to complain, however the killing won't happen immediately if the process is in a non-interruptible system call. Killing a process with kill -9 does release most resources, but not all.
–
GillesMar 9 '11 at 21:01

You do not want to kill -9 a process that is doing any form of writes. For example a heavily used mysql server should have its cache flushed first. You can see what process is writing with iotop, top, atop, and fatrace.