I have been asked this question in two consecutive interviews, but after some research and checking with various systems administrators I haven't received a good answer. I am wondering if somebody can help me out here.

A server is out of disk space. You notice a very large log file and determine it is safe to remove. You delete the file but the disk still shows that it is full. What would cause this and how would you remedy it? And how would you find which process is writing this huge log file?

5 Answers
5

This is a common interview question and a situation that comes up in a variety of production environments.

The file's directory entries have been deleted, but the logging process is still running. The space won't be reclaimed by the operating system until all file handles have been closed (e.g., the process has been killed) and all directory entries removed. To find the process writing to the file, you'll need to use the lsof command.

The other part of the question can sometimes be "how do you clear a file that's being written to without killing the process?" Ideally, you'd "zero" or "truncate" the log file with something like : > /var/log/logfile instead of deleting the file.

I ask a variant of this question on every interview: "You're getting disk full messages. df says your out of space, du says you're barely using any. What's causing it, and why don't the two tools agree?"
–
voretaq7♦Mar 5 '12 at 17:16

What to do if after > /var/log/file the space on disk still at 100%? The log file seems to be empty... but only after restarting the program that writes on this log file the space is recovered. Is there a way to recover the disk space without restarting the program?
–
alemaniMar 9 '12 at 17:42

There's still another link to the file (either hard link or open file handle). Deleting a file only deletes the directory entry; the file data and inode hang around until the last reference to it has been removed.

It's somewhat common practice for a service to create a temporary file and immediately delete it while keeping the file open. This creates a file on disk, but guarantees that the file will be deleted if the process terminates abnormally, and also keeps other processes from accidentally stomping on the file. MySQL does this, for example, for all its on-disk temporary tables. Malware often uses similar tactics to hide its files.

Under Linux, you can conveniently access these deleted files as /proc/<pid>/fd/<filenumber>.

I'm not a sysadmin, but from what I've gathered on Unix.SE, a Linux system won't actually delete a file (mark the space as free/reusable) after it is unlinked until all file descriptors pointing to them have been closed. So to answer the first part, the space isn't yet free because a process is still reading it. To answer the second, you can see which process is using the file with lsof.

If the process writing the file is root, it'll write into the superuser reserved file space. The file system has this space to keep a system operational in case a user task fills up the disk. This space (imho per default 5%) is invisible to many tools.

lsof can show you, which process has locked the file, ergo is writing to it.

One alternative answer besides the obvious hard link/open file answer: that file is a (very) sparse file such as /var/log/lastlog on RHEL that wasn't actually taking up all that much space. Deleting it had very little impact, so you need to look at the next biggest file.