Archive for November, 2008

Finally, it happened: the infamous nspluginwrapper is not needed to use the proprietary flash plugin on amd64. Get your plugin from adobe themselves. It’s supposedly alpha quality, but it really can’t be worse than having nswrapper.bin either eating your cpu or crashing, or only seeing a grey area instead of flash content.

It is a tool using VirtualBox APIs to bring virtual disks (not sure if it is limited to VirtualBox format or works for any format VirtualBox knows) as block devices under linux, using the network block device driver (nbd).

I have to say there are 2 things I don’t like about nbd.

First, it adds useless overhead, since it has to go through the network stack, even when the whole thing is local.

Second, I have had bad experience with nbd stability, though I must say I only tested on old stuff. A while ago, VMware ESX 2.5 had a tool, named vmware-mount, that would basically do what Vboxmount does, for ESX vmdk files, on the ESX service console (a 2.4.something kernel). The fact is, the whole thing would bring the whole server down (kernel panic or deadlock, I can’t remember) more often than not. Which is why the tool has not been provided since ESX 3.0. This tool was using nbd.

There are IMHO better ways to implement something like this, though the nicest doesn’t exist yet.

dm-userspace allows for something similar, but requires the image file to be backed by a loopback device, and the data on the image needn’t be compressed.

Fuse would allow to present a flattened image as a file, that you could turn into a block device with the loopback device driver. I happen to have written something like that, except its legal status is unsure.

As for the nicest solution I can see, that doesn’t exist yet, as said above, it would be some kind of “Buse” (Block device in USEr space), or process-backed loopback device, call it like you want, that would allow a process to answer to random reads in a (virtual) block device, in a similar way a Fuse file system process would answer to random reads in a (virtual) file. This has been discussed several times on several mailing lists, but has not yet been implemented, as far as I know.

Yesterday, at work, we had the typical case where df would say there is (almost) no space left on some device, while du doesn’t see as much data present as you would expect from this situation. This happens when you delete a file that another process has opened (and, obviously, not yet closed).

In typical UNIX filesystems, files are actually only entries in a directory, pointing (linking) to the real information about the content, the inode.

The inode contains the information about how many such links exist on the filesystem, the link count. When you create a hard link (ln without -s), you create another file entry in some directory, linking to the same inode as the “original” file. You also increase the link count for the inode.

Likewise, when removing a file, the entry in the directory is removed (though most of the time, really only skipped, but that’s another story), and the link count decreased. When the link count is zero, usually, the inode is marked as deleted.

Except when the usage count is not zero.

When a process opens a file, the kernel keeps a usage count for the corresponding inode in memory. When some process is reading from a file, it doesn’t really expect it to disappear suddenly. So, as long as the usage count is not null, even when the link count in the inode is zero, the content is kept on the disk and still takes space on the filesystem.

On the other hand, since there is no entry left in any directory linking to the inode, the size for this content can’t be added to du‘s total.

Back to our problem, the origin was that someone had to free some space on a 1GB filesystem, and thought a good idea would be to delete that 860MB log file that nobody cares about. Except that it didn’t really remove it, but he didn’t really check.

Later, the “filesystem full” problem came back at someone else, who came to ask me what files from a small list he could remove. But the files were pretty small, and that wouldn’t have freed enough space. That gave me the feeling that we probably were in this typical case I introduced this post with, which du -sk confirmed: 970MB used on the filesystem according to df, but only 110MB worth of data…

Just in case you would need to find the pid of the process having the deleted file still opened, or even better, get access to the file itself, you can use the following command:

find -L /proc/*/fd -type f -links 0

(this works on Linux ; remove -L on recent Solaris ; on other OSes, you can find the pid with lsof)

Each path this command returns can be opened and its content accessed with a program, such as cat. That will give access to the deleted content.

I already adressed how to re-link such a file, which somehow works under Linux, but in my case, all that mattered was to really remove the file, this time. But we didn’t know if it was safe to stop the process still holding the file, nor how to properly restart it. We were left without a possible resolution, but still needed to come up with something before the filesystem gets really full while waiting to be able to deal with the root of the problem.

The first crazy idea I had was to attach a debugger to the process, and use it to close the file descriptor and open a new file instead (I think you can find some examples with google). But there was no debugger installed.

So, I had this other crazy idea: would truncate() work on these /proc/$pid/fd files?

You know what? It does work. So I bought us some time by running:

perl -e 'truncate("/proc/$pid/fd/$fd", 0);'

(somehow, there is no standard executable to do a truncate(), so I always resort to perl)

Afterwards, I also verified the same works under Linux (where you wouldn’t really know what it’d do with these files that are symbolic links to somewhere that doesn’t exist).

The even simpler following command works, too.

> /proc/$pid/fd/$fd

It doesn’t truncate() but open() with O_WRONLY | O_CREAT | O_TRUNC, and close() right after (to simplify), which has the same effect.