Friday, June 19, 2009

After a short investigation I found that the external backup disk disappeared. It was physically there of course, but the symbolic link to its device path was gone.

A while ago I added the following UDEV rule that creates a symbolic link /dev/elements to point to the device path representing the external backup disk (e.g. /dev/sda1) when it is first connected to the computer:

This allows me to always address the disk as /dev/elements in scripts and configuration files, instead of using the actual device path. This is useful because the device path can and does change, depending on the order of its detection by the kernel (i.e. it may appear as /dev/sdb1 instead of /dev/sda1).

I could've used the UUID link to the device: /dev/disk/by-uuid/306fc694-5328-4a6d-b6ec-4d1310c2feb8, but my custom symlink is shorter to type, easier to remember and I find it to be somehow prettier.

Anyhow, my nice link was gone, but the device path /dev/sda1 was still there, as were the various other links to this device in /dev/disk/by-uuid, /dev/disk/by-path, etc.

I was able to mount the disk via the UUID link, so it didn't appear to be hardware related.

I tried re-triggering UDEV events:

udevadm trigger

and was pleased to find that the link re-appeared.

I chalked it up to my incomplete grasp of reality. It's a fluke. A one-time event. Nothing to worry about.

All was well for a few days, and then I had to reboot my box. I was rather upset to find out that the link was missing again. I had no time to investigate this and simply added udevadm trigger to /etc/init.d/bootmisc.sh and restarted the machine. This seemed to work fine.

Close to two months passed by. Every time I had to reboot my box I recalled that incident but was not inclined to investigate any further - I am trying to curb my pathological dislike to workarounds.

I almost forgot about it, until I tried to mount my encrypted live-HDD.

This time it worked, and I was left with an executable at src/cryptsetup. I launched gdb, stepped through the code and then I realized that the link got removed as soon as the first close function completed.

Did I mention how weird this is? All I did was open and close /dev/aluminum-crypto, which was enough to remove it - this shouldn't happen.

I had to verify that I actually saw what I thought I saw, so I wrote the following little program, named rmlink.c:

This little program successfully removed my custom device symlinks by simply opening and then closing them.

Curiously, I was not able to remove other device symlinks (e.g. the ones under /dev/disk/by-uuid).

What was going on here? is this a personal thing that UDEV has against me? after all, UDEV is responsible for all those symlinks, both standard and custom, and they are all generated using UDEV rules which look similar to mine (see /lib/udev/rules.d/60-persistent-storage.rules).

I used the following magic to persuade UDEV to verbosely log its actions to /var/log/syslog:

udevadm control --log-priority=debug

I then saw the following messages appear, after running ./rmlink /dev/gigapod (yet another external USB disk that I use):

I scrutinized my own UDEV rules and pondered the rules in /lib/udev/rules.d/60-persistent-storage.rules and noticed two things that I hadn't noticed before:

the standard rules are fired both for "add" and "change" events, while mine only address the "add" event

the standard rules enable an option: OPTIONS+="watch"

I couldn't find any documentation for the "watch" option. But one of the commenters in Ubuntu bug #332270 explained that this option causes the Kernel to trigger "change" events for inotify events (such as closing a block device...), and it seems to be a recent addition...

The fix was easy enough to figure out now: add the "change" action to my rules, so that the symlinks were re-added when the "change" event got triggered:

Friday, June 12, 2009

When was the last time you used the floppy disk drive on your box? Oh, you don't have one? Well, I do have a floppy disk drive on my aging laptop. And I'm quite sure that the only use I had for it was when I first installed Debian on it.

Yup! I actually installed Debian/Etch from floppies! that was my only option at the time, because the optical drive on this box is busted, and it doesn't boot from USB. As I said, it's an old box. I've since upgraded to Debian/Lenny and recently to Squeeze, and it seems that installation from floppies is no longer supported. It's kinda sad.

This was something new. I assumed that the problem was with VirtualBox. And since I only read the second line in the error message at first, I found myself going through the VirtualBox documentation, rummaging through its various directories and files and methodically going through all of its configuration options and menus. No luck.

Finally, after more than an hour of futzing around, I got back to the error message and realized my original sin. I then searched for the error message on the Net, and found a short thread on linuxquestions.org, that suggested that the floppy disk drive should be disabled in the machine settings.

I tried it and it did the trick - I managed to launch my virtual Window$ PC and finish my original business.

But it continued to bother me. The floppy drive exists, so maybe it's just a permissions issue? I tried

ls -l /dev/fd0

and was surprised to find that it doesn't actually exist. WTF?

Time for some more Net searching. I found a thread at the Debian Forums site, and another thread at linuxquestions.org, which seemed rather relevant. Both suggested that the floppy module (device driver) was not loaded, that this is intentional (since floppy disk drives are pretty rare), and that manually loading the floppy module would fix the problem.

I tried

lsmod | grep floppy

as root, and got nothing. Indeed, the floppy module wasn't loaded. I manually loaded the module

modprobe floppy

(again as root) and verified that /dev/fd0 appeared. I then re-enabled the physical floppy drive in VirtualBox, and was able to start the Window$ virtual PC.

Finally, I added the module to /etc/modules in order to ensure that it'll be loaded on the next reboot:

echo floppy >> /etc/modules

Still, I wasn't satisfied. The floppy disk drive never gave me trouble before, so this must be a recent regression. I hate it when things break down like this. I hate it even more when all I can do about it is a workaround and not a solution. But I'm making progress, and am willing to accept workarounds. Sometimes.

In any case, I finally managed to find the root cause of the floppy failure. It's described in Debian bug #521520.

I'm now curious to know if the floppy disk drive works at all, but I don't have any floppy disk to test it with...

Friday, June 5, 2009

More often than not I take a camera (a Canon Powershot A620 that I purchased a few years ago) to family events and take pictures.

To be honest, my camera is better than I am at taking pictures. I usually set the camera on AUTO, aim, half-press the button until I hear a beep and see the green rectangles around the people or stuff that I'm trying to capture, and then shoot. People around me are usually aware that I'm taking their pictures (I guess it's because I ask them to smile and stand still). I'm pretty happy with the results, most of the time.

My wife's dad has a similar approach to family events, but, unlike me, takes photography seriously. He took lessons, he purchased a professional looking camera (namely, Canon Digital Rebel XTi), and is always futzing around with its settings. He tries to blend in, and conceal the fact that he's taking pictures, in an attempt to capture people in their natural state. He's actually quite good at it.

The other difference between us, is that I usually post my pictures on our self-hostedGallery2 website. I often pester him to send me a CD with the photos so that I can post them too. I take care of quantity, and he takes care of quality.

And here's where I find myself in a need. I like to order the photos in a chronological order on the website. It's rather easy with photos originating from the same camera, because the file name used for each picture contains a serial number, so that sorting them by name is equivalent to chronological ordering. But the numbering of photos from the other camera is, naturally, not synchronized with my own photos.

The first time I hit this, I tried to order the photos in Gallery2 using the time-stamp that's embedded in each photo's EXIF meta-data. But I found out that the internal clocks of the cameras were not synchronized (one of them was still an hour off, on daylight savings time). So I fixed this with exiv2 by running:

exiv2 -v -a 1:06 adjust my_pics_*.jpg

(yeah, a bit more than an hour - the offset was determined by visually matching the photos and subtracting timestamps of corresponding photos from different sessions).

But this didn't get the photos ordered on Gallery2. I tried setting the file creation time to match the EXIF timestamp:

exiv2 -v -T rename *.jpg

but Gallery2 either sorts by name only or rounds the timestamp to the nearest minute. Whatever the reason, I still couldn't get the photos ordered to my liking. I finally renamed all the files so that each file's name matches its timestamp: