This happens when ceph-disk activate mounts /var/lib/ceph/osd/ceph-0 while ceph-disk mounts /var/lib/ceph/tmp/mnt.y9HL5a to get information about the data partition. If ceph-disk is interrupted or fails for any reason, it will not umount the temporary partition and both will stay mounted.

Associated revisions

When activating a device, ceph-disk trigger restarts the ceph-disksystemd service. Two consecutive udev add on the same device willrestart the ceph-disk systemd service and the second one may kill thefirst one, leaving the device half activated.

The ceph-disk systemd service is instructed to not kill an existingprocess when restarting. The second run waits (via flock) for the secondone to complete before running so that they do not overlap.

The scenario creating duplicate mount points is when the the osd is destroyed while activating. This is a border case that only happens during testing and it probably is not worth fixing. Instead the test case must wait for the osd activation to complete.

The ceph-activate that was run via systemctl is interrupted before it can finish and that leaves the file system mounted and, in some cases, incompletely set. The following is a mix of the output of udevadm monitor and /var/log/messages. Two consecutive udev add vdb1 are received and the second one kills the first one.

I think it would be better to move flock outside of the systemd service in the udev action with the -n option. This would prevent the systemd script from being interrupted, will prevent multiple simultaneous starts, and will prevent the service from being in a failed state due to a race failure. It should still "RemainAferExit" as after it's been run, it's complete and should not run again if activated again.

Honestly, for a oneshot service that hasn't finished, multiple starts should not result in the service being started multiple times. This seems like a systemd bug.

With the current approach, the latest we cover all cases, at the expense of running activate for each and every udev add event. If we flock -n instead, we open the following race condition:

udev add -> flock -n takes the lock and runs ceph-disk activate

ceph-disk activate does not have enough to run because ceph-disk prepare is not finished preparing the device and starts to shutdown but is not stopped yet

ceph-disk prepare finishes preparing the device and fires a udev add to notify that to the world

udev add -> flock -n sees the lock is held and gives up

ceph-disk activate finish shutdown

There won't be another udev add event after that and the device won't activate. Although it is more expensive, I think running ceph-disk activate on every event addresses all possible concurrency scenario. This is assuming ceph-disk activate