Re: Making forced unmounts work

On Dec 4, 2012, at 4:44 PM, David Holland <dholland-tech%netbsd.org@localhost>
wrote:
> On Sun, Dec 02, 2012 at 05:29:01PM +0100, J. Hannken-Illjes wrote:
>>>>> Also I wonder if there's any way to accomplish this that doesn't
>>>>> require adding fstrans calls to every operation in every fs.
>>>>
>>>> Not in a clean way. We would need some kind of reference counting for
>>>> vnode operations and that is quite impossible as vnode operations on
>>>> devices or fifos sometimes wait forever and are called from other fs
>>>> like ufsspec_read() for example. How could we protect UFS updating
>>>> access times here?
>>>
>>> I'm not entirely convinced of that. There are basically three
>>> problems: (a) new incoming threads, (b) threads that are already in
>>> the fs and running, and (c) threads that are already in the fs and
>>> that are stuck more or less permanently because something broke.
>>>
>>> Admittedly I don't really understand how fstrans suspending works.
>>> Does it keep track of all the threads that are in the fs, so the (b)
>>> ones can be interrupted somehow, or so we at least can wait until all
>>> of them either leave the fs or enter fstrans somewhere and stall?
>>
>> Fstrans is a recursive rw-lock. Vnode operations take the reader lock
>> on entry. To suspend a fs we take the writer lock and therefore
>> it will catch both (a) and (b). New threads will block on first entry,
>> threads already inside the fs will continue until they leave the fs
>> and block on next entry.
>
> Ok, fair enough, but...
>
>> A suspended fs has the guarantee that no other thread will be inside
>> fstrans_suspend / fstrans_done of any vnode operation.
>>
>> Threads stuck permanently as in (c) are impossible to catch.
>
> ...doesn't that mean the (c) threads are going to be holding read
> locks, so the suspend will hang forever?
Yes.
> We have to assume there will be at least one such thread, as people
> don't generally attempt umount -f unless something's wedged.
If forced unmounts have to work with vnode operations that will not
leave a fs the only solution will be to disable forced unmounts and
leave reboot as the last option. How could we safely determine if
a vnode operation just takes some time or is stuck forever? What should
we do with threads that are stuck forever?
>>> If we're going to track that information we should really do it from
>>> vnode_if.c, both to avoid having to modify every fs and to make sure
>>> all fses support it correctly. (We also need to be careful about how
>>> it's done to avoid causing massive lock contention; that's why such
>>> logic doesn't already exist.)
>>
>> Some cases are nearly impossible to track at the vnode_if.c level:
>>
>> - Both VOP_GETPAGES() and VOP_PUTPAGES() cannot block or sleep here
>> as they run part or all of the operation with vnode interlock held.
>
> Ugh. But this doesn't, for example, preclude atomically incrementing a
> per-cpu counter or setting a flag in the lwp structure.
Sure, but we need a point in time where it is safe to mark a vnode dead
and that means to reclaim its inode or other fs-private state. This is
not just accounting -- we must be sure no thread is using fs-private data
and I don't see a way without suspending vnode operations.
>> - Accessing devices and fifos through a file system cannot be tracked
>> at the vnode_if.c level. Take ufs_spec_read() for example:
>>
>> fstrans_start(...);
>> VTOI(vp)->i_flag |= IN_ACCESS;
>> fstrans_done(...);
>> return VOCALL(spec_vnodeop_p, ...)
>>
>> Here the VOCALL may sleep forever if the device has no data.
>
> This I don't understand. To the extent it's "in" the fs while it's
> doing this, vnode_if.c can know that, because it called ufs_spec_read
> and ufs_spec_read hasn't returned yet. To the extent that it's in
> specfs, specfs won't itself ever be umount -f'd, since it isn't
> mounted, so there's no danger.
Ok, we had to use VOCALL rather than vnode_if.c and add the accounting
here.
>
> If umount -f currently blats over the data that specfs uses, then it's
> currently not safe, but I don't see how it's different from any other
> similar case with ufs data.
>
> I expect I'm missing something.
So another approach is to serialize vclean(), at least when cleaning
active vnodes, and have VOCALL() take care of the vnode currently being
cleaned. We would need just per-thread state then. This would still
not work for vnode operations stuck in the fs.
Is this the direction you have in mind?
--
J. Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig
(Germany)