On Thu, 22 Jun 2000 08:44:29 -0400,
Jeff Garzik <jgarzik@xxxxxxxxxxxxxxxx> wrote:
>Two things to remember: one, module_cleanup is called only when module
>use count (and hence user count) drops to zero, and two
>unregister_netdev closes the net device, so any users that slipped in
>and opened the net device during module_cleanup during (is that even
>possible?) are booted when the net device is closed. This also takes
>care of any wacky hardware cleanup that needs to be done in dev->stop(),
>so module_cleanup can simply concern itself with unregistered and
>freeing stuff and then exiting.
Not that simple. Module cleanup is entered with the big kernel lock
which is supposed to prevent open() or its equivalents from entering
the module until cleanup has completed. But module cleanup can sleep,
losing the big kernel lock. If cleanup sleeps before unregister then
open() can enter the module during cleanup, and the module structures
can be in any old state.
Other calls to module functions do not run under the big kernel lock.
Instead they are meant to call open() first which bumps the reference
count. It is then safe to enter module code without a lock, assuming
the open race in the previous paragraph has been fixed. Alas we are
finding code that calls module functions and has no reference count on
the module and does not run under the big kernel lock. In other words,
it is a wide open race condition. You can even have this :-
CPU 0 CPU 1
lock_kernel(); No lock or reference, incorrect code
module_exit(); but it exists right now.
if (fops->ioctl)
unregister();
fops->ioctl(); <=== now zero!
The existing locking mechanism is just too fragile, it relies on every
module writer and every piece of kernel code that calls a module being
coded exactly right. Plus it all relies on the big kernel lock - can
you say non-scalable?
Yes, the existing mechanism can be made to work, but only at huge
expense. Every module has to be checked to see if it conforms to the
lock design. Every kernel call via a function pointer has to be
checked to see if it conforms to the same design. This has to be done
forever, for all new code and methods. And the whole design will not
scale.
What Al Viro is doing is a lot cleaner in the long run. And it will
scale a lot better because it uses per fops spin locks.