eventfd currently emits a POLLHUP wakeup on f_ops->release() to generate a"release" callback. This lets eventfd clients know if the eventfd is aboutto go away and is very useful particularly for in-kernel clients. However,as it stands today it is not possible to use this feature of eventfd in arace-free way. This patch adds some additional logic to eventfd in orderto rectify this problem.

Background:-----------------------Eventfd currently only has one reference count mechanism: fget/fput. Thisin of itself is normally fine. However, if a client expects to benotified if the eventfd is closed, it cannot hold a fget() referenceitself or the underlying f_ops->release() callback will never be invokedby VFS. Therefore we have this somewhat unusual situation where we mayhold a pointer to an eventfd object (by virtue of having a waiter registeredin its wait-queue), but no reference. To make matters more complicated,the release callback is issued in an unlocked state. This makes it nearlyimpossible to design a mutual decoupling algorithm: you cannot unhook oneside from the other (or vice versa) without racing.-----------------------In summary, there are two fundamental problems:

1) The POLLHUP wakeup is broadcast lockless2) There are no references to the wait-queue-head (embedded in eventfd_ctx)

We fix this by using the locked variant of wakeup for POLLHUP, and byadding/exposing a kref to the underlying eventfd_ctx. Clients should thenbe able to govern their usage of the wait-queue as they do for any otherwait-queue in the kernel.

We propose this more raw solution rather than trying to encapsulate thepoll-callback because there are advantages to decoupling theremove_wait_queue from the kref_put(). Namely, its nice to unhook thewait-queue inside the wakeup, but to defer the kref_put() until we cansynchronize with the client.

Between these points, we believe we now have a race-free releasemechanism.

- /*- * No need to hold the lock here, since we are on the file cleanup- * path and the ones still attached to the wait queue will be- * serialized by wake_up_locked_poll().- */- wake_up_locked_poll(&ctx->wqh, POLLHUP);- kfree(ctx);+ wake_up_poll(&ctx->wqh, POLLHUP);+ _eventfd_put(&ctx->kref); return 0; }