Re: [Lse-tech] Re: [PATCH] Process Notification (pnotify)

> Is making pnotify leaner an option for PAGG ? something as simple as
> what is in the patch http://marc.theaimsgroup.com/?l=linux-
> kernel&m=111532025203086&w=2
>
> Note that we do not need all the events listed there anymore, only fork
> and exit.
I just took a look at this patch. Forgive me if I missed something. I
find I have to use things like this for real before I catch every detail
and I didn't go that far with this before writing.
One thing about the pnotify patch is there are lots of comments, could that
be making it seem larger than it really is?
There are a couple added things in pnotify that make it a bit different.
First, of course, it's more generic. More importantly, pnotify has a
task-associated data pointer. This allows subscriber kernel modules to
quickly locate data associated with the task. Without this, some pnotify
users (including Job as an example, but others as well) would need to
implement their own hash table lookup mechanism to associate task data
with a given task.
pnotify does have subscriber inheritance built in at the fork callout
(children are by default given the same subscriber list as the parent).
This could be implemented by each subscriber kernel module that needs
it instead I suppose.
One other thing pnotify has is a way to associate all tasks currently running
with a given kernel module subscriber at pnotify registration time. This
is a feature I think is very useful. I suppose this could be done in the
subscriber kernel module itself for modules that wish to use that
functionality - so that might reduce the patch size some if people aren't
interested.
Finally, for module subscribers I'm familiar with, I need a way to associate
a kernel module with any possible task (not necessarily just current or
in-construction children). This is because sometimes kernel modules need
to be notified about a task that isn't the the current task at the time. In
the case of Job, perhaps batch scheduler or similar wants to track a certain
process separately from the others for some administrative reason.
If I understand your patch right, we go in to searching for callouts
for each task, even if nothing cares about a specific task. This would
still be quick I suppose if nobody registered for events. However, the
pnotify solution is per-task, not global. If the task's subscriber list
is null, we're done (except, for fork, we do have to set up the the list
head and semaphore for new tasks). ie:
+static inline int pnotify_fork(struct task_struct *child,
+ struct task_struct *parent)
+{
+ INIT_PNOTIFY_LIST(child);
+ if (!list_empty(&parent->pnotify_subscriber_list))
+ return __pnotify_fork(child, parent);
+
+ return 0;
In other words, to be quick with the ckrm patch, there has to be no
registered events. For pnotify, a kernel module subscriber may not care
about all tasks. The performance hit is reduced because we don't take it
globally when someone is registered - we take it per-task when a
kernel module subscriber is registered to a given task. The size of the
subscriber list per task may be different per task too. For example, one
task may be part of a Linux job and be part of an Array session. Another
task may just be part of a Linux job and nothing else. Finally, there could
be tasks that have no subscriber list at all.
I think the above reasons explain why pnotify is a bit heavier than
this ckrm patch. I think the key pieces I'd hope for in an implementation
are:
* A means to subscribe arbitrary kernel modules to any given task (not
just the current task or child-in-construction at the moment).
* A task-associated data pointer that points to task associated data a
kernel module subscriber cares about
* notification that is task based - so if the kernel module subscriber
list is null or small, we don't take a performance hit.
* events fork, exec, exit
I think the rest of what is in pnotify is very useful. But if it came to it
and the community wanted to chop bits out besides the above, it would at
least still be possible to do what is needed efficiently. The penalty would
be some code duplication in subscriber modules and increased code size in
subscriber modules in some cases.
Erik