Commit Message

This patchset converts inotify to using the newly introduced
per-userns sysctl infrastructure.
Currently the inotify instances/watches are being accounted in the
user_struct structure. This means that in setups where multiple
users in unprivileged containers map to the same underlying
real user (i.e. pointing to the same user_struct) the inotify limits
are going to be shared as well, allowing one user(or application) to exhaust
all others limits.
Fix this by switching the inotify sysctls to using the
per-namespace/per-user limits. This will allow the server admin to
set sensible global limits, which can further be tuned inside every
individual user namespace. Additionally, in order to preserve the
sysctl ABI make the existing inotify instances/watches sysctls
modify the values of the initial user namespace.
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>Acked-by: Jan Kara <jack@suse.cz>Acked-by: Serge Hallyn <serge@hallyn.com>
---
Okay, so here is another version, which should
hopefully be free of slab corruptions. There was an issue
where in ucount.c the ifdef was checking the CONFIG_INOTIFY_USER_
(pay attention to the trailing _, this was clearly a mistake). This
led to the user_table (and all duplicated from it tables) to not
contain the inotify-related members. In my local testing I got
kasan splats even during kernel boot, due to out-of-bound writes.
Let's see how this version fares.
fs/notify/inotify/inotify.h | 17 +++++++++++++++++
fs/notify/inotify/inotify_fsnotify.c | 6 ++----
fs/notify/inotify/inotify_user.c | 34 +++++++++++++++++-----------------
include/linux/fsnotify_backend.h | 3 ++-
include/linux/sched.h | 4 ----
include/linux/user_namespace.h | 4 ++++
kernel/ucount.c | 6 +++++-
7 files changed, 47 insertions(+), 27 deletions(-)

Comments

Nikolay Borisov <n.borisov.lkml@gmail.com> writes:
> This patchset converts inotify to using the newly introduced> per-userns sysctl infrastructure.>> Currently the inotify instances/watches are being accounted in the> user_struct structure. This means that in setups where multiple> users in unprivileged containers map to the same underlying> real user (i.e. pointing to the same user_struct) the inotify limits> are going to be shared as well, allowing one user(or application) to exhaust> all others limits.>> Fix this by switching the inotify sysctls to using the> per-namespace/per-user limits. This will allow the server admin to> set sensible global limits, which can further be tuned inside every> individual user namespace. Additionally, in order to preserve the> sysctl ABI make the existing inotify instances/watches sysctls> modify the values of the initial user namespace.>> Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>> Acked-by: Jan Kara <jack@suse.cz>> Acked-by: Serge Hallyn <serge@hallyn.com>> --->> Okay, so here is another version, which should > hopefully be free of slab corruptions. There was an issue> where in ucount.c the ifdef was checking the CONFIG_INOTIFY_USER_> (pay attention to the trailing _, this was clearly a mistake). This > led to the user_table (and all duplicated from it tables) to not > contain the inotify-related members. In my local testing I got > kasan splats even during kernel boot, due to out-of-bound writes. > Let's see how this version fares.
Thank you I will place this in my for-testing branch shortly and see how
it fares.
Eric

Nikolay Borisov <n.borisov.lkml@gmail.com> writes:
> This patchset converts inotify to using the newly introduced> per-userns sysctl infrastructure.>> Currently the inotify instances/watches are being accounted in the> user_struct structure. This means that in setups where multiple> users in unprivileged containers map to the same underlying> real user (i.e. pointing to the same user_struct) the inotify limits> are going to be shared as well, allowing one user(or application) to exhaust> all others limits.>> Fix this by switching the inotify sysctls to using the> per-namespace/per-user limits. This will allow the server admin to> set sensible global limits, which can further be tuned inside every> individual user namespace. Additionally, in order to preserve the> sysctl ABI make the existing inotify instances/watches sysctls> modify the values of the initial user namespace.>> Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>> Acked-by: Jan Kara <jack@suse.cz>> Acked-by: Serge Hallyn <serge@hallyn.com>> --->> Okay, so here is another version, which should > hopefully be free of slab corruptions. There was an issue> where in ucount.c the ifdef was checking the CONFIG_INOTIFY_USER_> (pay attention to the trailing _, this was clearly a mistake). This > led to the user_table (and all duplicated from it tables) to not > contain the inotify-related members. In my local testing I got > kasan splats even during kernel boot, due to out-of-bound writes. > Let's see how this version fares.
So there is one more thing that needs to be addressed with your patch.
In inotify.h the functions need to be marked static inline
rather than just static or else there a number of new compiler warnings.
I have addressed this for now, but if anything else comes up or if you
resend this patch I would appreciate it if you add the static inline
notations in your internal copy of the patch.
Thank you,
Eric Biederman
> diff --git a/fs/notify/inotify/inotify.h b/fs/notify/inotify/inotify.h> index ed855ef6f077..b5536f8ad3e0 100644> --- a/fs/notify/inotify/inotify.h> +++ b/fs/notify/inotify/inotify.h> @@ -30,3 +30,20 @@ extern int inotify_handle_event(struct fsnotify_group *group,> const unsigned char *file_name, u32 cookie);> > extern const struct fsnotify_ops inotify_fsnotify_ops;> +> +#ifdef CONFIG_INOTIFY_USER> +static void dec_inotify_instances(struct ucounts *ucounts)> +{> + dec_ucount(ucounts, UCOUNT_INOTIFY_INSTANCES);> +}> +> +static struct ucounts *inc_inotify_watches(struct ucounts *ucounts)> +{> + return inc_ucount(ucounts->ns, ucounts->uid, UCOUNT_INOTIFY_WATCHES);> +}> +> +static void dec_inotify_watches(struct ucounts *ucounts)> +{> + dec_ucount(ucounts, UCOUNT_INOTIFY_WATCHES);> +}> +#endif

On 15.12.2016 02:37, Eric W. Biederman wrote:
> Nikolay Borisov <n.borisov.lkml@gmail.com> writes:> >> This patchset converts inotify to using the newly introduced>> per-userns sysctl infrastructure.>>>> Currently the inotify instances/watches are being accounted in the>> user_struct structure. This means that in setups where multiple>> users in unprivileged containers map to the same underlying>> real user (i.e. pointing to the same user_struct) the inotify limits>> are going to be shared as well, allowing one user(or application) to exhaust>> all others limits.>>>> Fix this by switching the inotify sysctls to using the>> per-namespace/per-user limits. This will allow the server admin to>> set sensible global limits, which can further be tuned inside every>> individual user namespace. Additionally, in order to preserve the>> sysctl ABI make the existing inotify instances/watches sysctls>> modify the values of the initial user namespace.>>>> Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>>> Acked-by: Jan Kara <jack@suse.cz>>> Acked-by: Serge Hallyn <serge@hallyn.com>>> --->>>> Okay, so here is another version, which should >> hopefully be free of slab corruptions. There was an issue>> where in ucount.c the ifdef was checking the CONFIG_INOTIFY_USER_>> (pay attention to the trailing _, this was clearly a mistake). This >> led to the user_table (and all duplicated from it tables) to not >> contain the inotify-related members. In my local testing I got >> kasan splats even during kernel boot, due to out-of-bound writes. >> Let's see how this version fares.> > So there is one more thing that needs to be addressed with your patch.> > In inotify.h the functions need to be marked static inline> rather than just static or else there a number of new compiler warnings.> > I have addressed this for now, but if anything else comes up or if you> resend this patch I would appreciate it if you add the static inline> notations in your internal copy of the patch.
Okay, I will keep this in mind. Btw, do you compile with W=1 to get
those warnings, since I don't get them when I just run plain make?
> > Thank you,> Eric Biederman> > >> diff --git a/fs/notify/inotify/inotify.h b/fs/notify/inotify/inotify.h>> index ed855ef6f077..b5536f8ad3e0 100644>> --- a/fs/notify/inotify/inotify.h>> +++ b/fs/notify/inotify/inotify.h>> @@ -30,3 +30,20 @@ extern int inotify_handle_event(struct fsnotify_group *group,>> const unsigned char *file_name, u32 cookie);>> >> extern const struct fsnotify_ops inotify_fsnotify_ops;>> +>> +#ifdef CONFIG_INOTIFY_USER>> +static void dec_inotify_instances(struct ucounts *ucounts)>> +{>> + dec_ucount(ucounts, UCOUNT_INOTIFY_INSTANCES);>> +}>> +>> +static struct ucounts *inc_inotify_watches(struct ucounts *ucounts)>> +{>> + return inc_ucount(ucounts->ns, ucounts->uid, UCOUNT_INOTIFY_WATCHES);>> +}>> +>> +static void dec_inotify_watches(struct ucounts *ucounts)>> +{>> + dec_ucount(ucounts, UCOUNT_INOTIFY_WATCHES);>> +}>> +#endif