Commit Message

In order to protect against ptrace(2) and similar attacks on container
runtimes when they join namespaces, many runtimes set mm->dumpable to
SUID_DUMP_DISABLE. However, doing this means that attempting to set up
an unprivileged user namespace will fail because an unprivileged process
can no longer access /proc/self/{setgroups,{uid,gid}_map} for the
container process (which is the same uid as the runtime process).
Fix this by changing pid_getattr to *also* change the owner of regular
files that have a mode of 0644 (when the process is not dumpable). This
ensures that the important /proc/[pid]/... files mentioned above are
properly accessible by a container runtime in a rootless container
context.
The most blantant issue is that a non-dumpable process in a rootless
container context is unable to open /proc/self/setgroups, because it
doesn't own the file.
int main(void)
{
prctl(PR_SET_DUMPABLE, 0, 0, 0, 0);
unshare(CLONE_NEWUSER);
/* This will fail. */
int fd = open("/proc/self/setgroups", O_WRONLY);
if (fd < 0)
abort();
return 0;
}
Cc: dev@opencontainers.org
Cc: containers@lists.linux-foundation.org
Signed-off-by: Aleksa Sarai <asarai@suse.de>
---
fs/proc/base.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

>> In order to protect against ptrace(2) and similar attacks on container>> runtimes when they join namespaces, many runtimes set mm->dumpable to>> SUID_DUMP_DISABLE. However, doing this means that attempting to set up>> an unprivileged user namespace will fail because an unprivileged process>> can no longer access /proc/self/{setgroups,{uid,gid}_map} for the>> container process (which is the same uid as the runtime process).>>>> Fix this by changing pid_getattr to *also* change the owner of regular>> files that have a mode of 0644 (when the process is not dumpable). This>> ensures that the important /proc/[pid]/... files mentioned above are>> properly accessible by a container runtime in a rootless container>> context.>>>> The most blantant issue is that a non-dumpable process in a rootless>> container context is unable to open /proc/self/setgroups, because it>> doesn't own the file.>> This changes a lot more than just setgroups, doesn't it? This bypasses> the task_dumpable check for all kinds of things.
Yeah. I can special case /proc/self/setgroups as well as uid_map,
gid_map and the other *specific* things that runC needs. But ultimately
I think we should come up with agreement on what things should always
appear to be owned by the process's user.
> Though, I expect the> has_pid_permissions() check to be the harder one to pass. Why does> has_pid_permissions() succeed in the case you've given?
Because the group id of the container process is the same as the parent
process, so is_group_p() will succeed. Also hide_pid_min is set such
that it will work in either case.

>> In order to protect against ptrace(2) and similar attacks on container>> runtimes when they join namespaces, many runtimes set mm->dumpable to>> SUID_DUMP_DISABLE. However, doing this means that attempting to set up>> an unprivileged user namespace will fail because an unprivileged process>> can no longer access /proc/self/{setgroups,{uid,gid}_map} for the>> container process (which is the same uid as the runtime process).>>>> Fix this by changing pid_getattr to *also* change the owner of regular>> files that have a mode of 0644 (when the process is not dumpable). This>> ensures that the important /proc/[pid]/... files mentioned above are>> properly accessible by a container runtime in a rootless container>> context.>>>> The most blantant issue is that a non-dumpable process in a rootless>> container context is unable to open /proc/self/setgroups, because it>> doesn't own the file.>>>> int main(void)>> {>> prctl(PR_SET_DUMPABLE, 0, 0, 0, 0);>> unshare(CLONE_NEWUSER);>>>> /* This will fail. */>> int fd = open("/proc/self/setgroups", O_WRONLY);>> if (fd < 0)>> abort();>>>> return 0;>> }>> I do agree that failing to open anything in /proc/self/ is more than> unexepcted! I cannot judge the patch but my gut feeling tells me that> the fix should be somewhere in the open handler.
Maybe that would suffice as a more specific fix (for the special case of
/proc/self), but the fact that none of the users and groups are
correctly set in /proc/[pid] will cause issues for runC and other
container runtimes (because they don't go through /proc/self -- it's
accessing /proc/[pid] from another process).
Though I get the feeling that the *correct* fix would be to remove the
conditional and *always* change the owner -- maybe I'm missing something
but I can't think of the security issue that this code currently fixes
(since all of the important permission checks are *in addition* to the
generic_permission used for /proc/self/..., which use ptrace_may_access).