On Tue, Aug 06, 2013 at 04:03:21PM +0200, Christoph Hellwig wrote:
> On Tue, Aug 06, 2013 at 09:42:54AM -0400, Rich Felker wrote:
> > > As told you earlier on linux-kernel just send a patch with your semantics
> >
> > Apologies, I did not see the reply, and I'm still looking for it. I
> > should have put the request to CC me more prominently in the email...
>
> Sorry, it actually was libc-alpha that I replied to. I didn't notice
> you sent two slightly different messages instead of a having a cross-posted
> discussion, which would have been more useful.
I agree totally. That's why I cross-posted this new thread.
> > > to lkml. We're not going to reserve a value for a namespace that is
> > > reserved for the kernel to implement something that should better
> > > be done in kernel space.
> >
> > Did you mean "that should better be done in user space"?
>
> No. It should be done in kernelspace, just like all other O_ flags.
OK, I was just confused by your wording.
> > Whether O_SEARCH and O_EXEC are provided fully natively by the kernel
> > or handled by userspace, either way a reserved value in the open flags
> > must be set aside. Otherwise any value used by the userspace
> > implementation would risk conflicting with future kernel features
> > using the same bit(s).
>
> No flag is going to get reserved without a proper (kernel-level)
> implementation.
This is frustrating because early on in the O_PATH discussions on LKML
when it was first added, there were requests for O_SEARCH and O_EXEC
semantics in the kernel, and these requests were rejected with the
response being roughly "you can do it in userspace using the more
general O_PATH approach". So we have two contradictory conditions:
- O_SEARCH/O_EXEC semantics won't be added in the kernel because you
can do it in userspace with O_PATH.
- O_SEARCH/O_EXEC can't be added in userspace because they can't be
assigned a value without having an implementation in kernelspace.
If there's a willingness to override/drop that previous decision
(which I believe Linus was in on, but I'd have to search for the old
threads again) then I can propose a patch. As far as I can tell, the
simplest implementation would be to follow the O_PATH code path but
include a check for this new mode and fail at the point of opening a
symlink where O_NOFOLLOW is processed. I am not sufficiently familiar
with this code to write the patch yet, but I can try to learn it. My
guess is that the patch would be less than 20 lines, half of it being
a change for the top-level O_PATH logic in openat that strips other
flags when O_PATH is present and half of it being
If I do this, do you have a recommendation on the value to use? My
guess for the best choice would be O_PATH|3, so that O_PATH, O_SEARCH,
O_EXEC, O_RDONLY, O_WRONLY, and O_RDWR can all fall under O_ACCMODE
without adding more than one bit to O_ACCMODE. If we do it this way,
the patch should also make it so the extra bits (bits 0 and 1) set at
open time should be preserved when fcntl(F_GETFL) is called so that
the application correctly sees the access mode it requested.
Really, my preference would be if O_PATH could be changed to honor
O_NOFOLLOW just like other open types, and a new O_SYMLINK could be
added to open the link itself, but this would be an incompatible
change in the kernel API and I fully agree that would not be
appropriate.
Rich