Michael C. Toren:
> 1) Open a file descriptor pointing to the current working directory.
>
> 2) Create a temporary directory within the jail, and chroot() to it.
>
> 3) Using fchdir(), change the working directory to the file descriptor
> saved from step 1.

Oho, I hadn't seen that before. The chroot() in step 2 is required to
avoid the special case in the Kernel that checks to see if you are
doing ".." in the current root directory. But because you chrooted()
yourself somewhere else, the special case isn't exercised.

Older systems don't have fchdir(), which is a fairly recent addition.

With the proliferation of "f" calls in recent years (fchdir, fchmod,
fchown, fstat, fsync, etc.) I wonder what would be the result if the
Unix system interface were redesigned to eliminate the non-"f"
versions of the calls entirely. Instead, there would be a generic
function, which we might call "iname", which transforms a path name to
an "inode" structure:

struct inode * iname (const char *path);

Unix kernels already contain a function with this name that does this
job.

The system calls that formerly accepted path names are changed to require
an inode structure. So instead of

fd = open("dir/file", ...)

one now has

fd = open(iname("dir/file"), ...)

(There are some minor language and usability issues here: what if
iname() returns NULL? Ignore those; I want to discuss OS issues, not
language issues.)

There would be a function, analogous to iname(), that also returned an
inode structure, but which took an open file descriptor instead of a
path name:

struct inode * inode(int fd);

This is essentially equivalent to the fstat() function we have now.

chown() and fchown() would merge to become a single call that accepted
an inode structure; instead of:

chown("dir/file", owner)
fchown(fd, owner)

one would have:

chown(iname("dir/file"), owner)
chown(inode(fd), owner)

Similarly, instead of:

chdir(path);
fchdir(fd);

one would have:

chdir(iname(path));
chdir(inode(fd));

stat() and fstat() would not only merge but would disappear entirely;
the struct inode can do everything that the struct stat can do. This
code:

stat(&statbuf, "dir/file");
fstat(&statbuf, fd);

turns into this:

statbuf = iname("dir/file"));
statbuf = inode(fd);

There are some security implications to this idea. There needs to be
protection against counterfeiting an inode structure. For example,
consider a world-readable file in a secret, nonsearchable directory.
Suppose the file happens to have i-number 123456. If it's possible to
do this, then security has failed:

struct inode I;
I.inumber = 123456;
fd = open(I, O_RDWR);

It should be impossible for anyone to manufacture the struct inode
that represents the secret file without actually using iname()
somewhere along the line. A simple way to arrange this would be to
have the kernel cryptographically sign each struct inode. This can be
done inexpensively.

This still has some access implications. Consider a
world-readable file in a world-searchable directory. Process Ainame()s the file, obtaining its struct inode. The search
permissions on the directory are then removed. Process A can still
open the file. This is analogous to a similar situation in standard
Unix in which process A opens the file before the permissions are
changed, and can still read and write it afterwards. So that's not a
big change. What might be a big change is that A can dump the struct
inode to a file and the a different process can read it back again,
evading the increased access protections on the directory. The
cryptographic signature technique can fix this problem by restricting
struct inodes to be used by a single process.

Whether this is worth doing I don't know. My main idea in thinking it
up was to avoid the increasing duplication of system calls. Does
Unix need an "fsymlink" call? Does it need three different ones?

This also fixes some of the proliferation in the system call interface
between calls that work on symlinks and calls that work through
symlinks. For example, stat() and lstat(), and chown() and lchown().
On normal files, each pair is the same. But on a symlink, stat() stats
the pointed-to file while lstat() stats the symlink itself; similarly
chown() changes the owner of the pointed-to file while lchown()
changes the owner of the symlink itself. But where's lchmod()? What
about llink()? There's no way to make a hard link to a symbolic
link! With the inode() / iname() technique above, you only need one
extra call to handle all possible operations on a symbolic link:

where liname() is just like iname(), except that if the resulting file
is a symbolic link, its inode is returned immediately; iname() would
have read the target of the symbolic link and called itself
recursively to resolve the target.

It also seems to me that this interface might make it easier to
communicate open files from one process to another. Some unix systems
offer a experimental features for passing file descriptors around;
this system only requires that the struct inode be communicated
directly to the receiving process.