Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

The video explains it, but it allows programs to 'drop' capabilities they no longer need.
For example, tcpdump needs root access to open the network interface, but after that it can give up those capabilities, so if there is a bug in tcpdump and it gets compromised by a maliciously crafted packet, the attacker does not have an excess privileges to exploit.

The video explains it, but it allows programs to 'drop' capabilities they no longer need.

For example, tcpdump needs root access to open the network interface, but after that it can give up those capabilities, so if there is a bug in tcpdump and it gets compromised by a maliciously crafted packet, the attacker does not have an excess privileges to exploit.

Actually it's different than "needs root access" and "it can give up". Tcpdump needs to open a raw socket, which is a particular capability (which can be granted without root access). The capabilities are quite fine grained.

If you give tcpdump the "-w" option to save the capture to a file, it will keep the "file-write" capability (but not "file-read" or "file-seek"), but otherwise drop "file-write". If you say not to do hostname looks up ("-n"), then the "dns-lookup" capabiltiy is dropped.

Dropping privileges means that the application (or one of its sub-processes) gives up the ability to perform a certain class of operations, such as switching user ID, binding to ports below 1024 etc. This is already done in a lot of networking applications. If I understood it correctly, using capabilities means that you cannot perform an operation unless you have some kind of token (such as a file descriptor) that represents the resource and the type of access you're allowed on that resource.

What I understood from the presentation is that in Capsicum your process starts to run with traditional access control. You acquire the capabilities you'll need and then you switch into a different mode in which resources can only be accessed using capabilities. In other words, you drop all possible operations at once, not a selected subset. Except when using capabilities you acquired in advance or were given by a different (sub-)process, you can do nothing at all with resources.

I still say that it has to do with what level you want to think of it. If you look at the mechanism by which a policy is achieved, yes, a capability-based system is quite different from the typical Unix model.

But in both cases, you have a program that's saying "if I get told to do something I shouldn't (e.g. by being compromised), I want to make sure I can't do something -- I know, I'll stop my future self from doing it by removing the ability." In the traditional Unix model you do that by changing users, i

One difference is that if you drop privileges, you cannot regain them, while capabilities can be passed from one process to another. For example, if a web site asks for access to your web cam, it would be good if the user can allow or deny that on a case-by-case basis. With capabilities, the capability to access the web cam can be transferred to the tab sandbox that asked for it. With privilege separation, you'd either have to give all sandboxes the ability to access the web cam, thus risking exploited tabs

So instead of dropping the privilege to write files at all, which would not be feasible in many applications, you would have a file descriptor to a directory that allows writing files under that directory.

That's an interesting choice of example, since I've often seen the *opposite* used to demonstrate capability systems; specifically that we can keep a filesystem capability in our shell and do everything else with stdio and pipes. For example, rather than giving a filesystem capability to "cp", we can get rid of "cp" altogether and use "cat file2". Since "cat" is only operating on stdio, we don't need to give it any filesystem capability.

Root access is more like a "privilege" than a "capability". It lets you do whatever you want in a global namespace.

There's potential for confusion here, because there are things that Linux has called "capabilities" for a long time, but they are not "capabilities" in the (even older) sense that Capsicum uses. Linux process "capabilities" like CAP_CHOWN or CAP_NET_RAW are not "capabilities" in the sense Capsicum sense. They're just basically limited subsets of root privilege.

Capsicum is a lightweight OS capability and sandbox framework developed at the University of Cambridge Computer Laboratory, supported by a grant from Google. Capsicum extends the POSIX API, providing several new OS primitives to support object-capability security on UNIX-like operating systems

Much of this sounds familiar. Fine grained rights on file handles sounds awfully like SELinux, which is itself merely an implementation of access control. Though it sounds like Capiscum has left it up to the app to decide what rights it needs, whereas SELinux maintains a big file of extended rights, basically a big extension to the old UNIX security model of "rwx" for owner, group, and world. Last time I tried SELinux, many years ago, I found I was always having to expand privileges so that utilities and

Capiscum is not a way for the OS to enforce security, but a way for the application to also enforce it. It's just an extra layer, meant to keep application flaws from breaking out. If you don't trust the app, then it doesn't matter.

An example of this is a web server. Say you need to load config files once when the web server is starting up. You give the web server file system access at the OS level, but then the web server loads the files then promptly drops permissions. This means the app willfully give

> Last time I tried SELinux, many years ago, I found I was always having to expand privileges so that utilities and apps could do their jobs.> We finally said the heck with it, and gave pretty much every permission to every program.

Other people had the same problem and reported exactly what the log said, so the distro default policies have been updated. That pretty much solved the problem, so SELinux is ready for you to give it another try.

> but it doesn't sound too secure, leaving it up to the app to police itself.> We've seen how well that didn't work in places like Wall Street. Is Capiscum's real security the sandboxing?

As a simple example, many web server exploits write files to/var/tmp and then to/sbin. It's not that Apache or Nginx is malicious, Apache is being tricked into writing those files. With Capsicum, Apache would, on start-up, declare to the OS "don't let me write any files outside of cgi-data/."

There are a few big differences. The first is that SELinux is really an extension of the ACL approach: you have a big matrix with things that can do stuff, and things that can have stuff done to them, and a bit in each position indicating whether that combination is allowed. SELinux (and most ACL implementations) compress this, because a matrix with one row for every process-user pair and one column for every filesystem or kernel object would be huge. The goal of things like SELinux is for system adminis

Capsicum is a genus of flowering plants in the nightshade family Solanaceae. Its species are native to the Americas, where they have been cultivated for thousands of years. In modern times, it is cultivated worldwide, and has become a key element in many regional cuisines. In addition to use as spices and food vegetables, capsicum has also found use in medicines....

The piquant (spicy) variety are commonly called chili peppers, or simply "chilies".

Google is funding this (both the direct research [cam.ac.uk] on FreeBSD and the port to the Linux kernel) because it addresses an aspect one level above the browser. Google Chrome would then be quite tightly sandboxed. This sure beats my method of running browsers as another user (I symlink ~/Downloads to my web user's version of that area and move things out of it quickly), especially since my method wouldn't do anything against actual privilege escalation (to root).

No, not really. It's just that modern OSs weren't designed for the damn security that hardware gives them, and they're too general purpose to utilize these hardware features properly. For instance: Instead of memory barriers and capability based security I've experimented with hypervisory mode sandboxing in some of my toy OSs. Every application thinks it in its own OS so instead of constantly verifying capabilities I can pre-allocate permitted resources and be fucking done.

They aren't as fine grained or powerful, I'll grant you, but they could easily be made to be so if Microsoft (or customers) cared enough. The problem is _they don't_ because the real-world security improvements you'll see from something like capsicum are minimal.

How does this compare to existing (coarse grained) Linux capabilities?

Linux capabilities are not capabilities in the classical sense of unforgeable tokens of authority. They are just permissions. Linux capabilities allow a non-root process to do some things as if they were root. Capsicum implements a traditional capability model, where there is no ambient authority and a sandboxed process can not do anything unless it holds the relevant capability.

How does this compare to SELinux?

They address different goals and do so in different ways. The goal of SELinux (and the FreeBSD MAC framework's type enforcement mode) is to allow a system administrator to restrict access of a particular program or user (or program-user pair) to some subset of system resources. It is very bad for application compartmentalisation, because you can't fork() a process and have different permissions for the different children and it's difficult to update the permissions on the fly (although Apple does this with their port of the FreeBSD MAC framework, to implement sandboxing on OS X and iOS, allowing powerboxes to grant permission to specific files or directories).

Capsicum is intended for application sandboxing. It is assumed that the user trusts the binary to be non-malicious, but the program author does not trust his code to be free of exploitable bugs. A sandboxed process should call cap_enter() early on, at which point it can't create any new file descriptors. In some simple cases, that's enough. For example, a sandboxed version of man opens the file containing the mdoc sources and then calls cap_enter(). Even if it's run as root, and is reading a malicious source file that exploits the troff parser and allows arbitrary code execution, it can't do anything other than write text to the terminal. More complex applications will keep open a UNIX domain socket to a (more) privileged process that can pass in new file descriptors as they're needed, after doing some application-specific policy checking.

Does this complement things like Linux's seccomp?

Seccomp is far more restrictive than Capscium and prevents even harmless system calls (e.g. getpid(), gettimeofday() - although not on platforms where these use VDSO, I believe, but others of equal utility and harmlessness are blocked). Capsicum allows a whitelisted set of calls. Additionally, Capsicum adds finer-grained permissions on file handles (for example, you can be able to read, but not mmap, or append but not seek) and a set of at-suffixed system calls that work on directory descriptors, such as openat that allows a sandboxed process to open files in a directory that it holds the relevant capability for. This means that, for example, you can give a sandboxed browser tab process a directory descriptor to a cache directory and it can write cache data there without needing any interposition from a more privileged process. It talks directly to the kernel.

Seccomp-bpf extends seccomp by allowing system calls to be blocked or allowed based on the execution of a BPF filter. This is more expressive than Capsicum (Google's first port of Capsicum implemented it in terms of seccomp-bpf, although it was slow and not complete), but it doesn't allow the policies to easily associate permissions data with file descriptors and requires you to implement a complex BPF policy.

What's the overhead compared to the above?

There is basically no overhead for capsicum. It's one extra bitmask check on each system call that interacts with a file descriptor, which is in the noise for most workloads[1].

In terms of programmer overhead, there's a summary table for the number of lines of code changed to implement sandboxing with various mechanisms in Chrome in the original Capsicum paper. Here's the short version:

Hey TR, thanks for the comprehensive replies (to be honest I thought I'd asked so late no one would see them) - you elaborated on things that I did not glean from the presentation. Well done for splitting secomp and secomp-bpf up too. I have a few more questions:

Does Capsicum only work at the process level? I can't have a more privileged thread that is still uncontained (i.e. still able to perform a blocked syscall) while other threads are contained?

Does Capsicum only work at the process level? I can't have a more privileged thread that is still uncontained (i.e. still able to perform a blocked syscall) while other threads are contained?

Yes. There's no point sandboxing a thread, because if you compromise one thread you can write over every other thread's memory, and trivially do ROP tricks to make another thread make the system calls on your behalf.

How do you envision codebases supporting Capsicum in a way that they leaves them still portable to platforms where Capsicum is not available? Is it going to be a case of #ifdefs all the way down?

The Capsicum APIs can mostly be #ifdef'd away. The things that restrict rights on a file descriptor and the cap_enter() syscall can just be turned into no-ops.

Would it be possible to make a sandbox program that uses Capsicum to in turn sandbox another (Capsicum-unaware) program that it goes on to run or is it likely going to be too restrictive for the second program?

You can't easily use it to isolate other unmodified programs. However, it is possible to use LD_PRELOAD to insert a shim libc that will trap any attempts to open files and will instead ask the parent process to pass it a file descriptor. That becomes a bit tricky because rtld also wants to open files (although it's now been modified to allow you to pass in some directory descriptors for search paths) and it must (obviously) run before any shared library code can run.

Improved security models like SE Linux or now capsicum are intellectually interesting, but do they solve a real problem?

After all there are really few working exploits in the wild against up to date Linux systems, and a non up- to date system will be hacked anyway, with or without fancy security models.

Isn't the whole point of dropping privileges (whether using something fancy like Capsicum, or simply switching to a non-root user after initialisation) to mitigate what happens after a process gets successfully attacked?
Whether a zero-day vuln or an old vuln that hasn't been patched, if the attacker can't do anything important once they get in, doesn't that help?

No software programmer is perfect. There will always be exploits, and those exploits often show up in use in the wild before they are patched out by the responsible party. The difference with Capsicum is that it shifts the load from the system administrator to the programmer. The administrator trusts that the programmer knows the limited set of privileges that their application needs, and drops all else. It's mitigating the harm that can be caused by an unpatched exploit, and doing it in a fashion that