I'm Tav, a 29yr old from London. I enjoy working on large-scale
social, economic and technological systems.

Note: This article isn't about securing the Python interpreter against
crashes/segfaults or exhaustion of resources attacks. For help with that, take a
look at the excellent sandboxing features of PyPy. Those of you wanting
to just know about the practical applications of this, scroll down to the bottom
of the article =)

There have been many attempts to secure the Python interpreter so that untrusted
code can be safely executed alongside trusted code. Working attempts like
RestrictedPython and
zope.proxy unfortunately come at a
high cost in terms of performance.

Old-school Python hackers would probably remember the deprecated rexec
module which used to be enabled in the standard library. This module, along with
it's Bastion sibling, provided a framework for “restricted execution” of
Python code.

The rexec module encouraged a certain design pattern which depended on class
attributes being kept “private” from untrusted code. Unfortunately, Python's
introspection powers are heavily geared against this and there are many many
dark corners from which one can peer deep into the heart of classes.

So it was no surprise that, soon after the introduction of new-style classes in
Python, rexec was dumped. And all hopes of securing the Python interpreter in an
efficient way went the way of Plan 9.

Now, those in the security world are probably aware of the Object Capability
model of security as
pioneered by the likes of the Actors model and the E language. Entire Operating Systems have been implemented free
of viruses thanks to this model.

For a long while I have
felt that there exists a major subset of Python that is suited for use through
the object capability model. After all capabilities are just non-forgeable
references. We already have this in Python.

The next step is to simply ensure that there is no global shared state. And
whilst a lot of existing code uses global shared state, there is nothing in the
Python language that imposes this limitation. Thus it should be possible to
isolate a capability-secure subset of Python and build up from there.

Since I've had this insight, the Google Caja project have done the exact same for
Javascript. They identified a capability-secure subset of Javascript and have
built up from there…

So how can we Python hackers get beyond shared state? After all, there is no
“private” in Python. Right?

But what about Python's various introspective powers you ask? Unlike the deep
plumbing of classes, Python's functions are relatively isolated and makes our
life much easier. This makes sense when you realise that Python classes are
actually syntactic sugar and sets of protocols on top of functions.

But functions aren't opaque beasts by default. There are a number of variables
which “leak” information. The ones I identified were:

As you can see this is a pretty small list. (Especial thanks to Paul Cannon for being the first with his hardcore hack to show that
frame objects are accessible.)

Now this list is in no way the definitive final list. The Python challenge is
still ongoing — try safelite.py yourself and see if
you can find more! But the fact that there have been no new exploits in the last
24 hours despite over a 1,000 unique downloads of safelite.py in the same time
gives me some confidence that we are getting towards a comprehensive list.

If we can ensure that untrusted code will never be able to access the final list
of these variables, then we can ensure that “private” data using closures stays
private. And from that basis, we can start building an object capability
framework in Python!!

In safelite.py, I use ctypes to completely remove these variables from the
Python interpreter. This is a neat approach which Phillip J. Ebyshowed me and means
that we can start building an object capability framework in Python today!

The flip-side of removing these variables however is that the code which uses
these variables won't work. Boo! So I made getter functions like
sys.get_func_code and patched the handful of functions in the standard
library like inspect.getargspec to use these instead.

The idea being that trusted code would have a reference to the sys module
and be able to use them whilst untrusted code would not. But Guido van Rossum — in the conversation that started here — convinced
me that Python already has the support for doing this!

And this is where our old friend rexec deserves some thanking. It turns out that
rexec is only one half of Python's restricted execution support. The other half
has been living inside the Python Interpreter for well over a decade. For the
sake of simplicity let's call this PIRE — Python Interpreter's Restricted
Execution.

And since there is seemingly no comprehensive documentation of PIRE, I'll
provide a summary here.

Whenever you read/write an attribute on one of Python's builtin objects, it will
raise a RuntimeError stating that the attributed is restricted if both of
the following conditions are true:

The attribute has a READ_RESTRICTED and/or WRITE_RESTRICTED flag set.

PyEval_GetRestricted() returns True.

The flags are set when members of an object are defined. For example, in
funcobject.c we find:

In other words, it checks to see if the __builtins__ variable in the current
execution frame is the exact same as the default __builtin__ module [Note
the difference in spelling of the two variables]. If they differ, restricted
execution is assumed.

Now the eagle-eyed amongst you would have noticed the import of the inspect
module above. We will use this to show how trusted code can still access
restricted attributes whilst within restricted execution. The inspect module has
a useful getargspec function which accesses restricted attributes to find a
function's signature. And, as we can see, it works even in restricted execution
mode:

Why does this work? Because the scope in which getargspec was defined
didn't have a custom __builtins__ and this was captured in the
getargspec.func_globals. This is just genius! And it provides us with a
framework on top of which we can build the object capability secure Python.

All we need to do is add the identified set of leak variables to the existing
set of restricted attributes. For those who are not familiar with the internals
of PIRE, I present a summary here of the current (in Python's SVN trunk) set
of restricted attributes.

The bitwise-OR-able flag contants are defined in structmember.h:

READ_RESTRICTED

Not readable in restricted mode.

WRITE_RESTRICTED

Not writable in restricted mode.

RESTRICTED

Not readable or writable in restricted mode.

In classobject.c, instance method objects:

im_class

RESTRICTED

im_func

RESTRICTED

__func__

RESTRICTED

im_self

RESTRICTED

__self__

RESTRICTED

In classobject.c, class objects:

__dict__

RESTRICTED

__class__

WRITE_RESTRICTED

In classobject.c, instance objects:

__dict__

RESTRICTED

__class__

RESTRICTED

In cPickle.c:

A private copy of the Pickler registry tables is used when
PyEval_GetRestricted().

In fileobject.c:

The file() constructor will raise an error when PyEval_GetRestricted().

In funcobject.c, function objects:

func_closure

RESTRICTED

__closure__

RESTRICTED

func_code

RESTRICTED

__code__

RESTRICTED

func_defaults

RESTRICTED

__defaults__

RESTRICTED

func_dict

RESTRICTED

__dict__

RESTRICTED

func_doc

WRITE_RESTRICTED

__doc__

WRITE_RESTRICTED

func_globals

RESTRICTED

__globals__

RESTRICTED

func_name

WRITE_RESTRICTED

__name__

WRITE_RESTRICTED

__module__

WRITE_RESTRICTED

In marshal.c:

Unmarshalling code objects will raise an error when PyEval_GetRestricted().

In methodobject.c, bultin functions:

__self__

RESTRICTED

__module__

WRITE_RESTRICTED

As you can see some of the “leak” attributes that I want to restrict are already
restricted in Python! All we need to do is add the following changes:

In codeobject.c:

Creating new code objects directly will raise an error when
PyEval_GetRestricted().

In frameobject.c:

All attributes of Frame objects are restricted except for f_restricted.

In genobject.c:

gi_code

RESTRICTED

gi_frame

RESTRICTED

In typeobject.c:

__subclasses__

RESTRICTED

The nice thing about this is that we can then use it in environments like
Google App Engine, where we cannot use
the ctypes-based approach.

With this patch in place (and assuming that there aren't more “leak” attributes
lying around), we can start building up a true, secure, object-capability
framework in Python.

We'd need to add things like import mechanisms and start whitelisting builtin
functions for use. This is a big undertaking and is one that I am committed to
— and will appreciate fellow collaborators who want to make this happen. That
includes you hopefully =)

Now, some of you may be wondering what the fuss is? Why bother creating such an
object capability framework in Python? For that let me give you a few use cases.
All on App Engine.

Custom Templates by Users

Web applications like Blogger don't allow users to customise their blogs using a
rich language. Instead they have a proprietary templating system which for the
most part is just variable substitution.

Imagine instead if you could let your users use a templating language like
Genshi. Users could have the full expresivity
of the Python language to generate the output they want.

The problem with letting users do that today is that they would be able to use
it to get at the rest of your application and start doing evil things to your
database.

But with an object capability based framework in place, you could give users the
capability to execute Genshi templates without worrying about them somehow
getting access to your database.

And the nice thing about App Engine is that they already have something similar
to PyPy's sandbox running — so your users won't be able to segfault your
processes.

UserScripts: Python Services in Apps

Web applications like Twitter and Facebook provide APIs which let developers write services
which run on their own servers. Imagine instead a ‘Plex’ application on App
Engine which allowed users to create and run arbitrary Python services on their
data.

Not only would this save resources — how many copies of Twitter's database are
there?? — but it could allow for interesting and composable services. Perhaps
even a command line for the internet?

Services could be provided with a minimal __builtins__ which allowed them to
access the current user's data and not anyone else's.