Another factor to consider is how you define what “your data” is. For example, if you look at it as just exporting your photos out of Picasa and importing them to flickr, I’d posit that’s a rather simplistic view. A large part of what makes your data useful and valuable is all the relationships associated with it. I share my photos with my friends and family, I license some under Creative Commons, I group them, I tag them – all of these make my data very context rich. How do you liberate this context? And if you do, what does it mean to import them elsewhere?

On a public forum I used to frequent, one user used to immediately delete all his posts whenever he lost an argument. In the context of Data Liberation, this could be considered a good thing: his posts were his data, not the property of the company (or rather, the volunteer community member) hosting the data. But on the other hand, his behavior also made entire conversations completely inscrutable to everyone else in the community. What used to be an interesting public dialogue between two people suddenly became one person talking at a wall.

It’s very easy to assume that the things we create are ours, and not some corporation’s: but what happens when you give what you created to someone, or to a community, or to the public? Does the ownership of that information become theirs to any extent?

If you take a photograph and give it to your grandma, what kind of rights should you have to take it back? Should grandma have the freedom to copy the photo you gave her—by posting it to your photo stream on Flickr—to her computer’s hard drive before you delete it from Flickr? Or should you have the freedom to be able to magically zap your data from her hard drive?

Since learning JavaScript over a year ago, it’s become one of my favorite dynamic programming languages alongside Python. And as I’ve mentioned before, I think the two languages actually complement each other pretty well.

Python, at its heart, is a platform that’s built to be extended. The evidence for this is plentiful: there’s modules and packages out there that offer practically any functionality you want, from web servers to 3D game engines to natural language processing toolkits and more, all instantly accessible through a simple command or an installer download. Yet one of the costs of all this generativity has been the fact that Python doesn’t really have much of a security model to speak of: any Python program has as much access to the underlying system as the current user does, which, compared to the Web, is basically omnipotence. Creating programs that obey the principle of least privilege is pretty hard.

JavaScript, on the other hand, has many of the opposite problems. For one thing, it’s really built for embedding: until the very recent advent of CommonJS and Narwhal, for instance, the language has always lacked a general-purpose platform and standard library. A Pythonic way of saying this is that the language doesn’t come with “batteries included”, but this can actually be a good thing from a security standpoint: because the simplest possible embedding has no privileges and needs to be explicitly given all its capabilities by its embedder, it’s very easy to follow the principle of least privilege. Recent work on membranes and capability models puts JavaScript way ahead of many other languages in the security realm, yet the lack of a mature general-purpose platform has meant that anyone who’s wanted to leverage these strengths has always had to muck around in C/C++ to create the kind of embedding they wanted.

Well, to an extent. One of the many aspects of Java that I’ve frequently been envious of has been Rhino, a JavaScript engine written entirely in Java, which allows anyone who knows Java to create their own embedding solution that leverages Java’s strengths. But I prefer Python to Java, and moreover, the engine itself isn’t worked on with as much intensity as the JS engines that power real-world consumer products like V8 and SpiderMonkey—so new language features are slow to be implemented and performance isn’t great.

I’d briefly tried resurrecting John J. Lee’s Python-Spidermonkey last year but I soon discovered that it wasn’t really what I wanted. For instance, JS objects were copied into Python approximations as they crossed the language boundary, which resulted in a “lossy” transfer and prevented features like identity perseverance. It was essentially a high-level wrapper created to solve a specific problem, rather than a low-level tool intended to enable any kind of wrapping based on context (e.g., how trusted the JS code is).

Introductions

In part because of all this, and in part because I’d always wanted to write a Python C extension from scratch, I’ve decided to create a new Python-Spidermonkey bridge: Pydermonkey.

Pydermonkey’s mission is pretty simple and straightforward: it’s just meant to wrap Spidermonkey’s C API as faithfully as possible—including its debugging API—while enforcing the memory safety that Python is known for. This makes it awfully low-level for casual programmers, but thanks to Python’s awesome support for magic methods, it’s not hard to create high-level wrappers that provide much more convenient bridging between JavaScript and Python code.

Where It’s At

Pydermonkey is currently at version 0.0.6; its API supports a decent subset of the Spidermonkey C API, but it’s still quite lacking in places: operation callbacks will allow you to run untrusted code that runs in infinite loops, throw hooks allow for full Python-esque stack tracebacks of JS code, yet property catchalls haven’t yet been implemented, which means that security is constrained to conventional sandboxing (membranes and object capabilities aren’t currently possible). There’s also the nasty problem of not being able to detect reference cycles that cross language boundaries, which means that such cycles need to be broken manually for now.

Getting It

Pydermonkey is available at the Python Package Index in source form, and as a precompiled binary for the few platforms that I happen to have access to at the moment.

You should be able to type easy_install pydermonkey at the command line and everything should “just work”: I’ve set up the Paver build script such that the Spidermonkey source code is automatically downloaded and built before the C extension if you’ve got the compiler toolchain on your system, though there are a few snags on Windows to circumnavigate. For more information, read the Pydermonkey documentation. And please feel free to file a bug if you run into one!

Where To Go From Here

If you’d like to see an example of a high-level wrapper, check out my Pydertron experiment. It provides a simple interface to expose untrusted JS functionality to Python code and also contains a CommonJS-compliant implementation of the SecurableModule standard. I’m also playing around with creating a Pydermonkey engine for Narwhal on github; contributions to any of these codebases are more than welcome, and there’s some low-hanging fruit in Pydermonkey that would be perfect for students or first-time contributors.

Finally, if you do anything interesting with Pydermonkey, I’d love to know about it.

Every time I think about why I like the open web, I basically think of how well it fits with the way I learned to use and program computers as a kid: my first computer, an Atari 400, came with everything I needed to do programming, and I (or my parents) didn’t have to spend hundreds of dollars or sign an NDA to get a development tool.

All of this was easy enough for a child to grasp—often far easier, as Jef Raskin observed in The Humane Interface, than today’s development tools. But being able to use a tool that provided an incredibly low barrier to generativity is something that I value a lot about my childhood. It’s in part where a lot of the real passion and excitement for open source and the Open Web come from: people like me see in them the qualities that made them truly excited about computers as a kid. Qualities that we’re constantly in danger of losing today as the field becomes more professionalized and controlled.

So that got me thinking about Drumbeat again: what if promotional materials for the Open Web focused on how it makes lives better for children who are budding hackers? Lots of adults aren’t tech savvy, but they know that their kids are, and if we can prove that the Open Web is better for their kids, and that they can make their kids’ lives better by choosing a standards-compliant browser, maybe they will.

Lately I’ve been thinking a bit about Drumbeat, and what the Open Web actually means to me. This morning, I came across an article by Katherine Mangu-Ward titled Transparency Chic which reminded me about a few of its most important aspects.

Transparency Chic discusses a Firefox addon called RECAP which helps make U.S. Judicial Records as freely-searchable as everything in Google by taking any of the free information browsed through PACER, the Federal court system’s clunky web-based database that charges eight cents per page, and submits it automatically to a free Internet archive.

One of the foundational principles of the Internet RECAP reminds me of is Jonathan Zittrain’s notion, explained in The Future of the Internet, that the endpoint matters. Cell phones, console gaming systems, and PCs are some of the destinations of the information and functionality that the Internet is built to transmit. Yet only the PC unilaterally provides its user with an extraordinary amount of control to alter any aspect of its behavior through third-party software. If it weren’t for this fact, and if it weren’t for the generativity enabled by Firefox exposing its internals to addon developers—that “freedom at the endpoint”—a subversive-yet-legal tool like RECAP simply couldn’t exist and be so accessible to so many people at once.

Of course, this isn’t to say that freedom at the endpoint doesn’t carry with it a slew of safety concerns, like viruses and malware—but these are problems we want to be able to solve without losing the freedom that makes our endpoints as innovative as they are. Drumbeat should raise awareness about this notion because it’s a freedom most of us take for granted, and it’s one that could easily disappear if stewards aren’t there to protect it.

One of the recurring issues that the Mozilla platform team has to contend with is the issue of how to allow trusted, privileged JavaScript code to interact with untrusted JavaScript code. Google’s Caja team actually has to deal with a very similar problem, albeit at a different layer in the technology stack.

This issue is quite subtle, and fully explaining it is beyond the scope of this blog post. If you know JavaScript, I recommend checking out the Caja Specification, which nicely lays out the problems inherent in running code with different trust levels in the same environment.

Firefox has to deal with this issue because much of it is actually written in JavaScript. Developers call the JS that powers Firefox chrome JavaScript: it has the ability to write to the filesystem, launch other programs on your computer, and pretty much anything that Firefox itself can do. The code that runs in web pages, on the other hand, is called content JavaScript. Chrome and content JS can interact with each other securely thanks to XPConnect wrappers: little layers of code that “wrap” objects and mediate access between them and the outside world. The self-proclaimed WrapMaster and implementer of most of these wrappers is Blake Kaplan, known in some circles as “Mr. B-Kap” (mrbkap).

Google Caja’s team also has a need for the same kind of functionality, but at a different level: they need to make it possible for web pages themselves able to run code that they don’t trust, which is useful when creating plug-in frameworks for web applications. The Caja team calls wrappers membranes—a word which I find more intuitive than “wrappers” because it’s not an overloaded term in computer science and because its biological definition closely matches that of its CS counterpart.

As I wrote in Jetpack: Summer 2009 State of Security, Part 1, the boundary between trusted and untrusted code has been of some concern to the Jetpack project. Unfortunately, all the XPConnect wrappers currently in Firefox have very specific purposes: for instance, most of them are made expressly to prevent omnipotent chrome code from being exploited by impotent content code. Jetpack’s needs are unique in that a Jetpack feature should be neither as omnipotent as Firefox, nor as impotent as a web page: ideally, we should follow the principle of least privilege and give it the minimum set of capabilities it needs to do its task, and no more.

After talking with the Firefox JS and Google Caja teams, we decided that wrappers were the right kind of solution to Jetpack’s security challenges. The problem was, though, that all of Firefox’s wrappers are in C++, which made them hard to experiment with. Jetpack is, after all, a Labs project, and as such, we needed a sort of “flexible membrane” whose security characteristics we could easily change as the platform evolved. So we decided to expose some functionality to chrome JavaScript that’s traditionally only available to C/C++ code.

One nice aspect of the flexible membranes we’ve created is that they’re useful for more than just prototyping membranes: they effectively allow chrome JS to create objects with characteristics that the JavaScript language doesn’t traditionally make room for, like catch-alls for object properties. Python programmers know of these by names like __getattr__ and __setattr__, and many other dynamic languages have them, but JavaScript doesn’t—yet something like them is needed to implement basic Web APIs like HTML5 localStorage. In other words, these flexible membranes should make it easy for us to develop nicer APIs for Jetpack.

If you’re interested in digging into these flexible membranes, check out our Binary Components documentation on the wiki. And feel free to take the pre-compiled component from our HG repository and use it in your own Firefox extensions.

Back in 2001, I made a satirical site for Nike Sweatshops, arguing that poverty is a great thing for capitalism.

Poverty is a great thing for capitalism, but Tim Harford’s The Undercover Economist—which I recently picked up from Dog Eared Books and finished this morning—offers an excellent explanation for why sweatshops and similar forms of foreign investment are ultimately a good thing for the world.

What impresses me most about The Undercover Economist is Harford’s underlying humanitarianism. This is someone who thinks that free markets are beautiful, yet who also believes that anyone who loses their job for reasons beyond their control deserves help and support. For anyone who’s grown weary of the demonization of the modern corporation—yet who nonetheless is skeptical of the benefits of a free-market economy—this book offers a refreshing perspective on the world and human behavior.

At Mozilla we get the opportunity to design the back of our business card. As I’ve written about before, Mozilla is a unique hybrid organization with a mission that lots of people don’t know about. It’s often hard to communicate to others in passing, so I decided to put it on my business card:

I don’t really expect many people to read it, but at least it’s out there for anyone who wants to learn more. Mozilla gives us a “budget” of 250 cards to order, so I’ve only ordered 75 of these; I’ll come up with something more visual and fun for the other cards.

Over the past few weeks I’ve had the pleasure of working with Dion Almaer on a Browser Memory Tool Prototype. This has been a lot of fun for me; for one thing, I’ve always wanted to help developers diagnose the problem of “I’ve been running my web app/Firefox extension for 8 hours, why’s it taking up 800 megabytes of RAM?”. And I’ve also always wanted to have an excuse to learn about the internals of SpiderMonkey, Mozilla’s JavaScript engine, and play with its C API. So working on this tool has helped me kill two birds with one stone.

The architecture we decided to use for the memory tool was particularly fun to design and implement. Because much of Firefox itself is implemented in JavaScript, we need to freeze the whole application in order to profile its memory use. The problem with this, though, is that being barred from using JS to implement the memory profiler itself would be a bummer—especially considering that we didn’t know the problem domain terribly well and would therefore need the freedom to easily experiment to find the best solution.

Fortunately, it turns out that SpiderMonkey was designed to support multiple instances of what’s called a JavaScript Runtime. From the JSAPI User Guide:

A JSRuntime, or runtime, is the space in which the JavaScript variables, objects, scripts, and contexts used by your application are allocated. Every JSContext and every object in an application lives within a JSRuntime. They cannot travel to other runtimes or be shared across runtimes. Most applications only need one runtime.

All the JavaScript code in Firefox—whether it belongs to the Mozilla platform, an extension, or a web page—executes in the same runtime. We’ll call that runtime the “Firefox runtime”. The trick to implementing a memory profiler in JavaScript itself was just to “freeze” the Firefox runtime and create a new runtime, which we’ll call the “memory profiling runtime”, to peek into the Firefox runtime via a simple API. We also added a simple blocking socket object to the memory profiling runtime, which allows it to embed a web server that a separate process could connect to—in this case, the cool Memory Tool Ajax Application that Dion made.

Since Dion and I were really the only ones writing code for the memory profiling runtime, we haven’t actually documented the API yet. It’s pretty simple, though; every object in the Firefox runtime is referred to by a unique integer ID, and various functions can be used to get metadata about an object. For instance, getObjectInfo(id) returns a JSON-able data structure containing information about the object with the given ID, such as its prototype, its parent (i.e., its global scope), what other objects it points to, and so forth. Dion then used this API to write a memory profiler server.

One nice thing about the memory profiling runtime, though, is that it doesn’t have to embed a web server: it could just get some information about the heap and return it as a JSON object to the Firefox runtime, which could then do something useful with it. There’s lots of interesting things we’d like to eventually see—for instance, a visualization of the JS heap over time would be pretty cool.

Of course, we also ran into our own share of problems, many of which we’re still trying to resolve. They’re pretty technical in nature, but you’re welcome to read the write-up on our Memory Profiling Notes from July 2009 on the wiki.

Creating this tool wouldn’t have been possible without the excellent SpiderMonkey documentation on the Mozilla Developer Center or the friendly folks on #jsapi at irc.mozilla.org—particularly David Baron and Blake Kaplan. I’m definitely looking forward to tinkering more with the JSAPI in the future, and working with Dion and Ben to make developing for the Open Web a lot more fun.

Update: I’ve since documented the API for the memory profiling binary component and runtime.

Security is hard! It’s tough enough designing a platform that’s powerful, well-documented, and easy to use; but what about security? If we aren’t careful, adding a incorrectly tuned or naive security model negatively affects generativity and usability. Jetpack needs to balance all three.

The following is something I wrote at the beginning of June, but didn’t post until now because I’ve had my head in code for a bit too long. What follows is still accurate, except in cases where noted otherwise. In my next post, I’ll explain the research and tools we’ve created so far to solve the problem of security.

In order to understand Jetpack’s current security model, one has to first understand how Jetpack got to where it is.

Previous History

In May of 2009, I implemented the initial Jetpack prototype. This was originally intended to be securable in the sense that a Jetpack Feature (sometimes called “a Jetpack”) should be sandboxed within either a standard Web page or a Components.utils.Sandbox with a limited principal and be given only the objects that it needed to do what it wanted to accomplish.

However, this quickly met with difficulties due to either limitations in the Mozilla platform or limitations in my understanding of it. For instance, the simple use case of a single JS script communicating securely with two open Web pages via XPCNativeWrapper objects proved difficult. Here’s some code from the Unad demo included with the Jetpack prototype:

$(widget).click(toggleState);

In this case, widget is the XPCNativeWrapped contentDocument property of an iframe XUL element with content privileges embedded in the status bar, which contains an HTML file on the server hosting the Jetpack. The $() function is a jQuery-like interface to simplify DOM manipulation at a level that is possible with XPCNativeWrapped objects.

In this case, doc is the XPCNativeWrapped HTML document object for a browser tab whose DOMContentLoaded event has just fired. blocklist is a simple interface for detecting whether a URL matches a known list of domains whose content should be removed from the page (presumably because they contain unwanted advertisements).

Ideally, to make this as easy on the developer as possible while remaining secure, both of the above code snippets should be contained in the same script, and executed within the same global object so that variables and so forth can be freely accessed without having to go through a cumbersome low-level barrier, such as a JSON message-passing bridge.

As far as I could find, however, such a solution wasn’t possible due to the Mozilla platform’s “binary” approach to codebase principals: either a script’s security principal could be associated with a single domain, in which case it could access the contents of pages on a single domain easily but not those of pages on other domains; or it could be associated with chrome, in which case it had free access to everything on the end-user’s system via the Components global. There appeared to be no “in-between” principal allowing for more granular permissions that would allow unfettered access to certain things (for instance, the DOM structures of pages on two or three explicitly-mentioned domains) while restricting access to others.

Above all, the Jetpack team believed that the initial prototype should focus on making the platform as easy and generative as possible. The main reason for this was simply that we didn’t actually know what people were going to do with such creative power once they had immediate access to it: better in the early stages to allow them to experiment unfettered, which allows us to see what they build and develop clean, secure APIs to encapsulate the kind of functionality they need.

For the reasons outlined above, we decided to go the route of giving Jetpacks a chrome codebase principal.

The Original Security Model

The original, highly-tentative plan was simply to allow Jetpacks to remain chrome-privileged, but introduce lightweight sandboxing mechanisms that would ultimately allow the Jetpack script to function as a high-privileged broker between less-privileged code. An early attempt at this can be seen in the Unad demo previously mentioned: the status bar panel is a content-space iframe, and the block-listing logic is contained in a locked-down script—actually a naive implementation of a SecurableModule, which I’ll talk about later—and the Jetpack script itself manages everything else.

It was then envisioned that Jetpacks could be made secure by encouraging developers to minimize the amount of code placed into their chrome-privileged Jetpack script; we could then rely on a healthy code review community to perform reviews of Jetpacks to ensure they were non-malicious. Further easing this process would be the simplicity and power of the Jetpack API: the benefit of replacing 20 lines of XPCOM boilerplate code to retrieve the clipboard contents with a simple call to jetpack.os.clipboard.get() not only makes the developer’s life easier, but the code reviewer’s as well.

After discussing this model with Mike Connor and Lucas Adamski on June 1, 2009, however, it was quickly found that this model had a number of vulnerabilities:

Code reviews don’t scale well in relation to the magnitude of code created in an extension model. Even in a heathy code review community where members have excellent social incentives to perform good reviews, performing a meaningful review of code that runs in the context of a high-privileged broker requires significant understanding about the technical details of security.

Even assuming that the intent of a Jetpack is completely benign, running a Jetpack in a chrome context still means that it’s very easy for its code to accidentally crash the user’s browser, break something important, or—perhaps most worrisome—create a new security hole through which untrusted web pages can exploit an end-user’s system. Further exacerbating this potential is the fact that because Jetpack’s development model is designed to make extending Firefox as easy as writing a webpage, more developers with very little knowledge of security fundamentals will be extending Firefox than ever before.

Given the above points, it was determined that the original security plan, taken as a whole, was untenable.

A New Plan

A tenable security model for Jetpack involves the following components:

A mechanism for extensible, securable code reuse. The Mozilla platform is incredibly powerful, and wrapping it in a secure, versioned/backwards-compatible, and elegant API is time-consuming but parallelizable. In the short-term, we need a straightforward way for modules that encapsulate parts of the platform to define their own privilege levels and parameters, as this will allow the community to contribute to the creation of Jetpack’s core API via Jetpack Enhancement Proposals and their reference implementations.

In the long term, however, the ability to easily share and reuse code is the foundation for any healthy development ecosystem. To do this securely, we need to follow the principle of least privilege: for instance, even though my Jetpack Feature may need access to the local filesystem, I want to make sure that the Twitter library I load doesn’t have access to the filesystem, and that it has the ability to contact twitter.com on the user’s behalf even if the rest of my Jetpack doesn’t.

I’m not sure what the best way to do this is. One compelling standard I’ve seen is the ServerJS group’s SecurableModules. Brendan Eich also discussed the notion of object tainting at the Mozilla All-Hands in April 2009, and another potential solution may be an object-capability model like that of Caja (though perhaps SecurableModules are a variant as well, I’m not sure). Yet another key to the solution may involve implementing a new, extensible codebase principal whose capabilities can be defined as a set of key-value pairs by the Jetpack Runtime. I honestly don’t know enough about security or the internals of Spidermonkey/XPConnect to know what the ultimate solution is—but given the volatility of the domain, it’s preferable that it be something that’s malleable from JS chrome code, so that the details of the security model can be rapidly changed without building and distributing new binary components. (Note: we’ve since made a lot more progress on this, which will be explained in part 2.)

A humane user interface for presenting risk and trust information to end-users. This can be done either before the user first installs a Jetpack or when an installed Jetpack tries to do something that requires privilege escalation, but it clearly needs to avoid the pitfalls presented by solutions like Vista’s User Account Control. This is an unsolved problem, and ideally the solution will be something that’s technical as well as social: for instance, imagine a visual that displays both the privileges required by the Jetpack as well as head-shots of the user’s trusted friends who use it. Striking artwork based on the security profile of the Jetpack could be used to make the user pay attention and not simply perceive the presentation as “yet another click-through”. As this is also a volatile domain, it needs to be easy to change; it’s also open enough to iteration by designers that we could expose this UI to Extensions and hold a Design Challenge for it. There’s plenty of room to leverage the community here.

The Immediate Future

Most importantly, we need to define the interface through which Jetpacks and Jetpack modules declare their security requirements and dependencies, so that we can at least mock it out in the Jetpack extension. For the time being, all that’s needed is a mechanism to tell Jetpack authors that they’re doing things “the right way”, so that once the security model is fully implemented, their Jetpack will work without requiring any changes. While a preview page of what the Jetpack risk UI will look like once the security mechanism is implemented will be useful for authors and will also allow us to iterate on the UI, the actual page that will be displayed when users try installing Jetpacks—the one that looks eerily familiar to Ubiquity’s red screen of death—will remain the same, since the Jetpacks are actually insecure until the security model is fully implemented.

Furthermore, for the immediate future, we’d actually like to promote the fact that Jetpacks have chrome privileges, and encourage developers to copy-and-paste snippets from MDC into their Jetpacks and use Components to their heart’s content, because only once we see what they want to make can we know what APIs need to be made to securely wrap them. In fact, we’d ideally like to always make it possible for Jetpacks to be run in a sort of “developer mode” context, so that it’s possible for a developer to create their Jetpack in a setting where security is a concern but not an impediment, and deal with locking-down the Jetpack once they’re done experimenting, possibly even delegating the creation of an appropriate “security manifest” to another person or party.

A little while ago, Vladimir Vukićević wrote an excellent blog post outlining the reasons why he’s not a fan of exposing a specific implementation of SQL to Web Content.

I agree with everything he says in his post; I’ve also been a fan of CouchDB for some time. A CouchDB-like API seems like a nice solution to persistent storage on the Web because so many of its semantics are delegated out to the JavaScript language, which makes it potentially easy to standardize, as well as easy to learn for Web developers. Furthermore, CouchDB’s MapReduce paradigm also naturally takes advantage of multiple processor cores—something that is increasingly common in today’s computing devices.

To explore the possibility, I decided to spend some time prototyping a JavaScript implementation of CouchDB, which I’ve dubbed BrowserCouch. It’s intended to work across all browsers, gracefully upgrading its functionality when support for features like Web Workers and DOM Storage are detected.

Right now this is very much a work-in-progress and there isn’t anything particularly shiny to see; just the test suite and the semi-large data set test. In the future, it’d be great to make CouchDB’s Futon client work entirely using BrowserCouch as its backend instead of a CouchDB server, but that’s a ways away.