Links

Friday, September 19, 2008

fun with WorkerPools

Arrr

I suspect I''e looked at
Google Gears
a half dozen or so times since it was originally released.
Always looked kinda intarstin', but hadn't mightily any use for it.
I took another look when
Chrome
was announced, since, o' all the stuff in Chrome,
the fact that
Gears was baked in
was the most intarstin' bit t' me.

(Bein' a bit curmugdeonly har, as in, the flashy chrome bits don't excite me.
"When I was a kid" my web browser (OS/2's WebExplorer) could only remember 10
bookmarks. The Nintendo DS Browser works for me, in a pinch. Now
GET OFF MY YARD!)

Gears is the most interesting bit in Chrome because it's the programmable bit.

What I decided to look through the other day was the
WorkerPool bits,
since I hadn't really looked into them much before. What I realized
pretty quickly is that they should have actually called these
ActorPools.
You may be familiar with the Actor paradigm via the recent
Erlang hotness.

In a nutshell, the WorkerPool facility provides the following capabilities:

the ability to run a hunk of JavaScript code in a new context

that context does not share code or state with any other JavaScript context,
including the context which 'launched' the code

the only way to communicate with the code is via asynchronous one-way
message sends of arbitrary JavaScript objects (basically, the same sorts of
objects describable in JSON; no functions or non-trivial objects allowed, for instance)

the ability to run that code independently of other contexts. Think
threads, though that's just an implementation detail.

If that's all it was, wouldn't be terribly interesting. You can get aspects
of this type of stuff with traditional JavaScript code, though it's often messy.
Gears provides, at least, a fairly clean way of providing these capabilities.

But here's where it gets interesting. That hunk of code that you've created
a worker for can be loaded from an arbitrary server, referenced via a URL. And then that code
follows the "same origin policy" rules for other Gears APIs available to workers,
for instance, the
Database APIs and
HttpRequest APIs.
In other words, your worker code can access HTTP-resources on the server it came
from, and have a 'protected' Database that it manages that is only visible to
workers that also were loaded from that server.

Very cool, because this means that you can build workers that act as self-contained
service modules to allow access to HTTP resources for other applications to use. None of
the usual cross-site chicanery we've had to deal with. In addition, these modules
can manage their own protected cache of data. Also, such modules can be reused across
multiple web applications, with each one reusing the same code and database store.

This seems like powerful mojo.

Building an RPC mechanism on top of the message send APIs

One of the downers, for most people, with the current WorkerPool APIs, is
going to be the message sending paradigm. It's pretty low-level and raw.
The great thing is that asynchronous message sends are a type of atomic building
block upon which other forms of IPC can easily be built. The
QNX operating system is famously built up on
this core concept,
slightly expanded.

To build your RPC-styled worker, create a JavaScript file that includes the services you want to expose, implemented as plain
old functions, along with a list of the functions you want to expose. Here's an example of
some math services:

At the bottom of this file is some boiler-plate code that deals with the messaging
interface. Basically, messages are received that are serialized versions of the
function invocation: function name, arguments, and an identifier to indicate
which invocation this was (needed to match up return values later). The boiler-plate
code cracks open the message, reflectively calls the function, then sends back the
function result as a message to the original message sender.

On the client end, in your main HTML / JavaScript code, you'll be using the
service like this:

In this code, we instantiate the services with the ggw_services_Service
constructor, passing it the URL of the JavaScript code we want to run as a
worker - this would be the service implementation file described above. The
resulting object will then expose proxies for the exposed functions in the service
in the services field of the object. You call these just like normal
functions. One trick: because the message sends are one-way, and you'll probably
want to get a result back, the first parameter can be a call-back function which
is invoked when the service method returns it's value. In this case, that would
be the print_sum function.

Neat. But crude. To do this right, would require a bit more infrastructure, as well
as making sure you can catch all the sorts of error conditions that can happen. The end
result won't be (shouldn't be) as transparent as the example above, but you can
probably get pretty close.

Notes

Because there is no sharing of code or data in a worker, and anything else,
things like debugging get hard, because you can't access the DOM, you can't access
the document, you can't access alert(). In FireBug, you can see
the message text from Errors, which is useful, especially when you throw
them yourself. But the code source isn't identified, just that the error occurred on
"worker 0" or the like.

Speaking of FireBug, I was able to pretty consistently lock up FireFox
while debugging my example. Due to my browser coding naiveté, don't know
if this was me, FireBug, Gears, or some combination of them. But it got old fast.

It's not clear what the best way is to handle security credentials for HTTP requests
from within workers. My gut tells me that some Gears APIs to manage sensitive
data like passwords and keys would be useful. Storing credentials in a server-specific
database doesn't sound great, but doesn't sound terrible either. But it's not even
clear how a worker would go about prompting a user for credentials, and do it safely.

The current story for worker code is that all the worker code has to be in
a single file. Not great. It would be nice to have an API like loadScript()
or some such that would allow you to add additional JavaScript code to your context.
In lieu of that, you can always XHR GET your additional code, and eval()
it into the context. Icky, but should work.

One of the nice things about the stark bleakness of the context in
which the workers run is that it makes these workers applicable to other
environments. For instance, it's not a huge stretch to consider porting
the Gears APIs to Rhino, allowing
reuse of the workers from within Java. Very cool for Java, where loading
"live code" is typically not something that is done, for various reasons, but
this makes it relatively straight-forward.

In terms of higher-level frameworks for this stuff, I think I'd start looking
at OSGi and it's
Bundle and Service concepts.
I suspect there's a pretty good fit there.
Combined with the previous thought of reusing workers in Java itself, why not design it
around OSGi, so that I could actually design a worker such that it could easily
be spun into OSGi bundle itself, and so directly consumable by OSGi-friendly code without
having to have them deal with "yucky" JavaScript. Having access to JavaScript in Java
isn't always considered a "plus" by Java programmers, but they can be easily fooled.