Projects

Links

The "Spellserver": A Generic Remote-Code-Execution Host

A "Spellserver" is my name for a very generic server design: one in which
the server-side code is provided by the clients (but somehow
constrained by the server owner). I've been exploring ways to achieve
E-like delegation and attenuation-by-code, in which I can write a
program to enforce my restrictions (rather than pre-programming the
server with all conceivable ones). E provides this quite elegantly, but
depends upon a number of unusual platform details, many of which have
not actually been implemented yet. I'm looking for a way to achieve
similar goals but in a familiar language like Javascript, and invoked
via plain HTTP.

Background

What is E? What's so special about it? Which of its properties am I
trying to achieve in something else?

The E programming language was
developed to explore object-capability -based programming patterns. It
has been the testbed for Eventual-Sends, Promises, Promise Pipelining,
Proxies, Facets, Rights-Amplification, Auditors, and other key ideas of
the objcap movement.

"Dr. SES" is an
ongoing project to bring E's concepts to Javascript, making them
accessible to a larger community. Ten years of TC39 standardization work
have lead to ECMAScript being a reasonable target for secure
object-capability -based programming.

E's Abstract Ideas

In E, objects contain references to other objects, which can be used to
invoke methods on those targets (and in fact are the only way to
communicate with them). These method invocations include arguments, and
these arguments can include other object references. Each method can
also return data (or a Promise
to some data), including object references.

Object references can span multiple computers. The
"Vat" abstraction is
a collection of objects which can make "local" (immediate, blocking)
calls to each other. When the target object is in a foreign Vat, any
method calls must use the "eventual send" operator, which returns a
Promise for the result, and guarantees that target will be unable to
affect any execution until the caller's stack frame has been unwound. In
practice, each Vat runs on a single computer.

The abstract Vat is immortal: it never shuts down, and any object
created in one will be persisted forever (or until all references to it
are lost, at which point its continued existence is indistinguishable
from its destruction).

The Vat basically contains a persistent object store (indexed by some
inter-Vat identifier scheme), a standard execution stack, and a queue of
inbound messages (either generated from remote Vats or from local
eventual-send operations). Each immediate method call pushes a new frame
on the stack, and yields a return value when it finishes. Each
eventual-send pushes a new message on a queue (either local or in some
foreign Vat). Eventual-sends provide a Promise to the caller, and
include a reference to the corresponding Resolver object as a special
argument in the remote message. The Vat knows how to take the return
value from a message invocation and deliver it to the Resolver (as a new
message).

Realizing E For Real

E is a
fully-functional programming environment,
implemented as a Java-based interpreter, but for various reasons it
hasn't caught on to the extent that Python, Node.js, Go, or Rust have.
Some of E's concepts have been brought into the Javascript world, but
many are not yet available outside of a real E interpreter. Porting
these ideas to a different environment would be pretty useful.

E's programming model imposes a number of requirements on any
implementation. The basic ones are the network protocols that define how
Vats on different computers should talk to each other. E includes
VatTP and
CapTP for this
purpose, but they're somewhat dated (they use Java serialization, MD5,
SHA1, DSA, and RSA).

These protocols are connection-oriented: when a TLS/TCP connection is
severed, object references that cross that connection are "broken", and
both sides can react to the break (by restarting from some persistent
identifier, and rebuilding everything that was lost). The persistent
identifiers are called
SturdyRefs,
and are basically an unguessable string, plus enough crypto bits to
identify and reach a specific TLS server.

This leads applications to distinguish between long-lived
externally-referenceable objects (which must live forever, because
there's no good way to learn when the secret string that grants access
has been forgotten by all callers), and ephemeral objects (which can be
called remotely, but the only way to reach them is via messages to one
of the long-lived objects). These long-lived objects are clear
candidates for persistence.

The Waterken system shares much heritage with E,
but provides a non-connection-oriented approach which imposes other
requirements: try-forever message delivery, and no ephemeral objects
(everything gets a persistent reference string).

The most significant constraint of E's model is orthogonal object
persistence. Vats pretend to hold all objects in RAM, forever, on a
durable and immortal computer, which nominally means everything gets
checkpointed after each message is processed. Application code doesn't
need to specify what data is persistent and what is ephemeral: in fact
objects are generally unaware of being serialized. At each checkpoint,
the Vat walks the table that maps external SturdyRefs to local objects,
then chases down everything which those objects reference, transitively.
The platform must know how to serialize each object, including cycles,
platform objects like file handles, and immutable containers that can
only be reconstructed in a specific order. E can serialize the code that
implements object methods, closures, and whatever other objects were
captured from their parent scope. This is important, since much of E's
encapsulation is based on lexically-scoped closures which capture
(otherwise-unreachable) private state variables.

If we assume that the Vat server stays up most of the time, but is
occasionally restarted, then we might persist only the long-lived
externally-referenceable objects. Ephemeral objects work normally until
the server is stopped, at which point all connections are dropped
anyways. Restarting the Vat is equivalent to breaking all external
connections, throwing out all objects that aren't reachable by one of
the persistent roots, then waiting for external callers to reconnect to
a root.

Something Like E, But Different

I want to use something like E, but with a familiar language
(Javascript, specifically Dr. SES), and which can run on a regular
non-immortal computer. I'd like an HTTP-based non-connection-oriented
transport, in which messages can be stored and forwarded from one node
to another safely (which implies signatures and encryption, rather than
transport-based security).

I don't know how to make Javascript do orthogonal persistence, and I'm
not sure it's a good idea anyways (in my experience with Twisted's old
"TAP" files, it was awfully easy to accidentally reference
non-serializable objects, or very large objects, and saving raw object
state interacts badly with attempts to upgrade program code). So I'm
interested in something that does non-orthogonal serialization, where
all persistent state lives in a single "Memory" object, and application
code calls set() and get() to access it.

Offline Uninvolved Attenuated Delegation

The most interesting features of object-capability systems are
delegation and attenuation, wherein new kinds of authorities can
be created by wrapping and combining existing ones. Security analysis of
a system is much easier when you know an upper bound on the power
available to any given component, and object references make this pretty
easy to reason about.

When Alice holds a reference to some object on Bob's server, she holds
the authority to invoke that object. She can delegate that authority to
Carol by sharing her object reference (which might just be a secret
string). But the more interesting case is where Alice wants to delegate
a subset of that authority. The standard ocap way to attenuate
authority is to create a wrapper object: the wrapper can apply
restrictions on the method being called, or on the arguments passed, or
it can simply throw an exception instead of proxying the message.

(I'm being pretty casual about whether "Alice" means the human who runs
the computer on which some object lives, or the anthropomorphized object
itself.. hopefully this won't cause confusion).

E lets you create these wrappers easily, but by default they live on the
local server. In our example, this would result in Carol holding a
reference to some wrapper object on Alice's computer, which must be
involved whenever Carol wants to invoke that object (this impacts
availability, latency, and overall bandwidth cost, as the messages must
traverse both the Carol->Alice and Alice->Bob links, instead of just the
Carol->Bob link).

To remove Alice's involvement, the wrapper must be created on Bob's Vat:
the same Vat as the target object. Creating it on Carol's Vat would be
obviously insecure, since she could bypass it any time she wanted. Bob
already has that power (since he owns the computer where the target
lives), so we don't lose any security by asking Bob to host the wrapper
too. To build it there, either Bob's object must offer an explicit
eval() method, or E's proposed where() feature must be
implemented (a generalized tool for creating an object on the same Vat
as some initial target).

In addition, these E wrappers/proxies must be constructed on Bob
first, so Alice can acquire a reference to them, which she can later
deliver to Carol. Bob must be online to perform attenuated delegation.

I'd like this delegation to be "offline", meaning that Alice can perform
this attenutation without talking to anyone else (Bob in particular),
I'd also like Alice to be uninvolved with the eventual exercise of this
delegated authority. And I'd like the wrapper to not consume persistent
space on Bob's computer: until they are exercised, only Alice should pay
the cost of creating and maintaining the wrapper object.

What Is A Server, Anyways?

At a high level of abstraction, all servers are equal. In a zero-latency
infinite-bandwidth universally-trusted Internet (presumably managed by
frictionless uniform-density spherical cows, favorites of physics
students everywhere), it doesn't matter where any object lives. Remote
object references are every bit as good as local ones (eventual-send vs
immediate-call being the only difference). Messages are executed
instantly, regardless of argument size or the complexity of their code.

Once we acknowledge that the speed of light is finite, we start to care
about latency. We'll want to move objects closer to the data they need,
and we discover some tradeoffs. Caching/mirroring moves the data to the
code, which helps if the data is small and the code is large. SQL-style
stored procedures and Promise
pipelining moves the
code to the data, which helps in the opposite situation.

The next stage of descent from the abstractosphere reveals trust issues,
where we don't want to grant too much authority to the server that hosts
a given object. Any reference we give to an object on Alice's Vat will
be available to Alice herself, which we might not like. The subsequent
uncomfortable realization regards confidentiality: any information we
reveal to Alice's objects are also visible to Alice. This prompts the
development of tools which keep certain objects inside a security
perimeter: a set of approved Vats.

But the core proposition remains: if you like the server, and you can
move code to data or vice versa, then all servers are equal. Today's
question is how you get the code there in the first place: how to turn a
bare slab of CPU and storage into a server that's useful for your
particular purpose.

The traditional non-Vat-ish server (e.g. a LAMP stack) is highly
pre-configured. You split most web applications into a client half and a
server half, and write a bunch of code that defines how the server
operates. This server code might be bundled into a container image,
which bottoms out in Linux syscalls and a kernel TCP stack. The admin
must provision and launch a server first, then they can tell clients
where to find it (as well as granting access credentials).

The Vat-ish approach is for the server admin to evaluate a big chunk of
text, which produces a persistent object and a SturdyRef.

The direction I want to explore is this: why should we need to
pre-configure the server? Since most API calls boil down to an object
identifier, a method name, and some serialized arguments, would it be
possible to build the server on the fly in response to (roughly) the
same amount of inbound message data?

On-Demand Server Creation

Imagine a very generic server, with just the following API at the root
URL path:

@post("/")
def post(body):
f = eval(body)
return f()

(my pseudocode uses a Python/Flask-like syntax, but of course we'd use a
properly-confined language, where eval() doesn't grant the
client-supplied code access to anything else: it can't import a library
to write to the filesystem, or make additional network calls. It can
think furiously, but has neither voice nor hands.)

This isn't very useful yet. Perhaps if our server has a bigger CPU than
the client, this would let them offload some intensive computation. But
each request is isolated and independent.

Now that the client-supplied code can use a basic dictionary/map, they
could use it to build a basic object store. One call can leave data for
a subsequent one. Entire
companies revolve around
providing tools like this. The set-side code could implement clever
indexing tables, allowing the get-side code to locate data faster.
Arbitrary search criteria could be used. This could provide a database
with server-side map/reduce functions. Clients look like this:

Does it feel wrong to supply the entire server program each time? Let's
cache it. First, let's just cache the text of the program. We augment
the protocol to make the client normally deliver just the hash, but if
the server doesn't recognize that hash, the client makes a second
request (to a different endpoint) that includes the complete program. If
the client believes that the server doesn't already have a copy, it will
premptively fill the cache before sending the invocation.

Next, let's cache the compiled/evaluated program too. E defines the
notion of Auditors
which let one object ask questions about the code inside another object.
The DeepFrozen auditor asserts that a given object has no references
to any mutable state. Any mutability must be passed into the object on
each call. This makes it safe to cache the callable object.

As Clarke's maxim says, sufficiently advanced technology is
indistinguishable from magic. These programs define an intricate
process, and carry great power when invoked. The hashes are like magic
words which uniquely identify a program, so let's call them a spell.

In the subsequent examples, I'll omit the details of this cache, but
remain confident that it can work, and with enough engineering, we won't
have to re-deliver the whole program every time.

Security of Spellcasting

Obviously, our little server is completely unprotected: anyone on the
internet can guess the IP address and port number, and supply programs
(aka "cast spells"). And callers are not isolated from each other. Every
POST could do anything it wants to the stored data: steal secrets,
corrupt state, or just delete it entirely.

All security on the internet is based upon knowledge of secrets, and
upon various ways of proving that knowledge to others. Bearer
credentials (e.g. API tokens) sound simple, but they require several
things: a TLS stack, a live TCP connection, trust in some particular set
of CA roots, and the discipline to only reveal the token to the right
destination (each secret is scoped to a particular recipient's identity).

Digital signatures are more generic. If we put aside confidentiality for
a moment, signed request messages don't require any notion of server
identity, and they can be forwarded along arbitrary paths before
eventual delivery (making them usable in asynchronous environments, or
between machines with intermittent network connectivity). Using them
safely requires something to prevent replay attacks, but bearer
credentials did too, it's just easy to forget because the obligatory TLS
connection generally did that for free.

Signatures involve four things: a Signer (which holds the private key),
a Verifier (which holds the public key), the Message, and the Signed
Message. Apart from key generation, there are only two operations: the
Signer can turn a Message into a Signed Message, and the Verifier can
turn a Signed Message into a Message (or throw an exception if the
signature was invalid). I learned this API from
djb's Ed25519 library, and I like it because
it discourages the use of unverified data: the only way to get the
Message is to go through the Verifier first. There are situations in
which detached signatures make sense, but usually as some sort of
optimization.

For our server, the simplest way to limit access to specific clients is
to generate a keypair, and embed the public verifier key in the server
(either we generate the private signing key and give it to the client,
or we have the client generate the signing key and tell us the
verifier). Only clients that know the corresponding private key will be
able to make calls, and they'll express this knowledge by signing each
request.

Multiple Clients

What if we want to have two clients, who use separate resources? Alice
has been given free reign on the server, and she'd like to hand out
"accounts" to her friends Bob and Bert (who have distinct keys). The
simplest way would be to give them isolated dictionaries, and use their
signing key to distinguish their requests:

Inflexible: Adding more clients must be done by modifying the server's
code, which can only be done by the machine's owner.

Isolated: Clients cannot interact with each other, even if they wanted
to.

Chained Programs

So how about we have the server accept two programs on each request?
The first program is controlled by the owner (Alice), and is signed by
her. The second comes from the client (signed by Bob or Bert), and gets
all its power from the first. Alice's program gets to do two things:

decide which key is allowed to sign the second program

create an attenuated power object (a wrapper around memory) to
be passed into that second program

We define the first "spell component" to be the signed serialization
of an "activation record": a 2-tuple of an attenuation program and a
verifying key. The key will be used to check the signature of the next
component. We don't currently need a verifying key in the second
activation record because there's no third program (yet!).

When Alice wants to create an account for Bob, she gets a verifying key
from Bob, and build a program to attenuate memory down into the
Bob-specific subset. She concatenates it with the key we'll use to
recognize Bob's program, and signs the result:

The server will run limit_to_bob, create a power object (to
Alice's original specifications) just for Bob, then pass it to Bob's
function (which stores some data). It runs store_data_at_index if
and only if it was signed with the key that was delivered next to
limit_to_bob.

When Bob wants to retrieve that data, he creates a different program:

def get_index(power):
return power["index"]

and signs and POSTs it as before, using PREFIX_FOR_BOB again.

When Alice wants to create an account for Bert, she builds a
limit_to_bert() that looks just like limit_to_bob except that it
requires a different public verifying key, and returns a power that
holds a different subset of the memory object:

def limit_to_bert(memory):
power = memory["bert"]
return power

Bert gets this new prefix, and uses it for all his requests.

Suppose that Alice owns the execution host. She can write attenuator
programs Alice1, Alice2, Alice3, etc, all of which will be accepted and
executed by the host, with full power. With each program, she can attach
a single identifier that enables a client like Bob to execute a second
program with less authority. Bob may have several such programs, all of
which are signed (or otherwise protected) by the same value. So if
Alice1 is tied to Bob's key, then Alice1.Bob1 and Alice1.Bob2 are valid
sub-programs. Alice2 might be tied to Bert's key, enabling Alice2.Bert1
and Alice2.Bert2. But "Alice1.Bert1" would be rejected, because the
Alice1 program doesn't authorize Bert's key.

Delegation

Suppose Bob stores a number in his subset of memory, and wants to let
Carol increment this value (but never decrement it). If we add a third
level of programs, then Bob can get a verifying key from Carol, and give
her a second-level prefix which exposes an increment() function:

Spells are Arbitrary-Length Lists of Attenuation Programs

We can generalize this, so the server accepts an entire list of signed
programs. It lets each one decide whether it likes the
following one, executes it to create an attenuated subset of the
passed-in power, then gives that subset to the next function. The last
function is the only one which actually does anything.

To be specific, a "spell" is a list of signed activation records (the
"spell components"). Each record identifies both a program and the
verifying key for the next component.

All non-leaf programs are run with a power argument, however we
require that none of them actually invoke it yet: they may only build
wrappers around it. No messages are sent to any power objects until
the final leaf function is executed. We might enforce this with an
Auditor of some sort, or by wrapping each power object in a proxy that
rejects invocation attempts until they are all switched on, just before
the final leaf runs.

All programs are DeepFrozen, the only mutability comes from the power
argument, and nothing invokes it during construction of the tree, so
evaluating these programs cannot make any changes to mutable state. The
lack of side-effects makes the results safe to cache, which is a big
win.

We'll build a tree cache: a tree in which each path from the root is a
list of components. At each node, we'll cache two things. The first is
compiled program (the power value) for that path, which will
encapsulate a reference to the tree-parent's cached power object
(which itself references its own parent, and so on, up to the root,
which is just the original Power object, without any code). These
power objects get weaker as we get further from the root, since
there are more attenuators in the way. The other value at each node is
the verifier, which is used to recognize programs that are allowed to
use the power object.

Extensions

So that's the basic idea. There are a lot of implementation details that
need to be worked out, so stay tuned for more posts on this subject. In
particular, there are several properties that we might want to add:

Promises: The wrappers should at least be able to use Promises
internally, returning a Promise for the result. It'd even be nice to
let these Promises span multiple messages, although that probably
means we need a way to store the Resolvers in the persistent storage.

Confidentiality: In many contexts, we want privacy for the code we're
sending and the results we get back. We'd like to encrypt the programs
before sending them, in addition to signing them.

Caching and API Compression: We don't really want to send every
program in the chain, for every call. This can be compressed this by
sending hashes instead of programs (and only supplying the programs
when necessary). We can compress it further by referring to entire
cached chains with a single truncated hash.

Evaluation Approaches: There are other possible ways to invoke parent
and child programs, giving one the results of the other, which might
be more natural for certain delegation patterns.

More Power: Memory is the simplest thing we might offer each program,
but there are other interesting powers to choose from. The first would
be the ability to send messages to other servers, presented as a basic
HTTP client, or using some other protocol. We might also allow
programs access to a clock, or a random-number generator, enabling
non-deterministic behavior. We can use these to prevent replay
attacks, or to expire a delegated authority after some amount of time.

Rights-Amplification: The chains we've seen so far are purely
attenuating: each child program can do no more than its parent.
"Rights Amplification" is about combining two program chains
together to do more than either one could separately. This will
involve a program taking multiple power arguments, each of which
derives from a full chain.

Creating power in response to the brand: In our current scheme, for
Carol to receive power, Bob must create a prefix for her (which
references her key). We could instead store information about Carol in
a table in the memory, and Bob could write a generic function that
would accept messages from any client listed in this table. Bob's
function would get to examine the "Brand" of Carol's message, and
would attenuate the power differently depending upon what it sees.
When the spells are confidential, Bob's program would also be
responsible for returning an Unsealer for the client program.