Below we see another bit of <i><b><font color="#009000">E</font></b></i> syntax.

+

In the pattern-match expression, there is a subexpression on the left of the &quot;<tt>=~</tt>&quot; operator, like &quot;<tt>obj</tt>&quot; below, and a sub-pattern on the right.

+

The subexpression is evaluated to a specimen, and the pattern is asked to try matching the specimen.

+

If it succeeds, the pattern-match expression returns <tt>true</tt>, and any bindings defined by the match are available in the successor scope -- here, the body of the <tt class="keyword">if</tt>'s then-part.

+

When this pattern is a variable declaration, like &quot;<span class="defvar"><tt>i</tt></span><tt> :int</tt>&quot;, the pattern matches if the specimen is compatible with the declared type (i.e., is successfully coerced by the guard).

+

This gives us, in effect, a type-case.

+

This last test below is passed by &quot;bare twine&quot;, which for present purposes just means &quot;String&quot;.

+

These are all the types that can be represented literally in <i><b><font color="#009000">E</font></b></i> and in Data-E.

It matches a specimen list of the same length if and only if each subpattern matches the corresponding element of the specimen list.

+

An uncaller should respond to <tt>.optUncall(obj)</tt> with either null or a list of three elements, so the following tests that the resulting specimen wasn't null, and if it wasn't, binds these three elements to variables named <tt class="defvar">rec</tt>, <tt class="defvar">verb</tt>, and <tt class="defvar">args</tt> [ref destructuring-bind].

+

More on the meaning of this uncall-triple <a href="#uncalling">below</a>.

The call <tt>map.fetch(key,func)</tt> returns the value associated with <tt>key</tt> if one is found, or <tt>func()</tt> otherwise.

+

The expression <tt>thunk{}</tt> evaluates to a no argument function that return <tt>null</tt>.

+

Since the values of the unscope are Strings or ints, we can use <tt>null</tt> to detect whether <tt>obj</tt> was found.

+

|}

+

<code><pre>

+

if (unscope.fetch(obj, thunk{}) =~ varID :notNull) {

+

return genVarUse(varID)

+

}

+

def promIndex := builder.buildPromise()

+

unscope[obj] := promIndex

+

def rValue := genObject(obj)

+

builder.buildDefrec(promIndex+1, rValue)

+

}

+

+

builder.buildRoot(generate(root))

+

}

+

}

+

}

+

</pre></code>

+

During traversal, for every reference a subgraph recognizer already associates with a variable, whether from the original <tt>unscopeMap</tt> argument or because it has already been traversed, it builds a reference to that variable.

+

Otherwise, it first builds a new pair of temporary variables for a promise and its resolver, and associates the promise variable as naming the new reference.

+

In that context, it then builds code to generate a reconstruction of that reference.

+

Finally, using <tt>defrec</tt> it builds code to resolve the previously generated promise to the reconstructed value.

The triple returned by an uncaller is very similar in structure and purpose to [http://java.sun.com/j2se/1.4.1/docs/api/java/beans/Statement.html java.beans.Statement] and its role in the serialization performed by the [http://java.sun.com/j2se/1.4.1/docs/api/java/beans/XMLEncoder.html XMLEncoder].

+

To be explained in [http://www.erights.org/data/serial/jhu-paper/related.html Related Work].

+

+

Should the <tt>uncallerList</tt> ever need to become long, efficiency would demand a lookup scheme other than linear search, such as the type-based dispatch of [http://java.sun.com/j2se/1.4.1/docs/api/java/beans/PersistenceDelegate.html PersistenceDelegate], to determine which uncallers are applicable.

+

We assume here only that any optimization is equivalent to linear search in resolving which uncaller to use when several are applicable.

+

|}

+

Once again, most of the code above is plumbing, to hook references up correctly.

+

The actual traversal step where objects are &quot;taken apart&quot; -- the inverse of the builder's <tt>E.call(..)</tt> step -- is the underlined call to each <tt>uncaller</tt>.

+

Each <tt>uncaller</tt> returns either null, indicating a failure to portray the object, or a triple corresponding to the three arguments to <tt>E.call(..)</tt> -- a receiver, a verb (message name), and a list of arguments.

+

Such a triple portrays the object for purpose of reconstruction.

+

It says that a reconstruction of the object would be an <tt>E.call(..)</tt> performed in the reconstructing context using (a reconstruction of) the receiver, the verb, and (reconstructions of) the arguments.

+

The <tt>uncallerList</tt> functions as a search path -- each uncaller is tried until one succeeds or the list is exhausted.

+

If none succeed, then the recognition as a whole is terminated with a thrown exception.

+

+

The default <tt>uncallerList</tt> consists of the <tt>minimalUncaller</tt> shown below and the <tt>import__uriGetter</tt>:

+

<code><pre>[minimalUncaller, &lt;import&gt;]</pre></code>

+

+

The <tt>minimalUncaller</tt> simply asks an object to provide its own portrayal.

+

Our [http://www.erights.org/data/serial/jhu-paper/deconstructing.html#genCounter earlier] <tt>generationCounter</tt> is an example of an object that overrides <tt>__optUncall()</tt> to provide its own self portrait.

+

We say that such an object is <i>transparent</i> -- it provides this portrayal to any of its clients.

+

The <tt>minimalUncaller</tt> can only portray transparent objects.

+

<code><pre>

+

def minimalUncaller implements Uncaller {

+

to optUncall(obj) :nullOk[__Portrayal] {

+

if (Ref.isNear(obj)) {

+

obj.__optUncall()

+

} else # ... we can ignore the non-Near cases for now

+

}

+

}

+

</pre></code>

+

Other uncallers are for portraying non-transparent objects.

+

Some, such as the <tt>import__uriGetter</tt>, are a special category of uncaller called a Loader.

+

These also have a <tt>.get(String)</tt> method that acts as the inverse of their <tt>.optUncall(..)</tt> method.

+

For example, since [http://www.erights.org/javadoc/java/lang/StringBuffer.html <tt>StringBuffer</tt>] is a [http://www.erights.org/elib/legacy/api-legend.html safe class], it can be imported using the <tt>import__uriGetter</tt>:

+

<code><pre>

+

? pragma.syntax(&quot;0.8&quot;)

+

+

? def makeStringBuffer := &lt;import:java.lang.makeStringBuffer&gt;

+

# value: &lt;makeStringBuffer&gt;

+

</pre></code>

+

As explained [http://www.erights.org/data/serial/jhu-paper/deconstructing.html#uri-exprs earlier], the above code uses the URI-expression.

<i>In order to reconstruct <tt>makeStringBuffer</tt>, send the &quot;<tt>.get</tt>&quot; message to me, the <tt>import__uriGetter</tt>, with the string <tt>&quot;<span class="litchars">java.lang.makeStringBuffer</span>&quot;</tt> as argument.</i>

+

</blockquote>

+

Loaders will normally follow this pattern, varying only the contents of the string argument.

+

+

Putting all this together, and remembering that <tt>deSrcKit</tt> will depict using the URI-expression shorthand when it can, we have

+

<code><pre>

+

? def makeSurgeon := &lt;elib:serial.makeSurgeon&gt;

+

? def surgeon := makeSurgeon.withSrcKit(&quot;de: &quot;)

+

+

? surgeon.serialize(makeStringBuffer)

+

# value: &quot;de: &lt;import:java.lang.makeStringBuffer&gt;&quot;

+

</pre></code>

+

Note that the <tt>makeStringBuffer</tt> reconstructed by these means isn't necessarily equivalent to the original.

+

Rather, <tt>import__uriGetter</tt> embodies the policy choice that the reconstruction should be whatever object is importable <i>by the same name</i> in the reconstruction context.

+

If this context represents a different version of the system, in which the object imported by this name acts differently, this policy choice would have us live with the consequences, including the possible failure to reconstruct.

+

This is often the right engineering decision, and corresponds closely to the decisions built into other serialization systems, such as JOSS's handling of classes [ref Shapiro].

+

+

We now have all the basic ingredients needed to explain and address the security issues raised by serialization.

+

+

= Corresponding Concepts in Conventional Serialization =

+

In our terminology, like Data-E, JOSS also solicits from each object not its depiction, but its portrayal in terms of other objects.

+

Mallet can only claim to have a reference to Alice by producing a reference to Alice, which he can only do if he actually has such a reference.

+

If an object simply implements <tt>Serializable</tt> and does nothing else, then its internal implementation doubles as its self-portrait.

+

However, an object can offer a <i>nominated replacement</i> -- another object to be serialized in its stead, whose portrayal thereby serves as the original object's self-portrait.

+

The serializer may use the nominated replacement.

+

Or it may appoint its own replacement, by overriding the [http://www.erights.org/javadoc/java/io/ObjectOutputStream.html#replaceObject(java.lang.Object) <tt>replaceObject(..)</tt>] method of <tt>ObjectOutputStream</tt>, just as our serializer can appoint its own portrayal by adding an uncaller to the uncaller list.

+

The resulting depiction is a literal picture only of the graph of appointed replacements.

+

JOSS provides similar flexibility during unserialization, with objects offering a <i>nominated resolution</i> to take their place in the unserialized graph, with the unserializer potentially substituting an <i>appointed resolution</i>.

+

Given cyclic graphs and the non-redirectability of Java references, this <i>cannot</i> be implemented correctly in Java.

+

Using promises, we can easily implement the equivalent correctly in <i><b><font color="#009000">E</font></b></i> for Data-E (and likewise in any other object-capability language with delayed references), but we haven't yet needed this flexibility during unserialization.

+

----

-

This is part wikified from the original [http://www.erights.org/data/serial/jhu-paper/recog-n-build.html Part 2: "Reversing" Evaluation]

+

This is wikified from [http://www.erights.org/data/serial/jhu-paper/recog-n-build.html Part 2: "Reversing" Evaluation]

Latest revision as of 02:39, 30 January 2008

As we've seen, we make serializers, unserializers, and other transformers
like expression simplifiers by composing a recognizer with a builder.
The interface between the two is the DEBuilder API, explained in Appendix A: The Data-E Manual.
Since most of the API is a straightforward reflection of the Data-E grammar productions, if you wish, you may safely skip these details and proceed here by example.

Contents

Evaluating Data-E

The semantics of Data-E are defined by the semantics of its evaluation as an E program.
We could unserialize using the full E evaluator.
However, this is inefficient both as an implementation and as an explanation.
Instead, here is the Data-E evaluator as a builder, implementing exactly this subset of E's semantics.

pragma.syntax("0.8")
def deSubgraphKit {
to makeBuilder(scope) :near {
# The index of the next temp variable
var nextTemp := 0
# The frame of temp variables
def temps := [].diverge()
# The type returned by "internal" productions and passed as arguments to represent
# built subtrees.
def Node := any
# The type returned by the builder as a whole.
def Root := any
# DEBuilderOf is a parameterized type constructor.
def deSubgraphBuilder implements DEBuilderOf(Node, Root) {
to getNodeType() :near { Node }
to getRootType() :near { Root }
/** Called at the end with the reconstructed root to obtain the value to return. */
to buildRoot(root :Node) :Root { root }
/** A literal evaluates to its value. */
to buildLiteral(value) :Node { value }
/** A free variable's name is looked up in the scope. */
to buildImport(varName :String) :Node { scope[varName] }
/** A temporary variable's index is looked up in the temps frame. */
to buildIbid(tempIndex :int) :Node { temps[tempIndex] }
/** Perform the described call. */
to buildCall(rec :Node, verb :String, args :Node[]) :Node {
# E.call(..) is E's reflective invocation construct. For example, E.call(2, "add", [3])
# performs the same call as 2.add(3).
<u>E.call(rec, verb, args)</u>
}
/**
* Called prior to building the right-hand side of a defexpr, to allocate and bind the
* next two temp variables to a promise and its resolver.
*
* @return the index of the temp holding the promise. The temp holding the
* resolver is understood to be this plus one.
*/
to buildPromise() :int {
def promIndex := nextTemp
nextTemp += 2
def [prom,res] := Ref.promise()
temps[promIndex] := prom
temps[promIndex+1] := res
promIndex
}
/**
* Called once the right-hand side of a defexpr is built, use the resolver to resolve
* the value of the promise.
*
* @return the value of the right-hand side.
*/
to buildDefrec(resIndex :int, rValue :Node) :Node {
temps[resIndex].resolve(rValue)
rValue
}
# ... buildDefine is an optimization of buildDefrec for known non-cyclic cases.
}
}
# ... other useful tools
}

As we see, the E.call(..) underlined above is where all the object construction is done.
All the rest is plumbing to hook the up the references among these objects.

The only extra parameter to the above code, in addition to those specified by the DEBuilder API, is the scope parameter to makeBuilder(..).
Typically, we will express unserialization-time policy choices using only this hook.
With a bit of pre-planning at serialization time, this can be a surprisingly powerful hook, and will often prove adequate.

Unevaluating to Data-E

Because the keys of a unscope table may be arbitrary values, including unresolved promises, it needs to be the special kind of map called a CycleBreaker.
For present purposes, we can ignore this issue.

We are now ready for the heart of serialization -- the Data-E subgraph recognizer.
It has two parameters for expressing policy -- the uncallerList and the unscopeMap.

Since we are evaling "in reverse", we need the inverse of a scope, which we call an unscope.
An unscope maps from arbitrary values to a description of the "variable name" presumed to hold that reference.
In the unscope table passed in as unscopeMap, each description is a normal variable name string, as would be used to look the value up in a scope.
On each recognize(..), the ".diverge()" makes a private copy of the unscopeMap we put in the variable unscope, which we use from there.
This private unscope table gets additional mappings from values to integers representing temporary variable indices.

The uncallerList is used to obtain a portrayal of each object, as we explain below.

Below we see another bit of E syntax.
In the pattern-match expression, there is a subexpression on the left of the "=~" operator, like "obj" below, and a sub-pattern on the right.
The subexpression is evaluated to a specimen, and the pattern is asked to try matching the specimen.
If it succeeds, the pattern-match expression returns true, and any bindings defined by the match are available in the successor scope -- here, the body of the if's then-part.
When this pattern is a variable declaration, like "i :int", the pattern matches if the specimen is compatible with the declared type (i.e., is successfully coerced by the guard).
This gives us, in effect, a type-case.
This last test below is passed by "bare twine", which for present purposes just means "String".
These are all the types that can be represented literally in E and in Data-E.

To the right of the "=~" below is a list pattern.
A list pattern is written as a list of subpatterns.
It matches a specimen list of the same length if and only if each subpattern matches the corresponding element of the specimen list.
An uncaller should respond to .optUncall(obj) with either null or a list of three elements, so the following tests that the resulting specimen wasn't null, and if it wasn't, binds these three elements to variables named rec, verb, and args [ref destructuring-bind].
More on the meaning of this uncall-triple <a href="#uncalling">below</a>.

The ":notNull" declaration below accepts any value except null.
The call map.fetch(key,func) returns the value associated with key if one is found, or func() otherwise.
The expression thunk{} evaluates to a no argument function that return null.
Since the values of the unscope are Strings or ints, we can use null to detect whether obj was found.

During traversal, for every reference a subgraph recognizer already associates with a variable, whether from the original unscopeMap argument or because it has already been traversed, it builds a reference to that variable.
Otherwise, it first builds a new pair of temporary variables for a promise and its resolver, and associates the promise variable as naming the new reference.
In that context, it then builds code to generate a reconstruction of that reference.
Finally, using defrec it builds code to resolve the previously generated promise to the reconstructed value.

Traversal as Uncalling

Should the uncallerList ever need to become long, efficiency would demand a lookup scheme other than linear search, such as the type-based dispatch of PersistenceDelegate, to determine which uncallers are applicable.
We assume here only that any optimization is equivalent to linear search in resolving which uncaller to use when several are applicable.

Once again, most of the code above is plumbing, to hook references up correctly.
The actual traversal step where objects are "taken apart" -- the inverse of the builder's E.call(..) step -- is the underlined call to each uncaller.
Each uncaller returns either null, indicating a failure to portray the object, or a triple corresponding to the three arguments to E.call(..) -- a receiver, a verb (message name), and a list of arguments.
Such a triple portrays the object for purpose of reconstruction.
It says that a reconstruction of the object would be an E.call(..) performed in the reconstructing context using (a reconstruction of) the receiver, the verb, and (reconstructions of) the arguments.
The uncallerList functions as a search path -- each uncaller is tried until one succeeds or the list is exhausted.
If none succeed, then the recognition as a whole is terminated with a thrown exception.

The default uncallerList consists of the minimalUncaller shown below and the import__uriGetter:

[minimalUncaller, <import>]

The minimalUncaller simply asks an object to provide its own portrayal.
Our earliergenerationCounter is an example of an object that overrides __optUncall() to provide its own self portrait.
We say that such an object is transparent -- it provides this portrayal to any of its clients.
The minimalUncaller can only portray transparent objects.

Other uncallers are for portraying non-transparent objects.
Some, such as the import__uriGetter, are a special category of uncaller called a Loader.
These also have a .get(String) method that acts as the inverse of their .optUncall(..) method.
For example, since StringBuffer is a safe class, it can be imported using the import__uriGetter:

The resulting object is a maker -- its protocol consists of (the enabled subset of) the public constructors and static methods of the class StringBuffer.
That's why we name it makeStringBuffer -- it acts mostly as a function for making StringBuffers.

Note that the makeStringBuffer reconstructed by these means isn't necessarily equivalent to the original.
Rather, import__uriGetter embodies the policy choice that the reconstruction should be whatever object is importable by the same name in the reconstruction context.
If this context represents a different version of the system, in which the object imported by this name acts differently, this policy choice would have us live with the consequences, including the possible failure to reconstruct.
This is often the right engineering decision, and corresponds closely to the decisions built into other serialization systems, such as JOSS's handling of classes [ref Shapiro].

We now have all the basic ingredients needed to explain and address the security issues raised by serialization.

Corresponding Concepts in Conventional Serialization

In our terminology, like Data-E, JOSS also solicits from each object not its depiction, but its portrayal in terms of other objects.
Mallet can only claim to have a reference to Alice by producing a reference to Alice, which he can only do if he actually has such a reference.
If an object simply implements Serializable and does nothing else, then its internal implementation doubles as its self-portrait.
However, an object can offer a nominated replacement -- another object to be serialized in its stead, whose portrayal thereby serves as the original object's self-portrait.
The serializer may use the nominated replacement.
Or it may appoint its own replacement, by overriding the replaceObject(..) method of ObjectOutputStream, just as our serializer can appoint its own portrayal by adding an uncaller to the uncaller list.
The resulting depiction is a literal picture only of the graph of appointed replacements.

JOSS provides similar flexibility during unserialization, with objects offering a nominated resolution to take their place in the unserialized graph, with the unserializer potentially substituting an appointed resolution.
Given cyclic graphs and the non-redirectability of Java references, this cannot be implemented correctly in Java.
Using promises, we can easily implement the equivalent correctly in E for Data-E (and likewise in any other object-capability language with delayed references), but we haven't yet needed this flexibility during unserialization.