This proposal is modifies the Transform to address the following
requirements:
Requirements
------------
1) Check what is signed:
Looking at a signature, it should be possible to find out what was
signed. This is one of the best practices for verification. A receiver
must not blindly verify a signature without at first checking if what
was supposed to have been included in the signature is really signed.
2) Support Streaming
Currently many transforms are based on nodesets, and nodesets imply DOM,
and DOM requires the whole document to be loaded in memory which is bad
for performance
Change 1: Distinguish between selection and canonicalization transforms
-----------------------------------------------------------------------
To support the "check what is signed" requirement, we need to
distinguish between transforms that select data to be signed, and
transforms that convert that data to bytes.
Selection Transforms: XPath Filter, XPath Filter 2.0, Enveloped
Signature, Decrypt Transform
Canonicalization Transforms: C14n, exc-C14N, base 64
XSLT transform can be used for anything, e.g. there could be a XSLT
transform to remove white spaces, then this particular XSLT transform
would fall in the canonicalization bucket.
The WS-Security STR Transform does both Selection and Canonicalization.
WS Security SWA attachment transforms do selection.
Change 2 : Limit transformation sequence to selection first,
canonicalization second
------------------------------------------------------------------------------------
Currently there is no limitation on the ordering of transforms, so
somebody could create a signature with
c14n, xpath
According to the processing rules, this means that reference URI is
resolved and canonicalized into a octet stream, which is then reparsed
into a xml, and then xpath is applied to select the nodes, after that
another implicit c14n is performed to covert it into a octet stream.
This is completely meaningless, and besides XML parsing is an expensive
operation. So we would like to define a strict rules on the sequence of
transforms
* There can be no transforms after c14n (or after WS Security
STRTransform which includes c14n transform)
* No transforms after base64 because it produces a octet stream, which
is to be directly digested
* Other transforms that emit octet stream (like the WS Security SWA
Attachment transforms) should also be the last one
* XSLT also produces an Octet stream, but that needs to be dealt
differently because it is not canonicalized and cannot be digested
directly - actually I would vote for removing XSLT transform completely,
because first of all it is insecure - very easy to have DoS attacks,
secondly it is completely unstreamable (unless we have a very restricted
XSLT), thirdly it loses the original nodeset so makes it impossible to
determine what was really signed.
* XPath Filter or XPath Filter 2.0 should be the first transform, and
there should only one XPath transform.
* There can be only one enveloped signature transform
* Only one Decrypt transform
* Base64 transform should only take a single text node or an element
with a single text node child as input. (This restriction is to
eliminate dependency on the Xpath text() function, which is not
streamable as it needs to select any number of text nodes and
concatenate them)
These rules eliminate XML Parsing during transform processing, and also
make it possible to determine what is signed.
Change 3: Use simple XPaths in XPath Transform
----------------------------------------------
XPath poses a lot of problems - first of all it is insecure - DoS
attacks are possible, secondly XPath inherently requires a DOM, there is
a only a limited set of XPath that can be streamed, thirdly XPath make
is very hard to know what is signed, fourthly XPath Filter 1.0 are
inside out and very difficult to write and understand (although this is
fixed in XPath Filter 2.0)
XPaths can also be specified in an XPointer URI, but since XPointers
were marked OPTIONAL, but XPath Transform were marked RECOMMENDED,
XPointers have never really been used. I propose that we just
drop/deprecate them.
To solve these XPath problems, I propose a new mechanism to to specify
the XPath transform, which is essentially a restricted form of the XPath
Filter 2.0. It has
* an included Xpath - identifies subtrees that need to be signed
(optional - an URI an can be used instead of this)
* an excluded Xpath - (optional) identifies subtrees or attributes need
to be excluded
The included XPath is similar to the "intersect" and the excluded XPath
is similar to the "subtract" of the XPath Filter 2.0.
Restrictions
* As mentioned above, if Xpath is used, it should be the first
transform, (there can be only one Xpath transform in the transform list),
* If included is used, the reference URI should be "", i.e. refer to the
complete document
* The XPath expression itself is very restricted as mentioned below
* Unlike XPath Filter 2.0, there is only included XPath and one excluded
XPath, and the excluded overrides included.
I am open to the syntax, as long as we can have this included and
excluded XPaths. One idea is to preserve backwards compatibility, and
just add two attributes "included" and "excluded" to the existing XPath
transform, like this:
<Transform Algorithm="http://www.w3.org/TR/1999/REC-xpath-19991116">
<XPath included="..." excluded="...">
...
</XPath>
</Transform>
So an older implementation will execute the XPath Filter 1.0 Transform,
whereas a newer implementation will just process the included and
excluded XPaths.
This proposal also makes it easy to determine what is signed. There is
only Xpath transform, and this Xpath has only one included XPath, so it
is easy to to do static analysis of the signature to determine what
elements were signed.
Streaming XPath
---------------
There are many streaming Xpath implementations, and they impose
different kinds of constraints on the XPath.
I looked at XSQ implementation which Thomas had pointed out
http://www.cs.umd.edu/projects/xsq/.
and some others
http://www.idealliance.org/papers/xml2001/papers/pdf/05-01-01.pdfhttp://cs.nyu.edu/~deepak/publications/icde.pdfhttp://www.stanford.edu/class/cs276b/handouts/presentations/joshislezberg.ppthttp://www.idealliance.org/proceedings/xml04/papers/299/RandomAccessXML.pdf
They have varying constrains some common ones are
* Only forward axes - like child, descendant, forward-sibling
(reverse axes are very difficult)
* lot of limitations on predicates
** no location paths in predicates
** no nested predicates
** functions on nodesets are not allowed e.g count(), last() etc
** conversion of subtrees to strings e.g. the text() functions
Even with these restrictions, the implementations are very complex and
require state engines and static optimization
I would like to propose an smaller subset of the XPath, that has even
lesser requirements. For this imagine a streaming XML Parser that is
walking through the XML tree, and any point it has in memory
* the current element,
* all the attributes of the current element,
* all and ancestor elements
We assume that this parser maintains a namespace definitions and also do
xml:base combinations as it walks down the tree.
Node Text nodes can be extremely long (especially for long base64
encoded string, e.. MTOM attachments), so it is possible that a text
node is split up, and not loaded up all in memory.
With this model, we impose the following restrictions
* Only elements can be selected. (I.e. the location path must resolve
to one or more elements. not attributes or text nodes)
* Only descendant and child axes can be used
* predicates can only have relational expressions involving attributes.
The predicate can only be at the last location step, and it cannot use
any functions.
So only simple expressions like this are allowed
/soap:Envelope/soap:Header[@actor = "ac"]
This restrictions are such that the XPath expression can be evaluated
with only the element, it attributes and its ancestor elements. So as a
streaming parser is walking down the document, it can evaluate the
included and excluded XPath expression for every node, and determine
whether a node is to be included or not.
Reference Processing
====================
These proposed changes allow the signature to be statically analyzed
without running through the transforms. A signature processing
API/Library should provide a method to statically analyze the reference
and return what was signed. After that the caller of this library, can
determine if it wants to go ahead with signature verification.
Streaming verification
----------------------
These changes also allow signatures to be processed in a streaming
manner. Let us assume that we have already done an initial pass over the
document to get the signature, keys, tokens etc. (In WSSecurity use
case, all of these are present in the SOAP header, so this first pass is
just going only over a small fraction of the document, not the entire
document).
Now we set a "canonicalization and digesting engine" for each reference.
This engine expects streaming xml events, and canonicalizes and digests
them to maintain a running digest. Then we do one pass over the whole
document, and for each node, evalulate all the XPaths/URIs for each
references. If the node is part of a reference we pass that event to the
corresponding canonicalization and digesting engine.
After this pass, we retrieve the digests from each engine, and check if
the digests match.
Summary
-------
The proposal puts in a lot of restrictions to the Transforms, to make it
possible to check what was signed, and to perform signing/verification
operations in a stream.
Pratik