Hi,
Quick summary of options:
1. Current Spec
. This is intuitive (in my opinion) because it is based on a
linear sequence of set operations.
. Typical (IMHO) use cases require 2 XPath evaluations.
However, increasingly complex filtering requirements incur
increasing cost; an arbitrarily complex expression requires
an arbitrarily large number of simple XPath expressions.
However, the standard XPath filter may be more useful for
these anyway.
. Operation can, in most cases, be commingled with c14n for
efficiency, but:
. The union operator is really ugly and unintuitive.
2. Christian's Spec
. *I* do not believe this is as intuitive; it involves labeling
nodes and then traversing the document, proceeding based
on node labels (e.g., omit-but-traverse).
. Typical use cases require 2 XPath evaluations. Increasingly
complex filtering requirements can be solved in a fixed
number (2/3) of increasingly complex XPath expressions.
. Operation can be commingled with c14n for effiency.
3. Or, we can take a variant of the current spec. I won't
detail it horrendously, but basically:
. The XPath Filter 2.1 takes, as a parameter, a sequence
of operations, each of which is characterized as a
set operation (intersect, subtract, union) and an
XPath expression.
. Operation over an input node set is as follows:
* Construct a node set N consisting of all the
nodes in the input document.
* Iterate through each of the operations.
# Evaluate the XPath expression; the result is X.
# Expand all identified nodes to include their
subtrees; the result is Y.
# Assign N = N op Y
* Use the resulting node set N as a filter to select
which nodes from the input node set will remain in the
output node set, just as the XPath 1.0 filter. This is
tantamount to intersection with the input node set.
* Implementations SHOULD note that an efficient realization
of this transform will not compute a node set at each
point in this transform, but instead make a list of
the operations and the XPath-selected nodes, and then
iterate through the input document once, in document
order, constructing a filtering node set N based upon
the operations and selected nodes.
* Implementations SHOULD note that iterating through the
document and constructing a filtering node set N can
be efficiently commingled with the canonicalization
transform if canonicalization is performed immediately
after this transform.
. With this formulation, intersection and subtraction
are IDENTICAL to the existing spec, with the only
change being that you can put them in one transform
or many.
. Union is, however, much improved (in my opinion). You
can only use it to include nodes that would be
removed by a previous operation in the same transform.
As a result, the output node set will only include
nodes from the input node set.
. Efficiency is as with the current spec. Basically this
fixes union.
I write this a while ago; thought I'd send it rather
than delete it. It's probably wasteful to propose yet
another option.
Merlin