Abstract

Canonical
XML
Version
2.0
is
a
major
rewrite
of
Canonical
XML
Version
1.1
and
Exclusive
Canonical
XML
1.0
to
address
issues
around
performance,
streaming,
hardware
implementation,
robustness,
minimizing
attack
surface,
determining
what
is
signed
and
more.
It
also
incorporates
an
update
to
Exclusive
Canonicalization,
effectively
combines
inclusive
and
exclusive
canonicalization
algorithms
into
a
2.0
version,
single
algorithm,
that
takes
the
canonicalization
mode
as
well.
a
parameter.

Any
XML
document
is
part
of
a
set
of
XML
documents
that
are
logically
equivalent
within
an
application
context,
but
which
vary
in
physical
representation
based
on
syntactic
changes
permitted
by
XML
1.0
[
XML10
]
and
Namespaces
in
XML
1.0
[
XML-NAMES
].
This
specification
describes
a
method
for
generating
a
physical
representation,
the
canonical
form,
of
an
XML
document
that
accounts
for
the
permissible
changes.
Except
for
limitations
regarding
a
few
unusual
cases,
if
two
documents
have
the
same
canonical
form,
then
the
two
documents
are
logically
equivalent
within
the
given
application
context.
Note
that
two
documents
may
have
differing
canonical
forms
yet
still
be
equivalent
in
a
given
context
based
on
application-specific
equivalence
rules
for
which
no
generalized
XML
specification
could
account.

Canonical
XML
Version
2.0
is
applicable
to
XML
1.0.
It
is
not
defined
for
XML
1.1.

Status
of
This
Document

This
section
describes
the
status
of
this
document
at
the
time
of
its
publication.
Other
documents
may
supersede
this
document.
A
list
of
current
W3C
publications
and
the
latest
revision
of
this
technical
report
can
be
found
in
the
W3C
technical
reports
index
at
http://www.w3.org/TR/.

This
is
a
W3C
Working
Draft
of
"Canonical
XML
Version
2.0".

This
document
is
expected
to
be
further
updated
based
on
both
Working
Group
input
and
public
comments.

This
document
was
developed
by
the
XML
Security
Working
Group
.
Please
send
comments
about
this
document
to
public-xmlsec-comments@w3.org
(with
public
archive
A
diff-marked
version
).
Publication
as
a
Working
Draft
does
not
imply
endorsement
by
of
this
specification
that
highlights
changes
against
the
W3C
Membership.
This
is
a
draft
document
and
may
be
updated,
replaced
or
obsoleted
by
other
documents
at
any
time.
It
previous
version
is
inappropriate
to
cite
this
document
as
other
than
work
available.
Major
changes
in
progress.
this
version:

This
document
was
produced
by
a
group
operating
under
the
5
February
2004
W3C
Patent
Policy.
The
W3C
maintains
a
public
list

Some
of
any
patent
disclosures
made
in
connection
with
the
deliverables
of
Parameters
have
been
changed.
All
the
group;
that
page
also
includes
instructions
for
disclosing
parameters's
names
now
start
with
an
uppercase
letter.
Parameters
ignoreDTD
and
expandEntities
have
been
removed.
The
xmlBaseAncestors
,
xmlIdAncestors
xmlLangAncestors
and
xmlSpaceAncestors
have
been
combined
into
XmlAncestors
.
The
parameter
xsiTypeAware
has
been
generalized
to
a
patent.
An
individual
who
new
parameter
QNameAware
.

The
definition
of
a
patent
which
the
individual
believes
contains
Essential
Claim(s)
must
disclose
the
information
visibly
utilized
has
been
extended
to
also
consider
namespace
prefixes
in
accordance
with
section
6
of
the
W3C
Patent
Policy.
elements
and
attributes
listed
in
QNameAware
parameter.

Publication
as
a
Working
Draft
does
not
imply
endorsement
by
the
W3C
Membership.
This
is
a
draft
document
and
may
be
updated,
replaced
or
obsoleted
by
other
documents
at
any
time.
It
is
inappropriate
to
cite
this
document
as
other
than
work
in
progress.

1.
Introduction

1.1
Terminology

The
key
words
"
must
",
"
must
not
",
"
required
",
"
shall
",
"
shall
not
",
"
should
",
"
should
not
",
"
recommended
",
"
may
",
and
"
optional
"
in
this
document
are
to
be
interpreted
as
described
in
RFC
2119
[
RFC2119
].

A
document
subset
is
a
portion
of
an
XML
document
that
may
not
include
all
of
the
nodes
in
the
document.

canonical
form

The
canonical
form
of
an
XML
document
is
physical
representation
of
the
document
produced
by
the
method
described
in
this
specification

canonical
XML

The
term
canonical
XML
refers
to
XML
that
is
in
canonical
form.
The
XML
canonicalization
method
is
the
algorithm
defined
by
this
specification
that
generates
the
canonical
form
of
a
given
XML
document
or
document
subset.
The
term
XML
canonicalization
refers
to
the
process
of
applying
the
XML
canonicalization
method
to
an
XML
document
or
document
subset.

subtree

Subtree
refers
to
one
XML
element
node,
and
all
that
it
contains.
In
XPath
terminology
it
is
an
element
node
and
all
its
descendant
nodes

DOM

DOM
or
Document
Object
Model
is
a
model
of
representing
an
XML
document
in
tree
structure.
The
W3C
DOM
standard
[
DOM-LEVEL-2-CORE
]
is
one
such
DOM,
but
this
specification
does
not
require
this
particular
set
of
DOM
APIs,
any
similar
model
can
be
used
as
long
as
it
has
a
tree
representation
of
the
XML
document,
whose
root
is
a
document
node,
and
the
document
node's
descendants
are
element
nodes,
attribute
nodes,
text
nodes
etc.

DOM
parser

An
software
module
that
reads
an
XML
document
and
constructs
a
DOM
tree.

Stream
parser

A
software
module
that
reads
an
XML
document
and
constructs
a
stream
of
XML
events
like
"beginElement",
"text",
"endElement".
[StAX]
is
an
example
of
a
stream
parser.

1.2
Applications

Since
the
XML
1.0
Recommendation
[
XML10
]
and
the
Namespaces
in
XML
1.0
Recommendation
[
XML-NAMES
]
define
multiple
syntactic
methods
for
expressing
the
same
information,
XML
applications
tend
to
take
liberties
with
changes
that
have
no
impact
on
the
information
content
of
the
document.
XML
canonicalization
is
designed
to
be
useful
to
applications
that
require
the
ability
to
test
whether
the
information
content
of
a
document
or
document
subset
has
been
changed.
This
is
done
by
comparing
the
canonical
form
of
the
original
document
before
application
processing
with
the
canonical
form
of
the
document
result
of
the
application
processing.

For
example,
a
digital
signature
over
the
canonical
form
of
an
XML
document
or
document
subset
would
allow
the
signature
digest
calculations
to
be
oblivious
to
changes
in
the
original
document's
physical
representation,
provided
that
the
changes
are
defined
to
be
logically
equivalent
by
the
XML
1.0
or
Namespaces
in
XML
1.0.
During
signature
generation,
the
digest
is
computed
over
the
canonical
form
of
the
document.
The
document
is
then
transferred
to
the
relying
party,
which
validates
the
signature
by
reading
the
document
and
computing
a
digest
of
the
canonical
form
of
the
received
document.
The
equivalence
of
the
digests
computed
by
the
signing
and
relying
parties
(and
hence
the
equivalence
of
the
canonical
forms
over
which
they
were
computed)
ensures
that
the
information
content
of
the
document
has
not
been
altered
since
it
was
signed.

Note:
Although
not
stated
as
a
requirement
on
implementations,
nor
formally
proved
to
be
the
case,
it
is
the
intent
of
this
specification
that
if
the
text
generated
by
canonicalizing
a
document
according
to
this
specification
is
itself
parsed
and
canonicalized
according
to
this
specification,
the
text
generated
by
the
second
canonicalization
will
be
the
same
as
that
generated
by
the
first
canonicalization.

1.3
Limitations

Two
XML
documents
may
have
differing
information
content
that
is
nonetheless
logically
equivalent
within
a
given
application
context.
Although
two
XML
documents
are
equivalent
(aside
from
limitations
given
in
this
section)
if
their
canonical
forms
are
identical,
it
is
not
a
goal
of
this
work
to
establish
a
method
such
that
two
XML
documents
are
equivalent
if
and
only
if
their
canonical
forms
are
identical.
Such
a
method
is
unachievable,
in
part
due
to
application-specific
rules
such
as
those
governing
unimportant
whitespace
and
equivalent
data
(e.g.
<color>black</color>
versus
<color>rgb(0,0,0)</color>
).
There
are
also
equivalencies
established
by
other
W3C
Recommendations
and
Working
Drafts.
Accounting
for
these
additional
equivalence
rules
is
beyond
the
scope
of
this
work.
They
can
be
applied
by
the
application
or
become
the
subject
of
future
specifications.

The
canonical
form
of
an
XML
document
may
not
be
completely
operational
within
the
application
context,
though
the
circumstances
under
which
this
occurs
are
unusual.
This
problem
may
be
of
concern
in
certain
applications
since
the
canonical
form
of
a
document
and
the
canonical
form
of
the
canonical
form
of
the
document
are
equivalent.
For
example,
in
a
digital
signature
application,
it
cannot
be
established
whether
the
operational
original
document
or
the
non-operational
canonical
form
was
signed
because
the
canonical
form
can
be
substituted
for
the
original
document
without
changing
the
digest
calculation.
However,
the
security
risk
only
occurs
in
the
unusual
circumstances
described
below,
which
can
all
be
resolved
or
at
least
detected
prior
to
digital
signature
generation.

The
difficulties
arise
due
to
the
loss
of
the
following
information
not
available
in
the
data
model:
model
:

base
URI,
especially
in
content
derived
from
the
replacement
text
of
external
general
parsed
entity
references

notations
and
external
unparsed
entity
references

attribute
types
in
the
document
type
declaration

In
the
first
case,
note
that
a
document
containing
a
relative
URI
[
URI
]
is
only
operational
when
accessed
from
a
specific
URI
that
provides
the
proper
base
URI.
In
addition,
if
the
document
contains
external
general
parsed
entity
references
to
content
containing
relative
URIs,
then
the
relative
URIs
will
not
be
operational
in
the
canonical
form,
which
replaces
the
entity
reference
with
internal
content
(thereby
implicitly
changing
the
default
base
URI
of
that
content).
Both
of
these
problems
can
typically
be
solved
by
adding
support
for
the
xml:base
attribute
[
XMLBASE
]
to
the
application,
then
adding
appropriate
xml:base
attributes
to
document
element
and
all
top-level
elements
in
external
entities.
In
addition,
applications
often
have
an
opportunity
to
resolve
relative
URIs
prior
to
the
need
for
a
canonical
form.
For
example,
in
a
digital
signature
application,
a
document
is
often
retrieved
and
processed
prior
to
signature
generation.
The
processing
should
create
a
new
document
in
which
relative
URIs
have
been
converted
to
absolute
URIs,
thereby
mitigating
any
security
risk
for
the
new
document.

In
the
second
case,
the
loss
of
external
unparsed
entity
references
and
the
notations
that
bind
them
to
applications
means
that
canonical
forms
cannot
properly
distinguish
among
XML
documents
that
incorporate
unparsed
data
via
this
mechanism.
This
is
an
unusual
case
precisely
because
most
XML
processors
currently
discard
the
document
type
declaration,
which
discards
the
notation,
the
entity's
binding
to
a
URI,
and
the
attribute
type
that
binds
the
attribute
value
to
an
entity
name.
For
documents
that
must
be
subjected
to
more
than
one
XML
processor,
the
XML
design
typically
indicates
a
reference
to
unparsed
data
using
a
URI
in
the
attribute
value.

In
the
third
case,
the
loss
of
attribute
types
can
affect
the
canonical
form
in
different
ways
depending
on
the
type.
Attributes
of
type
ID
cease
to
be
ID
attributes.
Hence,
any
XPath
expressions
that
refer
to
the
canonical
form
using
the
id()
function
cease
to
operate.
The
attribute
types
ENTITY
and
ENTITIES
are
not
part
of
this
case;
they
are
covered
in
the
second
case
above.
Attributes
of
enumerated
type
and
of
type
ID,
IDREF,
IDREFS,
NMTOKEN,
NMTOKENS,
and
NOTATION
fail
to
be
appropriately
constrained
during
future
attempts
to
change
the
attribute
value
if
the
canonical
form
replaces
the
original
document
during
application
processing.
Applications
can
avoid
the
difficulties
of
this
case
by
ensuring
that
an
appropriate
document
type
declaration
is
prepended
prior
to
using
the
canonical
form
in
further
XML
processing.
This
is
likely
to
be
an
easy
task
since
attribute
lists
are
usually
acquired
from
a
standard
external
DTD
subset,
and
any
entity
and
notation
declarations
not
also
in
the
external
DTD
subset
are
typically
constructed
from
application
configuration
information
and
added
to
the
internal
DTD
subset.

While
these
limitations
are
not
severe,
it
would
be
possible
to
resolve
them
in
a
future
version
of
XML
canonicalization
if,
for
example,
a
new
version
of
XPath
were
created
based
on
the
XML
Information
Set
[
XML-INFOSET
]
currently
under
development
at
the
W3C
.
W3C.

1.4
Requirements
for
2.0

Canonical
XML
Canonicalization
2.0
solves
most
many
of
the
major
issues
that
have
been
identified
by
implementers
with
Canonical
XML
1.0
[
XML-C14N
]
and
1.1
[
XML-C14N11
].

1.4.1
Performance

A
major
factor
in
performance
issues
noted
in
XML
Signature
is
often
C14N11
canonicalization.
Canonical
XML
1.1
processing.
Canonicalization
will
be
slow
if
the
implementation
uses
the
Canonical
XML
1.1
specification
as
a
formula
without
any
attempt
at
optimization.
This
specification
rectifies
this
problem
by
incorporating
lessons
learned
from
implementation
into
the
specification.
Most
mature
C14N
canonicalization
implementations
solve
the
performance
problem
by
inspecting
the
signature
first,
to
see
if
it
can
be
canonicalized
using
a
simple
tree
walk
algorithm
whose
performance
is
similar
to
regular
XML
serialization.
If
not
they
fall
back
to
the
expensive
nodeset
based
nodeset-based
algorithm.

The
use
cases
that
cannot
be
solved
addressed
by
the
simple
tree
walk
algorithm
are
mostly
edge
use
cases.
This
specification
restricts
the
input
of
to
the
canonicalization
algorithm,
so
that
implementations
can
always
use
the
simple
tree
walk
algorithm.

C14N
1.x
uses
an
"XPath
1.0
Nodeset"
to
describe
a
document
subset.
This
is
the
root
cause
of
the
performance
problem
and
can
be
solved
by
not
using
a
Nodeset.
nodeset.
This
version
of
the
spec
specification
does
not
use
a
nodeset,
visits
each
node
exactly
once,
and
it
only
visits
the
nodes
that
are
being
canonicalized.

1.4.2
Streaming

A
streaming
implementation
is
required
to
be
able
to
process
very
large
documents
without
holding
it
them
all
in
memory,
i.e.
memory;
it
should
be
able
to
process
the
document
documents
one
chunk
at
a
time.

1.4.3
Robustness

Whitespace
handling
was
a
common
cause
of
signature
breakages.
breakage.
XML
libraries
allow
one
to
"pretty
print"
an
XML
document,
and
most
people
wrongly
assume
that
the
white
space
introduced
by
pretty
printing
will
be
removed
by
canonicalization
but
that
is
not
the
case.
This
specification
adds
three
techniques
to
improve
robustness:

Remove
Optionally
remove
leading
and
trailing
whitespace
from
text
nodes,

Allow
for
QNames
in
content
especially
content,
particularly
in
the
xsi:type
attribute,

Rewrite
Optionally
rewrite
prefixes

1.4.4
Simplicity

C14N
1.x
algorithms
are
complex
and
depend
on
a
full
XPath
library.
This
makes
it
very
hard
increases
the
work
required
for
scripting
languages
to
use
XML
Signatures.
This
specification
addresses
this
issue
by
not
using
the
complex
nodeset
model,
and
therefore
not
relying
completely
on
XPath
-
also
it
also
introduces
a
minimal
canonicalization
mode.

2.
Canonical
XML
Canonicalization
2.0

2.1
Data
Model

The
input
to
the
canonicalization
algorithm
consists
of
an
XML
document
subset,
and
set
of
options.
The
XML
document
subset
can
be
expressed
in
two
ways,
with
a
DOM
model
or
a
Stream
model.

In
a
the
DOM
model
the
XML
subset
is
expressed
as
as:

Inclusion
List:
Either
the
document
Node
D
or
a
list
of
one
or
more
element
nodes
E
1
,
E
2
,
...
E
n
.
If
(If
out
of
this
list,
one
element
node
E
i
is
a
descendant
of
another
E
j
,
then
that
element
node
E
i
is
ignored.)

Exclusion
List
(optional):
A
list
of
zero
or
more
element
nodes
E
1
,
E
2
,
...
E
m
and
a
list
of
zero
or
more
attribute
nodes
A
1
,
A
2
,
...
A
M
.
These
attribute
nodes
should
not
be
namespace
desclaration
declaration
or
attributes
in
the
xml
namespace.

The
XML
subset
is
computed
by
at
first
taking
consists
of
all
the
nodes
in
the
Inclusion
list
and
their
descendant.
From
this
list
remove
descendant,
minus
all
the
nodes
that
are
in
the
Exclusion
list
and
their
descendants.

The
element
nodes
in
the
Inclusion
list
are
also
referred
as
apex
nodes
.

Note:This
input
model
is
a
very
limited
form
of
the
generic
XPath
Nodeset
that
was
the
input
model
for
Canonical
XML
1.x.
It
is
designed
to
be
simple
and
allow
for
a
high
performance
algorithm,
while
still
allowing
supporting
the
most
essential
use
cases.
Specifically
Specifically:

This
model
purposely
does
not
support
re-inclusion,
re-inclusion;
i.e.
all
the
exclusions
are
applied
after
all
the
inclusions.
Think
of
it
as
It
is
effectively
a
simplified
form
of
the
XPath
Filter
2
model
[
XMLDSIG-XPATH-FILTER2
]
with
one
intersect
followed
by
one
optional
subtract
operation.
Reinclusion
Re-inclusion
complicates
the
canonicalization
algorithm,
especially
in
the
areas
of
namespace
and
xml
attributes
attribute
inheritance.

Exclusion
is
very
limited,
only
limited
to
complete
subtrees
and
attribute
nodes
can
be
excluded,
other
nodes.
Other
kinds
of
nodes
like
text
nodes,
comment
nodes,
PI
nodes
(text,
comment,
PI)
cannot
be
excluded.

Even
attribute
Attribute
exclusion
is
also
limited,
such
that
namespace
declaration
and
attributes
in
XML
from
the
xml
namespace
cannot
be
excluded.

Some
examples
of
subsets
that
were
were
permitted
in
the
Canonical
XML
1.x
mode
1.x,
but
not
in
this
new
model:
version:

A
subset
consisting
of
a
single
attribute
all
by
itself.

A
subset
consisting
of
an
attribute
without
its
owner
element.

A
subset
consisting
of
a
text
node
all
by
itself.

A
subset
consisting
of
a
text
node
without
its
parent
node.

A
subset
consisting
of
an
element
without
some
of
its
text
node
children.

Note:
Canonical
XML
2.0,
unlike
earlier
versions,
does
not
support
direct
input
of
an
octet
stream.
The
transformation
of
such
a
stream
into
the
input
model
required
by
this
specification
is
application-specific
and
should
be
defined
in
specifications
that
reference
or
make
use
of
this
one.

2.2
Parameters

Instead
of
separate
algorithms
for
each
variant
of
canonicalization,
this
specification
goes
with
takes
the
approach
of
a
single
algorithm,
which
does
slightly
different
things
depending
on
algorithm
subject
to
a
variety
of
parameters
that
change
its
behavior
to
address
specific
use
cases.

The
following
is
a
list
of
the
parameters.
logical
parameters
supported
by
this
algorithm.
The
actual
serialization
that
expresses
the
parameters
in
use
may
be
defined
as
appropriate
to
specific
applications
of
this
specification
(e.g.,
the
<ds:CanonicalizationMethod>
element
in
[
XMLDSIG-CORE2
]).

Name

Values

Description

Default

exclusiveMode
ExclusiveMode

true
or
false

whether
to
do
inclusive
or
exclusive
dealing
of
namespaces.
In
exclusive
mode
the
inclusiveNamespacePrefixList
InclusiveNamespaces
parameter
can
be
specified
listing
the
prefixes
that
are
to
be
treated
in
an
inclusive
mode

false

inclusiveNamespacePrefixList
InclusiveNamespace

space
separated
list
of
prefixes

list
of
prefixes
to
be
treated
inclusively.
Special
token
#default
indicates
the
default
namespace.

empty

ignoreComments
IgnoreComments

true
or
false

whether
to
ignore
comments
during
canonicalization

true

trimTextNodes
TrimTextNodes

true
or
false

whether
to
trim
(i.e.
remove
leading
and
trailing
whitespaces)
all
text
nodes
when
canonicalizing.
Adjacent
text
nodes
must
be
coalesced
prior
to
trimming.
If
an
element
has
an
xml:space="preserve"
attribute,
then
text
nodes
node
descendants
of
that
element
are
not
trimmed
regardless
of
the
value
of
this
parameter.

false

serialization
Serialization

XML
serializeXML
or
EXI
serializeEXI

whether
to
do
the
normal
XML
serialization,
serialization
(
http://www.w3.org/2010/xml-c14n2#serializeXML
),
or
do
an
EXI
serialization
(
http://www.w3.org/2010/xml-c14n2#serializeEXI
)
-
which
is
useful
if
the
original
document
to
be
signed
canonicalized
is
already
in
EXI
format.

XML
serializeXML

prefixRewrite
PrefixRewrite

none,
sequential,
derived

with
none,
prefixes
are
not
changed,
with
sequential
prefixes
are
changed
to
n1,
n2,
n3
...
and
with
derived,
each
prefix
is
changed
to
nSuffix,
where
the
suffix
is
derived
by
doing
a
digest
of
the
namespace
URI.

none

sortAttributes
SortAttributes

true
or
false

whether
the
attributes
need
to
be
sorted
before
canonicalization.
In
some
environments
the
order
of
attributes
changes
in
transit
so
sorting
is
important.

true

ignoreDTD
XmlAncestors

true
or
false
inherit,
none

if
set
where
to
true,
ignore
inherit
the
DTD
completely,
which
means
do
not
normalize
attributes,
do
not
look
into
entity
definitions,
do
not
add
default
simple
inheritable
attributes
(
xml:lang
and
xml:space
)
and
combine
the
xml:base
i.e.
similar
to
each
element
false
expandEntities
true
Canonical
XML
1.1
or
false
if
set
to
true
completely
ignore
all
entity
declarations,
and
expand
only
the
predefined
entites
(lt,
gt,
amp,
apos,
quot)
and
character
references.
(Entity
declarations
are
potential
attack
points,
[BradHill]
mentions
an
entity
that
is
2
GB
is
length,
also
expanding
external
entities
can
lead
xml
attributes
in
ancestors
similar
to
cross
site
scripting
attacks)
Exclusive
Canonical
XML
1.0

whether
to
inherit
xml:base
attributes
from
ancestors
(like
C14N
1.0)
or
not
(like
Exc
C14n
1.0)
set
of
nodes
whose
entire
content
must
be
processed
as
QName-valued
or
combine
them
(like
C14n
1.1)
[
CURIE
]-valued
for
the
purposes
of
canonicalization,
including
prefix
rewriting
and
recognition
of
prefix
"visible
utilization"

combine
empty
set

xmlIdAncestors
inherit,
none
whether

The
defaults
are
chosen
for
equivalence
to
inherit
xml:id
attributes
from
ancestors
(like
C14N
1.0)
or
not
(like
C14N
Canonical
XML
1.1
or
Exc
C14n
1.0)
none
with
comments
ignored.

2.2.1
Conformance
profiles

Implementations
may
not
support
all
of
these
parameters.
We
have
identified
the
following
profiles.

The
input
to
Canonicalization
should
only
be
a
single
complete
subtree
identified
by
ID.
There
is
no
XPath
involved
in
this
profile
and
hence
no
associated
complexities
on
visible
utilization
of
prefixes
in
IncludedXPath
and
ExcludedXPath

if
set
to
true,
looks
for
namespace
prefix
usages
in
xsi:type
attributes
as
well,
otherwise
xsi:type
attributes
are
treated
just
like
regular
attributes.

false

The
defaults
are
set
to
result
in
canonical
1.1
with
no
comments.
Implementation
are
not
required
to
support
all
possible
combinations
of
these
parameters,
instead
these
parameter
are
grouped
into
various
"named
parameter
sets".
Implementation
can
choose
to
support
one
or
more
of
these.
canonical-xml-1.1-nocomments:
exclusiveMode=false,
xsiTypeAware=false
...
This
produces
the
exactly
same
output
as
Canonical
XML
1.1
exclusive-canonical-xml-1.0-nocomments:
exclusiveMode=true,
xsiTypeAware=false
...
This
produces
the
exactly
same
output
as
Exc
Canonical
XML
1.0
minimal-canonicalization:
sortAttributes=false,...
Very
low
processing,
required
in
situations
where
the
XML
content
is
expected
to
be
mostly
unchanged
during
transport

2.3
Processing
Model

The
basic
canonicalization
process
consist
of
traversing
the
tree
and
outputting
octets
for
each
node.

Input:
The
XML
subset
conisting
consisting
of
an
Inclusion
list
and
an
exlusion
Exclusion
list.

Processing

Sort
inclusion
list
by
document
order:
If
inclusion
list
only
has
the
document
node
D
there
is
nothing
to
sort.
Otherwise
remove
all
element
nodes
E
i
that
are
descendants
of
some
other
element
node
in
the
inclusion
list.
Then
sort
the
remaining
element
nodes
E
1
,
E
2
,
...
E
n
by
document
order.

Canonicalize
each
subtree
For
each
element
node
E
i
or
document
node
D
in
the
sorted
list,
do
a
depth
first
traversal
to
visit
all
the
child
descendant
nodes
in
the
E
i
subtree,
and
canonicalize
each
one
of
them.
While
traversing
traversing,
if
the
current
node
is
an
element,
element
and
that
element
is
in
the
exclusion
list,
prune
the
traversal,
i.e
skip
over
that
element
and
all
its
descendants.

During
traversal
of
each
subtree,
generate
the
canonicalized
text
depending
on
the
node
type
as
follows:

Root
Node-
Ignore
the
byte
order
mark,
XML
declaration,
nor
anything
from
within
the
document
type
declaration.
Traverse
through
the
children.

Element
nodes-
Nodes-
The
canonicalized
result
is
an
open
angle
bracket
(
<
),
the
element
QName,
the
result
of
processing
the
namespaces
,
the
result
of
processing
the
attributes
,
a
close
angle
bracket
(
>
),
traverse
the
child
nodes
of
the
element,
an
open
angle
bracket
(
<
),
a
forward
slash
(
/
),
the
element
QName,
and
a
close
angle
bracket
(
>
.
).
Note
if
the
prefix
rewriting
parameter
is
set,
the
QNames
should
have
will
be
written
with
the
changed
prefixes.

Attribute
Nodes-
a
space,
the
node's
QName,
an
equals
sign,
an
open
quotation
mark
(double
quote),
the
modified
string
value,
and
a
close
quotation
mark
(double
quote).
The
string
value
of
the
node
is
modified
by
replacing
all
ampersands
(
&
)
with
&amp;
,
all
open
angle
brackets
(
<
)
with
&lt;
,
all
quotation
mark
characters
with
&quot;
,
and
the
whitespace
characters
#x9
,
#xA
,
and
#xD
,
with
character
references.
The
character
references
are
written
in
uppercase
hexadecimal
with
no
leading
zeroes
(for
example,
#xD
is
represented
by
the
character
reference
&#xD;
).

If
the
prefix
rewriting
parameter
is
set,
and
the
attribute
name
has
a
namespace
prefix,
the
prefix
is
changed
to
the
rewritten
prefix.
Also
with
prefix
rewriting
enabled,
the
xsi:type
attribute
content
is
treated
specially
if
the
attribute
is
among
those
enumerated
for
the
xsiTypeAware="true
.
In
this
case
QNameAware
option.
If
so,
the
QName
in
the
or
[
CURIE
]
value
of
the
xsi:type
should
also
be
attribute
is
rewritten
with
the
new
prefix.

Namespace
Nodes-
Process
Take
the
ordered
list
of
namespace
nodes
resulting
from
namespace
processing
,
and
process
each
of
the
namespace
node
N
in
the
same
way
as
an
attribute
node.

Text
Nodes-
the
string
value,
except
all
ampersands
are
replaced
by
&amp;
,
all
open
angle
brackets
(
<
)
are
replaced
by
&lt;
,
all
closing
angle
brackets
(
>
)
are
replaced
by
&gt;
,
and
all
#xD
characters
are
replaced
by
&#xD;
.
If
parameter
trimTextNode
TrimTextNodes
is
true
and
there
is
no
xml:space=preserve
xml:space="preserve"
declaration
is
in
context
context,
trim
the
leading
and
trailing
spaces.
space.
E.g.
trim
<A>
<B/>
to
<A><B/>
and
trim
<A>
this
is
text
</A>
to
<A>this
is
text</A>
.

Note:
The
DOM
parser
might
have
split
up
a
long
text
node
into
multiple
adjacent
text
nodes,
some
of
which
may
be
empty.
In
that
case
be
careful
Be
aware
when
trimming
the
leading
and
trailing
space
-
whitespace
in
such
cases;
the
net
result
should
be
same
equivalent
to
doing
so
as
if
it
the
adjacent
text
nodes
were
concatenated
into
one
concatenated.

If
the
prefix
rewriting
parameter
is
set,
and
if
the
parent
element
node
is
among
those
enumerated
for
the
QNameAware
option,
then
the
QName
or
CURIE
value
of
the
text
node
is
rewritten
with
the
new
prefix.

Processing
Instruction
(PI)
Nodes-
The
opening
PI
symbol
(
<?
),
the
PI
target
name
of
the
node,
a
leading
space
and
the
string
value
if
it
is
not
empty,
and
the
closing
PI
symbol
(
?>
).
If
the
string
value
is
empty,
then
the
leading
space
is
not
added.
Also,
a
trailing
#xA
is
rendered
after
the
closing
PI
symbol
for
PI
children
of
the
root
node
with
a
lesser
document
order
than
the
document
element,
and
a
leading
#xA
is
rendered
before
the
opening
PI
symbol
of
PI
children
of
the
root
node
with
a
greater
document
order
than
the
document
element.

Comment
Nodes-
Nothing
if
generating
canonical
XML
without
comments.
For
canonical
XML
with
comments,
generate
the
opening
comment
symbol
(
<!--
),
the
string
value
of
the
node,
and
the
closing
comment
symbol
(
-->
).
Also,
a
trailing
#xA
is
rendered
after
the
closing
comment
symbol
for
comment
children
of
the
root
node
with
a
lesser
document
order
than
the
document
element,
and
a
leading
#xA
is
rendered
before
the
opening
comment
symbol
of
comment
children
of
the
root
node
with
a
greater
document
order
than
the
document
element.
(Comment
children
of
the
root
node
represent
comments
outside
of
the
top-level
document
element
and
outside
of
the
document
type
declaration).

Note
although
some
xml
XML
models
like
such
as
DOM
don't
distinguish
namespace
declarations
from
attributes,
Canonicalization
needs
to
treat
them
separately.
In
this
document
Attribute
document,
attribute
nodes
that
are
actually
namespace
declarations
are
referred
as
"Namespace
Nodes",
"namespace
nodes",
other
attributes
are
called
"Attribute
"attribute
nodes".

2.4
The
Need
for
Exclusive
XML
Canonicalization

In
some
cases,
particularly
for
signed
XML
in
protocol
applications,
there
is
a
need
to
canonicalize
a
subdocument
in
such
a
way
that
it
is
substantially
independent
of
its
XML
context.
This
is
because,
in
protocol
applications,
it
is
common
to
envelope
XML
in
various
layers
of
message
or
transport
elements,
to
strip
off
such
enveloping,
and
to
construct
new
protocol
messages,
parts
of
which
were
extracted
from
different
messages
previously
received.
If
the
pieces
of
XML
in
question
are
signed,
they
need
to
be
canonicalized
in
a
way
such
that
these
operations
do
not
break
the
signature
but
the
signature
still
provides
as
much
security
as
can
be
practically
obtained.

2.4.1
A
Simple
Example

As
a
simple
example
of
the
type
of
problem
that
changes
in
XML
context
can
cause
for
signatures,
consider
the
following
document:

The
first
document
above
is
in
canonical
form.
But
assume
that
document
is
enveloped
as
in
the
second
case.
The
subdocument
with
elem1
as
its
apex
node
can
be
extracted
from
this
second
case
with
an
XPath
expression
such
as:

/descendant::n1:elem1

/descendant::n1:elem1

The
result
of
performing
inclusive
canoicalization
canonicalization
to
the
resulting
xml
subset
is
the
following
(except
for
line
wrapping
to
fit
this
document):

Note
that
the
n0
namespace
has
been
included
by
inclusive
canoncalization
canonicalization
because
it
includes
namespace
context.
This
change
which
would
break
a
signature
over
elem1
based
on
the
first
version.

2.4.2
General
Problems
with
re-Enveloping

As
a
more
complete
example
of
the
changes
in
canonical
form
that
can
occur
when
the
enveloping
context
of
a
document
subset
is
changed,
consider
the
following
document:

However,
although
elem2
is
represented
by
the
same
octet
sequence
in
both
pieces
of
external
XML
above,
the
Canonical
XML
version
of
elem2
from
the
second
case
would
be
(except
for
line
wrapping
so
it
will
fit
into
this
document)
as
follows:

Note
that
the
change
in
context
has
resulted
in
lots
of
changes
in
the
subdocument
as
serialized
by
the
inclusive
canonicalization.
In
the
first
example,
n0
had
been
included
from
the
context
and
the
presence
of
an
identical
n3
namespace
declaration
in
the
context
had
elevated
that
declaration
to
the
apex
of
the
canonicalized
form.
In
the
second
example,
n0
has
gone
away
but
n2
has
appeared,
n3
is
no
longer
elevated,
and
an
xml:space
declaration
has
appeared,
due
to
changes
in
context.
elevated.
But
not
all
context
changes
have
effect.
In
the
second
example,
the
presence
at
ancestor
nodes
of
an
xml:lang
and
the
n1
prefix
namespace
declaration
have
no
effect
because
of
existing
declarations
at
the
elem2
node.

On
the
other
hand,
using
Exclusive
canonicalization
with
xmlLangAncestors="none"
and
xmlSpaceAncestors="none"
the
physical
form
of
elem2
as
extracted
by
the
XPath
expression
above
is
(except
for
line
wrapping
so
it
will
fit
into
this
document)
as
follows:

2.5
Namespace
Processing

As
part
of
the
canonicalization
process,
while
traversing
the
subtree,
use
the
following
algorithm
to
look
at
all
the
namespace
declarations
in
an
element,
and
decide
which
ones
to
output.

2.5.1
Namespace
concepts

The
following
concepts
are
used
in
Namespace
processing:

Explicit
and
Implicit
namespace
declarations

In
DOM,
there
is
no
special
node
for
namespace
declarations,
they
are
just
present
as
regular
Attribute
attribute
nodes.
An
"explicit"
namespace
declaration
is
an
Attribute
attribute
node
whose
prefix
is
"xmlns"
and
whose
locaName
localName
is
the
prefix
begin
being
declared.
DOM
also
allows
declaring
a
namespace
"implicitly",
i.e.
if
a
new
DOM
element
or
attribute
is
constructed
using
the
createElementNS
and
createAttributeNS
methods,
then
DOM
adds
a
namespace
declaration
automatically
when
serializing
the
document.

Apex
nodes

An
apex
node
is
an
element
node
in
a
document
subset
having
no
element
node
ancestor
in
the
document
subset.

Default
namespace

The
default
namespace
is
declared
by
xmlns="..."
.
To
make
the
algorithm
simpler
this
will
be
treated
as
a
namespace
declaration
whose
prefix
value
is
""
i.e.
an
empty
string.

Visibility
utilized

This
concept
is
required
for
exclusive
canonicalization.
An
element
E
in
the
document
subset
visibly
utilizes
a
namespace
declaration,
i.e.
a
namespace
prefix
P
and
bound
value
V
,
if
any
of
the
following
conditions
are
true:

The
element
E
itself
has
a
qualified
name
that
uses
the
prefix
P
.
(Note
if
an
element
does
not
have
a
prefix,
that
means
it
visibily
visibly
utilizes
the
default
namespace.)

OR
The
element
E
is
among
those
enumerated
for
the
QNameAware
option,
and
the
QName
or
CURIE
value
of
the
element
uses
the
prefix
P
(or,
lacking
a
prefix,
it
visibly
utilizes
the
default
namespace)

OR
An
attribute
A
of
that
element
has
a
qualified
name
that
uses
the
prefix
P
,
and
that
attribute
is
not
in
the
exclusion
list.
(Note:
unlike
elements,
if
an
attribute
doesn't
have
a
prefix,
its
that
means
it
is
a
locally
scoped
attribute.
It
does
NOT
mean
that
the
attribute
visibily
visibly
utilizes
the
default
namespace.)

The
parameter
OR
An
attribute
xsiTypeAware
A
of
that
element
is
true,
and
among
those
enumerated
for
the
element
has
an
xsi:type
QNameAware
attribute,
option,
and
this
attribute's
the
QName
or
CURIE
value
of
the
attribute
uses
this
the
prefix
P
.
(or,
lacking
a
prefix,
it
visibly
utilizes
the
default
namespace)

OR
(TBD)
Some
special
attribute
or
text
nodes
maybe
have
an
XPath,
e.g.
the
IncludedXPath
and
ExcludedXPath
attributes
in
an
XML
Signature
2.0
Transform.
Any
prefixes
used
in
this
XPath
expression
are
considered
to
be
visibility
utilized.

Use
the
following

2.5.2
Namespace
processing
algorithm
to

Step
1:
At
first
determine
the
namespaces
to
be
output
for
an
element
E
.

Find
a
list
of
namespace
declarations
that
are
in
scope
for
this
element
E
by
looking
at
both
implicit
and
explicit
namespace
declarations
in
this
element
and
its
ancestors.

If
in
this
list,
any
of
the
namespace
declaration
has
already
been
output
during
the
canonicalization
of
one
of
the
element
E
's
ancestors,
say
E
j
,
and
has
not
been
redeclared
since
then
to
a
different
value,
i.e
not
been
redeclared
by
an
element
between
E
i
j
and
E
,
then
remove
it
from
this
list.

Of
this
list,
check
if
there
are
any
prefixes
that
are
to
be
processed
in
exclusive
mode.
This
is
indicated
by
parameter
exclusiveMode="true"
ExclusiveMode="true"
and
this
prefix
being
absent
from
parameter
inclusiveNamespacePrefixList
InclusiveNamespaces
.
For
the
prefixes
that
are
to
be
treated
in
exclusive
mode,
check
if
the
prefix
is
visibily
visibly
utilized
by
this
element
E
,
and
if
it
is
not
then
remove
it.

Return
the
list
of
namespace
declarations
left
on
the
list.

Step
2:
If
the
prefixRewrite
PrefixRewrite
option
is
specified,
set
to
other
than
"none",
then
compute
new
prefixes
for
all
the
namespaces
declarations
in
this
list,
except
the
prefixes
starting
with
"xml",
as
follows:

For
prefixRewrite="sequential"
PrefixRewrite="sequential"
sort
this
list
of
namespace
declarations
by
URI.
Then
assign
a
new
prefix
value
"nN"
to
each
prefix,
incrementing
the
value
of
N
for
every
prefix.
The
counter
should
be
set
to
0
in
the
beginning
of
the
canonicalization.
(E.g.
if
the
value
of
this
counter
was
5
when
the
traversal
reached
this
element,
and
this
element
had
3
prefixes
to
be
output,
then
use
the
prefixes
"n5",
"n6",
"n7"
and
set
the
counter
to
8
after
that).

For
prefixRewrite="digest"
PrefixRewrite="digest"
assign
new
prefix
values
"nD"
to
each
prefix
in
this
list
where
D
is
SHA1
digest
of
the
URI,
the
digest
encoded
expressed
as
a
base64
string,
and
then
hexadecimal
string
using
the
base64
chars
'/'
and
'+'
replaced
by
'_'
characters
'0'-'9'
and
'-'
'a'-'f'.
Before
digesting,
the
URI
should
be
converted
to
achieve
XML
name
rules.
octets
using
US-ASCII
encoding.

The
"sequential"
mode
of
prefix
rewriting
has
the
advantage
of
a
smaller
canonicalization
output
than
the
"digest"
mode,
but
the
downside
is
that
it
may
result
in
different
namespace
prefixes
in
different
contexts,
see
the
example
below.
With
the
"digest"
mode
the
namespace
prefixes
will
be
identical
across
documents
and
contexts.
Note:
with
prefix
rewriting
the
default
namespace
is
never
output,
i.e.
it
is
also
rewritten
into
a
new
prefix.

Note:
with
exclusive
canonicalization
namespace
declarations
and
output
only
when
they
are
utilized,
this
may
lead
to
one
declaration
being
output
multiple
times,
and
if
prefixRewrite
PrefixRewrite
parameter
is
set
to
sequential,
it
may
be
rewritten
to
a
different
value
every
time.

Step
3:
If
sortAttributes="true"
SortAttributes="true"
which
is
the
default,
then
sort
this
list
of
namespaces
by
as
follows:
In
case
of
PrefixRewrite="none"
sort
the
namespace
declaration
in
lexicographic(ascending)
order
of
prefixes
(the
default
namespace
declaration
has
no
prefix,
so
it
is
lexicographically
least).
In
case
of
PrefixRewrite="sequential"
or
PrefixRewrite="digest"
sort
them
in
ascending
order
of
namespace
URI.

Note
how
the
"wsu"
prefix
declaration
is
present
in
wsse:Security,
but
is
not
utilized.
So
exclusive
canonicalization
will
"push
the
declaration
down"
into
<UserName>
and
<Timestamp>
where
it
is
really
used,
i.e.
the
wsu
declaration
will
be
output
twice,
once
in
<UserName>
and
another
in
<Timestamp>,
as
shown
above.

With
digest
prefix
rewriting
the
wsu
namespace
is
emitted
twice
as
well,
but
it
is
the
same
every
time.
The
downside
is
that
the
prefixes
are
very
long.

2.6
Attribute
processing

Note:
namespace
declarations
are
not
considered
as
attributes,
they
are
processed
separately
as
namespace
nodes.

Processing
the
attributes
of
an
element
E
consist
consists
of
the
following
steps:

If
E
is
an
apex
node
node,
then
examine
all
ancestor
element
nodes
along
of
E
's
ancestors
for
the
nearest
occurrences
of
simple
inheritable
attributes
in
the
xml
namespace,
such
as
xml:lang
and
xml:space
that
are
not
already
there
present
in
E
's
attributes.
Then
temporily
temporarily
add
these
attributes
to
E
's
attribute
list.
(Do
this
step
only
if
the
parametes
xmlSpaceAncestors
and
parameter
xmlLangAncestors
XmlAncestors
are
is
set
to
inherit.)
"inherit".)

The
xml:base
attribute
is
not
a
simple
inheritable
attribute
and
requires
special
processing
beyond
a
simple
redeclaration.
Collect
the
values
of
xml:base
for
all
of
E
's
ancestors,
starting
with
the
document
root
element,
and
including
E
itself
into
an
ordered
list.
If
there
are
two
or
more
values
in
the
list,
combining
then
combine
them
two
at
a
time
starting
from
the
beginning,
using
the
join-URI-references
function.
E.g.
if
the
list
has
X
1
,
X
2
,
...
X
m
,
the
then
join
X
1
and
X
2
first,
then
join
the
result
with
X
3
amd
and
so
on.
(Do
this
step
only
if
the
parameter
xmlBaseAncestors
XmlAncestors
is
set
to
"combine").
"inherit").

Ignore
any
attributes
that
are
present
in
the
exclusion
list.
However
note
that
namespace
nodes
and
xml:
attributes
cannot
be
excluded.

Sort
all
the
attribute
lexicographically
(increasing)
in
increasing
lexicographic
order
with
namespace
URI
as
the
primary
key
and
local
name
as
the
secondary
key
(an
empty
namespace
URI
is
lexicographically
least).

If
the
prefixes
are
rewritten
PrefixRewrite
option
is
set
to
other
than
"none",
modify
the
QNames
for
the
attribute
name
to
use
the
new
prefixes.
Also
Also,
if
the
attribute
is
among
those
enumerated
for
the
xsi:type
and
xsiTypeAware
QNameAware
is
set,
option,
then
change
its
QName
or
CURIE
value
to
use
the
new
prefix.

2.7
join-URI-References
function

The
join-URI-References
function
takes
xml:base
attribute
values
from
all
the
ancestor
elements
and
combines
it
them
to
create
a
value
for
an
updated
xml:base
attribute.
A
simple
method
for
doing
this
is
similar
to
that
found
in
sections
5.2.1,
5.2.2
and
5.2.4
of
RFC
3986
with
the
following
modifications:

The
scheme
component
is
not
required
in
the
base
URI
(Base).
(i.e.
Base.scheme
may
be
null)

Replace
a
trailing
".."
segment
with
"../"
segment
before
processing.

Section
5.2.4.
"Remove
Dot
Segments"
is
modified
as
follows:

Keep
leading
"../"
segments

Replace
multiple
consecutive
"/"
characters
with
a
single
"/"
character.

Append
a
"/"
character
to
a
trailing
".."
segment

The
"Remove
Dot
Segments"
algorithm
is
modified
to
ensure
that
a
combination
of
two
xml:base
attribute
values
that
include
relative
path
components
(i.e.,
path
components
that
do
not
begin
with
a
'/'
character)
results
in
an
attribute
value
that
is
a
relative
path
component.

Canonical
XML
2.0
takes
many
supports
a
set
of
parameters,
these
are
listed
as
enumerated
in
Canonicalization
Parameters
.
All
parameters
are
optional
and
have
default
values.
When
used
in
conjunction
with
the
<ds:CanonicalizationMethod>
element,
each
parameter
is
expressed
with
a
dedicated
child
element.
They
can
be
present
in
any
order.
Here
is
the
A
schema
definition
for
them:
each
parameter
follows:

3.2
Use
of
Canonical
XML
2.0
in
XML
Encryption
1.1

Canonical
XML
2.0
may
also
be
used
in
XML
Encryption
1.1,
with
changes
as
noted
in
the
non-normative
section
"Serializing
XML"
of
XML
Encryption
1.1
[
XMLENC-CORE1
].

4.
Pseudocode

This
section
presents
the
entire
canonicalization
algorithm
in
psuedo
pseudo
code.
It
is
not
normative.

4.1
canonicalize()

Top
level
canonicalize
function.
canonicalize(list of subtree, list of exclusion elements and attributes, properties)
{
put the exclusion elements and attributes in hash table for easier lookup
sort the multiple subtrees by document order
for each subtree
canonicalizeSubtree(subtree)
}

canonicalize(list of subtree, list of exclusion elements and attributes, properties)
{
put the exclusion elements and attributes in hash table for easier lookup
sort the multiple subtrees by document order
for each subtree
canonicalizeSubtree(subtree)
}

hasBeenOutput
a
boolean
flag
that
indicates
whether
tha
that
namespace
declaration
has
been
output

newPrefix
the
rewritten
value
of
the
prefix.

At
the
beginning
of
the
canoncalization
canonicalization
initialize
this
to
contain
only
entry
-
the
default
namespace
mapped
to
an
empty
URI,
and
hasBeenOutput
=
true.
A
prefix
value
of
""
can
be
used
to
denote
the
default
namespace.

xmlattribContext:
xmlattribContext
is
a
hash
table
of
name
->
value
.

canonicalizeSubtree(node)
{
initialize namespaceContext to contain the default prefix, mapped
to an empty URI, and hasBeenOutput to true
if (node is the document node or a document root element)
{
// (whole document is being processed, no ancestors to worry about)
call processNode(node, namespaceContext)
}
else
{
starting from the element, walk up the tree to collect a list of
ancestors
for each of this ancestor elements starting with the document
root, but not including the element itself
addNamespaces(ancestorElem, namespaceContext)
initialize xmlattribContext to empty
for each of this ancestor elements starting with the document
root, and also including the element itself
addXmlattribs(ancestorElem, xmlattribContext)
if there are any attributes in xmlattribContext
temporarily add/replace these XML attributes in node
processNode(node, namspaceContext)
restore the original XML attributes
}
}

canonicalizeSubtree(node)
{
initialize namespaceContext to contain the default prefix, mapped
to an empty URI, and hasBeenOutput to true
if (node is the document node or a document root element)
{
// (whole document is being processed, no ancestors to worry about)
call processNode(node, namespaceContext)
}
else
{
starting from the element, walk up the tree to collect a list of
ancestors
for each of this ancestor elements starting with the document
root, but not including the element itself
addNamespaces(ancestorElem, namespaceContext)
initialize xmlattribContext to empty
for each of this ancestor elements starting with the document
root, and also including the element itself
addXMLAttributes(ancestorElem, xmlattribContext)
if there are any attributes in xmlattribContext
temporarily add/replace these XML attributes in node
processNode(node, namspaceContext)
restore the original XML attributes
}
}

4.5
processElement()

Process
an
Element
Node.
processElement(element, namespaceContext)
{
if this exists in the exclusion hash table
return
make of copy of xmlattribContext and namespaceContext
//(by copying, any changes made can be undone when this function returns)
nsToBeOutputList = processNamespaces(element, namespaceContext)
output('<')
if prefixRewrite is sequential or digest, temporatily modify the QName to have the new prefix value as determined from the namespaceContext
output(element QName)
for each of the namespaces in the nsToBeOutputList
output this namespace declaration
sort each of the non namespaces attributes by URI first then attribute name.
output each of these attributes with original QName or a modifiedQName if prefixRewrite is true
output('>')
Loop through all child nodes and call
processNode(child, namespaceContext)
output('</')
output(element QName)
output('>')
restore xmlattribContext and namespaceContext
}

processElement(element, namespaceContext)
{
if this exists in the exclusion hash table
return
make of copy of xmlattribContext and namespaceContext
//(by copying, any changes made can be undone when this function returns)
nsToBeOutputList = processNamespaces(element, namespaceContext)
output('<')
if PrefixRewrite is sequential or digest, temporatily modify the QName to have the new prefix value as determined from the namespaceContext
output(element QName)
for each of the namespaces in the nsToBeOutputList
output this namespace declaration
sort each of the non namespaces attributes by URI first then attribute name.
output each of these attributes with original QName or a modifiedQName if PrefixRewrite is true
output('>')
Loop through all child nodes and call
processNode(child, namespaceContext)
output('</')
output(element QName)
output('>')
restore xmlattribContext and namespaceContext
}

4.6
processText()

Process
an
Text
Node.
processTextNode(textNode)
{
if this text node is outside document root
return
in the text replace
all ampersands by &amp;,
all open angle brackets (<) by &lt;,
all closing angle brackets (>) by &gt;,
and all #xD characters by &#xD;.
If trimTextNode is true and there is no xml:space=preserve declaration in scope
trim leading and trailing space
output(text)
}

processText(textNode)
{
if this text node is outside document root
return
in the text replace
all ampersands by &amp;,
all open angle brackets (<) by &lt;,
all closing angle brackets (>) by &gt;,
and all #xD characters by &#xD;.
If TrimTextNodes is true and there is no xml:space="preserve" declaration in scope
trim leading and trailing space
output(text)
}

Note:
The
DOM
parser
might
have
split
up
a
long
text
node
into
multiple
adjacent
text
nodes,
some
of
which
may
be
empty.
In
that
case
be
careful
when
trimming
the
leading
and
trailing
space
-
the
net
result
should
be
same
as
if
it
the
adjacent
text
nodes
were
concatenated
into
one

4.9
addNamespaces()

Add
namespaces
from
this
element
to
the
namespace
context.
This
function
is
called
for
every
ancestor
element,
and
also
at
every
element
of
the
subtrees
(minus
the
exclusion
elements).
addNamespaces(element, namespaceContext)
{
for each the explicit and implicit namespace declarations in the element
{
if there is already a declaration for this prefix, and this
declaration is different from existing declaration
overwrite the URI , and set hasBeenOutput to false
if there is no entry for this prefix
add an entry for this URI, and hasBeenOutout to false
}
}

addNamespaces(element, namespaceContext)
{
for each the explicit and implicit namespace declarations in the element
{
if there is already a declaration for this prefix, and this
declaration is different from existing declaration
overwrite the URI , and set hasBeenOutput to false
if there is no entry for this prefix
add an entry for this URI, and hasBeenOutout to false
}
}

4.10
processNamespaces()

Process
the
list
of
namespaces
for
this
element.
processNamespaces(element, namespaceContext)
{
addNamespaces(element, namespaceContext)
initialize nsToBeOutputList to empty list
for each prefix in the namespaceContext for which hasBeenOutput is false
{
if exclusiveMode and this prefix is not in the inclusiveNamespacesList
{
if the prefix is visibily utilized by this element
add the prefix to the nsToBeOutputList and set
hasBeenOutput to true
}
else
add the prefix to the nsToBeOutputList and set hasBeenOutput to true
}
if (prefixRewrite is none)
{
sort the nsToBeOutputList by the prefix
}
else if (prefixRewrite is sequential)
{
sort the nsToBeOutputList by URI
assign new prefix values "nN" to each prefix in this
nsToBeOutputList where N represents an incremented counter value ,
i.e. n0, n1, n2 ..
// the counter should be set to 0 in the beginning of the canonicalization
// note: prefix numbers are assigned in the order that the
prefixes are present in nsToBeOutputList
}
else if (prefixRewrite in digest)
{
sort the nsToBeOutputList by URI
assign new prefix values "nD" to each prefix in this nsToBeOutputList where
D represents the SHA1 digest of the URI represented as a Base64
string
// refer to presentation by Ed Simon
}
return nsToBeOutputList
}

processNamespaces(element, namespaceContext)
{
addNamespaces(element, namespaceContext)
initialize nsToBeOutputList to empty list
for each prefix in the namespaceContext for which hasBeenOutput is false
{
if ExclusiveMode and this prefix is not in the inclusiveNamespacesList
{
if the prefix is visibly utilized by this element
add the prefix to the nsToBeOutputList and set
hasBeenOutput to true
}
else
add the prefix to the nsToBeOutputList and set hasBeenOutput to true
}
if (PrefixRewrite is none)
{
sort the nsToBeOutputList by the prefix
}
else if (PrefixRewrite is sequential)
{
sort the nsToBeOutputList by URI
assign new prefix values "nN" to each prefix in this
nsToBeOutputList where N represents an incremented counter value ,
i.e. n0, n1, n2 ..
// the counter should be set to 0 in the beginning of the canonicalization
// note: prefix numbers are assigned in the order that the
prefixes are present in nsToBeOutputList
}
else if (PrefixRewrite in digest)
{
sort the nsToBeOutputList by URI
assign new prefix values "nD" to each prefix in this nsToBeOutputList where
D represents the SHA1 digest of the URI represented as a hex string
}
return nsToBeOutputList
}

4.11
addXMLAttribute()
addXMLAttributes()

Combine/modify
the
4
3
special
xml
attributes:
xml:id,
xml:lang,
xml:space
and
xml:base.
addXMLAttribute(element, xmlattribContext)
{
for each of the xml: attributes of this element
{
case xml:id attribute:
if xmlIdAncestors is inherit then store this attribute value, else do nothing
case xml:lang attribute
if xmlLangAncestors is inherit then store this attribute value, else do nothing
case xml:space attribute
if xmlSpaceAncestors is inherit then store this attribute value, else do nothing
case xml:base attribute
if xmlBaseAncestors is inherit then store this attribute value,
else if xmlBaseAncestors is combine, and there is a previous value of xml:base
then do a "join-URI-References" to combine the new value and the old value
else do nothing
}
}

addXMLAttributes(element, xmlattribContext)
{
for each of the xml: attributes of this element
{
case xml:lang attribute
if XmlAncestors is inherit then store this attribute value, else do nothing
case xml:space attribute
if XmlAncestors is inherit then store this attribute value, else do nothing
case xml:base attribute
if XmlAncestors is inherit, and there is a previous value of xml:base
then do a "join-URI-References" to combine the new value and the old value
else do nothing
}
}

5.
Output
rules

The
document
is
encoded
in
UTF-8
UTF-8.

Line
breaks
normalized
to
#xA
on
input
(automatically
done
by
a
DOM
parser)

Attribute
values
are
normalized,
if
ignoreDTD
is
false
normalized.

Character
and
parsed
entity
references
are
replaced
replaced.

CDATA
sections
are
replaced
with
their
character
content
content.

The
XML
declaration
and
document
type
declaration
are
removed
removed.

Empty
elements
are
converted
to
start-end
tag
pairs
pairs.

Whitespace
outside
of
the
document
element
and
within
start
and
end
tags
is
normalized
normalized.

Special
characters
in
attribute
values
and
character
content
are
replaced
by
character
references
references.

Default
attributes
are
added
to
each
element,
if
ignoreDTD
is
false
element.

6.
Other
ideas
considered

Qnames
in
content:
Have
another
parameter
listing
other
element
/
attribute
names
that
can
have
QNames,
besides
xsi:type.
Or
simply
search
all
text
content
for
QName.

Significant
white
space:
Have
a
parameters
listing
elements
in
which
whitespace
is
significant.
Instead
of
listing
individual
element
names,
and
entire
target
namespace
URI
can
be
specified,
e.g.
in
many
elements
in
xhtml
namespace
whitespace
is
significant

7.
Processing
model
for
Streaming
XML
parsers

Unlike
DOM
parsers
which
represent
XML
document
as
a
tree
of
nodes,
streaming
parsers
represent
an
XML
document
as
stream
of
events
like
"start-element",
"end-element",
"text"
etc.
A
document
subset
can
also
be
represented
as
a
stream
of
events.
This
stream
of
events
in
exactly
in
the
same
order
as
a
tree
walk,
so
the
above
canonicalization
algorithm
can
be
also
used
to
canonicalize
an
event
stream.

A.
Remove
Dot
segments

The
following
informative
table
outlines
example
results
of
the
modified
Remove
Dot
Segments
algorithm
described
in
Section
2.4.
join-URI-references
.

B.
References

Dated
references
below
are
to
the
latest
known
or
appropriate
edition
of
the
referenced
work.
The
referenced
works
may
be
subject
to
revision,
and
conformant
implementations
may
follow,
and
are
encouraged
to
investigate
the
appropriateness
of
following,
some
or
all
more
recent
editions
or
replacements
of
the
works
cited.
It
is
in
each
case
implementation-defined
which
editions
are
supported.