DOM Level 3 Load and Save Issues List

This document contains a list of issues regarding the DOM Level 3
Load and Save specification Last Call period. All comments or
issues regarding the specification or this document must be
reported to www-dom@w3.org (public
archives) before July 31, 2003. After this
date, we don't guarantee to take your comments into account before
moving to Candidate Recommendation.

Acknowledgment cycle

This interface was called DOMBuilder in the earlier
version(s) of the spec. Is there any specific reason why
the name is changed to DOMParser. The name change to
"DOMParser" is confusing to our users since we already
have a public class called DOMParser
(oracle.xml.parser.v2.DOMParser) and from a quick google
search, it looks like Xerces might also have one (namely
org.apache.xerces.parser.DOMParser ) If there is no
"specific" reason for changing the name to DOMParser, it
will be preferred if the name is changed back to
DOMBuilder.

Alternatively, the interface could be changed to
DOMParserLS or DOMBuilderLS (consistent with
DOMImplementationLS, DocumentLS etc).

Acknowledgment cycle

"Asynchronous DOMParser objects are expected to also
implement the events::EventTarget interface so that
event listener can be registered on asynchronous
DOMParser objects."

It will be much cleaner and clearer if DOMParser extends
events:EventTarget interface instead of expecting the
implementation to extend and support EventTarget. It
could be argued that synchronous DOMParser is not
required to implement the events::EventTarget and so it
should not be a forced to implement one. In that case, a
possible solution is to have a generic DOMParser
interface and two other interfaces namely
DOMParserSynchronous and DOMParserAsynchronous which
extends DOMParser. Then the DOMParserAsynchronous could
be made to implement the events::EventTarget interface.

Acknowledgment cycle

The spec is not very clear when the progress events are
fired. Probably, the spec should include some scenarios
when the progress event should be fired or should
include a sentence saying that signaling of progress
events is implementation dependent.

Transition history

The spec is update to clearly state that this is
implementation dependent. In addition to that, the spec
now also includes an example of how an implementation
*might* dispatch progress events, but that's just an
example.

Acknowledgment cycle

It is not clarified how parseWithContext interacts with
the DOMBuilderFilter/DOMParserFilter and its very own
passed ACTION TYPE. Which one gets precedance? Or will
the filter be ignored and interpreded as accept?

Transition history

The WG found numerous problems with the way this error
was defined. The spec now defines an implementation
dependent "unbound-prefix-in-entity" warning on
DOMParser, and a fatal
"unbound-prefix-in-entity-reference" error in
DOMSerializer.

Acknowledgment cycle

1.1 says DOMParserFilter filters only elements, while
1.3 says all kinds of nodes (e.g. attributes and text
nodes) can be filtered. Which is right? The preferred
answer is that of 1.3. Please fix this in the spec.

Transition history

Acknowledgment cycle

1.1 says DOMSerializerFilter can be used to filter out
nodes, but 1.3 says that only elements can be
filtered. Why doesn't this interface include attributes?
An example of a use case: in XForms the 'relevant'
attribute can be set to false on a attribute, which
removes it from the serialization. Please fix this so
that also attributes and text nodes can be filtered out.

Acknowledgment cycle

Why are these interfaces optional? If the claim is right
that they are just convenience methods, they should be
trivial to implement. For users it will be a pain to
check whether an implementation supports these
interfaces. Please fix this by making them mandatory.

Transition history

Acknowledgment cycle

In several places (1.2.3, 1.2.4, DOMInput, DOMOutput),
it is said that UTF-16 is defined in [Unicode] and
Amendment 1 of [ISO/IEC 10646]. That last part is
obsolete, UTF-16 was defined in Amd 1 of 10646:1993, but
integrated in an Appendix of 10646:2000. Just say
"...in [Unicode] and in [ISO/IEC 10646]".

Acknowledgment cycle

In interface DOMParser, 1st bullet after 3rd para, it is
wrong to claim that CDATA sections are structure. It
also seems wrong to set expectations that CDATA sections
will show up after parsing when in fact parsers are not
required to report them.

Acknowledgment cycle

In interface DOMParser, in the description of the
"unbound-namespace-in-entity" warning, how can an
unbound prefix be found in an entity *declaration*?
Perhaps you mean in an entity's replacement text?

Acknowledgment cycle

In interface DOMInput, it says "The DOMParser will use
the DOMInput object to determine how to read data. The
DOMParser will look at the different inputs specified in
the DOMInput in the following order to know which one to
read from, the first one through which data is available
will be used: "

It is not clear how the DOMParser does that, i.e. how it
determines if data is available. Is there an
expectation that, say, DOMInput.characterStream will be
null if data is not available there? What about
stringData? Null or empty? Is this binding-specific?

Acknowledgment cycle

In interface DOMSerializer, the statement "For all other
types of nodes the serialized form is not specified, but
should be something useful to a human for debugging or
diagnostic purposes." seems a bit weak. It should be
possible to specify more, especially for Element nodes.

Transition history

The WG discussed this but decided not to attempt to
clarify this further in the spec. In stead, the WG chose
to replace the above sentence with "For all other types
of nodes the serialized form is implementation
dependent.".

Acknowledgment cycle

In interface DOMSerializer, method writeURI(), it would
be desirable to specify more how to write to a URI, at
least for very common schemes such as HTTP(S) and
mailto.

In HTTP, it would seem desirable to actually be able to
choose which verb (POST or PUT) is used. POST is
supposed to be used when posting forms, which XForms
does with XML data. PUT is supposed to be used for
uploading data, here an XML document. The DOM user
should be able to specify which to use, perhaps using an
additional parameter to the method.

The spec should also specify to include a Content-Type
header with a media type (which? need a parameter to the
method?) and a charset parameter.

Transition history

The DOM WG discussed this issue and decided to specify
that when writing to a HTTP URI, a HTTP PUT is always
performed. For other typs of URIs, the mechanism for
writing the data to the URI is implementation
dependent. The WG did not want to extend the API to let
the user specify a content type, though it was decided
to make the spec state that the implementation is
responsible of associating the appropriate media type
with the serialized data. As for charset, use
DOMSerializer.write() and specify the charset in the
DOMOutput. (DOMSerializer.writeURI() is now simply a
convenience method that acts as if calling write(),
passing the uri using the DOMOutput argument).

Acknowledgment cycle

In interface DOMOutput, the descriptions of encoding and
systemID seem to have been more or less copy-pasted from
DOMInput, not fully taking into account the fact that
output is involved, not input. Setting encoding
indicates an intention, not a knowledge of the encoding
of some existing data.

Transition history

The WG discussed this and if anything, the systemId is
relative to the caller's current location, but whether
or not that's possible, and what that means, is
implementation dependent. Therefore, the spec remains
unchanged.

Acknowledgment cycle

Interface DOMParser: character normalization checking is
now controlled by the "check-character-normalization"
parameter of DOMCOnfiguration defined in Core. The fact
that the "true" value (do check) is marked as [optional]
(not the default, not even required to implement) is not
acceptable. Whereas Charmod says that normalization
SHOULD be checked, users are not even able to check if
the "true" value is not implemented. Furthermore, the
DocumentLS.load() and loadXML() methods automatically do
the wrong thing and have no way to do the right thing if
the default is false.

Transition history

Users *are* able to check if the "true" value is
implemented or not. Using the DOMConfiguration object, a
user can call
config.canSetParameter("check-character-normalization",
true), and that will tell them if the implementation
supports character normalization checking. The DOM Level
3 Load and Save (and DOM Level 3 Core) specs do not
*require* that implementations *must* support character
normalization.

In the discussion of interface DOMSerializer (above the
IDL definition), it would be nice if character
references were specified to be hexadecimal (preferred)
or decimal. One way or the other determined by the
spec, not implementation-dependent. Similarly (still
within DOMSerializer), it would be better to specify
serialization of attribute values to be always in quotes
(or apostrophes, you choose), with escaping as
necessary.

Transition history

The DOM WG discussed this before, and the WG has always
decided against doing this. If you want canonicalized
output, set the "canonical-form" parameter, if not,
you'll get implementation dependent output.

Acknowledgment cycle

Reluctantly accepted. Given the apparently zero
implementation burden of choosing one way or the other
in the spec, one wonders why the WG resists this. Of
course, the benefit is not great either, but given the
rather severe under-specification of serializing
anything but Documents and Entities, any amount of
predictability would seem desirable...

We would appreciate a at least some text encouraging
implementers to use hex for character references,
since that is what all character encoding standards
use.

One of the reasons the this request was rejected is
that the WG wants existing DOM serializers be
wrappable in an LS serializer w/o changes to the
existing serializer (which may or my not be in control
of who's wrapping it in an LSSerializer interface) and
still be able to claim compliance (which wouldn't be
possible if the existing serializer character
references in a way that didn't follow what's required
by the LS spec).

Text encouraging implementers to use hex for character
references was inserted.

In DOMSerializer, the contents of the encoding
pseudo-attribute of the XML (or text) declaration is
underspecified. It should be specified that this MUST
be the actual encoding that is used for output, whatever
the source that determined that was.

Acknowledgment cycle

In DOMSerializer, method writeURI(): there is no way to
control the encoding that will be used to output. The
method itself doesn't have a parameter, and the order of
priorities is Document.actualEncoding followed by
Document.xmlEncoding. Document.actualEncoding being
read-only, the user has no way to specify the output
encoding, except if by chance Document.actualEncoding is
null. There should be an additional "encoding"
parameter (nullable, to fall back to actualEncoding and
xmlEncoding) to the method.

Transition history

DOMSerializer.writeURI() is merely a convenience method
(and is now defined as such), if you need to pass
encoding information when writing to a IRI, use
DOMSerializer.write() and set the encoding on the
DOMOutput.

Acknowledgment cycle

Please reconsider this one. It seems to be asking for
non-compatibility of code. I think a minimum of one
encoding should be required for all implementations,
preferably UTF-8; and I really don't think it would be
that onerous to require all three.

While this is sufficient for strict interoperability,
it is not for compatibility of code. If there is not
at least one required encoding, it is not possible to
write a DOM program that will work over any DOM
implementation. We insist that at least UTF-8 be
required. Furthermore, since XML 1.0 did it back in
1998, it cannot be so onerous to require all 3.
Please reconsider.

In DocumentLS.load(), it is said that 'the parameters
used in the DOMParser interface are assumed to have
their default values with the exception that the
parameters "entities", "normalize-characters",
"check-character-normalization" are set to "false".',
which is strange as the last 2 of these parameters do
default to false anyway.
"check-character-normalization" should default to true
(see other comment).