1.1. Overview of the DOM Core Interfaces

This section defines a minimal set of objects and
interfaces for accessing and manipulating document objects.
The functionality specified in this section (the
Core functionality) should be sufficient to allow
software developers and web script authors to access and
manipulate parsed HTML and XML content inside conforming
products. The DOM Core API also allows population
of a Document object using only DOM API calls; creating
the skeleton Document and saving it persistently is left
to the product that implements the DOM API.

1.1.1. The DOM Structure Model

The DOM presents documents as a hierarchy of Node
objects that also implement other, more specialized interfaces. Some
types of nodes may have child nodes of various types, and others are
leaf nodes that cannot have anything below them in the document
structure. The node types, and which node types they may have as
children, are as follows:

The DOM also specifies a NodeList interface to handle
ordered lists of Nodes, such as the children of a
Node, or the elements returned by the
Element.getElementsByTagName method, and also a
NamedNodeMap interface to handle unordered sets of nodes
referenced by their name attribute, such as the attributes of an
Element. NodeLists and
NamedNodeMaps in the DOM are "live", that is, changes to
the underlying document structure are reflected in all relevant
NodeLists and NamedNodeMaps. For example, if
a DOM user gets a NodeList object containing the children
of an Element, then subsequently adds more children to
that element (or removes children, or modifies them), those changes are
automatically reflected in the NodeList without further
action on the user's part. Likewise changes to a Node in
the tree are reflected in all references to that Node in
NodeLists and NamedNodeMaps.

1.1.2. Memory Management

Most of the APIs defined by this specification are
interfaces rather than classes. That means that
an actual implementation need only expose methods with
the defined names and specified operation, not actually
implement classes that correspond directly to the interfaces.
This allows the DOM APIs to be implemented as a thin veneer on top
of legacy applications with their own data structures, or
on top of newer applications with different class hierarchies.
This also means that ordinary constructors (in the Java or C++
sense) cannot be used to create DOM objects, since the
underlying objects to be constructed may have little relationship
to the DOM interfaces. The conventional solution to this in
object-oriented design is to define factory methods
that create instances of objects that implement the various
interfaces. In the DOM Level 1, objects implementing some
interface "X" are created by a "createX()" method on the
Document interface; this is because all DOM objects live
in the context of a specific Document.

The DOM Level 1 API does not define a standard
way to create DOMImplementation or Document
objects; actual DOM implementations must provide
some proprietary way of bootstrapping these DOM interfaces, and
then all other objects can be built from the Create methods on
Document (or by various other convenience methods).

The Core DOM APIs are designed to be compatible with a wide
range of languages, including both general-user scripting languages and
the more challenging languages used mostly by professional programmers.
Thus, the DOM
APIs need to operate across a variety of memory management
philosophies, from language platforms that do not expose memory
management to the user at all, through those (notably Java) that
provide explicit constructors but provide an automatic garbage
collection mechanism to automatically reclaim unused memory,
to those (especially C/C++) that generally require the
programmer to explicitly allocate object memory, track where
it is used, and explicitly free it for re-use. To ensure a
consistent API across these platforms, the DOM does not
address memory management issues at all,
but instead leaves these for the
implementation. Neither of the explicit language bindings
devised by the DOM Working Group (for ECMAScript and Java)
require any memory management methods, but DOM bindings for
other languages (especially C or C++) probably will require
such support. These extensions will be the responsibility of
those adapting the DOM API to a specific language, not the DOM
WG.

1.1.3. Naming Conventions

While it would
be nice to have attribute and method names that are short,
informative, internally consistent, and familiar to users of
similar APIs, the names also should not clash with the names
in legacy APIs supported by DOM implementations.
Furthermore, both OMG IDL and ECMAScript have
significant limitations in their ability to disambiguate names
from different namespaces that makes it difficult to avoid naming
conflicts with short, familiar names. So, DOM names tend to be
long and quite descriptive in order to be unique across all
environments.

The Working Group has also attempted to be internally
consistent in its use of various terms, even though these may
not be common distinctions in other APIs. For example, we use
the method name "remove" when the method changes the
structural model, and the method name "delete" when the method
gets rid of something inside the structure model. The thing
that is deleted is not returned. The thing that is removed may
be returned, when it makes sense to return it.

1.1.4. Inheritance vs Flattened Views of the API

The DOM Core APIs present two somewhat different sets of
interfaces to an XML/HTML document; one presenting an "object
oriented" approach with a hierarchy of inheritance, and a
"simplified" view that allows all manipulation to be done via
the Node interface without requiring casts (in
Java and other C-like languages) or query interface calls in
COM environments. These operations are fairly expensive in Java and
COM, and the DOM may be used in performance-critical
environments, so we allow significant functionality using just the
Node interface. Because many other users will find the
inheritance hierarchy easier to understand than the
"everything is a Node" approach to the DOM, we also
support the full higher-level interfaces for those who prefer a more
object-oriented API.

In practice, this means that there is a certain amount of
redundancy in the API. The Working Group considers the
"inheritance" approach the primary view of the API, and the
full set of functionality on Node to be "extra"
functionality that users may employ, but that does not eliminate
the need for methods on other interfaces that an
object-oriented analysis would dictate. (Of course, when the
O-O analysis yields an attribute or method that is
identical to one on the Node interface, we don't
specify a completely redundant one). Thus, even though there
is a generic nodeName attribute on the Node
interface, there is still a tagName attribute on the
Element interface; these two attributes must
contain the same value, but the Working Group considers it
worthwhile to support both, given the different constituencies
the DOM API must satisfy.

1.1.5. The DOMString type

To ensure interoperability, the DOM specifies the
DOMString type as follows:

A DOMString is a sequence of 16-bit
quantities. This may be expressed in IDL terms as:

typedef sequence<unsigned short> DOMString;

Applications must encode DOMString using UTF-16
(defined in Appendix C.3 of [UNICODE] and Amendment 1 of
[ISO-10646]).The UTF-16 encoding was chosen because of its widespread
industry practice. Please note that for both HTML and XML, the document
character set (and therefore the notation of numeric character
references) is based on UCS-4. A single numeric character reference in
a source document may therefore in some cases correspond to two array
positions in a DOMString (a high surrogate and a low
surrogate). Note: Even though the DOM defines the name of the string type to
be DOMString, bindings may used different names. For,
example for Java, DOMString is bound to the
String type because it also uses UTF-16 as its
encoding.

Note: As of August 1998, the OMG IDL specification included a
wstring type. However, that definition did not meet the
interoperability criteria of the DOM API since it relied on encoding
negotiation to decide the width of a character.

1.1.6. Case sensitivity in the DOM

The DOM has many interfaces that imply string matching.
HTML processors generally assume an uppercase (less often,
lowercase) normalization of names for such things as
elements, while XML is explicitly case sensitive. For the
purposes of the DOM, string matching takes place on a character
code by character code basis, on the 16 bit value of a
DOMString. As such, the DOM assumes that any
normalizations will take place in the processor,
before the DOM structures are built.

This then raises the issue of exactly what normalizations
occur. The W3C I18N working group is in the process of
defining exactly which normalizations are necessary for applications
implementing the DOM.

1.2. Fundamental Interfaces

The interfaces within this section are considered
fundamental, and must be fully implemented by all
conforming implementations of the DOM, including all HTML DOM
implementations.

DOM operations only raise exceptions in "exceptional"
circumstances, i.e., when an operation is impossible
to perform (either for logical reasons, because data is lost, or
because the implementation has become unstable). In general, DOM methods
return specific error values in ordinary
processing situation, such as out-of-bound errors when using
NodeList.

Implementations may raise other exceptions under other circumstances.
For example, implementations may raise an implementation-dependent
exception if a null argument is passed.

Some languages and object systems do not support the concept of
exceptions. For such systems, error conditions may be indicated using
native error reporting mechanisms. For some bindings, for example, methods
may return error codes similar to those listed in the corresponding method
descriptions.

The DOMImplementation interface provides a
number of methods for performing operations that are independent
of any particular instance of the document object model.

The DOM Level 1 does not specify a way of creating a
document instance, and hence document creation is an operation
specific to an implementation. Future Levels of the DOM specification
are expected to provide methods for creating documents directly.

The package name of the feature to
test. In Level 1, the legal values are "HTML" and
"XML" (case-insensitive).

version

This is the version number of the package name to
test. In Level 1, this is the string "1.0".
If the version is not specified, supporting any version of the
feature will cause the method to return true.

Return Value

true if the feature is implemented in the specified
version, false otherwise.

DocumentFragment is a "lightweight" or
"minimal" Document object. It is very common to want to be able to
extract a portion of a document's tree or to create a new fragment of
a document. Imagine implementing a user command like cut or
rearranging a document by moving fragments around. It is
desirable to have an object which can hold such fragments and it
is quite natural to use a Node for this purpose. While it is
true that a Document object could fulfil this role,
a Document object can potentially be a heavyweight
object, depending on the underlying implementation. What is really
needed for this is a very lightweight object.
DocumentFragment is such an object.

Furthermore, various operations -- such as inserting nodes as
children of another Node -- may take
DocumentFragment objects as arguments; this
results in all the child nodes of the DocumentFragment
being moved to the child list of this node.

The children of a DocumentFragment node are zero
or more nodes representing the tops of any sub-trees defining
the structure of the document. DocumentFragment nodes do not
need to be well-formed XML documents (although they do need to
follow the rules imposed upon well-formed XML parsed entities,
which can have multiple top nodes).
For example, a DocumentFragment might have only one child and
that child node could be a Text node. Such a structure model
represents neither an HTML document nor a well-formed XML document.

When a DocumentFragment is inserted into a
Document (or indeed any other Node that may take children)
the children of the DocumentFragment and not the DocumentFragment
itself are inserted into the Node. This makes the DocumentFragment
very useful when the user wishes to create nodes that are siblings;
the DocumentFragment acts as the parent of these nodes so that the
user can use the standard methods from the Node
interface, such as insertBefore() and
appendChild().

The Document interface represents the entire
HTML or XML document. Conceptually, it is the root of the
document tree, and provides the primary access to the
document's data.

Since elements, text nodes, comments, processing instructions,
etc. cannot exist outside the context of a
Document, the Document interface also
contains the factory methods needed to create these objects.
The Node objects created have a ownerDocument
attribute which associates them with the Document within whose
context they were created.

The Document Type Declaration (see DocumentType)
associated with
this document. For HTML documents as well as XML documents without a
document type declaration this returns null. The DOM Level
1 does not support editing the Document Type Declaration, therefore
docType cannot be altered in any way.

Creates an element of the type specified. Note that the
instance returned implements the Element interface, so
attributes can be specified directly on the returned
object.

Parameters

tagName

The name of the element type to
instantiate. For XML, this is case-sensitive. For HTML, the
tagName parameter may be provided in any case,
but it must be mapped to the canonical uppercase form by
the DOM implementation.

The Node interface is the primary datatype for the
entire Document Object Model. It represents a single node in the
document tree. While all objects implementing the
Node interface expose methods for dealing with
children, not all objects implementing the Node
interface may have children. For example, Text
nodes may not have children, and adding children to such nodes
results in a DOMException being raised.

The attributes nodeName, nodeValue
and attributes are
included as a mechanism to get at node information without
casting down to the specific derived interface. In cases where
there is no obvious mapping of these attributes for a specific
nodeType (e.g., nodeValue for an Element
or attributes
for a Comment), this returns null. Note that the
specialized interfaces may contain
additional and more convenient mechanisms to get and set the relevant
information.

The parent of this node. All nodes,
except Document, DocumentFragment, and
Attr may have a parent. However, if a
node has just been created and not yet added to the tree, or if it has
been removed from the tree, this is null.

A NodeList that contains all
children of this node. If there are no children, this is a
NodeList containing no nodes. The content of the
returned NodeList is "live" in the
sense that, for instance, changes to the children of the node object
that it was created from are immediately reflected in the nodes
returned by the NodeList accessors; it is not a
static snapshot of the content of the node. This is true for every
NodeList, including the ones returned by the
getElementsByTagName method.

Returns a duplicate of this node, i.e., serves
as a generic copy constructor for nodes. The duplicate node has no
parent (parentNode returns null.).

Cloning an Element copies
all attributes and their values, including those generated by the
XML processor to represent defaulted attributes, but this method does
not copy any text it contains unless it is a deep clone, since the text
is contained in a child Text node. Cloning any other type of
node simply returns a copy of this node.

Parameters

deep

If true, recursively clone the subtree under the
specified node; if false, clone only the node itself (and
its attributes, if it is an Element).

Objects implementing the NamedNodeMap interface are
used to represent collections of nodes that can be accessed by name. Note
that NamedNodeMap does not inherit from
NodeList; NamedNodeMaps are not maintained in
any particular order. Objects contained in an object implementing
NamedNodeMap may also be accessed by an ordinal index, but
this is simply to allow convenient enumeration of the contents of a
NamedNodeMap, and does not imply that the DOM specifies an
order to these Nodes.

As the nodeName attribute is used to
derive the name which the node must be stored under, multiple
nodes of certain types (those that have a "special" string
value) cannot be stored as the names would clash. This is seen
as preferable to allowing nodes to be aliased.

Parameters

arg

A node to store in a named node map. The node will
later be accessible using the value of the
nodeName attribute of the node. If a node with that
name is already present in the map, it is replaced
by the new one.

Return Value

If the new Node replaces an existing node with the
same name the previously existing Node is returned,
otherwise null is returned.

The CharacterData interface extends Node with a set
of attributes
and methods for accessing character data in the DOM.
For clarity this set is defined
here rather than on each object that uses these attributes and methods. No DOM objects correspond directly to CharacterData,
though Text and
others do inherit the interface from it. All offsets in
this interface start from 0.

The character data of the node
that implements this interface. The DOM implementation may not
put arbitrary limits on the amount of data that may be stored in a
CharacterData node. However, implementation limits may
mean that the entirety of a node's data may not fit into a single
DOMString. In such cases, the user may call
substringData to retrieve the data in appropriately sized
pieces.

Replace the characters starting at the specified character
offset with the specified string.

Parameters

offset

The offset from which to start replacing.

count

The number of characters to replace. If the sum of
offset and count exceeds
length, then all characters to the end of the data
are replaced (i.e., the effect is the same as a
remove method call with the same range, followed
by an append method invocation).

The Attr interface represents an attribute in an Element object.
Typically the allowable values for the attribute are defined in a document
type definition.

Attr objects inherit the Node
interface, but since they are not actually child nodes of the element
they describe, the DOM does not consider them part of the document
tree. Thus, the Node attributes parentNode,
previousSibling, and nextSibling have a
null value for Attr objects. The DOM takes the
view that attributes are properties of elements rather than having a
separate identity from the elements they are associated with;
this should make it more efficient to implement
such features as default attributes associated with all elements of a
given type. Furthermore, Attr
nodes may not be immediate children of a DocumentFragment.
However, they can be associated with Element nodes contained within
a DocumentFragment.
In short, users and implementors of the DOM need to be aware that
Attr nodes have some things in
common with other objects inheriting the Node interface,
but they also are quite distinct.

The attribute's effective value is determined as follows: if this
attribute has been explicitly assigned any value, that value is the
attribute's effective value; otherwise, if there is a declaration for
this attribute, and that declaration includes a default value, then
that default value is the attribute's effective value; otherwise, the
attribute does not exist on this element in the structure model until
it has been explicitly added. Note that the nodeValue
attribute on the Attr instance can also be used to
retrieve the string version of the attribute's value(s).

In XML, where the value of an attribute can contain entity references,
the child nodes of the Attr node provide a representation in
which entity references are not expanded. These child nodes may be either
Text or EntityReference nodes. Because the
attribute type may be unknown, there are no tokenized attribute values.

If this attribute was explicitly given a value in the original
document, this is true; otherwise, it is false.
Note that the implementation is in charge of this attribute, not the
user. If the user changes the value of the attribute (even if it ends up
having the same value as the default value) then the specified
flag is automatically flipped to true. To re-specify the
attribute as the default value from the DTD, the user must delete the
attribute. The implementation will then make a new attribute available
with specified set to false and the default value
(if one exists).

In summary:

If the attribute has an assigned value in the document then
specified is true, and the value is the
assigned value.

If the attribute has no assigned value in the document and has
a default value in the DTD, then specified is false,
and the value is the default value in the DTD.

If the attribute has no assigned value in the document and has
a value of #IMPLIED in the DTD, then the attribute does not appear
in the structure model of the document.

When represented using DOM, the top node is an Element node
for "elementExample", which contains two child Element
nodes, one for "subelement1" and one
for "subelement2". "subelement1" contains no
child nodes.

Elements may have attributes associated with them; since the
Element interface inherits from Node, the generic
Node interface method getAttributes may be used
to retrieve the set of all attributes for an element. There are methods on
the Element interface to retrieve either an Attr
object by name or an attribute value by name. In XML, where an attribute
value may contain entity references, an Attr object should be
retrieved to examine the possibly fairly complex sub-tree representing the
attribute value. On the other hand, in HTML, where all attributes have
simple string values, methods to directly access an attribute value can
safely be used as a convenience.

tagName has the value
"elementExample". Note that this is
case-preserving in XML, as are all of the operations of the DOM.
The HTML DOM returns the tagName of an HTML element
in the canonical uppercase form, regardless of the case in the
source HTML document.

Adds a new attribute. If an attribute with that name is already
present in the element, its value is changed to be that of the value
parameter. This value is a simple string, it is not parsed as it is being
set. So any markup (such as syntax to be recognized as an entity
reference) is treated as literal text, and needs to be appropriately
escaped by the implementation when it is written out. In order to assign
an attribute value that contains entity references, the user must create
an Attr node plus any Text and
EntityReference nodes, build the appropriate subtree, and
use setAttributeNode to assign it as the value of an
attribute.

Puts all Text nodes in the full depth of the
sub-tree underneath this Element into a "normal" form
where only markup (e.g., tags, comments, processing instructions, CDATA
sections, and entity references) separates Text nodes,
i.e., there are no adjacent Text nodes. This can be used
to ensure that the DOM view of a document is the same as if it were
saved and re-loaded, and is useful when operations (such as XPointer
lookups) that depend on a particular document tree structure are to be
used.
This method has no parameters.
This method returns nothing.
This method raises no exceptions.

The Text interface represents the textual
content (termed character
data
in XML) of an Element or Attr.
If there is no markup inside an element's content, the text is contained
in a single object implementing the Text interface that
is the only child of the element. If there is markup, it is parsed into
a list of elements and Text nodes that form the list of
children of the element.

When a document is first made available via the DOM, there is
only one Text node for each block of text. Users may create
adjacent Text nodes that represent the
contents of a given element without any intervening markup, but
should be aware that there is no way to represent the separations
between these nodes in XML or HTML, so they will not (in general)
persist between DOM editing sessions. The normalize()
method on Element merges any such adjacent Text
objects into a single node for each block of text; this is
recommended before employing operations that depend on a particular
document structure, such as navigation with XPointers.

Breaks this Text node into two Text nodes at the
specified offset, keeping both in the tree as siblings. This node then
only contains all the content up to the offset point. And
a new Text node, which is inserted as the next sibling of
this node, contains all the content at and after the
offset point.

This represents the content of a comment, i.e., all the
characters between the starting '<!--' and
ending '-->'. Note that this is the definition
of a comment in XML, and, in practice, HTML, although some HTML
tools may implement the full SGML comment structure.

IDL Definition

interface Comment : CharacterData {
};

1.3. Extended Interfaces

The interfaces defined here form part of the DOM Level 1
Core specification, but objects that expose these
interfaces will never be encountered in a DOM implementation
that deals only with HTML. As such, HTML-only DOM
implementations do not need to have objects that implement
these interfaces.

CDATA sections are used to escape blocks of text containing
characters that would otherwise be regarded as markup. The only
delimiter that is recognized in a CDATA section is the "]]>" string
that ends the CDATA section. CDATA sections can not be
nested. The primary purpose is for including
material such as XML fragments, without needing to escape all
the delimiters.

The DOMString attribute of the
Text node holds the text that is contained by the CDATA
section. Note that this may contain characters
that need to be escaped outside of CDATA sections and that, depending on
the character encoding ("charset") chosen for serialization, it may be
impossible to write out some characters as part of a CDATA section.

The CDATASection interface inherits the
CharacterData interface through the Text
interface. Adjacent CDATASections nodes are not merged by
use of the Element.normalize() method.

Each Document has a doctype attribute
whose value is either null or a DocumentType
object. The DocumentType interface in the DOM Level 1 Core
provides an interface to the list of entities that are defined
for the document, and little else because the effect of
namespaces and the various XML scheme efforts on DTD
representation are not clearly understood as of this writing.

This interface represents a notation declared in the DTD. A notation
either declares, by name, the format of an unparsed entity (see section 4.7
of the XML 1.0 specification), or is used for formal declaration of
Processing Instruction targets (see section 2.6 of the XML 1.0
specification). The nodeName attribute inherited from
Node is set to the declared name of the notation.

The DOM Level 1 does not support editing Notation
nodes; they are therefore readonly.

This interface represents an entity, either parsed or
unparsed, in an XML document. Note that this models the entity
itself not the entity declaration. Entity
declaration modeling has been left for a later Level of the DOM
specification.

The nodeName attribute that is inherited from
Node contains the name of the entity.

An XML processor may choose to completely expand entities before
the structure model is passed to the DOM; in this case there will
be no EntityReference nodes in the document tree.

XML does not mandate that a non-validating XML processor read
and process entity declarations made in the external subset or
declared in external parameter entities. This means
that parsed entities declared in the external subset
need not be expanded by some classes of applications, and that
the replacement value of the entity may not be available. When the
replacement value is available, the corresponding
Entity node's child list represents the structure of
that replacement text. Otherwise, the child list is empty.

The resolution of the children of the Entity (the
replacement value) may be lazily evaluated; actions by the user (such as
calling the childNodes method on the
Entity Node) are assumed to trigger the evaluation.

The DOM Level 1 does not support editing Entity
nodes; if a user wants to make changes to the contents of an
Entity, every related EntityReference node
has to be replaced in the structure model by a clone of the
Entity's contents, and then the desired changes must be made
to each of those clones instead. All the descendants of an
Entity node are readonly.

EntityReference objects may be inserted into the
structure model when an entity reference is in the source document,
or when the user wishes to insert an entity reference. Note that
character references and references to predefined entities are
considered to be expanded by the HTML or XML
processor so that characters are represented by their Unicode
equivalent rather than by an entity reference. Moreover, the XML
processor may completely expand references to entities while building the
structure model, instead of providing EntityReference
objects. If it does provide such objects, then for a given
EntityReference node, it may be that there is no
Entity node representing the referenced entity;
but if such an Entity exists, then the child list of the
EntityReference node is the same as that of the
Entity node. As with the Entity node, all
descendants of the EntityReference are readonly.

The resolution of the children of the EntityReference (the
replacement value of the referenced Entity) may be lazily
evaluated; actions by the user (such as calling the
childNodes method on the EntityReference node)
are assumed to trigger the evaluation.