The Document Class is in most cases the result of a parsing process. But sometimes it is necessary to create a Document from scratch. The DOM Document Class provides functions that conform to the DOM Core naming style.

It inherits all functions from XML::LibXML::Node as specified in the DOM specification. This enables access to the nodes besides the root element on document level - a DTD for example. The support for these nodes is limited at the moment.

While generally nodes are bound to a document in the DOM concept it is suggested that one should always create a node not bound to any document. There is no need of really including the node to the document, but once the node is bound to a document, it is quite safe that all strings have the correct encoding. If an unbound text node with an ISO encoded string is created (e.g. with $CLASS->new()), the toString function may not return the expected result.

To prevent such problems, it is recommended to pass all data to XML::LibXML methods as character strings (i.e. UTF-8 encoded, with the UTF8 flag on).

Returns the URI (or filename) of the original document. For documents obtained by parsing a string of a FH without using the URI parsing argument of the corresponding parse_* function, the result is a generated string unknown-XYZ where XYZ is some number; for documents created with the constructor new, the URI is undefined.

The value can be modified by calling setURI method on the document node.

returns the encoding in which the XML will be returned by $doc->toString(). This is usually the original encoding of the document as declared in the XML declaration and returned by $doc->encoding. If the original encoding is not known (e.g. if created in memory or parsed from a XML without a declared encoding), 'UTF-8' is returned.

This method allows to change the declaration of encoding in the XML declaration of the document. The value also affects the encoding in which the document is serialized to XML by $doc->toString(). Use setEncoding() to remove the encoding declaration.

This function returns the Numerical value of a documents XML declarations standalone attribute. It returns 1 if standalone="yes" was found, 0 if standalone="no" was found and -1 if standalone was not specified (default on creation).

Through this method it is possible to alter the value of a documents standalone attribute. Set it to 1 to set standalone="yes", to 0 to set standalone="no" or set it to -1 to remove the standalone attribute from the XML declaration.

libxml2 allows reading of documents directly from gzipped files. In this case the compression variable is set to the compression level of that file (0-8). If XML::LibXML parsed a different source or the file wasn't compressed, the returned value will be -1.

If one intends to write the document directly to a file, it is possible to set the compression level for a given document. This level can be in the range from 0 to 8. If XML::LibXML should not try to compress use -1 (default).

Note that this feature will only work if libxml2 is compiled with zlib support and toFile() is used for output.

toString is a DOM serializing function, so the DOM Tree is serialized into an XML string, ready for output.

IMPORTANT: unlike toString for other nodes, on document nodes this function returns the XML as a byte string in the original encoding of the document (see the actualEncoding() method)! This means you can simply do:

open my $out_fh, '>', $file;
print {$out_fh} $doc->toString;

regardless of the actual encoding of the document. See the section on encodings in XML::LibXML for more details.

The optional $format parameter sets the indenting of the output. This parameter is expected to be an integer value, that specifies that indentation should be used. The format parameter can have three different values if it is used:

If $format is 0, than the document is dumped as it was originally parsed

If $format is 1, libxml2 will add ignorable white spaces, so the nodes content is easier to read. Existing text nodes will not be altered

If $format is 2 (or higher), libxml2 will act as $format == 1 but it add a leading and a trailing line break to each text node.

libxml2 uses a hard-coded indentation of 2 space characters per indentation level. This value can not be altered on run-time.

This function is similar to toString(), but it writes the document directly to a filehandle or a stream. A byte stream in the document encoding is passed to the file handle. Do NOT apply any :encoding(...) or :utf8 PerlIO layer to the filehandle! See the section on encodings in XML::LibXML for more details.

This is an exception throwing equivalent of is_valid. If the document is not valid it will throw an exception containing the error. This allows you much better error reporting than simply is_valid or not.

This function creates and adds an internal subset to the given document. Because the function automatically adds the DTD to the document there is no need to add the created node explicitly to the document.

If a node is not part of a document, it can be imported to another document. As specified in DOM Level 2 Specification the Node will not be altered or removed from its original document ($node->cloneNode(1) will get called implicitly).

NOTE: Don't try to use importNode() to import sub-trees that contain an entity reference - even if the entity reference is the root node of the sub-tree. This will cause serious problems to your program. This is a limitation of libxml2 and not of XML::LibXML itself.

If a node is not part of a document, it can be imported to another document. As specified in DOM Level 3 Specification the Node will not be altered but it will removed from its original document.

After a document adopted a node, the node, its attributes and all its descendants belong to the new document. Because the node does not belong to the old document, it will be unlinked from its old location first.

NOTE: Don't try to adoptNode() to import sub-trees that contain entity references - even if the entity reference is the root node of the sub-tree. This will cause serious problems to your program. This is a limitation of libxml2 and not of XML::LibXML itself.

Returns the element that has an ID attribute with the given value. If no such element exists, this returns undef.

Note: the ID of an element may change while manipulating the document. For documents with a DTD, the information about ID attributes is only available if DTD loading/validation has been requested. For HTML documents parsed with the HTML parser ID detection is done automatically. In XML documents, all "xml:id" attributes are considered to be of type ID. You can test ID-ness of an attribute node with $attr->isId().

In versions 1.59 and earlier this method was called getElementsById() (plural) by mistake. Starting from 1.60 this name is maintained as an alias only for backward compatibility.

This function causes libxml2 to stamp all elements in a document with their document position index which considerably speeds up XPath queries for large documents. It should only be used with static documents that won't be further changed by any DOM methods, because once a document is indexed, XPath will always prefer the index to other methods of determining the document order of nodes. XPath could therefore return improperly ordered node-lists when applied on a document that has been changed after being indexed. It is of course possible to use this method to re-index a modified document before using it with XPath again. This function is not a part of the DOM specification.

This function returns number of elements indexed, -1 if error occurred, or -2 if this feature is not available in the running libxml2.