<?xml version="1.0" encoding="UTF-8"?>
<book lang="en-us">
<title>XML::LibXML</title>
<bookinfo>
<authorgroup>
<author>
<firstname>Matt</firstname>
<surname>Sergeant</surname>
</author>
<author>
<firstname>Christian</firstname>
<surname>Glahn</surname>
</author>
<author>
<firstname>Petr</firstname>
<surname>Pajas</surname>
</author>
</authorgroup>
<edition>1.64</edition>
<copyright>
<year>2001-2007</year>
<holder>AxKit.com Ltd; 2002-2006 Christian Glahn; 2006-2007 Petr Pajas</holder>
</copyright>
</bookinfo>
<chapter>
<title>Introduction</title>
<titleabbrev>README</titleabbrev>
<para>This module implements a Perl interface to the Gnome
libxml2 library which provides
interfaces for parsing and manipulating XML files. This
module allows Perl programmers to make use of the highly
capable validating XML parser and the high performance DOM
implementation.</para>
<sect1>
<title>Important Notes</title>
<para>XML::LibXML was almost entirely reimplemented between version 1.40 to version 1.49. This may cause problems on some production machines. With
version 1.50 a lot of compatibility fixes were applied, so programs written for XML::LibXML 1.40 or less should run with version 1.50 again.</para>
<para>In 1.59, a new callback API was introduced. This new API is not compatible with the previous one.
See XML::LibXML::InputCallback manual page for details.</para>
<para>In 1.61 the XML::LibXML::XPathContext module, previously distributed separately, was merged in.</para>
</sect1>
<sect1>
<title>Dependencies</title>
<para>Prior to installation you MUST have installed the libxml2 library. You can get the latest libxml2 version from</para>
<para>http://xmlsoft.org/</para>
<para>Without libxml2 installed this module will neither build nor run.</para>
<para>Also XML::LibXML requires the following packages:</para>
<itemizedlist>
<listitem>
<para>XML::LibXML::Common - general functions used by various XML::LibXML modules</para>
</listitem>
<listitem>
<para>XML::SAX - DOM building support from SAX</para>
</listitem>
<listitem>
<para>XML::NamespaceSupport - DOM building support from SAX</para>
</listitem>
</itemizedlist>
<para>These packages are required. If one is missing some tests will fail.</para>
<para>Again, libxml2 is required to make XML::LibXML work. The library is not just required to build XML::LibXML, it has to be accessible during
run-time as well. Because of this you need to make sure libxml2 is installed properly. To test this, run the xmllint program on your system. xmllint
is shipped with libxml2 and therefore should be available.
For building the module you will also need the header file for libxml2, which in binary
(.rpm,.deb) etc. distributions usually dwell in a package named libxml2-devel or similar.</para>
</sect1>
<sect1>
<title>Installation</title>
<para>To install XML::LibXML just follow the standard installation routine for Perl modules:</para>
<orderedlist>
<listitem>
<para>perl Makefile.PL</para>
</listitem>
<listitem>
<para>make</para>
</listitem>
<listitem>
<para>make test</para>
</listitem>
<listitem>
<para>make install # as superuser</para>
</listitem>
</orderedlist>
<para>Note that XML::LibXML is an XS based Perl extension and you need a C compiler
to build it.</para>
<para>Note also that you should rebuild XML::LibXML if you upgrade libxml2
in order to avoid problems with possible binary incompatibilities between releases of the library.</para>
<sect2>
<title>Notes on libxml2 versions</title>
<para>XML::LibXML requires at least
libxml2 2.6.16 to compile and pass all tests and
at least 2.6.21 is required for XML::LibXML::Reader.
For some older OS versions this means that an
update of the pre-built packages is required.</para>
<para>Although libxml2 claims binary compatibility between
its patch levels, it is a good idea to recompile XML::LibXML and
XML::LibXML::Common and run its tests after an upgrade of libxml2.
</para>
<para>If your libxml2 installation is not within your $PATH,
you can pass the XMLPREFIX=$YOURLIBXMLPREFIX parameter to Makefile.PL
determining the correct libxml2 version in use. e.g.
</para>
<programlisting> perl Makefile.PL XMLPREFIX=/usr/brand-new </programlisting>
<para>will ask '/usr/brand-new/bin/xml2-config' about your real libxml2 configuration.</para>
<para>Try to avoid setting INC and LIBS directly on the
command-line, for if used, Makefile.PL does not check
the libxml2 version for compatibility with XML::LibXML.</para>
</sect2>
<sect2>
<title>Which version of libxml2 should be used?</title>
<para>XML::LibXML is tested against a couple versions of
libxml2 before it is released. Thus there are versions
of libxml2 that are known not to work properly with
XML::LibXML. The Makefile.PL keeps a blacklist of
the incompatible libxml2 versions.</para>
<para>If Makefile.PL detects one of the incompatible versions,
it notifies the user. It may still happen that
XML::LibXML builds and pass its tests with such
a version, but that does not mean everything
is OK. There will be no support at all for blacklisted versions!</para>
<para>As of XML::LibXML 1.61, only versions 2.6.16 and higher are supported.
XML::LibXML will probably not compile with earlier libxml2 versions than
2.5.6. Versions prior to 2.6.8 are known to be broken for various reasons,
versions prior to 2.1.16 exhibit problems with namespaced attributes
and do not therefore pass XML::LibXML regression tests.
</para>
<para>It may happen that an unsupported version of libxml2
passes all tests under certain conditions. This is no
reason to assume that it shall work without problems.
If Makefile.PL marks a version of libxml2 as incompatible or broken
it is done for a good reason.</para>
</sect2>
<sect2>
<title>Notes for Microsoft Windows</title>
<para>Thanks to Randy Kobes there is a pre-compiled PPM package available on</para>
<para>http://theoryx5.uwinnipeg.ca/ppmpackages/</para>
<para>Usually it takes a little time to build the package for the latest release.</para>
</sect2>
<sect2>
<title>Notes for Mac OS X</title>
<para>Due refactoring the module, XML::LibXML will not
run with some earlier versions of Mac OS X. It appears that this is related
to special linker options for that OS prior to version
10.2.2. Since the developers do not have full access to this OS,
help/ patches from OS X gurus are highly
appreciated.</para>
<para>It is confirmed that XML::LibXML builds and runs
without problems since Mac OS X 10.2.6.</para>
</sect2>
<sect2>
<title>Notes for HPUX</title>
<para>XML::LibXML requires libxml2 2.6.16 or
later. There may not exist a usable binary
libxml2 package for HPUX and XML::LibXML. If
HPUX cc does not compile libxml2
correctly, you will be forced to recompile perl with
gcc (unless you have already done that).</para>
<para>Additionally I received the following Note from Rozi Kovesdi:</para>
<programlisting>Here is my report if someone else runs into the same problem:
Finally I am done with installing all the libraries and XML Perl
modules
The combination that worked best for me was:
gcc
GNU make
Most importantly - before trying to install Perl modules that depend on
libxml2:
must set SHLIB_PATH to include the path to libxml2 shared library
assuming that you used the default:
export SHLIB=/usr/local/lib
also, make sure that the config files have execute permission:
/usr/local/bin/xml2-config
/usr/local/bin/xslt-config
they did not have +x after they were installed by 'make install'
and it took me a while to realize that this was my problem
or one can use:
perl Makefile.PL LIBS='-L/path/to/lib' INC='-I/path/to/include'</programlisting>
</sect2>
</sect1>
<sect1>
<title>Contact</title>
<para>For bug reports, please use the CPAN request tracker on http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-LibXML</para>
<para>For suggestions etc. you may contact the maintainer directly at "pajas at ufal dot mff dot cuni dot cz", but in general, it is recommended to use the mailing list given below.
</para>
<para>For suggestions etc., and other issues
related to XML::LibXML you may use the perl XML mailing list
(<email>perl-xml@listserv.ActiveState.com</email>),
where most XML-related Perl modules are discussed.
In case of problems you should check the archives of that
list first. Many problems are already discussed there. You
can find the list's archives and subscription options at
http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/perl-xml</para>
</sect1>
<sect1>
<title>Package History</title>
<para>Version &lt; 0.98 were maintained by Matt Sergeant</para>
<para>0.98 &gt; Version &gt; 1.49 were maintained by Matt Sergeant and Christian Glahn</para>
<para>Versions &gt;= 1.49 are maintained by Christian Glahn</para>
<para>Versions &gt; 1.56 are co-maintained by Petr Pajas</para>
<para>Versions &gt;= 1.59 are provisionally maintained by Petr Pajas</para>
</sect1>
<sect1>
<title>Patches and Developer Version</title>
<para>As XML::LibXML is open source software help and
patches are appreciated. If you find a bug in the current
release, make sure this bug still exists in the developer
version of XML::LibXML. This version can be downloaded
from its Subversion repository, e.g. via</para>
<para>svn co svn://axkit.org/XML-LibXML/trunk</para>
<para>Note that this account does not allow direct commits.</para>
<para>Please consider all regression tests as correct. If
any test fails it is most certainly related to a
bug.</para>
<para>If you find documentation bugs, please fix them in
the libxml.dbk file, stored in the docs directory.</para>
</sect1>
<sect1>
<title>Known Issues</title>
<para>The push-parser implementation causes memory leaks.</para>
</sect1>
</chapter>
<chapter>
<title>License</title>
<titleabbrev>LICENSE</titleabbrev>
<para>This is free software, you may use it and distribute it under the same terms as Perl itself.</para>
<para>Copyright 2001-2003 AxKit.com Ltd, All rights reserved.</para>
<sect1>
<title>Disclaimer</title>
<para>THIS PROGRAM IS DISTRIBUTED IN THE HOPE THAT IT WILL
BE USEFUL, BUT WITHOUT ANY WARRANTY; WITHOUT EVEN THE
IMPLIED WARRANTY OF MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE.</para>
</sect1>
</chapter>
<chapter>
<title>Perl Binding for libxml2</title>
<titleabbrev>XML::LibXML</titleabbrev>
<sect1>
<title>Synopsis</title>
<programlisting>use XML::LibXML;
my $parser = XML::LibXML-&gt;new();
my $doc = $parser-&gt;parse_string(&lt;&lt;'EOT');
&lt;some-xml/&gt;
EOT</programlisting>
</sect1>
<sect1>
<title>Description</title>
<para>This module is an interface to the gnome libxml2 DOM and SAX parser and the DOM tree. It also provides an XML::XPath-like findnodes()
interface, providing access to the XPath API in libxml2. The module is split into several packages which are not described in this section.</para>
<para>For further information, please check the following documentation:</para>
<variablelist>
<varlistentry>
<term>XML::LibXML::Parser</term>
<listitem>
<para>Parsing XML Files with XML::LibXML</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::DOM</term>
<listitem>
<para>XML::LibXML DOM Implementation</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::SAX</term>
<listitem>
<para>XML::LibXML direct SAX parser</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::Reader</term>
<listitem>
<para>Reading XML with a pull-parser</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::Document</term>
<listitem>
<para>XML::LibXML DOM Document Class</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::Node</term>
<listitem>
<para>Abstract Base Class of XML::LibXML Nodes</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::Element</term>
<listitem>
<para>XML::LibXML Class for Element Nodes</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::Text</term>
<listitem>
<para>XML::LibXML Class for Text Nodes</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::Comment</term>
<listitem>
<para>XML::LibXML Comment Nodes</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::CDATASection</term>
<listitem>
<para>XML::LibXML Class for CDATA Sections</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::Attr</term>
<listitem>
<para>XML::LibXML Attribute Class</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::DocumentFragment</term>
<listitem>
<para>XML::LibXML's DOM L2 Document Fragment Implementation</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::Namespace</term>
<listitem>
<para>XML::LibXML Namespace Implementation</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::PI</term>
<listitem>
<para>XML::LibXML Processing Instructions</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::Dtd</term>
<listitem>
<para>XML::LibXML DTD Support</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::RelaxNG</term>
<listitem>
<para>XML::LibXML frontend for RelaxNG schema validation</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::Schema</term>
<listitem>
<para>XML::LibXML frontend for W3C Schema schema validation</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::XPathContext</term>
<listitem>
<para>API for evaluating XPath expressions</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXMLguts</term>
<listitem>
<para>Internal of the Perl Layer for libxml2 (not done yet)</para>
</listitem>
</varlistentry>
</variablelist>
</sect1>
<sect1>
<title>Version Information</title>
<para>Sometimes it is useful to figure out, for which
version XML::LibXML was compiled for. In most cases this
is for debugging or to check if a given installation meets
all functionality for the package. The functions
XML::LibXML::LIBXML_DOTTED_VERSION and
XML::LibXML::LIBXML_VERSION provide this version
information. Both functions simply pass through the values
of the similar named macros of libxml2.
Similarly, XML::LibXML::LIBXML_RUNTIME_VERSION returns
the version of the (usually dynamically) linked libxml2.
</para>
<variablelist>
<varlistentry>
<term>XML::LibXML::LIBXML_DOTTED_VERSION</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$Version_String = XML::LibXML::LIBXML_DOTTED_VERSION;</funcsynopsisinfo>
</funcsynopsis>
<para>Returns the version string of the
libxml2 version XML::LibXML was compiled
for. This will be "2.6.2" for "libxml2
2.6.2".</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::LIBXML_VERSION</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$Version_ID = XML::LibXML::LIBXML_VERSION;</funcsynopsisinfo>
</funcsynopsis>
<para>Returns the version id of the libxml2
version XML::LibXML was compiled for. This
will be "20602" for "libxml2 2.6.2". Don't mix
this version id with
$XML::LibXML::VERSION. The latter contains the
version of XML::LibXML itself while the first
contains the version of libxml2 XML::LibXML
was compiled for.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::LIBXML_RUNTIME_VERSION</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$DLL_Version = XML::LibXML::LIBXML_RUNTIME_VERSION;</funcsynopsisinfo>
</funcsynopsis>
<para>Returns a version string of the libxml2
which is (usually dynamically) linked by
XML::LibXML. This will be "20602" for libxml2
released as "2.6.2" and something like
"20602-CVS2032" for a CVS build of
libxml2.</para>
<para>XML::LibXML issues a warning if the version
of libxml2 dynamically linked to it is less than the version of libxml2
which it was compiled against.
</para>
</listitem>
</varlistentry>
</variablelist>
</sect1>
<sect1>
<title>Related Modules</title>
<para>The modules described in this section are not part of the XML::LibXML package itself. As they support some additional features, they are
mentioned here.</para>
<variablelist>
<varlistentry>
<term>XML::LibXSLT</term>
<listitem>
<para>XSLT Processor using libxslt and XML::LibXML</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::Common</term>
<listitem>
<para>Common functions for XML::LibXML related Classes</para>
</listitem>
</varlistentry>
<varlistentry>
<term>XML::LibXML::Iterator</term>
<listitem>
<para>XML::LibXML Implementation of the DOM Traversal Specification</para>
</listitem>
</varlistentry>
</variablelist>
</sect1>
<sect1>
<title>XML::LibXML and XML::GDOME</title>
<para>Note: <emphasis>THE FUNCTIONS DESCRIBED HERE ARE STILL EXPERIMENTAL</emphasis></para>
<para>Although both modules make use of libxml2's XML capabilities, the DOM implementation of both modules are not compatible. But still it is
possible to exchange nodes from one DOM to the other. The concept of this exchange is pretty similar to the function cloneNode(): The particular
node is copied on the low-level to the opposite DOM implementation.</para>
<para>Since the DOM implementations cannot coexist within one document, one is forced to copy each node that should be used. Because you are always
keeping two nodes this may cause quite an impact on a machines memory usage.</para>
<para>XML::LibXML provides two functions to export or import GDOME nodes: import_GDOME() and export_GDOME(). Both function have two parameters: the
node and a flag for recursive import. The flag works as in cloneNode().</para>
<para>The two functions allow to export and import XML::GDOME nodes explicitly, however, XML::LibXML allows also the transparent import of
XML::GDOME nodes in functions such as appendChild(), insertAfter() and so on. While native nodes are automatically adopted in most functions
XML::GDOME nodes are always cloned in advance. Thus if the original node is modified after the operation, the node in the XML::LibXML document will
not have this information.</para>
<variablelist>
<varlistentry>
<term>import_GDOME</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$libxmlnode = XML::LibXML-&gt;import_GDOME( $node, $deep );</funcsynopsisinfo>
</funcsynopsis>
<para>This clones an XML::GDOME node to a XML::LibXML node explicitly.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>export_GDOME</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$gdomenode = XML::LibXML-&gt;export_GDOME( $node, $deep );</funcsynopsisinfo>
</funcsynopsis>
<para>Allows to clone an XML::LibXML node into a XML::GDOME node.</para>
</listitem>
</varlistentry>
</variablelist>
</sect1>
<sect1>
<title>CONTACTS</title>
<para>For bug reports, please use the CPAN request tracker on http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-LibXML</para>
<para>For suggestions etc., and other issues
related to XML::LibXML you may use the perl XML mailing list
(<email>perl-xml@listserv.ActiveState.com</email>),
where most XML-related Perl modules are discussed.
In case of problems you should check the archives of that
list first. Many problems are already discussed there. You
can find the list's archives and subscription options at
<ulink url="http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/perl-xml">http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/perl-xml</ulink>.
</para>
</sect1>
</chapter>
<chapter>
<title>Parsing XML Data with XML::LibXML</title>
<titleabbrev>XML::LibXML::Parser</titleabbrev>
<sect1>
<title>Synopsis</title>
<programlisting>use XML::LibXML;
my $parser = XML::LibXML-&gt;new();
my $doc = $parser-&gt;parse_string(&lt;&lt;'EOT');
&lt;some-xml/&gt;
EOT
my $fdoc = $parser-&gt;parse_file( $xmlfile );
my $fhdoc = $parser-&gt;parse_fh( $xmlstream );
my $fragment = $parser-&gt;parse_xml_chunk( $xml_wb_chunk );</programlisting>
</sect1>
<sect1>
<title>Parsing</title>
<para>A XML document is read into a data structure such as a DOM tree by a piece of software, called a parser. XML::LibXML currently provides four
different parser interfaces:</para>
<itemizedlist>
<listitem>
<para>A DOM Pull-Parser</para>
</listitem>
<listitem>
<para>A DOM Push-Parser</para>
</listitem>
<listitem>
<para>A SAX Parser</para>
</listitem>
<listitem>
<para>A DOM based SAX Parser.</para>
</listitem>
</itemizedlist>
<sect2>
<title>Creating a Parser Instance</title>
<para>XML::LibXML provides an OO interface to the libxml2 parser functions. Thus you have to create a parser instance before you can parse any
XML data.</para>
<variablelist>
<varlistentry>
<term>new</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser = XML::LibXML-&gt;new();</funcsynopsisinfo>
</funcsynopsis>
<para>There is nothing much to say about the constructor. It simply creates a new parser instance.</para>
<para>Although libxml2 uses mainly global flags to alter the behaviour of the parser, each XML::LibXML parser instance has its own
flags or callbacks and does not interfere with other instances.</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2>
<title>DOM Parser</title>
<para>One of the common parser interfaces of XML::LibXML is the DOM parser. This parser reads XML data into a DOM like data structure, so each
tag can get accessed and transformed.</para>
<para>XML::LibXML's DOM parser is not only capable to parse XML data, but also (strict) HTML files. There are three ways to parse
documents - as a string, as a Perl filehandle, or as a filename/URL. The return value from each is a XML::LibXML::Document object, which is a DOM
object.</para>
<para>All of the functions listed below will throw an exception if the document is invalid. To prevent this causing your program exiting, wrap
the call in an eval{} block</para>
<variablelist>
<varlistentry>
<term>parse_file</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$doc = $parser-&gt;parse_file( $xmlfilename );</funcsynopsisinfo>
</funcsynopsis>
<para>This function parses an XML document from a file or network;
$xmlfilename can be either a filename or an URL.
Note that for parsing files, this function is the fastest choice,
about 6-8 times faster then parse_fh().
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>parse_fh</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$doc = $parser-&gt;parse_fh( $io_fh );</funcsynopsisinfo>
</funcsynopsis>
<para>parse_fh() parses a IOREF or a subclass of IO::Handle.</para>
<para>Because the data comes from an open handle, libxml2's parser does not know about the base URI of the document. To set the
base URI one should use parse_fh() as follows:</para>
<programlisting>my $doc = $parser-&gt;parse_fh( $io_fh, $baseuri );</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term>parse_string</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$doc = $parser-&gt;parse_string( $xmlstring);</funcsynopsisinfo>
</funcsynopsis>
<para>This function is similar to parse_fh(), but it parses a XML document that is available as a single string in memory. Again,
you can pass an optional base URI to the function.</para>
<programlisting>my $doc = $parser-&gt;parse_string( $xmlstring, $baseuri );</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term>parse_html_file</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$doc = $parser-&gt;parse_html_file( $htmlfile, \%opts );</funcsynopsisinfo>
</funcsynopsis>
<para>Similar to parse_file() but parses HTML (strict) documents;
$htmlfile can be filename or URL.
</para>
<para>An optional second argument can be
used to pass some options to the HTML
parser as a HASH reference. Possible
options are: Possible options are:
encoding and URI for libxml2 &lt;
2.6.27, and for later versions of
libxml2 additionally: recover,
suppress_errors, suppress_warnings,
pedantic_parser, no_blanks, and
no_network.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>parse_html_fh</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$doc = $parser-&gt;parse_html_fh( $io_fh, \%opts );</funcsynopsisinfo>
</funcsynopsis>
<para>Similar to parse_fh() but parses HTML (strict) streams.</para>
<para>
An optional second argument can be used
to pass some options to the HTML parser
as a HASH reference. Possible options
are: encoding and URI for libxml2 &lt;
2.6.27, and for later versions of
libxml2 additionally: recover,
suppress_errors, suppress_warnings,
pedantic_parser, no_blanks, and
no_network. Note: encoding option may
not work correctly with this function
in libxml2 &lt; 2.6.27 if the HTML file
declares charset using a META tag.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>parse_html_string</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$doc = $parser-&gt;parse_html_string( $htmlstring, \%opts );</funcsynopsisinfo>
</funcsynopsis>
<para>Similar to parse_string() but parses HTML (strict) strings.</para>
<para>An optional second argument can be used to pass some options to the
HTML parser as a HASH reference. Possible options are:
encoding and URI for libxml2 &lt; 2.6.27, and for later versions of libxml2 additionally:
recover, suppress_errors, suppress_warnings, pedantic_parser,
no_blanks, and no_network.
</para>
</listitem>
</varlistentry>
</variablelist>
<para>Parsing HTML may cause problems, especially if
the ampersand ('&amp;') is used. This is a common
problem if HTML code is parsed that contains links to
CGI-scripts. Such links cause the parser to throw
errors. In such cases libxml2 still parses the entire
document as there was no error, but the error causes
XML::LibXML to stop the parsing process. However, the
document is not lost. Such HTML documents should be
parsed using the <emphasis>recover</emphasis> flag. By
default recovering is deactivated.</para>
<para>The functions described above are implemented to
parse well formed documents. In some cases a program
gets well balanced XML instead of well formed
documents (e.g. a XML fragment from a Database). With
XML::LibXML it is not required to wrap such fragments
in the code, because XML::LibXML is capable even to
parse well balanced XML fragments.</para>
<variablelist>
<varlistentry>
<term>parse_balanced_chunk</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$fragment = $parser-&gt;parse_balanced_chunk( $wbxmlstring );</funcsynopsisinfo>
</funcsynopsis>
<para>This function parses a well balanced XML string into a XML::LibXML::DocumentFragment.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>parse_xml_chunk</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$fragment = $parser-&gt;parse_xml_chunk( $wbxmlstring );</funcsynopsisinfo>
</funcsynopsis>
<para>This is the old name of parse_balanced_chunk(). Because it may causes confusion with the push parser interface, this function
should not be used anymore.</para>
</listitem>
</varlistentry>
</variablelist>
<para>By default XML::LibXML does not process XInclude tags within a XML Document (see options section below). XML::LibXML allows to post
process a document to expand XInclude tags.</para>
<variablelist>
<varlistentry>
<term>process_xincludes</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;process_xincludes( $doc );</funcsynopsisinfo>
</funcsynopsis>
<para>After a document is parsed into a DOM structure, you may want to expand the documents XInclude tags. This function processes
the given document structure and expands all XInclude tags (or throws an error) by using the flags and callbacks of the given parser
instance.</para>
<para>Note that the resulting Tree contains some extra nodes (of type XML_XINCLUDE_START and XML_XINCLUDE_END) after successfully
processing the document. These nodes indicate where data was included into the original tree. if the document is serialized, these
extra nodes will not show up.</para>
<para>Remember: A Document with processed XIncludes differs from the original document after serialization, because the original
XInclude tags will not get restored!</para>
<para>If the parser flag "expand_xincludes" is set to 1, you need not to post process the parsed document.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>processXIncludes</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;processXIncludes( $doc );</funcsynopsisinfo>
</funcsynopsis>
<para>This is an alias to process_xincludes, but through a JAVA like function name.</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2>
<title>Push Parser</title>
<para>XML::LibXML provides a push parser interface. Rather than pulling the data from a given source the push parser waits for the data to be
pushed into it.</para>
<para>This allows one to parse large documents without waiting for the parser to finish. The interface is especially useful if a program needs
to pre-process the incoming pieces of XML (e.g. to detect document boundaries).</para>
<para>While XML::LibXML parse_*() functions force the data to be a well-formed XML, the push parser will take any arbitrary string that contains
some XML data. The only requirement is that all the pushed strings are together a well formed document. With the push parser interface a
program can interrupt the parsing process as required, where the parse_*() functions give not enough flexibility.</para>
<para>Different to the pull parser implemented in parse_fh() or parse_file(), the push parser is not able to find out about the documents end
itself. Thus the calling program needs to indicate explicitly when the parsing is done.</para>
<para>In XML::LibXML this is done by a single function:</para>
<variablelist>
<varlistentry>
<term>parse_chunk</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;parse_chunk($string, $terminate);</funcsynopsisinfo>
</funcsynopsis>
<para>parse_chunk() tries to parse a given chunk of data, which isn't necessarily well balanced data. The function takes two
parameters: The chunk of data as a string and optional a termination flag. If the termination flag is set to a true value (e.g. 1),
the parsing will be stopped and the resulting document will be returned as the following example describes:</para>
<programlisting>my $parser = XML::LibXML-&gt;new;
for my $string ( "&lt;", "foo", ' bar="hello world"', "/&gt;") {
$parser-&gt;parse_chunk( $string );
}
my $doc = $parser-&gt;parse_chunk("", 1); # terminate the parsing</programlisting>
</listitem>
</varlistentry>
</variablelist>
<para>Internally XML::LibXML provides three functions that control the push parser process:</para>
<variablelist>
<varlistentry>
<term>start_push</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;start_push();</funcsynopsisinfo>
</funcsynopsis>
<para>Initializes the push parser.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>push</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;push(@data);</funcsynopsisinfo>
</funcsynopsis>
<para>This function pushes the data stored inside the array to libxml2's parser. Each entry in @data must be a normal scalar!</para>
</listitem>
</varlistentry>
<varlistentry>
<term>finish_push</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$doc = $parser-&gt;finish_push( $recover );</funcsynopsisinfo>
</funcsynopsis>
<para>This function returns the result of the parsing process. If this function is called without a parameter it will complain about
non well-formed documents. If $restore is 1, the push parser can be used to restore broken or non well formed (XML) documents as the
following example shows:</para>
<programlisting>eval {
$parser-&gt;push( "&lt;foo&gt;", "bar" );
$doc = $parser-&gt;finish_push(); # will report broken XML
};
if ( $@ ) {
# ...
}</programlisting>
<para>This can be annoying if the closing tag is missed by accident. The following code will restore the document:</para>
<programlisting>eval {
$parser-&gt;push( "&lt;foo&gt;", "bar" );
$doc = $parser-&gt;finish_push(1); # will return the data parsed
# unless an error happened
};
print $doc-&gt;toString(); # returns "&lt;foo&gt;bar&lt;/foo&gt;"</programlisting>
<para>Of course finish_push() will return nothing if there was no data pushed to the parser before.</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2>
<title>DOM based SAX Parser</title>
<para>XML::LibXML provides a DOM based SAX parser. The SAX parser is defined in XML::LibXML::SAX::Parser. As it is not a stream based parser, it
parses documents into a DOM and traverses the DOM tree instead.</para>
<para>The API of this parser is exactly the same as any other Perl SAX2 parser. See XML::SAX::Intro for details.</para>
<para>Aside from the regular parsing methods, you can access the DOM tree traverser directly, using the generate() method:</para>
<programlisting>my $doc = build_yourself_a_document();
my $saxparser = $XML::LibXML::SAX::Parser-&gt;new( ... );
$parser-&gt;generate( $doc );</programlisting>
<para>This is useful for serializing DOM trees, for example that you might have done prior processing on, or that you have as a result of XSLT
processing.</para>
<para><emphasis>WARNING</emphasis></para>
<para>This is NOT a streaming SAX parser. As I said above, this parser reads the entire document into a DOM and serialises it. Some people
couldn't read that in the paragraph above so I've added this warning.</para>
<para>If you want a streaming SAX parser look at the XML::LibXML::SAX man page</para>
</sect2>
</sect1>
<sect1>
<title>Serialization</title>
<para>XML::LibXML provides some functions to serialize nodes and documents. The serialization functions are described on the XML::LibXML::Node
manpage or the XML::LibXML::Document manpage. XML::LibXML checks three global flags that alter the serialization process:</para>
<itemizedlist>
<listitem>
<para>skipXMLDeclaration</para>
</listitem>
<listitem>
<para>skipDTD</para>
</listitem>
<listitem>
<para>setTagCompression</para>
</listitem>
</itemizedlist>
<para>of that three functions only setTagCompression is available for all serialization functions.</para>
<para>Because XML::LibXML does these flags not itself, one has to define them locally as the following example shows:</para>
<programlisting>local $XML::LibXML::skipXMLDeclaration = 1;
local $XML::LibXML::skipDTD = 1;
local $XML::LibXML::setTagCompression = 1;</programlisting>
<para>If skipXMLDeclaration is defined and not '0', the XML declaration is omitted during serialization.</para>
<para>If skipDTD is defined and not '0', an existing DTD would not be serialized with the document.</para>
<para>If setTagCompression is defined and not '0' empty tags are displayed as open and closing tags rather than the shortcut. For example
the empty tag <emphasis>foo</emphasis> will be rendered as <emphasis>&lt;foo&gt;&lt;/foo&gt;</emphasis> rather than <emphasis>&lt;foo/&gt;</emphasis>.</para>
</sect1>
<sect1>
<title>Parser Options</title>
<para>LibXML options are global (unfortunately this is a limitation of the underlying implementation, not this interface). They can either be set
using $parser-&gt;option(...), or XML::LibXML-&gt;option(...), both are treated in the same manner. Note that even two parser processes will share
some of the same options, so be careful out there!</para>
<para>Every option returns the previous value, and can be called without parameters to get the current value.</para>
<variablelist>
<varlistentry>
<term>validation</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;validation(1);</funcsynopsisinfo>
</funcsynopsis>
<para>Turn validation on (or off). Defaults to off.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>recover</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;recover(1);</funcsynopsisinfo>
</funcsynopsis>
<para>Turn the parsers recover mode on (or off). Defaults to off.</para>
<para>This allows one to parse broken XML data into memory. This switch will only work with XML data rather than HTML data. Also the
validation will be switched off automatically.</para>
<para>The recover mode helps to recover documents that are almost well-formed very efficiently. That is for example a document that
forgets to close the document tag (or any other tag inside the document). The recover mode of XML::LibXML has problems restoring
documents that are more like well balanced chunks.</para>
<para>XML::LibXML will only parse until the first fatal error occurs, reporting recoverable parsing errors as warnings. To suppress
these warnings use $parser-&gt;recover_silently(1); or, equivalently, $parser-&gt;recover(2).</para>
</listitem>
</varlistentry>
<varlistentry>
<term>recover_silently</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;recover_silently(1);</funcsynopsisinfo>
</funcsynopsis>
<para>Turns the parser warnings off (or on). Defaults to on.</para>
<para>This allows to switch off warnings printed to STDERR when parsing documents with recover(1).</para>
<para>Please note that calling recover_silently(0) also turns the parser recover mode off and calling recover_silently(1) automatically
activates the parser recover mode.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>expand_entities</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;expand_entities(0);</funcsynopsisinfo>
</funcsynopsis>
<para>Turn entity expansion on or off, enabled by default. If entity expansion is off, any external parsed entities in the document are
left as entities. Probably not very useful for most purposes.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>keep_blanks</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;keep_blanks(0);</funcsynopsisinfo>
</funcsynopsis>
<para>Allows you to turn off XML::LibXML's default behaviour of maintaining white-space in the document.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>pedantic_parser</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;pedantic_parser(1);</funcsynopsisinfo>
</funcsynopsis>
<para>You can make XML::LibXML more pedantic if you want to.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>line_numbers</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;line_numbers(1);</funcsynopsisinfo>
</funcsynopsis>
<para>If this option is activated XML::LibXML will store the line number of a node. This gives more information where a validation error
occurred. It could be also used to find out about the position of a node after parsing (see also XML::LibXML::Node::line_number())</para>
<para>By default line numbering is switched off (0).</para>
</listitem>
</varlistentry>
<varlistentry>
<term>load_ext_dtd</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;load_ext_dtd(1);</funcsynopsisinfo>
</funcsynopsis>
<para>Load external DTD subsets while parsing.</para>
<para>This flag is also required for DTD Validation, to provide complete attribute,
and to expand entities, regardless if the document has an internal subset.
Thus switching off external DTD loading, will disable entity expansion,
validation, and complete attributes on internal subsets as well.</para>
<para>If you leave this parser flag untouched, everything will work,
because the default is 1 (activated)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>complete_attributes</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;complete_attributes(1);</funcsynopsisinfo>
</funcsynopsis>
<para>Complete the elements attributes lists with the ones defaulted from the DTDs. By default, this option is enabled.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>expand_xinclude</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;expand_xinclude(1);</funcsynopsisinfo>
</funcsynopsis>
<para>Expands XIinclude tags immediately while parsing the document. This flag assures that the parser callbacks are used while parsing
the included document.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>load_catalog</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;load_catalog( $catalog_file );</funcsynopsisinfo>
</funcsynopsis>
<para>Will use $catalog_file as a catalog during all parsing processes. Using a catalog will significantly speed up parsing processes if
many external resources are loaded into the parsed documents (such as DTDs or XIncludes).</para>
<para>Note that catalogs will not be available if an external entity handler was specified. At the current state it is not possible to
make use of both types of resolving systems at the same time.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>base_uri</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;base_uri( $your_base_uri );</funcsynopsisinfo>
</funcsynopsis>
<para>In case of parsing strings or file handles, XML::LibXML doesn't know about the base uri of the document. To make relative
references such as XIncludes work, one has to set a separate base URI, that is then used for the parsed documents.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>gdome_dom</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;gdome_dom(1);</funcsynopsisinfo>
</funcsynopsis>
<para>THIS FLAG IS EXPERIMENTAL!</para>
<para>Although quite powerful XML:LibXML's DOM implementation is limited if one needs or wants full DOM level 2 or level 3 support.
XML::GDOME is based on libxml2 as well but provides a rather complete DOM implementation by wrapping libgdome. This allows you to make
use of XML::LibXML's full parser options and XML::GDOME's DOM implementation at the same time.</para>
<para>To make use of this function, one has to install libgdome and configure XML::LibXML to use this library. For this you need to
rebuild XML::LibXML!</para>
</listitem>
</varlistentry>
<varlistentry>
<term>clean_namespaces</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;clean_namespaces( 1 );</funcsynopsisinfo>
</funcsynopsis>
<para>libxml2 2.6.0 and later allows to strip redundant namespace declarations from the DOM tree. To do this, one has to set
clean_namespaces() to 1 (TRUE). By default no namespace cleanup is done.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>no_network</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parser-&gt;no_network(1);</funcsynopsisinfo>
</funcsynopsis>
<para>Turn networking support on or off,
enabled by default. If networking is off, all
attempts to fetch non-local resources (such as
DTD or external entities) will fail (unless
custom callbacks are defined). It may be
necessary to use $parser->recover(1) for
processing documents requiring such resources
while networking is off.
</para>
</listitem>
</varlistentry>
</variablelist>
</sect1>
<sect1>
<title>Error Reporting</title>
<para>XML::LibXML throws exceptions during parsing, validation or XPath processing (and some other occasions). These errors can be caught by using
<emphasis>eval</emphasis> blocks. The error then will be stored in <emphasis>$@</emphasis>.</para>
<para>XML::LibXML throws errors as they occurs and does not wait if a user test
for them. This is a very common misunderstanding in the use of XML::LibXML. If the eval is omitted, XML::LibXML will always halt your script by
"croaking" (see Carp man page for details).</para>
<para>Also note that an increasing number of functions throw errors if bad data is passed. If you cannot assure valid data passed to XML::LibXML you should eval
these functions.</para>
<para>Note: since version 1.59, get_last_error() is no longer available in XML::LibXML for thread-safety reasons.</para>
</sect1>
</chapter>
<chapter>
<title>XML::LibXML direct SAX parser</title>
<titleabbrev>XML::LibXML::SAX</titleabbrev>
<sect1>
<title>Description</title>
<para>XML::LibXML provides an interface to libxml2 direct SAX interface. Through this interface it is possible to generate SAX events directly while
parsing a document. While using the SAX parser XML::LibXML will not create a DOM Document tree.</para>
<para>Such an interface is useful if very large XML documents have to be processed and no DOM functions are required. By using this interface it is
possible to read data stored within a XML document directly into the application data structures without loading the document into memory.</para>
<para>The SAX interface of XML::LibXML is based on the famous XML::SAX interface. It uses the generic interface as provided by XML::SAX::Base.</para>
<para>Additionally to the generic functions, which are only able to process entire documents, XML::LibXML::SAX provides <emphasis>parse_chunk()</emphasis>.
This method generates SAX events from well balanced data such as is often provided by databases.</para>
<para><emphasis>NOTE:</emphasis> At the moment XML::LibXML provides only an incomplete interface to libxml2's native SAX implementation. The
current implementation is not tested in production environment. It may causes significant memory problems or shows wrong behaviour. If you run into
specific problems using this part of XML::LibXML, let me know.</para>
</sect1>
</chapter>
<chapter>
<title>Building DOM trees from SAX events.</title>
<titleabbrev>XML::LibXML::SAX::Builder</titleabbrev>
<sect1>
<title>Synopsis</title>
<programlisting>use XML::LibXML::SAX::Builder;
my $builder = XML::LibXML::SAX::Builder-&gt;new();
my $gen = XML::Generator::DBI-&gt;new(Handler =&gt; $builder, dbh =&gt; $dbh);
$gen-&gt;execute("SELECT * FROM Users");
my $doc = $builder-&gt;result();</programlisting>
</sect1>
<sect1>
<title>Description</title>
<para>This is a SAX handler that generates a DOM tree from SAX events. Usage is as above. Input is accepted from any SAX1 or SAX2 event generator.</para>
<para>Building DOM trees from SAX events is quite easy with XML::LibXML::SAX::Builder. The class is designed as a SAX2 final handler not as a
filter!</para>
<para>Since SAX is strictly stream oriented, you should not expect anything to return from a generator. Instead you have to ask the builder instance
directly to get the document built. XML::LibXML::SAX::Builder's result() function holds the document generated from the last SAX stream.</para>
</sect1>
</chapter>
<chapter>
<title>XML::LibXML DOM Implementation</title>
<titleabbrev>XML::LibXML::DOM</titleabbrev>
<sect1>
<title>Description</title>
<para>XML::LibXML provides an light-wight interface to
<emphasis>modify</emphasis> a node of the document tree
generated by the XML::LibXML parser. This interface
follows as far as possible the DOM Level 3
specification. Additionally to the specified functions the
XML::LibXML supports some functions that are more handy to
use in the perl environment.</para>
<para>One also has to remember, that XML::LibXML is an
interface to libxml2 nodes which actually reside on the
C-Level of XML::LibXML. This means each node is a
reference to a structure different than a perl hash or
array. The only way to access these structure's values is
through the DOM interface provided by XML::LibXML. This
also means, that one <emphasis>can't</emphasis> simply
inherit a XML::LibXML node and add new member variables as
they were hash keys.</para>
<para>The DOM interface of XML::LibXML does not intend to
implement a full DOM interface as it is done by XML::GDOME
and used for full featured application. Moreover, it
offers an simple way to build or modify documents that are
created by XML::LibXML's parser.</para>
<para>Another target of the XML::LibXML interface is to
make the interfaces of libxml2 available to the perl
community. This includes also some workarounds to some
features where libxml2 assumes more control over the
C-Level that most perl users don't have.</para>
<para>One of the most important parts of the XML::LibXML
DOM interface is, that the interfaces try do follow the
DOM Level 3 specification rather strictly. This means the
interface functions are named as the DOM specification
says and not what widespread Java interfaces claim to be
standard. Although there are several functions that have
only a singular interface that conforms to the DOM spec
XML::LibXML provides an additional Java style alias
interface.</para>
<para>Also there are some function interfaces left over
from early stages of XML::LibXML for compatibility
reasons. These interfaces are for compatibility reasons
<emphasis>only</emphasis>. They might disappear in one of
the future versions of XML::LibXML, so a user is requested
to switch over to the official functions.</para>
<para>More recent versions of perl (e.g. 5.6.1 or higher)
support special flags to distinguish between UTF-8 and so
called binary data. XML::LibXML provides for these
versions functionality to make efficient use of these
flags: If a document has set an encoding other than UTF-8
all strings that are not already in UTF-8 are implicitly
encoded from the document encoding to UTF-8. On output
these strings are commonly returned as UTF-8 unless a user
does request explicitly the original (aka. document)
encoding.</para>
<para>Older version of perl (such as 5.00503 or less) do
not support these flags. If XML::LibXML is build for these
versions, all strings have to get encoded to UTF-8 manually
before they are passed to any DOM functions.</para>
<para><emphasis>NOTE:</emphasis> XML::LibXML's magic
encoding may not work on all platforms. Some platforms
are known to have a broken iconv(), which is partly used
by libxml2. To test if your platform works correctly with
your language encoding, build a simple document in the
particular encoding and try to parse it with
XML::LibXML. If your document gets parsed with out causing
any segmentation faults, bus errors or whatever your OS
throws. An example for such a test can be found in test
19encoding.t of the distribution.</para>
<para><emphasis>Namespaces and XML::LibXML's DOM implementation</emphasis></para>
<para>XML::LibXML's DOM implementation is
limited by the DOM implementation of libxml2
which treats namespaces slightly differently than
required by the DOM Level 2 specification.
</para>
<para>According to the DOM Level 2 specification,
namespaces of elements and attributes should be
persistent, and nodes should be permanently bound to
namespace URIs as they get created; it should be
possible to manipulate the special attributes used for
declaring XML namespaces just as other attributes
without affecting the namespaces of other nodes.
In DOM Level 2, the application is responsible
for creating the special attributes consistently and/or for correct
serialization of the document.
</para>
<para>
This is both inconvenient, causes problems in serialization
of DOM to XML, and most importantly, seems almost impossible
to implement over libxml2.
</para>
<para>
In libxml2, namespace URI and prefix of a node is
provided by a pointer to a namespace declaration
(appearing as a special xmlns attribute in the XML
document). If the prefix or namespace URI of the
declaration changes, the prefix and namespace URI of all
nodes that point to it changes as well. Moreover, in
contrast to DOM, a node (element or attribute) can only
be bound to a namespace URI if there is some namespace
declaration in the document to point to.
</para>
<para>
Therefore current DOM implementation in XML::LibXML tries
to treat namespace declarations in a compromise between
reason, common sense, limitations of libxml2, and the DOM
Level 2 specification.
</para>
<para>In XML::LibXML, special attributes declaring XML namespaces
are often created automatically, usually when
a namespaced node is attached to a document
and no existing declaration of the namespace and prefix is in the
scope to be reused.
In this respect,
XML::LibXML DOM implementation differs from the DOM
Level 2 specification according to which special
attributes for declaring the appropriate XML namespaces
should not be added when a node with a namespace prefix
and namespace URI is created.
</para>
<para>
Namespace declarations are also created when
XML::LibXML::Document's
createElementNS() or createAttributeNS() function are used. If the
a namespace is not declared on the documentElement, the
namespace will be locally declared for the newly created
node. In case of Attributes this may look a bit confusing,
since these nodes cannot have namespace declarations
itself. In this case the namespace is internally applied
to the attribute and later declared on the node the
attribute is appended to (if required).</para>
<para>The following example may explain this a bit:</para>
<programlisting> my $doc = XML::LibXML-&gt;createDocument;
my $root = $doc-&gt;createElementNS( "", "foo" );
$doc-&gt;setDocumentElement( $root );
my $attr = $doc-&gt;createAttributeNS( "bar", "bar:foo", "test" );
$root-&gt;setAttributeNodeNS( $attr ); </programlisting>
<para>This piece of code will result in the following document:</para>
<programlisting> &lt;?xml version="1.0"?&gt;
&lt;foo xmlns:bar="bar" bar:foo="test"/&gt;</programlisting>
<para>The namespace is declared on the document element
during the setAttributeNodeNS() call.
</para>
<para>Namespaces can be also declared explicitly by the use of XML::LibXML:Element's setNamespace() function.
Since 1.61, they can also be manipulated with functions
setNamespaceDeclPrefix() and setNamespaceDeclURI() (not available in DOM).
Changing an URI or prefix of an existing namespace declaration
affects the namespace URI and prefix of all nodes which point to it
(that is the nodes in its scope).
</para>
<para>It is also important to repeat the specification:
While working with namespaces you should use the namespace
aware functions instead of the simplified versions. For
example you should <emphasis>never</emphasis> use
setAttribute() but setAttributeNS().</para>
</sect1>
</chapter>
<chapter>
<title>XML::LibXML DOM Document Class</title>
<titleabbrev>XML::LibXML::Document</titleabbrev>
<sect1>
<title>Synopsis</title>
<programlisting>use XML::LibXML;
# Only methods specific to Document nodes listed here,
# see XML::LibXML::Node manpage for other methods</programlisting>
</sect1>
<para>The Document Class is in most cases the result of a parsing process. But sometimes it is necessary to create a Document from scratch. The DOM
Document Class provides functions that conform to the DOM Core naming style.</para>
<para>It inherits all functions from <function>XML::LibXML::Node</function> as specified in the DOM specification. This enables access to the nodes
besides the root element on document level - a <function>DTD</function> for example. The support for these nodes is limited at the moment.</para>
<para>While generally nodes are bound to a document in the DOM concept it is suggested that one should always create a node not bound to any document.
There is no need of really including the node to the document, but once the node is bound to a document, it is quite safe that all strings have the
correct encoding. If an unbound text node with an ISO encoded string is created (e.g. with $CLASS-&gt;new()), the <function>toString</function> function
may not return the expected result.</para>
<para>All this seems like a limitation as long as UTF-8 encoding is assured. If ISO encoded strings come into play it is much safer to use the node
creation functions of <emphasis>XML::LibXML::Document</emphasis>.</para>
<variablelist>
<varlistentry>
<term>new</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$dom = XML::LibXML::Document-&gt;new( $version, $encoding );</funcsynopsisinfo>
</funcsynopsis>
<para>alias for createDocument()</para>
</listitem>
</varlistentry>
<varlistentry>
<term>createDocument</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$dom = XML::LibXML::Document-&gt;createDocument( $version, $encoding );</funcsynopsisinfo>
</funcsynopsis>
<para>The constructor for the document class. As Parameter it takes the version string and (optionally) the encoding string. Simply calling
<emphasis>createDocument</emphasis>() will create the document:</para>
<programlisting>&lt;?xml version="your version" encoding="your encoding"?&gt;</programlisting>
<para>Both parameter are optional. The default value for <emphasis>$version</emphasis> is <function>1.0</function>, of course. If the
<emphasis>$encoding</emphasis> parameter is not set, the encoding will be left unset, which means UTF-8 is implied.</para>
<para>The call of <emphasis>createDocument</emphasis>() without any parameter will result the following code:</para>
<programlisting>&lt;?xml version="1.0"?&gt; </programlisting>
<para>Alternatively one can call this constructor directly from the XML::LibXML class level, to avoid some typing. This will not have any
effect on the class instance, which is always XML::LibXML::Document.</para>
<programlisting> my $document = XML::LibXML-&gt;createDocument( "1.0", "UTF-8" );</programlisting>
<para>is therefore a shortcut for</para>
<programlisting>my $document = XML::LibXML::Document-&gt;createDocument( "1.0", "UTF-8" );</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term>encoding</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$strEncoding = $doc-&gt;encoding();</funcsynopsisinfo>
</funcsynopsis>
<para>returns the encoding string of the document.</para>
<programlisting>my $doc = XML::LibXML-&gt;createDocument( "1.0", "ISO-8859-15" );
print $doc-&gt;encoding; # prints ISO-8859-15</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term>actualEncoding</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$strEncoding = $doc-&gt;actualEncoding();</funcsynopsisinfo>
</funcsynopsis>
<para>returns the encoding in which the XML will be returned by $doc->toString().
This is usually the original encoding of the document as declared
in the XML declaration and returned by $doc->encoding.
If the original encoding is not known (e.g. if created in memory or parsed from a
XML without a declared encoding), 'UTF-8' is returned.
</para>
<programlisting>my $doc = XML::LibXML-&gt;createDocument( "1.0", "ISO-8859-15" );
print $doc-&gt;encoding; # prints ISO-8859-15</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term>setEncoding</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$doc-&gt;setEncoding($new_encoding);</funcsynopsisinfo>
</funcsynopsis>
<para>This method allows to change the declaration of
encoding in the XML declaration of the document.
The value also affects the encoding in which the
document is serialized to XML by $doc->toString().
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>version</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$strVersion = $doc-&gt;version();</funcsynopsisinfo>
</funcsynopsis>
<para>returns the version string of the document</para>
<para><emphasis>getVersion()</emphasis> is an alternative form of this function.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>standalone</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$doc-&gt;standalone</funcsynopsisinfo>
</funcsynopsis>
<para>This function returns the Numerical value of a documents XML declarations standalone attribute. It returns <emphasis>1</emphasis> if
standalone="yes" was found, <emphasis>0</emphasis> if standalone="no" was found and <emphasis>-1</emphasis> if standalone
was not specified (default on creation).</para>
</listitem>
</varlistentry>
<varlistentry>
<term>setStandalone</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$doc-&gt;setStandalone($numvalue);</funcsynopsisinfo>
</funcsynopsis>
<para>Through this method it is possible to alter the value of a documents standalone attribute. Set it to <emphasis>1</emphasis> to set
standalone="yes", to <emphasis>0</emphasis> to set standalone="no" or set it to <emphasis>-1</emphasis> to remove the
standalone attribute from the XML declaration.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>compression</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>my $compression = $doc-&gt;compression;</funcsynopsisinfo>
</funcsynopsis>
<para>libxml2 allows reading of documents directly from gzipped files. In this case the compression variable is set to the compression level
of that file (0-8). If XML::LibXML parsed a different source or the file wasn't compressed, the returned value will be
<emphasis>-1</emphasis>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>setCompression</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$doc-&gt;setCompression($ziplevel);</funcsynopsisinfo>
</funcsynopsis>
<para>If one intends to write the document directly to a file, it is possible to set the compression level for a given document. This level
can be in the range from 0 to 8. If XML::LibXML should not try to compress use <emphasis>-1</emphasis> (default).</para>
<para>Note that this feature will <emphasis>only</emphasis> work if libxml2 is compiled with zlib support and toFile() is used for output.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>toString</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$docstring = $dom-&gt;toString($format);</funcsynopsisinfo>
</funcsynopsis>
<para><emphasis>toString</emphasis> is a DOM serializing function,
so the DOM Tree can be serialized into a XML string, ready for output.</para>
<para>IMPORTANT: unlike toString for other nodes, on document nodes
this function returns the XML as a byte string in the original encoding of the
document (see the actualEncoding() method)!</para>
<para>The optional <emphasis>$format</emphasis> parameter sets the indenting of the output. This parameter is expected to be an
<function>integer</function> value, that specifies that indentation should be used. The format parameter can have three different values if
it is used:</para>
<para>If $format is 0, than the document is dumped as it was originally parsed</para>
<para>If $format is 1, libxml2 will add ignorable white spaces, so the nodes content is easier to read. Existing text nodes will not be
altered</para>
<para>If $format is 2 (or higher), libxml2 will act as $format == 1 but it add a leading and a trailing line break to each text node.</para>
<para>libxml2 uses a hard-coded indentation of 2 space characters per indentation level. This value can not be altered on run-time.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>toStringC14N</term>
<listitem>
<para><funcsynopsis><funcsynopsisinfo>$c14nstr = $doc-&gt;toStringC14N($comment_flag,$xpath); </funcsynopsisinfo></funcsynopsis>
See the documentation in XML::LibXML::Node.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>toStringEC14N</term>
<listitem>
<para><funcsynopsis><funcsynopsisinfo>$ec14nstr = $doc-&gt;toStringEC14N($inclusive_prefix_list, $comment_flag,$xpath); </funcsynopsisinfo></funcsynopsis>
See the documentation in XML::LibXML::Node.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>serialize</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$str = $doc-&gt;serialize($format); </funcsynopsisinfo>
</funcsynopsis>
<para>An alias for toString(). This function was name added to be more consistent
with libxml2.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>serialize_c14n</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$c14nstr = $doc-&gt;serialize_c14n($comment_flag,$xpath); </funcsynopsisinfo>
</funcsynopsis>
<para>An alias for toStringC14N().</para>
</listitem>
</varlistentry>
<varlistentry>
<term>serialize_exc_c14n</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$ec14nstr = $doc-&gt;serialize_exc_c14n($comment_flag,$xpath,$inclusive_prefix_list); </funcsynopsisinfo>
</funcsynopsis>
<para>An alias for toStringEC14N().</para>
</listitem>
</varlistentry>
<varlistentry>
<term>toFile</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$state = $doc-&gt;toFile($filename, $format);</funcsynopsisinfo>
</funcsynopsis>
<para>This function is similar to toString(), but it writes the document directly into a filesystem. This function is very useful, if one
needs to store large documents.</para>
<para>The format parameter has the same behaviour as in toString().</para>
</listitem>
</varlistentry>
<varlistentry>
<term>toFH</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$state = $doc-&gt;toFH($fh, $format);</funcsynopsisinfo>
</funcsynopsis>
<para>This function is similar to toString(), but it writes the document directly to a filehandle or a stream.</para>
<para>The format parameter has the same behaviour as in toString().</para>
</listitem>
</varlistentry>
<varlistentry>
<term>toStringHTML</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$str = $document-&gt;toStringHTML();</funcsynopsisinfo>
</funcsynopsis>
<para><emphasis>toStringHTML</emphasis> serialize the tree to a string as HTML. With this method indenting is automatic and managed by
libxml2 internally.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>serialize_html</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$str = $document-&gt;serialize_html();</funcsynopsisinfo>
</funcsynopsis>
<para>An alias for toStringHTML().</para>
</listitem>
</varlistentry>
<varlistentry>
<term>is_valid</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$bool = $dom-&gt;is_valid();</funcsynopsisinfo>
</funcsynopsis>
<para>Returns either TRUE or FALSE depending on whether the DOM Tree is a valid Document or not.</para>
<para>You may also pass in a XML::LibXML::Dtd object, to validate against an external DTD:</para>
<programlisting> if (!$dom-&gt;is_valid($dtd)) {
warn("document is not valid!");
}</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term>validate</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$dom-&gt;validate();</funcsynopsisinfo>
</funcsynopsis>
<para>This is an exception throwing equivalent of is_valid. If the document is not valid it will throw an exception containing the error.
This allows you much better error reporting than simply is_valid or not.</para>
<para>Again, you may pass in a DTD object</para>
</listitem>
</varlistentry>
<varlistentry>
<term>documentElement</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$root = $dom-&gt;documentElement();</funcsynopsisinfo>
</funcsynopsis>
<para>Returns the root element of the Document. A document can have just one root element to contain the documents data.</para>
<para>Optionally one can use <emphasis>getDocumentElement</emphasis>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>setDocumentElement</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$dom-&gt;setDocumentElement( $root );</funcsynopsisinfo>
</funcsynopsis>
<para>This function enables you to set the root element for a document. The function supports the import of a node from a different document
tree.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>createElement</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$element = $dom-&gt;createElement( $nodename );</funcsynopsisinfo>
</funcsynopsis>
<para>This function creates a new Element Node bound to the DOM with the name <function>$nodename</function>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>createElementNS</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$element = $dom-&gt;createElementNS( $namespaceURI, $qname );</funcsynopsisinfo>
</funcsynopsis>
<para>This function creates a new Element Node bound to the DOM with the name <function>$nodename</function> and placed in the given
namespace.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>createTextNode</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$text = $dom-&gt;createTextNode( $content_text );</funcsynopsisinfo>
</funcsynopsis>
<para>As an equivalent of <emphasis>createElement</emphasis>, but it creates a <emphasis>Text Node</emphasis> bound to the DOM.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>createComment</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$comment = $dom-&gt;createComment( $comment_text );</funcsynopsisinfo>
</funcsynopsis>
<para>As an equivalent of <emphasis>createElement</emphasis>, but it creates a <emphasis>Comment Node</emphasis> bound to the DOM.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>createAttribute</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$attrnode = $doc-&gt;createAttribute($name [,$value]);</funcsynopsisinfo>
</funcsynopsis>
<para>Creates a new Attribute node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>createAttributeNS</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$attrnode = $doc-&gt;createAttributeNS( namespaceURI, $name [,$value] );</funcsynopsisinfo>
</funcsynopsis>
<para>Creates an Attribute bound to a namespace.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>createDocumentFragment</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$fragment = $doc-&gt;createDocumentFragment();</funcsynopsisinfo>
</funcsynopsis>
<para>This function creates a DocumentFragment.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>createCDATASection</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$cdata = $dom-&gt;create( $cdata_content );</funcsynopsisinfo>
</funcsynopsis>
<para>Similar to createTextNode and createComment, this function creates a CDataSection bound to the current DOM.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>createProcessingInstruction</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>my $pi = $doc-&gt;createProcessingInstruction( $target, $data );</funcsynopsisinfo>
</funcsynopsis>
<para>create a processing instruction node.</para>
<para>Since this method is quite long one may use its short form <emphasis>createPI()</emphasis>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>createEntityReference</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>my $entref = $doc-&gt;createEntityReference($refname);</funcsynopsisinfo>
</funcsynopsis>
<para>If a document has a DTD specified, one can create entity references by using this function. If one wants to add a entity reference to
the document, this reference has to be created by this function.</para>
<para>An entity reference is unique to a document and cannot be passed to other documents as other nodes can be passed.</para>
<para><emphasis>NOTE:</emphasis> A text content containing something that looks like an entity reference, will not be expanded to a real
entity reference unless it is a predefined entity</para>
<programlisting> my $string = "&amp;foo;";
$some_element-&gt;appendText( $string );
print $some_element-&gt;textContent; # prints "&amp;amp;foo;"</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term>createInternalSubset</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$dtd = $document-&gt;createInternalSubset( $rootnode, $public, $system);</funcsynopsisinfo>
</funcsynopsis>
<para>This function creates and adds an internal subset to the given document. Because the function automatically adds the DTD to the document
there is no need to add the created node explicitly to the document.</para>
<programlisting> my $document = XML::LibXML::Document-&gt;new();
my $dtd = $document-&gt;createInternalSubset( "foo", undef, "foo.dtd" );</programlisting>
<para>will result in the following XML document:</para>
<programlisting>&lt;?xml version="1.0"?&gt;
&lt;!DOCTYPE foo SYSTEM "foo.dtd"&gt; </programlisting>
<para>By setting the public parameter it is possible to set PUBLIC DTDs to a given document. So</para>
<programlisting>my $document = XML::LibXML::Document-&gt;new();
my $dtd = $document-&gt;createInternalSubset( "foo", "-//FOO//DTD FOO 0.1//EN", undef );
</programlisting>
<para>will cause the following declaration to be created on the document:</para>
<programlisting>&lt;?xml version="1.0"?&gt;
&lt;!DOCTYPE foo PUBLIC "-//FOO//DTD FOO 0.1//EN"&gt;</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term>createExternalSubset</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$dtd = $document-&gt;createExternalSubset( $rootnode, $public, $system);</funcsynopsisinfo>
</funcsynopsis>
<para>This function is similar to <function>createInternalSubset()</function> but this DTD is considered to be external and is therefore not
added to the document itself. Nevertheless it can be used for validation purposes.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>importNode</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$document-&gt;importNode( $node );</funcsynopsisinfo>
</funcsynopsis>
<para>If a node is not part of a document, it can be imported to another document. As specified in DOM Level 2 Specification the Node will
not be altered or removed from its original document (<function>$node-&gt;cloneNode(1)</function> will get called implicitly).</para>
<para><emphasis>NOTE:</emphasis> Don't try to use importNode() to import sub-trees that contain an entity reference - even if the entity
reference is the root node of the sub-tree. This will cause serious problems to your program. This is a limitation of libxml2 and not of
XML::LibXML itself.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>adoptNode</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$document-&gt;adoptNode( $node );</funcsynopsisinfo>
</funcsynopsis>
<para>If a node is not part of a document, it can be imported to another document. As specified in DOM Level 3 Specification the Node will
not be altered but it will removed from its original document.</para>
<para>After a document adopted a node, the node, its attributes and all its descendants belong to the new document. Because the node does
not belong to the old document, it will be unlinked from its old location first.</para>
<para><emphasis>NOTE:</emphasis> Don't try to adoptNode() to import sub-trees that contain entity references - even if the entity
reference is the root node of the sub-tree. This will cause serious problems to your program. This is a limitation of libxml2 and not of
XML::LibXML itself.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>externalSubset</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>my $dtd = $doc-&gt;externalSubset;</funcsynopsisinfo>
</funcsynopsis>
<para>If a document has an external subset defined it will be returned by this function.</para>
<para><emphasis>NOTE</emphasis> Dtd nodes are no ordinary nodes in libxml2. The support for these nodes in XML::LibXML is still limited. In
particular one may not want use common node function on doctype declaration nodes!</para>
</listitem>
</varlistentry>
<varlistentry>
<term>internalSubset</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>my $dtd = $doc-&gt;internalSubset;</funcsynopsisinfo>
</funcsynopsis>
<para>If a document has an internal subset defined it will be returned by this function.</para>
<para><emphasis>NOTE</emphasis> Dtd nodes are no ordinary nodes in libxml2. The support for these nodes in XML::LibXML is still limited. In
particular one may not want use common node function on doctype declaration nodes!</para>
</listitem>
</varlistentry>
<varlistentry>
<term>setExternalSubset</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$doc-&gt;setExternalSubset($dtd);</funcsynopsisinfo>
</funcsynopsis>
<para><emphasis>EXPERIMENTAL!</emphasis></para>
<para>This method sets a DTD node as an external subset of the given document.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>setInternalSubset</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$doc-&gt;setInternalSubset($dtd);</funcsynopsisinfo>
</funcsynopsis>
<para><emphasis>EXPERIMENTAL!</emphasis></para>
<para>This method sets a DTD node as an internal subset of the given document.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>removeExternalSubset</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>my $dtd = $doc-&gt;removeExternalSubset();</funcsynopsisinfo>
</funcsynopsis>
<para><emphasis>EXPERIMENTAL!</emphasis></para>
<para>If a document has an external subset defined it can be removed from the document by using this function. The removed dtd node will be
returned.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>removeInternalSubset</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>my $dtd = $doc-&gt;removeInternalSubset();</funcsynopsisinfo>
</funcsynopsis>
<para><emphasis>EXPERIMENTAL!</emphasis></para>
<para>If a document has an internal subset defined it can be removed from the document by using this function. The removed dtd node will be
returned.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getElementsByTagName</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>my @nodelist = $doc-&gt;getElementsByTagName($tagname);</funcsynopsisinfo>
</funcsynopsis>
<para>Implements the DOM Level 2 function</para>
<para>In SCALAR context this function returns a <function>XML::LibXML::NodeList</function> object.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getElementsByTagNameNS</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>my @nodelist = $doc-&gt;getElementsByTagNameNS($nsURI,$tagname);</funcsynopsisinfo>
</funcsynopsis>
<para>Implements the DOM Level 2 function</para>
<para>In SCALAR context this function returns a <function>XML::LibXML::NodeList</function> object.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getElementsByLocalName</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>my @nodelist = $doc-&gt;getElementsByLocalName($localname);</funcsynopsisinfo>
</funcsynopsis>
<para>This allows the fetching of all nodes from a given document with the given Localname.</para>
<para>In SCALAR context this function returns a <function>XML::LibXML::NodeList</function> object.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getElementById</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>my $node = $doc-&gt;getElementById($id);</funcsynopsisinfo>
</funcsynopsis>
<para>Returns the element that has an ID attribute
with the given value. If no such element exists,
this returns undef.</para>
<para>Note: the ID of an element
may change while manipulating the document.
For documents with a DTD, the information about ID attributes
is only available if DTD loading/validation has been requested.
For HTML documents parsed with the HTML
parser ID detection is done
automatically. In XML documents, all "xml:id"
attributes are considered to be of type ID.
You can test ID-ness of an attribute node
with $attr-&gt;isId().
</para>
<para>In versions 1.59 and earlier this method was
called getElementsById() (plural) by
mistake. Starting from 1.60 this name is
maintained as an alias only for backward compatibility.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>indexElements</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$dom-&gt;indexElements();</funcsynopsisinfo>
</funcsynopsis>
<para>This function causes libxml2 to stamp all elements in a document with their document position index which considerably speeds up XPath
queries for large documents. It should only be used with static documents that won't be further changed by any DOM methods, because once
a document is indexed, XPath will always prefer the index to other methods of determining the document order of nodes. XPath could therefore
return improperly ordered node-lists when applied on a document that has been changed after being indexed. It is of course possible to use
this method to re-index a modified document before using it with XPath again. This function is not a part of the DOM specification.</para>
<para>This function returns number of elements indexed, -1 if error occurred, or -2 if this feature is not available in the running libxml2.</para>
</listitem>
</varlistentry>
</variablelist>
</chapter>
<chapter>
<title>Abstract Base Class of XML::LibXML Nodes</title>
<titleabbrev>XML::LibXML::Node</titleabbrev>
<sect1>
<title>Synopsis</title>
<programlisting>use XML::LibXML;</programlisting>
</sect1>
<para>XML::LibXML::Node defines functions that are common to
all Node Types. A LibXML::Node should never be created
standalone, but as an instance of a high level class such as
LibXML::Element or LibXML::Text. The class itself should
provide only common functionality. In XML::LibXML each node is
part either of a document or a document-fragment. Because of
this there is no node without a parent. This may causes
confusion with "unbound" nodes.</para>
<variablelist>
<varlistentry>
<term>nodeName</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$name = $node-&gt;nodeName;</funcsynopsisinfo>
</funcsynopsis>
<para>Returns the node's name. This function is
aware of namespaces and returns the full name of
the current node (<function>prefix:localname</function>).
</para>
<para>Since 1.62 this function also returns the correct
DOM names for node types with constant names, namely:
#text, #cdata-section, #comment, #document,
#document-fragment.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>setNodeName</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;setNodeName( $newName );</funcsynopsisinfo>
</funcsynopsis>
<para>In very limited situations, it is useful to change a nodes name. In the DOM specification this should throw an error. This Function is
aware of namespaces.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>isSameNode</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$bool = $node-&gt;isSameNode( $other_node );</funcsynopsisinfo>
</funcsynopsis>
<para>returns TRUE (1) if the given nodes refer to
the same node structure, otherwise FALSE (0) is
returned.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>isEqual</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$bool = $node-&gt;isEqual( $other_node );</funcsynopsisinfo>
</funcsynopsis>
<para>deprecated version of isSameNode().</para>
<para><emphasis>NOTE</emphasis> isEqual will change behaviour to follow the DOM specification</para>
</listitem>
</varlistentry>
<varlistentry>
<term>nodeValue</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$content = $node-&gt;nodeValue;</funcsynopsisinfo>
</funcsynopsis>
<para>If the node has any content (such as stored in a <function>text node</function>) it can get requested through this function.</para>
<para><emphasis>NOTE:</emphasis> Element Nodes have no content per definition. To get the text value of an Element use textContent()
instead!</para>
</listitem>
</varlistentry>
<varlistentry>
<term>textContent</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$content = $node-&gt;textContent;</funcsynopsisinfo>
</funcsynopsis>
<para>this function returns the content of all text nodes in the descendants of the given node as specified in DOM.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>nodeType</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$type = $node-&gt;nodeType;</funcsynopsisinfo>
</funcsynopsis>
<para>Return the node's type. The possible types are described in the libxml2 <emphasis>tree.h</emphasis> documentation. The return
value of this function is a numeric value. Therefore it differs from the result of perl ref function.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>unbindNode</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;unbindNode();</funcsynopsisinfo>
</funcsynopsis>
<para>Unbinds the Node from its siblings and Parent, but not from the Document it belongs to. If the node is not inserted into the DOM
afterwards it will be lost after the program terminated. From a low level view, the unbound node is stripped from the context it is and
inserted into a (hidden) document-fragment.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>removeChild</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$childnode = $node-&gt;removeChild( $childnode );</funcsynopsisinfo>
</funcsynopsis>
<para>This will unbind the Child Node from its parent <function>$node</function>. The function returns the unbound node. If
<function>oldNode</function> is not a child of the given Node the function will fail.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>replaceChild</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$oldnode = $node-&gt;replaceChild( $newNode, $oldNode );</funcsynopsisinfo>
</funcsynopsis>
<para>Replaces the <function>$oldNode</function> with the <function>$newNode</function>. The <function>$oldNode</function> will be unbound
from the Node. This function differs from the DOM L2 specification, in the case, if the new node is not part of the document, the node will
be imported first.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>replaceNode</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;replaceNode($newNode);</funcsynopsisinfo>
</funcsynopsis>
<para>This function is very similar to replaceChild(), but it replaces the node itself rather than a childnode. This is useful if a node
found by any XPath function, should be replaced.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>appendChild</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$childnode = $node-&gt;appendChild( $childnode );</funcsynopsisinfo>
</funcsynopsis>
<para>The function will add the <function>$childnode</function> to the end of <function>$node</function>'s children. The function should
fail, if the new childnode is already a child of <function>$node</function>. This function differs from the DOM L2 specification, in the
case, if the new node is not part of the document, the node will be imported first.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>addChild</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$childnode = $node-&gt;addChild( $chilnode );</funcsynopsisinfo>
</funcsynopsis>
<para>As an alternative to appendChild() one can use the addChild() function. This function is a bit faster, because it avoids all DOM
conformity checks. Therefore this function is quite useful if one builds XML documents in memory where the order and ownership (<function>ownerDocument</function>)
is assured.</para>
<para>addChild() uses libxml2's own xmlAddChild() function. Thus it has to be used with extra care: If a text node is added to a node
and the node itself or its last childnode is as well a text node, the node to add will be merged with the one already available. The current
node will be removed from memory after this action. Because perl is not aware of this action, the perl instance is still available.
XML::LibXML will catch the loss of a node and refuse to run any function called on that node.</para>
<programlisting> my $t1 = $doc-&gt;createTextNode( "foo" );
my $t2 = $doc-&gt;createTextNode( "bar" );
$t1-&gt;addChild( $t2 ); # is OK
my $val = $t2-&gt;nodeValue(); # will fail, script dies</programlisting>
<para>Also addChild() will not check if the added node belongs to the same document as the node it will be added to. This could lead to
inconsistent documents and in more worse cases even to memory violations, if one does not keep track of this issue.</para>
<para>Although this sounds like a lot of trouble, addChild() is useful if a document is built from a stream, such as happens sometimes in
SAX handlers or filters.</para>
<para>If you are not sure about the source of your nodes, you better stay with appendChild(), because this function is more user friendly in
the sense of being more error tolerant.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>addNewChild</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node = $parent-&gt;addNewChild( $nsURI, $name );</funcsynopsisinfo>
</funcsynopsis>
<para>Similar to <function>addChild()</function>, this function uses low level libxml2 functionality to provide faster interface for DOM
building. <emphasis>addNewChild()</emphasis> uses <function>xmlNewChild()</function> to create a new node on a given parent element.</para>
<para>addNewChild() has two parameters $nsURI and $name, where $nsURI is an (optional) namespace URI. $name is the fully qualified element
name; addNewChild() will determine the correct prefix if necessary.</para>
<para>The function returns the newly created node.</para>
<para>This function is very useful for DOM building, where a created node can be directly associated with its parent. <emphasis>NOTE</emphasis>
this function is not part of the DOM specification and its use will limit your code to XML::LibXML.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>addSibling</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;addSibling($newNode);</funcsynopsisinfo>
</funcsynopsis>
<para>addSibling() allows adding an additional node to the end of a nodelist, defined by the given node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>cloneNode</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$newnode =$node-&gt;cloneNode( $deep );</funcsynopsisinfo>
</funcsynopsis>
<para><emphasis>cloneNode</emphasis> creates a
copy of <function>$node</function>. When $deep is
set to 1 (true) the function will copy all
childnodes as well. If $deep is 0 only the current
node will be copied. Note that in case of element,
attributes are copied even if $deep is 0.
</para>
<para>Note that the behavior of this function for $deep=0
has changed in 1.62 in order to be consistent with the DOM spec
(in older versions attributes and namespace information
was not copied for elements).</para>
</listitem>
</varlistentry>
<varlistentry>
<term>parentNode</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$parentnode = $node-&gt;parentNode;</funcsynopsisinfo>
</funcsynopsis>
<para>Returns simply the Parent Node of the current node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>nextSibling</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$nextnode = $node-&gt;nextSibling();</funcsynopsisinfo>
</funcsynopsis>
<para>Returns the next sibling if any .</para>
</listitem>
</varlistentry>
<varlistentry>
<term>previousSibling</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$prevnode = $node-&gt;previousSibling();</funcsynopsisinfo>
</funcsynopsis>
<para>Analogous to <emphasis>getNextSibling</emphasis> the function returns the previous sibling if any.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>hasChildNodes</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$boolean = $node-&gt;hasChildNodes();</funcsynopsisinfo>
</funcsynopsis>
<para>If the current node has Childnodes this function returns TRUE (1), otherwise it returns FALSE (0, not undef).</para>
</listitem>
</varlistentry>
<varlistentry>
<term>firstChild</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$childnode = $node-&gt;firstChild;</funcsynopsisinfo>
</funcsynopsis>
<para>If a node has childnodes this function will return the first node in the childlist.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>lastChild</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$childnode = $node-&gt;lastChild;</funcsynopsisinfo>
</funcsynopsis>
<para>If the <function>$node</function> has childnodes this function returns the last child node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>ownerDocument</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$documentnode = $node-&gt;ownerDocument;</funcsynopsisinfo>
</funcsynopsis>
<para>Through this function it is always possible to access the document the current node is bound to.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getOwner</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node = $node-&gt;getOwner;</funcsynopsisinfo>
</funcsynopsis>
<para>This function returns the node the current node is associated with. In most cases this will be a document node or a document fragment
node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>setOwnerDocument</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;setOwnerDocument( $doc );</funcsynopsisinfo>
</funcsynopsis>
<para>This function binds a node to another DOM. This method unbinds the node first, if it is already bound to another document.</para>
<para>This function is the opposite calling of XML::LibXML::Document's adoptNode() function. Because of this it has the same limitations
with Entity References as adoptNode().</para>
</listitem>
</varlistentry>
<varlistentry>
<term>insertBefore</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;insertBefore( $newNode, $refNode );</funcsynopsisinfo>
</funcsynopsis>
<para>The method inserts <function>$newNode</function> before <function>$refNode</function>. If <function>$refNode</function> is undefined,
the newNode will be set as the new last child of the parent node. This function differs from the DOM L2 specification, in the case, if the
new node is not part of the document, the node will be imported first, automatically.</para>
<para>$refNode has to be passed to the function even if it is undefined:</para>
<programlisting> $node-&gt;insertBefore( $newNode, undef ); # the same as $node-&gt;appendChild( $newNode );
$node-&gt;insertBefore( $newNode ); # wrong</programlisting>
<para>Note, that the reference node has to be a direct child of the node the function is called on. Also, $newChild is not allowed to be an
ancestor of the new parent node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>insertAfter</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;insertAfter( $newNode, $refNode );</funcsynopsisinfo>
</funcsynopsis>
<para>The method inserts <function>$newNode</function> after <function>$refNode</function>. If <function>$refNode</function> is undefined,
the newNode will be set as the new last child of the parent node.</para>
<para>Note, that $refNode has to be passed explicitly even if it is undef.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>findnodes</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>@nodes = $node-&gt;findnodes( $xpath_expression );</funcsynopsisinfo>
</funcsynopsis>
<para><emphasis>findnodes</emphasis> evaluates the xpath expression (XPath 1.0) on the current node and returns the resulting node set as an array. In scalar context returns a <function>XML::LibXML::NodeList</function> object.</para>
<para><emphasis>NOTE ON NAMESPACES AND XPATH</emphasis>:</para>
<para>A common mistake about
XPath is to assume that node tests consisting of an
element name with no prefix match elements in the default
namespace. This assumption is wrong - by XPath
specification, such node tests can only match elements
that are in no (i.e. null) namespace.
</para>
<para>
So, for example, one cannot match the root element of an
XHTML document with <code>$node-&gt;find('/html')</code>
since <literal>'/html'</literal> would only match if the
root element <literal>&lt;html&gt;</literal> had no
namespace, but all XHTML elements belong to the namespace
http://www.w3.org/1999/xhtml. (Note that
<literal>xmlns="..."</literal> namespace declarations can
also be specified in a DTD, which makes the situation even worse, since
the XML document looks as if there was no default namespace).
</para>
<para>There are several possible ways to deal with namespaces in XPath:
</para>
<itemizedlist>
<listitem>
<para>
The recommended way is to use the
<function>XML::LibXML::XPathContext</function> module
to define an explicit context
for XPath evaluation, in which a document independent
prefix-to-namespace mapping can be defined. For
example:
</para>
<programlisting>my $xpc = XML::LibXML::XPathContext-&gt;new;
$xpc-&gt;registerNs('x', 'http://www.w3.org/1999/xhtml');
$xpc-&gt;find('/x:html',$node);</programlisting>
</listitem>
<listitem><para>
Another possibility is to use prefixes declared
in the queried document (if known).
If the document declares a prefix for the
namespace in question (and the context node is in the
scope of the declaration),
<function>XML::LibXML</function> allows you to use the
prefix in the XPath expression, e.g.:
</para>
<programlisting>$node-&gt;find('/x:html');</programlisting>
</listitem>
</itemizedlist>
<para>See also XML::LibXML::XPathContext-&gt;findnodes.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>find</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$result = $node-&gt;find( $xpath );</funcsynopsisinfo>
</funcsynopsis>
<para><emphasis>find</emphasis> evaluates the XPath 1.0 expression using the current node as the context of the expression, and returns the
result depending on what type of result the XPath expression had. For example, the XPath "1 * 3 + 52" results in a
<function>XML::LibXML::Number</function> object being returned. Other expressions might return a <function>XML::LibXML::Boolean</function>
object, or a <function>XML::LibXML::Literal</function> object (a string). Each of those objects uses Perl's overload feature to "do
the right thing" in different contexts.</para>
<para>See also XML::LibXML::XPathContext-&gt;find.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>findvalue</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>print $node-&gt;findvalue( $xpath );</funcsynopsisinfo>
</funcsynopsis>
<para><emphasis>findvalue</emphasis> is exactly equivalent to:</para>
<programlisting> $node-&gt;find( $xpath )-&gt;to_literal; </programlisting>
<para>That is, it returns the literal value of the results. This enables you to ensure that you get a string back from your search, allowing
certain shortcuts. This could be used as the equivalent of XSLT's &lt;xsl:value-of select="some_xpath"/&gt;.</para>
<para>See also XML::LibXML::XPathContext-&gt;findvalue.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>childNodes</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>@childnodes = $node-&gt;childNodes;</funcsynopsisinfo>
</funcsynopsis>
<para><emphasis>getChildnodes</emphasis> implements a more intuitive interface to the childnodes of the current node. It enables you to pass
all children directly to a <function>map</function> or <function>grep</function>. If this function is called in scalar context, a
<function>XML::LibXML::NodeList</function> object will be returned.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>toString</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$xmlstring = $node-&gt;toString($format,$docencoding);</funcsynopsisinfo>
</funcsynopsis>
<para>This is the equivalent to <function>XML::LibXML::Document::toString</function> for a single node. This means a node and all its
childnodes will be dumped into the result string.</para>
<para>Additionally to the $format flag of XML::LibXML::Document, this version accepts the optional $docencoding flag. If this flag is set
this function returns the string in its original encoding (the encoding of the document) rather than UTF-8.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>toStringC14N</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$c14nstring = $node-&gt;toStringC14N($with_comments, $xpath_expression);</funcsynopsisinfo>
</funcsynopsis>
<para>The function is similar to toString(). Instead of simply serializing the document tree, it transforms it as it is specified in the
XML-C14N Specification (see <ulink url="http://www.w3.org/TR/xml-c14n">http://www.w3.org/TR/xml-c14n</ulink>).
Such transformation is known as canonization.</para>
<para>If $with_comments is 0 or not defined, the result-document will not contain any comments that exist in the original document. To
include comments into the canonized document, $with_comments has to be set to 1.</para>
<para>The parameter $xpath_expression defines the nodeset of nodes that should be visible in the resulting document. This can be used to
filter out some nodes. One has to note, that only the nodes that are part of the nodeset, will be included into the result-document. Their
child-nodes will not exist in the resulting document, unless they are part of the nodeset defined by the xpath expression.</para>
<para>If $xpath_expression is omitted or empty, toStringC14N() will include all nodes in the given sub-tree.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>toStringEC14N</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$ec14nstring = $node-&gt;toStringEC14N($with_comments, $xpath_expression, $inclusive_prefix_list);</funcsynopsisinfo>
</funcsynopsis>
<para>The function is similar to toStringC14N() but follows
the XML-EXC-C14N Specification (see <ulink url="http://www.w3.org/TR/xml-exc-c14n">http://www.w3.org/TR/xml-exc-c14n</ulink>)
for exclusive canonization of XML.</para>
<para>The first two arguments are as above. If $inclusive_prefix_list is
used, it should be an ARRAY reference listing
namespace prefixes that are to be handled in the manner described by the Canonical XML Recommendation (i.e. preserved in the output even if the namespace is not used). C.f. the spec for details.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>serialize</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$str = $doc-&gt;serialize($format); </funcsynopsisinfo>
</funcsynopsis>
<para>An alias for toString(). This function was name added to be more consistent
with libxml2.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>serialize_c14n</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$c14nstr = $doc-&gt;serialize_c14n($comment_flag,$xpath); </funcsynopsisinfo>
</funcsynopsis>
<para>An alias for toStringC14N().</para>
</listitem>
</varlistentry>
<varlistentry>
<term>serialize_exc_c14n</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$ec14nstr = $doc-&gt;serialize_ec14n($comment_flag,$xpath,$inclusive_prefix_list); </funcsynopsisinfo>
</funcsynopsis>
<para>An alias for toStringEC14N().</para>
</listitem>
</varlistentry>
<varlistentry>
<term>localname</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$localname = $node-&gt;localname;</funcsynopsisinfo>
</funcsynopsis>
<para>Returns the local name of a tag. This is the part behind the colon.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>prefix</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$nameprefix = $node-&gt;prefix;</funcsynopsisinfo>
</funcsynopsis>
<para>Returns the prefix of a tag. This is the part before the colon.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>namespaceURI</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$uri = $node-&gt;namespaceURI();</funcsynopsisinfo>
</funcsynopsis>
<para>returns the URI of the current namespace.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>hasAttributes</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$boolean = $node-&gt;hasAttributes();</funcsynopsisinfo>
</funcsynopsis>
<para>returns 1 (TRUE) if the current node has any attributes set, otherwise 0 (FALSE) is returned.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>attributes</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>@attributelist = $node-&gt;attributes();</funcsynopsisinfo>
</funcsynopsis>
<para>This function returns all attributes and namespace declarations assigned to the given node.</para>
<para>Because XML::LibXML does not implement namespace declarations and attributes the same way, it is required to test what kind of node is
handled while accessing the functions result.</para>
<para>If this function is called in array context the attribute nodes are returned as an array. In scalar context the function will return a
<function>XML::LibXML::NamedNodeMap</function> object.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>lookupNamespaceURI</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$URI = $node-&gt;lookupNamespaceURI( $prefix );</funcsynopsisinfo>
</funcsynopsis>
<para>Find a namespace URI by its prefix starting at the current node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>lookupNamespacePrefix</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$prefix = $node-&gt;lookupNamespacePrefix( $URI );</funcsynopsisinfo>
</funcsynopsis>
<para>Find a namespace prefix by its URI starting at the current node.</para>
<para><emphasis>NOTE</emphasis> Only the namespace URIs are meant to be unique. The prefix is only document related. Also the document might
have more than a single prefix defined for a namespace.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>normalize</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;normalize;</funcsynopsisinfo>
</funcsynopsis>
<para>This function normalizes adjacent text nodes. This function is not as strict as libxml2's xmlTextMerge() function, since it will
not free a node that is still referenced by the perl layer.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getNamespaces</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>@nslist = $node-&gt;getNamespaces;</funcsynopsisinfo>
</funcsynopsis>
<para>If a node has any namespaces defined, this function will return these namespaces. Note, that this will not return all namespaces that
are in scope, but only the ones declared explicitly for that node.</para>
<para>Although getNamespaces is available for all nodes, it only makes sense if used with element nodes.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>removeChildNodes</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;removeChildNodes();</funcsynopsisinfo>
</funcsynopsis>
<para>This function is not specified for any DOM level: It removes all childnodes from a node in a single step. Other than the libxml2
function itself (xmlFreeNodeList), this function will not immediately remove the nodes from the memory. This saves one from getting memory
violations, if there are nodes still referred to from the Perl level.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>nodePath</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;nodePath();</funcsynopsisinfo>
</funcsynopsis>
<para>This function is not specified for any DOM level: It returns a canonical structure based XPath for a given node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>line_number</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$lineno = $node-&gt;line_number();</funcsynopsisinfo>
</funcsynopsis>
<para>This function returns the line number where the tag was found during parsing. If a node is added to the document the line number is 0.
Problems may occur, if a node from one document is passed to another one.</para>
<para>Note: line_number() is special to XML::LibXML and not part of the DOM specification.</para>
<para>If the line_numbers flag of the parser was not activated before parsing, line_number() will always return 0.</para>
</listitem>
</varlistentry>
</variablelist>
</chapter>
<chapter>
<title>XML::LibXML Class for Element Nodes</title>
<titleabbrev>XML::LibXML::Element</titleabbrev>
<sect1>
<title>Synopsis</title>
<programlisting>use XML::LibXML;
# Only methods specific to Element nodes listed here,
# see XML::LibXML::Node manpage for other methods</programlisting>
</sect1>
<variablelist>
<varlistentry>
<term>new</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node = XML::LibXML::Element-&gt;new( $name );</funcsynopsisinfo>
</funcsynopsis>
<para>This function creates a new node unbound to any DOM.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>setAttribute</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;setAttribute( $aname, $avalue );</funcsynopsisinfo>
</funcsynopsis>
<para>This method sets or replaces the node's attribute <function>$aname</function> to the value <function>$avalue</function></para>
</listitem>
</varlistentry>
<varlistentry>
<term>setAttributeNS</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;setAttributeNS( $nsURI, $aname, $avalue );</funcsynopsisinfo>
</funcsynopsis>
<para>Namespace-aware version of <function>setAttribute</function>, where
<function>$nsURI</function> is a namespace URI,
<function>$aname</function> is a qualified name,
and <function>$avalue</function> is the value.
The namespace URI may be null (empty or undefined)
in order to create an attribute which has no namespace.
</para>
<para>
The current implementation differs from DOM in the following aspects
</para>
<para>
If an attribute with the same local name and namespace URI already exists
on the element, but its prefix differs from the prefix of <function>$aname</function>,
then this function is supposed to change the prefix (regardless
of namespace declarations and possible collisions).
However, the current implementation does rather the opposite.
If a prefix is declared for the namespace URI in the scope
of the attribute, then the already declared prefix is used,
disregarding the prefix specified in <function>$aname</function>.
If no prefix is declared for the namespace, the function tries
to declare the prefix specified in <function>$aname</function>
and dies if the prefix is already taken by some other namespace.
</para>
<para>According to DOM Level 2 specification, this method can also be used to
create or modify special attributes used for declaring XML namespaces
(which belong to the namespace "http://www.w3.org/2000/xmlns/" and
have prefix or name "xmlns"). This should work since version 1.61,
but again the implementation differs from DOM specification in the following:
if a declaration of the same namespace prefix already exists
on the element, then changing its value via this method
automatically changes the namespace of all elements and attributes
in its scope. This is because in libxml2 the namespace URI of an element
is not static but is computed from a pointer to a namespace declaration attribute.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getAttribute</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$avalue = $node-&gt;getAttribute( $aname );</funcsynopsisinfo>
</funcsynopsis>
<para>If <function>$node</function> has an attribute with the name <function>$aname</function>, the value of this attribute will get
returned.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getAttributeNS</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$avalue = $node-&gt;setAttributeNS( $nsURI, $aname );</funcsynopsisinfo>
</funcsynopsis>
<para>Retrieves an attribute value by local name and namespace URI.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getAttributeNode</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$attrnode = $node-&gt;getAttributeNode( $aname );</funcsynopsisinfo>
</funcsynopsis>
<para>Retrieve an attribute node by name. If no attribute with a given name exists, <function>undef</function> is returned.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getAttributeNodeNS</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$attrnode = $node-&gt;getAttributeNodeNS( $namespaceURI, $aname );</funcsynopsisinfo>
</funcsynopsis>
<para>Retrieves an attribute node by local name and namespace URI. If no attribute with a given localname and namespace exists, <function>undef</function> is returned.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>removeAttribute</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;removeAttribute( $aname );</funcsynopsisinfo>
</funcsynopsis>
<para>The method removes the attribute <function>$aname</function> from the node's attribute list, if the attribute can be found.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>removeAttributeNS</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;removeAttributeNS( $nsURI, $aname );</funcsynopsisinfo>
</funcsynopsis>
<para>Namespace version of <function>removeAttribute</function></para>
</listitem>
</varlistentry>
<varlistentry>
<term>hasAttribute</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$boolean = $node-&gt;hasAttribute( $aname );</funcsynopsisinfo>
</funcsynopsis>
<para>This function tests if the named attribute is set for the node. If the attribute is specified, TRUE (1) will be returned, otherwise the
return value is FALSE (0).</para>
</listitem>
</varlistentry>
<varlistentry>
<term>hasAttributeNS</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$boolean = $node-&gt;hasAttributeNS( $nsURI, $aname );</funcsynopsisinfo>
</funcsynopsis>
<para>namespace version of <function>hasAttribute</function></para>
</listitem>
</varlistentry>
<varlistentry>
<term>getChildrenByTagName</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>@nodes = $node-&gt;getChildrenByTagName($tagname);</funcsynopsisinfo>
</funcsynopsis>
<para>The function gives direct access to all child elements of the current node with a given tagname, where
tagname is a qualified name, that is, in case of namespace usage it may consist of a prefix and local
name. This function makes things a lot easier if one needs
to handle big data sets. A special tagname '*' can be used to match any name.</para>
<para>If this function is called in SCALAR context, it returns the number of elements found.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getChildrenByTagNameNS</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>@nodes = $node-&gt;getChildrenByTagNameNS($nsURI,$tagname);</funcsynopsisinfo>
</funcsynopsis>
<para>Namespace version of <function>getChildrenByTagName</function>. A special nsURI '*' matches any namespace URI,
in which case the function behaves just like <function>getChildrenByLocalName</function>.</para>
<para>If this function is called in SCALAR context, it returns the number of elements found.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getChildrenByLocalName</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>@nodes = $node-&gt;getChildrenByLocalName($localname);</funcsynopsisinfo>
</funcsynopsis>
<para>The function gives direct access to all child elements of the current node with a given local name. It makes things a lot easier if one needs
to handle big data sets. A special <function>localname</function> '*' can be used to match any local name.</para>
<para>If this function is called in SCALAR context, it returns the number of elements found.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getElementsByTagName</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>@nodes = $node-&gt;getElementsByTagName($tagname);</funcsynopsisinfo>
</funcsynopsis>
<para>This function is part of the spec. It
fetches all descendants of a node with a given tagname,
where <function>tagname</function> is a qualified name,
that is, in case of namespace usage it may consist of a prefix and
local name.
A special <function>tagname</function> '*' can be used to match any tag name.
</para>
<para>In SCALAR context this function returns a <function>XML::LibXML::NodeList</function> object.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getElementsByTagNameNS</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>@nodes = $node-&gt;getElementsByTagNameNS($nsURI,$localname);</funcsynopsisinfo>
</funcsynopsis>
<para>Namespace version of <function>getElementsByTagName</function> as found in the DOM spec.
A special <function>localname</function> '*' can be used to match any local name
and <function>nsURI</function> '*' can be used to match any namespace URI.</para>
<para>In SCALAR context this function returns a <function>XML::LibXML::NodeList</function> object.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getElementsByLocalName</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>@nodes = $node-&gt;getElementsByLocalName($localname);</funcsynopsisinfo>
</funcsynopsis>
<para>This function is not found in the DOM specification. It is a mix of getElementsByTagName and getElementsByTagNameNS. It will fetch all
tags matching the given local-name. This allows one to select tags with the same local name across namespace borders.</para>
<para>In SCALAR context this function returns a <function>XML::LibXML::NodeList</function> object.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>appendWellBalancedChunk</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;appendWellBalancedChunk( $chunk );</funcsynopsisinfo>
</funcsynopsis>
<para>Sometimes it is necessary to append a string coded XML Tree to a node. <emphasis>appendWellBalancedChunk</emphasis> will do the trick
for you. But this is only done if the String is <function>well-balanced</function>.</para>
<para><emphasis>Note that appendWellBalancedChunk() is only left for compatibility reasons</emphasis>. Implicitly it uses</para>
<programlisting> my $fragment = $parser-&gt;parse_xml_chunk( $chunk );
$node-&gt;appendChild( $fragment );</programlisting>
<para>This form is more explicit and makes it easier to control the flow of a script.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>appendText</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;appendText( $PCDATA );</funcsynopsisinfo>
</funcsynopsis>
<para>alias for appendTextNode().</para>
</listitem>
</varlistentry>
<varlistentry>
<term>appendTextNode</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;appendTextNode( $PCDATA );</funcsynopsisinfo>
</funcsynopsis>
<para>This wrapper function lets you add a string directly to an element node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>appendTextChild</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;appendTextChild( $childname , $PCDATA );</funcsynopsisinfo>
</funcsynopsis>
<para>Somewhat similar with <function>appendTextNode</function>: It lets you set an Element, that contains only a <function>text node</function>
directly by specifying the name and the text content.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>setNamespace</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;setNamespace( $nsURI , $nsPrefix, $activate );</funcsynopsisinfo>
</funcsynopsis>
<para>setNamespace() allows one to apply a
namespace to an element. The function takes three
parameters: 1. the namespace URI, which is
required and the two optional values prefix, which
is the namespace prefix, as it should be used in
child elements or attributes as well as the
additional activate parameter. If prefix is not given,
undefined or empty, this function tries to create a
declaration of the default namespace.
</para>
<para>The activate parameter is most useful: If
this parameter is set to FALSE (0), a new namespace
declaration is simply added to the element
while the element's namespace itself is not
altered. Nevertheless, activate is set to TRUE (1)
on default. In this case the namespace
is used as the node's effective
namespace. This means the namespace prefix is
added to the node name and if there was a
namespace already active for the node, it will
be replaced (but its declaration is not removed from the document).
A new namespace declaration is only created if necessary
(that is, if the element is already in the scope
of a namespace declaration associating the prefix
with the namespace URI, then this declaration is reused).
</para>
<para>The following example may clarify this:</para>
<programlisting> my $e1 = $doc-&gt;createElement("bar");
$e1-&gt;setNamespace("http://foobar.org", "foo")</programlisting>
<para>results</para>
<programlisting> &lt;foo:bar xmlns:foo="http://foobar.org"/&gt;</programlisting>
<para>while</para>
<programlisting> my $e2 = $doc-&gt;createElement("bar");
$e2-&gt;setNamespace("http://foobar.org", "foo",0)</programlisting>
<para>results only</para>
<programlisting> &lt;bar xmlns:foo="http://foobar.org"/&gt;</programlisting>
<para>By using $activate == 0 it is possible to
create multiple namespace declarations on a single
element.</para>
<para>The function fails if it is required to
create a declaration associating the prefix
with the namespace URI but the element already
carries a declaration with the same prefix but
different namespace URI.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>setNamespaceDeclURI</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;setNamespaceDeclURI( $nsPrefix, $newURI );</funcsynopsisinfo>
</funcsynopsis>
<para>EXPERIMENTAL IN 1.61 !</para>
<para>This function manipulates
directly with an existing namespace
declaration on an element. It takes
two parameters: the prefix by which it
looks up the namespace declaration and
a new namespace URI which replaces its previous
value.</para>
<para>It returns 1 if the namespace declaration
was found and changed, 0 otherwise.</para>
<para>All elements and attributes (even those previously
unbound from the document) for which the
namespace declaration determines their namespace
belong to the new namespace after
the change.
</para>
<para>If the new URI is undef or empty, the nodes
have no namespace and no prefix after the change.
Namespace declarations
once nulled in this way do not
further appear in the serialized output
(but do remain in the document for internal integrity
of libxml2 data structures).
</para>
<para>This function is NOT part of any DOM API.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>setNamespaceDeclPrefix</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node-&gt;setNamespaceDeclPrefix( $oldPrefix, $newPrefix );</funcsynopsisinfo>
</funcsynopsis>
<para>EXPERIMENTAL IN 1.61 !</para>
<para>This function manipulates
directly with an existing namespace
declaration on an element. It takes
two parameters: the old prefix by which it
looks up the namespace declaration and
a new prefix which is to replace the old one.</para>
<para>The function dies with an error
if the element is in the scope of
another declaration whose prefix equals
to the new prefix, or if the change should
result in a declaration with a non-empty prefix but
empty namespace URI.
Otherwise, it returns 1 if the namespace declaration
was found and changed and 0 if not found.</para>
<para>All elements and attributes (even those previously
unbound from the document) for which the
namespace declaration determines their namespace
change their prefix to the new value.
</para>
<para>If the new prefix is undef or empty,
the namespace declaration becomes
a declaration of a default namespace.
The corresponding nodes drop their namespace prefix
(but remain in the, now default, namespace).
In this case the function fails, if the containing element
is in the scope of another default namespace declaration.
</para>
<para>This function is NOT part of any DOM API.</para>
</listitem>
</varlistentry>
</variablelist>
</chapter>
<chapter>
<title>XML::LibXML Class for Text Nodes</title>
<titleabbrev>XML::LibXML::Text</titleabbrev>
<sect1>
<title>Synopsis</title>
<programlisting>use XML::LibXML;
# Only methods specific to Text nodes listed here,
# see XML::LibXML::Node manpage for other methods</programlisting>
</sect1>
<para>Different to the DOM specification XML::LibXML implements the text node as the base class of all character data node. Therefor there exists no
CharacterData class. This allow one to use all methods that are available for text nodes as well for Comments or CDATA-sections.</para>
<variablelist>
<varlistentry>
<term>new</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$text = XML::LibXML::Text-&gt;new( $content ); </funcsynopsisinfo>
</funcsynopsis>
<para>The constructor of the class. It creates an unbound text node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>data</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$nodedata = $text-&gt;data;</funcsynopsisinfo>
</funcsynopsis>
<para>Although there exists the <function>nodeValue</function> attribute in the Node class, the DOM specification defines data as a separate
attribute. <function>XML::LibXML</function> implements these two attributes not as different attributes, but as aliases, such as
<function>libxml2</function> does. Therefore</para>
<programlisting> $text-&gt;data;</programlisting>
<para>and</para>
<programlisting> $text-&gt;nodeValue;</programlisting>
<para>will have the same result and are not different entities.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>setData($string)</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$text-&gt;setData( $text_content );</funcsynopsisinfo>
</funcsynopsis>
<para>This function sets or replaces text content to a node. The node has to be of the type "text", "cdata" or
"comment".</para>
</listitem>
</varlistentry>
<varlistentry>
<term>substringData($offset,$length)</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$text-&gt;substringData($offset, $length);</funcsynopsisinfo>
</funcsynopsis>
<para>Extracts a range of data from the node. (DOM Spec) This function takes the two parameters $offset and $length and returns the
sub-string, if available.</para>
<para>If the node contains no data or $offset refers to an non-existing string index, this function will return <emphasis>undef</emphasis>.
If $length is out of range <function>substringData</function> will return the data starting at $offset instead of causing an error.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>appendData($string)</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$text-&gt;appendData( $somedata );</funcsynopsisinfo>
</funcsynopsis>
<para>Appends a string to the end of the existing data. If the current text node contains no data, this function has the same effect as
<function>setData</function>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>insertData($offset,$string)</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$text-&gt;insertData($offset, $string);</funcsynopsisinfo>
</funcsynopsis>
<para>Inserts the parameter $string at the given $offset of the existing data of the node. This operation will not remove existing data, but
change the order of the existing data.</para>
<para>The $offset has to be a positive value. If $offset is out of range, <function>insertData</function> will have the same behaviour as
<function>appendData</function>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>deleteData($offset, $length)</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$text-&gt;deleteData($offset, $length);</funcsynopsisinfo>
</funcsynopsis>
<para>This method removes a chunk from the existing node data at the given offset. The $length parameter tells, how many characters should
be removed from the string.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>deleteDataString($string, [$all])</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$text-&gt;deleteDataString($remstring, $all);</funcsynopsisinfo>
</funcsynopsis>
<para>This method removes a chunk from the existing node data. Since the DOM spec is quite unhandy if you already know <function>which</function>
string to remove from a text node, this method allows more perlish code :)</para>
<para>The functions takes two parameters: <emphasis>$string</emphasis> and optional the <emphasis>$all</emphasis> flag. If $all is not set,
<emphasis>undef</emphasis> or <emphasis>0</emphasis>, <function>deleteDataString</function> will remove only the first occurrence of
$string. If $all is <emphasis>TRUE</emphasis> <function>deleteDataString</function> will remove all occurrences of <emphasis>$string</emphasis>
from the node data.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>replaceData($offset, $length, $string)</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$text-&gt;replaceData($offset, $length, $string);</funcsynopsisinfo>
</funcsynopsis>
<para>The DOM style version to replace node data.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>replaceDataString($oldstring, $newstring, [$all])</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$text-&gt;replaceDataString($old, $new, $flag);</funcsynopsisinfo>
</funcsynopsis>
<para>The more programmer friendly version of replaceData() :)</para>
<para>Instead of giving offsets and length one can specify the exact string (<emphasis>$oldstring</emphasis>) to be replaced. Additionally
the <emphasis>$all</emphasis> flag allows to replace all occurrences of <emphasis>$oldstring</emphasis>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>replaceDataRegEx( $search_cond, $replace_cond, $reflags )</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$text-&gt;replaceDataRegEx( $search_cond, $replace_cond, $reflags );</funcsynopsisinfo>
</funcsynopsis>
<para>This method replaces the node's data by a <function>simple</function> regular expression. Optional, this function allows to pass
some flags that will be added as flag to the replace statement.</para>
<para><emphasis>NOTE:</emphasis> This is a shortcut for</para>
<programlisting> my $datastr = $node-&gt;getData();
$datastr =~ s/somecond/replacement/g; # 'g' is just an example for any flag
$node-&gt;setData( $datastr );</programlisting>
<para>This function can make things easier to read for simple replacements. For more complex variants it is recommended to use the code
snippet above.</para>
</listitem>
</varlistentry>
</variablelist>
</chapter>
<chapter>
<title>XML::LibXML Comment Class</title>
<titleabbrev>XML::LibXML::Comment</titleabbrev>
<sect1>
<title>Synopsis</title>
<programlisting>use XML::LibXML;
# Only methods specific to Comment nodes listed here,
# see XML::LibXML::Node manpage for other methods</programlisting>
</sect1>
<para>This class provides all functions of <emphasis>XML::LibXML::Text</emphasis>, but for comment nodes. This can be done, since only the output of the
node types is different, but not the data structure. :-)</para>
<variablelist>
<varlistentry>
<term>new</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node = XML::LibXML::Comment( $content );</funcsynopsisinfo>
</funcsynopsis>
<para>The constructor is the only provided function for this package. It is required, because <emphasis>libxml2</emphasis> treats text nodes
and comment nodes slightly differently.</para>
</listitem>
</varlistentry>
</variablelist>
</chapter>
<chapter>
<title>XML::LibXML Class for CDATA Sections</title>
<titleabbrev>XML::LibXML::CDATASection</titleabbrev>
<sect1>
<title>Synopsis</title>
<programlisting>use XML::LibXML;
# Only methods specific to CDATA nodes listed here,
# see XML::LibXML::Node manpage for other methods</programlisting>
</sect1>
<para>This class provides all functions of <emphasis>XML::LibXML::Text</emphasis>, but for CDATA nodes.</para>
<variablelist>
<varlistentry>
<term>new</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node = XML::LibXML::CDATASection( $content );</funcsynopsisinfo>
</funcsynopsis>
<para>The constructor is the only provided function for this package. It is required, because <emphasis>libxml2</emphasis> treats the
different text node types slightly differently.</para>
</listitem>
</varlistentry>
</variablelist>
</chapter>
<chapter>
<title>XML::LibXML Attribute Class</title>
<titleabbrev>XML::LibXML::Attr</titleabbrev>
<sect1>
<title>Synopsis</title>
<programlisting>use XML::LibXML;
# Only methods specific to Attribute nodes listed here,
# see XML::LibXML::Node manpage for other methods</programlisting>
</sect1>
<para>This is the interface to handle Attributes like ordinary nodes. The naming of the class relies on the W3C DOM documentation.</para>
<variablelist>
<varlistentry>
<term>new</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$attr = XML::LibXML::Attr-&gt;new($name [,$value]);</funcsynopsisinfo>
</funcsynopsis>
<para>Class constructor. If you need to work with ISO encoded strings, you should <emphasis>always</emphasis> use the <function>createAttrbute</function>
of <emphasis>XML::LibXML::Document</emphasis>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getValue</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$string = $attr-&gt;getValue();</funcsynopsisinfo>
</funcsynopsis>
<para>Returns the value stored for the attribute. If undef is returned, the attribute has no value, which is different of being
<function>not specified</function>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>value</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$string = $attr-&gt;value;</funcsynopsisinfo>
</funcsynopsis>
<para>Alias for <emphasis>getValue()</emphasis></para>
</listitem>
</varlistentry>
<varlistentry>
<term>setValue</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$attr-&gt;setValue( $string );</funcsynopsisinfo>
</funcsynopsis>
<para>This is needed to set a new attribute value. If ISO encoded strings are passed as parameter, the node has to be bound to a document,
otherwise the encoding might be done incorrectly.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getOwnerElement</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$node = $attr-&gt;getOwnerElement();</funcsynopsisinfo>
</funcsynopsis>
<para>returns the node the attribute belongs to. If the attribute is not bound to a node, undef will be returned. Overwriting the underlying
implementation, the <emphasis>parentNode</emphasis> function will return undef, instead of the owner element.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>setNamespace</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$attr-&gt;setNamespace($nsURI, $prefix);</funcsynopsisinfo>
</funcsynopsis>
<para>This function tries to bound the attribute to a given namespace.
If <function>$nsURI</function> is undefined or empty,
the function discards any previous association of the attribute with a namespace.
If the namespace was not previously declared in the context of the
attribute, this function will fail.
In this case you may wish to call setNamespace() on the ownerElement.
If the namespace URI is non-empty and
declared in the context of the attribute, but only with a different
(non-empty) prefix, then the attribute is still bound to the namespace
but gets a different prefix than <function>$prefix</function>.
The function also fails if the prefix is empty but the namespace URI
is not (because unprefixed attributes should by definition belong to
no namespace).
This function returns 1 on success, 0 otherwise.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>isId</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$bool = $attr-&gt;isId;</funcsynopsisinfo>
</funcsynopsis>
<para>Determine whether an attribute is of type
ID. For documents with a DTD, this information
is only available if DTD loading/validation has been requested.
For HTML documents parsed with the HTML
parser ID detection is done
automatically. In XML documents, all "xml:id"
attributes are considered to be of type ID.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>serializeContent($docencoding)</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$string = $attr-&gt;serializeContent;</funcsynopsisinfo>
</funcsynopsis>
<para>This function is not part of DOM API. It returns attribute content
in the form in which it serializes into XML, that is
with all meta-characters properly quoted and with raw
entity references (except for entities expanded during parse time).
Setting the optional $docencoding flag to 1 enforces document
encoding for the output string (which is then passed to Perl as a
byte string). Otherwise the string is passed to Perl as (UTF-8 encoded)
characters.
</para>
</listitem>
</varlistentry>
</variablelist>
</chapter>
<chapter>
<title>XML::LibXML's DOM L2 Document Fragment Implementation</title>
<titleabbrev>XML::LibXML::DocumentFragment</titleabbrev>
<sect1>
<title>Synopsis</title>
<programlisting>use XML::LibXML;</programlisting>
</sect1>
<para>This class is a helper class as described in the DOM Level 2 Specification. It is implemented as a node without name. All adding, inserting or
replacing functions are aware of document fragments now.</para>
<para>As well <emphasis>all</emphasis> unbound nodes (all nodes that do not belong to any document sub-tree) are implicit members of document fragments.</para>
</chapter>
<chapter>
<title>XML::LibXML Namespace Implementation</title>
<titleabbrev>XML::LibXML::Namespace</titleabbrev>
<sect1>
<title>Synopsis</title>
<programlisting>use XML::LibXML;
# Only methods specific to Namespace nodes listed here,
# see XML::LibXML::Node manpage for other methods</programlisting>
</sect1>
<para>Namespace nodes are returned by both $element-&gt;findnodes('namespace::foo') or by $node-&gt;getNamespaces().</para>
<para>The namespace node API is not part of any current DOM API, and so it is quite minimal. It should be noted that namespace nodes are
<emphasis>not</emphasis> a sub class of XML::LibXML::Node, however Namespace nodes act a lot like attribute nodes, and similarly named methods will
return what you would expect if you treated the namespace node as an attribute. Note that in order to fix several inconsistencies between the API and the documentation, the behavior of some functions have been changed in 1.64.</para>
<variablelist>
<varlistentry>
<term>new</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>my $ns = XML::LibXML::Namespace-&gt;new($nsURI);</funcsynopsisinfo>
</funcsynopsis>
<para>Creates a new Namespace node. Note that this is not a 'node' as an attribute or an element node. Therefore you can't do
call all XML::LibXML::Node Functions. All functions available for this node are listed below.</para>
<para>Optionally you can pass the prefix to the namespace constructor. If this second parameter is omitted you will create a so called
default namespace. Note, the newly created namespace is not bound to any document or node, therefore you should not expect it to be
available in an existing document.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>declaredURI</term>
<listitem>
<para>Returns the URI for this namespace.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>declaredPrefix</term>
<listitem>
<para>Returns the prefix for this namespace.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>nodeName</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>print $ns-&gt;nodeName();</funcsynopsisinfo>
</funcsynopsis>
<para>Returns "xmlns:prefix", where prefix is the prefix for this namespace.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>name</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>print $ns-&gt;name();</funcsynopsisinfo>
</funcsynopsis>
<para>Alias for nodeName()</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getLocalName</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$localname = $ns-&gt;getLocalName();</funcsynopsisinfo>
</funcsynopsis>
<para>Returns the local name of this node as if it were an attribute, that is, the prefix associated with the namespace.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getData</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>print $ns-&gt;getData();</funcsynopsisinfo>
</funcsynopsis>
<para>Returns the URI of the namespace, i.e. the value of this node as if it were an attribute.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getValue</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>print $ns-&gt;getValue();</funcsynopsisinfo>
</funcsynopsis>
<para>Alias for getData()</para>
</listitem>
</varlistentry>
<varlistentry>
<term>value</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>print $ns-&gt;value();</funcsynopsisinfo>
</funcsynopsis>
<para>Alias for getData()</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getNamespaceURI</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$known_uri = $ns-&gt;getNamespaceURI();</funcsynopsisinfo>
</funcsynopsis>
<para>Returns the string "http://www.w3.org/2000/xmlns/"</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getPrefix</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$known_prefix = $ns-&gt;getPrefix();</funcsynopsisinfo>
</funcsynopsis>
<para>Returns the string "xmlns"</para>
</listitem>
</varlistentry>
</variablelist>
</chapter>
<chapter>
<title>XML::LibXML Processing Instructions</title>
<titleabbrev>XML::LibXML::PI</titleabbrev>
<sect1>
<title>Synopsis</title>
<programlisting>use XML::LibXML;
# Only methods specific to Processing Instruction nodes listed here,
# see XML::LibXML::Node manpage for other methods</programlisting>
</sect1>
<para>Processing instructions are implemented with XML::LibXML with read and write access. The PI data is the PI without the PI target (as specified in
XML 1.0 [17]) as a string. This string can be accessed with getData as implemented in XML::LibXML::Node.</para>
<para>The write access is aware about the fact, that many processing instructions have attribute like data. Therefore setData() provides besides the DOM
spec conform Interface to pass a set of named parameter. So the code segment</para>
<programlisting>my $pi = $dom-&gt;createProcessingInstruction("abc");
$pi-&gt;setData(foo=&gt;'bar', foobar=&gt;'foobar');
$dom-&gt;appendChild( $pi );</programlisting>
<para>will result the following PI in the DOM:</para>
<programlisting>&lt;?abc foo="bar" foobar="foobar"?&gt;</programlisting>
<para>Which is how it is specified in the DOM specification. This three step interface creates temporary a node in perl space. This can be avoided while
using the insertProcessingInstruction() method. Instead of the three calls described above, the call</para>
<programlisting>$dom-&gt;insertProcessingInstruction("abc",'foo="bar" foobar="foobar"');</programlisting>
<para>will have the same result as above.</para>
<para>XML::LibXML::PI's implementation of setData() differs a bit from the the standard version as available in XML::LibXML::Node():</para>
<variablelist>
<varlistentry>
<term>setData</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$pinode-&gt;setData( $data_string );
$pinode-&gt;setData( name=&gt;string_value [...] );</funcsynopsisinfo>
</funcsynopsis>
<para>This method allows to change the content data of a PI. Additionally to the interface specified for DOM Level2, the method provides a
named parameter interface to set the data. This parameter list is converted into a string before it is appended to the PI.</para>
</listitem>
</varlistentry>
</variablelist>
</chapter>
<chapter>
<title>XML::LibXML DTD Handling</title>
<titleabbrev>XML::LibXML::Dtd</titleabbrev>
<sect1>
<title>Synopsis</title>
<programlisting>use XML::LibXML;</programlisting>
</sect1>
<para>This class holds a DTD. You may parse a DTD from either a string, or from an external SYSTEM identifier.</para>
<para>No support is available as yet for parsing from a filehandle.</para>
<para>XML::LibXML::Dtd is a sub-class of Node, so all the methods available to nodes (particularly toString()) are available to Dtd objects.</para>
<variablelist>
<varlistentry>
<term>new</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$dtd = XML::LibXML::Dtd-&gt;new($public_id, $system_id);</funcsynopsisinfo>
</funcsynopsis>
<para>Parse a DTD from the system identifier, and return a DTD object that you can pass to $doc-&gt;is_valid() or $doc-&gt;validate().</para>
<programlisting> my $dtd = XML::LibXML::Dtd-&gt;new(
"SOME // Public / ID / 1.0",
"test.dtd"
);
my $doc = XML::LibXML-&gt;new-&gt;parse_file("test.xml");
$doc-&gt;validate($dtd);</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term>parse_string</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$dtd = XML::LibXML::Dtd-&gt;parse_string($dtd_str);</funcsynopsisinfo>
</funcsynopsis>
<para>The same as new() above, except you can parse a DTD from a string. Note that parsing from string may fail if the DTD contains external parametric-entity references with relative URLs.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getName</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$publicId = $dtd-&gt;getName();</funcsynopsisinfo>
</funcsynopsis>
<para>Returns the name of DTD; i.e., the name immediately following the DOCTYPE keyword.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>publicId</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$publicId = $dtd-&gt;publicId();</funcsynopsisinfo>
</funcsynopsis>
<para>Returns the public identifier of the external subset.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>systemId</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$systemId = $dtd-&gt;systemId();</funcsynopsisinfo>
</funcsynopsis>
<para>Returns the system identifier of the external subset.</para>
</listitem>
</varlistentry>
</variablelist>
</chapter>
<chapter>
<title>XML::LibXML Class for Input Callbacks</title>
<titleabbrev>XML::LibXML::InputCallback</titleabbrev>
<sect1>
<title>Synopsis</title>
<programlisting>use XML::LibXML;</programlisting>
</sect1>
<sect1>
<title>Synopsis</title>
<programlisting>my $input_callbacks = XML::LibXML::InputCallback-&gt;new();
$input_callbacks-&gt;register_callbacks([ $match_cb1, $open_cb1,
$read_cb1, $close_cb1 ] );
$input_callbacks-&gt;register_callbacks([ $match_cb2, $open_cb2,
$read_cb2, $close_cb2 ] );
$input_callbacks-&gt;register_callbacks( [ $match_cb3, $open_cb3,
$read_cb3, $close_cb3 ] );
$parser-&gt;input_callbacks( $input_callbacks );
$parser-&gt;parse_file( $some_xml_file );</programlisting>
</sect1>
<sect1>
<title>Description</title>
<para>You may get unexpected results if you are trying to load external documents during libxml2 parsing if the location of the resource is not a
HTTP, FTP or relative location but a absolute path for example. To get around this limitation, you may add your own input handler to open, read and
close particular types of locations or URI classes. Using this input callback handlers, you can handle your own custom URI schemes for example.</para>
<para>The input callbacks are used whenever LibXML has to get something other than externally parsed entities from somewhere. They are implemented
using a callback stack on the Perl layer in analogy to libxml2's native callback stack.</para>
<para>The XML::LibXML::InputCallback class transparently registers the input callbacks for the libxml2's parser processes.</para>
<sect2>
<title>How does XML::LibXML::InputCallback work?</title>
<para>The libxml2 library offers a callback implementation as global functions only. To work-around the troubles resulting in having only global
callbacks - for example, if the same global callback stack is manipulated by different applications running together in a single Apache
Web-server environment -, XML::LibXML::InputCallback comes with a object-oriented and a function-oriented part.</para>
<para>Using the function-oriented part the global callback stack of libxml2 can be manipulated. Those functions can be used as interface to the
callbacks on the C- and XS Layer. At the object-oriented part, operations for working with the "pseudo-localized" callback stack are
implemented. Currently, you can register and de-register callbacks on the Perl layer and initialize them on a per parser basis.</para>
<sect3>
<title>Callback Groups</title>
<para>The libxml2 input callbacks come in groups. One group contains a URI matcher (<emphasis>match</emphasis>), a data stream constructor (<emphasis>open</emphasis>),
a data stream reader (<emphasis>read</emphasis>), and a data stream destructor (<emphasis>close</emphasis>). The callbacks can be
manipulated on a per group basis only.</para>
</sect3>
<sect3>
<title>The Parser Process</title>
<para>The parser process work on a XML data stream, along which, links to other resources can be embedded. This can be links to external
DTDs or XIncludes for example. Those resources are identified by URIs. The callback implementation of libxml2 assumes that one callback
group can handle a certain amount of URIs and a certain URI scheme. Per default, callback handlers for <emphasis>file://*</emphasis>,
<emphasis>file:://*.gz</emphasis>, <emphasis>http://*</emphasis> and <emphasis>ftp://*</emphasis> are registered.</para>
<para>Callback groups in the callback stack are processed from top to bottom, meaning that callback groups registered later will be
processed before the earlier registered ones.</para>
<para>While parsing the data stream, the libxml2 parser checks if a registered callback group will handle a URI - if they will not, the URI
will be interpreted as <emphasis>file://URI</emphasis>. To handle a URI, the <emphasis>match</emphasis> callback will have to return
'1'. If that happens, the handling of the URI will be passed to that callback group. Next, the URI will be passed to the
<emphasis>open</emphasis> callback, which should return a <emphasis>reference</emphasis> to the data stream if it successfully opened the
file, '0' otherwise. If opening the stream was successful, the <emphasis>read</emphasis> callback will be called repeatedly until it
returns an empty string. After the read callback, the <emphasis>close</emphasis> callback will be called to close the stream.</para>
</sect3>
<sect3>
<title>Organisation of callback groups in XML::LibXML::InputCallback</title>
<para>Callback groups are implemented as a stack (Array), each entry holds a reference to an array of the callbacks. For the libxml2
library, the XML::LibXML::InputCallback callback implementation appears as one single callback group. The Perl implementation however allows
to manage different callback stacks on a per libxml2-parser basis.</para>
</sect3>
</sect2>
<sect2>
<title>Using XML::LibXML::InputCallback</title>
<para>After object instantiation using the parameter-less constructor, you can register callback groups.</para>
<programlisting>my $input_callbacks = XML::LibXML::InputCallback-&gt;new();
$input_callbacks-&gt;register_callbacks([ $match_cb1, $open_cb1,
$read_cb1, $close_cb1 ] );
$input_callbacks-&gt;register_callbacks([ $match_cb2, $open_cb2,
$read_cb2, $close_cb2 ] );
$input_callbacks-&gt;register_callbacks( [ $match_cb3, $open_cb3,
$read_cb3, $close_cb3 ] );
$parser-&gt;input_callbacks( $input_callbacks );
$parser-&gt;parse_file( $some_xml_file );</programlisting>
</sect2>
<sect2>
<title>What about the old callback system prior to XML::LibXML::InputCallback?</title>
<para>In XML::LibXML versions prior to 1.59 - i.e. without the XML::LibXML::InputCallback module - you could define your callbacks either using
globally or locally. You still can do that using XML::LibXML::InputCallback, and in addition to that you can define the callbacks on a per
parser basis!</para>
<para>If you use the old callback interface through global callbacks, XML::LibXML::InputCallback will treat them with a lower priority as the
ones registered using the new interface. The global callbacks will not override the callback groups registered using the new interface. Local
callbacks are attached to a specific parser instance, therefore they are treated with highest priority. If the <emphasis>match</emphasis>
callback of the callback group registered as local variable is identical to one of the callback groups registered using the new interface, that
callback group will be replaced.</para>
<para>Users of the old callback implementation whose <emphasis>open</emphasis> callback returned a plain string, will have to adapt their code
to return a reference to that string after upgrading to version &gt;= 1.59. The new callback system can only deal with the
<emphasis>open</emphasis> callback returning a reference!</para>
</sect2>
</sect1>
<sect1>
<title>Interface Description</title>
<sect2>
<title>Global Variables</title>
<variablelist>
<varlistentry>
<term>$_CUR_CB</term>
<listitem>
<para>Stores the current callback and can be used as shortcut to access the callback stack.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>@_GLOBAL_CALLBACKS</term>
<listitem>
<para>Stores all callback groups for the current parser process.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>@_CB_STACK</term>
<listitem>
<para>Stores the currently used callback group. Used to prevent parser errors when dealing with nested XML data.</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2>
<title>Global Callbacks</title>
<variablelist>
<varlistentry>
<term>_callback_match</term>
<listitem>
<para>Implements the interface for the <emphasis>match</emphasis> callback at C-level and for the selection of the callback group
from the callbacks defined at the Perl-level.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>_callback_open</term>
<listitem>
<para>Forwards the <emphasis>open</emphasis> callback from libxml2 to the corresponding callback function at the Perl-level.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>_callback_read</term>
<listitem>
<para>Forwards the read request to the corresponding callback function at the Perl-level and returns the result to libxml2.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>_callback_close</term>
<listitem>
<para>Forwards the <emphasis>close</emphasis> callback from libxml2 to the corresponding callback function at the Perl-level..</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2>
<title>Class methods</title>
<variablelist>
<varlistentry>
<term>new()</term>
<listitem>
<para>A simple constructor.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>register_callbacks( [ $match_cb, $open_cb, $read_cb, $close_cb ])</term>
<listitem>
<para>The four callbacks <emphasis>have</emphasis> to be given as array reference in the above order <emphasis>match</emphasis>,
<emphasis>open</emphasis>, <emphasis>read</emphasis>, <emphasis>close</emphasis>!</para>
</listitem>
</varlistentry>
<varlistentry>
<term>unregister_callbacks( [ $match_cb, $open_cb, $read_cb, $close_cb ])</term>
<listitem>
<para>With no arguments given, <function>unregister_callbacks()</function> will delete the last registered callback group from the
stack. If four callbacks are passed as array reference, the callback group to unregister will be identified by the
<emphasis>match</emphasis> callback and deleted from the callback stack. Note that if several identical <emphasis>match</emphasis>
callbacks are defined in different callback groups, ALL of them will be deleted from the stack.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>init_callbacks()</term>
<listitem>
<para>Initializes the callback system before a parsing process.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>cleanup_callbacks()</term>
<listitem>
<para>Resets global variables and the libxml2 callback stack.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>lib_init_callbacks()</term>
<listitem>
<para>Used internally for callback registration at C-level.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>lib_cleanup_callbacks()</term>
<listitem>
<para>Used internally for callback resetting at the C-level.</para>
</listitem>
</varlistentry>
</variablelist>
<para/>
</sect2>
</sect1>
<sect1>
<title>Example callbacks</title>
<para>The following example is a purely fictitious example that uses a MyScheme::Handler object that responds to methods similar to an IO::Handle.</para>
<programlisting>
# Define the four callback functions
sub match_uri {
my $uri = shift;
return $uri =~ /^myscheme:/; # trigger our callback group at a 'myscheme' URIs
}
sub open_uri {
my $uri = shift;
my $handler = MyScheme::Handler-&gt;new($uri);
return $handler;
}
# The returned $buffer will be parsed by the libxml2 parser
sub read_uri {
my $handler = shift;
my $length = shift;
my $buffer;
read($handler, $buffer, $length);
return $buffer; # $buffer will be an empty string '' if read() is done
}
# Close the handle associated with the resource.
sub close_uri {
my $handler = shift;
close($handler);
}
# Register them with a instance of XML::LibXML::InputCallback
my $input_callbacks = XML::LibXML::InputCallback-&gt;new();
$input_callbacks-&gt;register_callbacks([ \&amp;match_uri, \&amp;open_uri,
\&amp;read_uri, \&amp;close_uri ] );
# Register the callback group at a parser instance
$parser-&gt;input_callbacks( $input_callbacks );
# $some_xml_file will be parsed using our callbacks
$parser-&gt;parse_file( $some_xml_file );
</programlisting>
</sect1>
</chapter>
<chapter>
<title>RelaxNG Schema Validation</title>
<titleabbrev>XML::LibXML::RelaxNG</titleabbrev>
<sect1>
<title>Synopsis</title>
<programlisting>use XML::LibXML;
$doc = XML::LibXML->new->parse_file($url);</programlisting>
</sect1>
<para>The XML::LibXML::RelaxNG class is a tiny frontend to libxml2's RelaxNG implementation. Currently it supports only schema parsing and document
validation.</para>
<variablelist>
<varlistentry>
<term>new</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$rngschema = XML::LibXML::RelaxNG-&gt;new( location =&gt; $filename_or_url );
$rngschema = XML::LibXML::RelaxNG-&gt;new( string =&gt; $xmlschemastring );
$rngschema = XML::LibXML::RelaxNG-&gt;new( DOM =&gt; $doc );</funcsynopsisinfo>
</funcsynopsis>
<para>The constructor of XML::LibXML::RelaxNG may get called with either one of three parameters. The parameter tells the class from which
source it should generate a validation schema. It is important, that each schema only have a single source.</para>
<para>The location parameter allows to parse a schema from the filesystem or a URL.</para>
<para>The string parameter will parse the schema from the given XML string.</para>
<para>The DOM parameter allows to parse the schema from a pre-parsed XML::LibXML::Document.</para>
<para>Note that the constructor will die() if the schema does not meed the constraints of the RelaxNG specification.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>validate</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>eval { $rngschema-&gt;validate( $doc ); };</funcsynopsisinfo>
</funcsynopsis>
<para>This function allows to validate a (parsed) document
against the given RelaxNG schema. The argument of this function should be a XML::LibXML::Document object.
If this function succeeds, it will return 0, otherwise
it will die() and report the errors found. Because of this validate() should be always evaluated.</para>
</listitem>
</varlistentry>
</variablelist>
</chapter>
<chapter>
<title>XML Schema Validation</title>
<titleabbrev>XML::LibXML::Schema</titleabbrev>
<sect1>
<title>Synopsis</title>
<programlisting>use XML::LibXML;
$doc = XML::LibXML->new->parse_file($url);</programlisting>
</sect1>
<para>The XML::LibXML::Schema class is a tiny frontend to libxml2's XML Schema implementation. Currently it supports only schema parsing and
document validation.</para>
<variablelist>
<varlistentry>
<term>new</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>$xmlschema = XML::LibXML::Schema-&gt;new( location =&gt; $filename_or_url );
$xmlschema = XML::LibXML::Schema-&gt;new( string =&gt; $xmlschemastring );</funcsynopsisinfo>
</funcsynopsis>
<para>The constructor of XML::LibXML::Schema may get called with either one of two parameters. The parameter tells the class from which
source it should generate a validation schema. It is important, that each schema only have a single source.</para>
<para>The location parameter allows to parse a schema from the filesystem or a URL.</para>
<para>The string parameter will parse the schema from the given XML string.</para>
<para>Note that the constructor will die() if the schema does not meed the constraints of the XML Schema specification.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>validate</term>
<listitem>
<funcsynopsis>
<funcsynopsisinfo>eval { $xmlschema-&gt;validate( $doc ); };</funcsynopsisinfo>
</funcsynopsis>
<para>This function allows to validate a (parsed) document against the given XML Schema.
The argument of this function should be a XML::LibXML::Document object.
If this function succeeds, it will return 0, otherwise it
will die() and report the errors found. Because of this validate() should be always evaluated.</para>
</listitem>
</varlistentry>
</variablelist>
</chapter>
<chapter>
<title>XPath Evaluation</title>
<titleabbrev>XML::LibXML::XPathContext</titleabbrev>
<para>
The XML::LibXML::XPathContext
class provides an almost complete
interface to libxml2's XPath implementation.
With XML::LibXML::XPathContext is is possible to
evaluate XPath expressions in the context
of arbitrary node, context size, and context position,
with a user-defined namespace-prefix mapping,
custom XPath functions written in Perl, and
even a custom XPath variable resolver.
</para>
<sect1>
<title>Examples</title>
<sect2>
<title>Namespaces</title>
<para>This example demonstrates <function>registerNs()</function> method.
It finds all paragraph nodes in an XHTML document.</para>
<programlisting>my $xc = XML::LibXML::XPathContext-&gt;new($xhtml_doc);
$xc-&gt;registerNs('xhtml', 'http://www.w3.org/1999/xhtml');
my @nodes = $xc-&gt;findnodes('//xhtml:p');</programlisting>
</sect2>
<sect2>
<title>Custom XPath functions</title>
<para>This example demonstrates <function>registerFunction()</function> method
by defining a function filtering nodes based on a Perl regular expression:</para>
<programlisting>sub grep_nodes {
my ($nodelist,$regexp) = @_;
my $result = XML::LibXML::NodeList-&gt;new;
for my $node ($nodelist-&gt;get_nodelist()) {
$result-&gt;push($node) if $node-&gt;textContent =~ $regexp;
}
return $result;
};
my $xc = XML::LibXML::XPathContext-&gt;new($node);
$xc-&gt;registerFunction('grep_nodes', \&amp;grep_nodes);
my @nodes = $xc-&gt;findnodes('//section[grep_nodes(para,"\bsearch(ing|es)?\b")]');</programlisting>
</sect2>
<sect2>
<title>Variables</title>
<para>This example demonstrates <function>registerVarLookup()</function>
method. We use XPath variables to recycle results of previous evaluations:</para>
<programlisting>sub var_lookup {
my ($varname,$ns,$data)=@_;
return $data-&gt;{$varname};
}
my $areas = XML::LibXML-&gt;new-&gt;parse_file('areas.xml');
my $empl = XML::LibXML-&gt;new-&gt;parse_file('employees.xml');
my $xc = XML::LibXML::XPathContext-&gt;new($empl);
my %variables = (
A =&gt; $xc-&gt;find('/employees/employee[@salary&gt;10000]'),
B =&gt; $areas-&gt;find('/areas/area[district='Brooklyn']/street'),
);
# get names of employees from $A working in an area listed in $B
$xc-&gt;registerVarLookupFunc(\&amp;var_lookup, \%variables);
my @nodes = $xc-&gt;findnodes('$A[work_area/street = $B]/name');
</programlisting>
</sect2>
</sect1>
<sect1>
<title>Methods</title>
<variablelist>
<varlistentry>
<term>new</term>
<listitem>
<funcsynopsis><funcsynopsisinfo>my $xpc = XML::LibXML::XPathContext-&gt;new();</funcsynopsisinfo></funcsynopsis>
<para>Creates a new XML::LibXML::XPathContext object
without a context node.</para>
<funcsynopsis><funcsynopsisinfo>my $xpc = XML::LibXML::XPathContext-&gt;new($node);</funcsynopsisinfo></funcsynopsis>
<para>Creates a new XML::LibXML::XPathContext object with
the context node set to <literal>$node</literal>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>registerNs</term>
<listitem><funcsynopsis><funcsynopsisinfo>$xpc-&gt;registerNs($prefix, $namespace_uri)</funcsynopsisinfo></funcsynopsis>
<para>Registers namespace <literal>$prefix</literal> to
<literal>$namespace_uri</literal>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>unregisterNs</term>
<listitem><funcsynopsis><funcsynopsisinfo>$xpc-&gt;unregisterNs($prefix)</funcsynopsisinfo></funcsynopsis>
<para>Unregisters namespace <literal>$prefix</literal>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>lookupNs</term>
<listitem><funcsynopsis><funcsynopsisinfo>$uri = $xpc-&gt;lookupNs($prefix)</funcsynopsisinfo></funcsynopsis>
<para>Returns namespace URI registered with
<literal>$prefix</literal>. If <literal>$prefix</literal>
is not registered to any namespace URI returns
<literal>undef</literal>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>registerVarLookupFunc</term>
<listitem><funcsynopsis><funcsynopsisinfo>$xpc-&gt;registerVarLookupFunc($callback, $data)</funcsynopsisinfo></funcsynopsis>
<para>Registers variable lookup function
<literal>$prefix</literal>. The registered function is
executed by the XPath engine each time an XPath variable
is evaluated. It takes three arguments:
<literal>$data</literal>, variable name, and variable
ns-URI and must return one value: a number or string or
any <literal>XML::LibXML::</literal> object that can be a result
of findnodes: Boolean, Literal, Number, Node
(e.g. Document, Element, etc.), or NodeList. For
convenience, simple (non-blessed) array references
containing only <literal>XML::LibXML::Node</literal> objects can be
used instead of a <literal>XML::LibXML::NodeList</literal>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getVarLookupData</term>
<listitem>
<funcsynopsis><funcsynopsisinfo>$data = $xpc-&gt;getVarLookupData();</funcsynopsisinfo></funcsynopsis>
<para>
Returns the data that have been associated with a
variable lookup function during a previous call to
<literal>registerVarLookupFunc</literal>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getVarLookupFunc</term>
<listitem>
<funcsynopsis><funcsynopsisinfo>$callback = $xpc-&gt;getVarLookupFunc();</funcsynopsisinfo></funcsynopsis>
<para>
Returns the variable lookup function previously registered with
<literal>registerVarLookupFunc</literal>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>unregisterVarLookupFunc</term>
<listitem>
<funcsynopsis><funcsynopsisinfo>$xpc-&gt;unregisterVarLookupFunc($name);</funcsynopsisinfo></funcsynopsis>
<para>Unregisters variable lookup function and the associated lookup data.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>registerFunctionNS</term>
<listitem><funcsynopsis><funcsynopsisinfo>$xpc-&gt;registerFunctionNS($name, $uri, $callback)</funcsynopsisinfo></funcsynopsis>
<para>Registers an extension function
<literal>$name</literal> in <literal>$uri</literal>
namespace. <literal>$callback</literal> must be a CODE
reference. The arguments of the callback function are
either simple scalars or <literal>XML::LibXML::*</literal> objects
depending on the XPath argument types. The function is
responsible for checking the argument number and
types. Result of the callback code must be a single
value of the following types: a simple scalar
(number, string) or an arbitrary <literal>XML::LibXML::*</literal>
object that can be a result of findnodes: Boolean,
Literal, Number, Node (e.g. Document, Element, etc.), or
NodeList. For convenience, simple (non-blessed) array
references containing only <literal>XML::LibXML::Node</literal>
objects can be used instead of a
<literal>XML::LibXML::NodeList</literal>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>unregisterFunctionNS</term>
<listitem><funcsynopsis><funcsynopsisinfo>$xpc-&gt;unregisterFunctionNS($name, $uri)</funcsynopsisinfo></funcsynopsis>
<para>
Unregisters extension function <literal>$name</literal>
in <literal>$uri</literal> namespace. Has the same
effect as passing <literal>undef</literal> as
<literal>$callback</literal> to
registerFunctionNS.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>registerFunction</term>
<listitem><funcsynopsis><funcsynopsisinfo>$xpc-&gt;registerFunction($name, $callback)</funcsynopsisinfo></funcsynopsis>
<para>Same as <literal>registerFunctionNS</literal> but
without a namespace.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>unregisterFunction</term>
<listitem><funcsynopsis><funcsynopsisinfo>$xpc-&gt;unregisterFunction($name)</funcsynopsisinfo></funcsynopsis>
<para>Same as <literal>unregisterFunctionNS</literal> but
without a namespace.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>findnodes</term>
<listitem><funcsynopsis><funcsynopsisinfo>@nodes = $xpc-&gt;findnodes($xpath)</funcsynopsisinfo></funcsynopsis>
<funcsynopsis><funcsynopsisinfo>@nodes = $xpc-&gt;findnodes($xpath, $context_node )</funcsynopsisinfo></funcsynopsis>
<funcsynopsis><funcsynopsisinfo>$nodelist = $xpc-&gt;findnodes($xpath, $context_node )</funcsynopsisinfo></funcsynopsis>
<para>Performs the xpath statement on the current node and
returns the result as an array. In scalar context
returns a <literal>XML::LibXML::NodeList</literal> object. Optionally, a
node may be passed as a second argument to set the
context node for the query.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>find</term>
<listitem><funcsynopsis><funcsynopsisinfo>$object = $xpc-&gt;find($xpath )</funcsynopsisinfo></funcsynopsis>
<funcsynopsis><funcsynopsisinfo>$object = $xpc-&gt;find($xpath, $context_node )</funcsynopsisinfo></funcsynopsis>
<para>Performs the xpath expression using the current node
as the context of the expression, and returns the result
depending on what type of result the XPath expression
had. For example, the XPath <literal>1 * 3 +
52</literal> results in a <literal>XML::LibXML::Number</literal> object
being returned. Other expressions might return a
<literal>XML::LibXML::Boolean</literal> object, or a
<literal>XML::LibXML::Literal</literal> object (a string). Each of those
objects uses Perl's overload feature to ``do the right
thing'' in different contexts. Optionally, a node may be
passed as a second argument to set the context node for
the query.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>findvalue</term>
<listitem><funcsynopsis><funcsynopsisinfo>$value = $xpc-&gt;findvalue($xpath )</funcsynopsisinfo></funcsynopsis>
<funcsynopsis><funcsynopsisinfo>$value = $xpc-&gt;findvalue($xpath, $context_node )</funcsynopsisinfo></funcsynopsis>
<para>Is exactly equivalent to:</para>
<programlisting>$node-&gt;find( $xpath )-&gt;to_literal;</programlisting>
<para>That is, it returns the literal value of the
results. This enables you to ensure that you get a string
back from your search, allowing certain shortcuts. This
could be used as the equivalent of &lt;xsl:value-of
select=``some_xpath''/&gt;. Optionally, a node may be
passed in the second argument to set the context node for
the query.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>setContextNode</term>
<listitem><funcsynopsis><funcsynopsisinfo>$xpc-&gt;setContextNode($node)</funcsynopsisinfo></funcsynopsis>
<para>Set the current context node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getContextNode</term>
<listitem>
<funcsynopsis><funcsynopsisinfo>my $node = $xpc-&gt;getContextNode;</funcsynopsisinfo></funcsynopsis>
<para>Get the current context node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>setContextPosition</term>
<listitem><funcsynopsis><funcsynopsisinfo>$xpc-&gt;setContextPosition($position)</funcsynopsisinfo></funcsynopsis>
<para>
Set the current context position. By default, this
value is -1 (and evaluating XPath function
<literal>position()</literal> in the initial context
raises an XPath error), but can be set to any value up
to context size. This usually only serves to cheat the
XPath engine to return given position when
<literal>position()</literal> XPath function is
called. Setting this value to -1 restores the default
behavior.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getContextPosition</term>
<listitem>
<funcsynopsis><funcsynopsisinfo>my $position = $xpc-&gt;getContextPosition;</funcsynopsisinfo></funcsynopsis>
<para>Get the current context position.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>setContextSize</term>
<listitem><funcsynopsis><funcsynopsisinfo>$xpc-&gt;setContextSize($size)</funcsynopsisinfo></funcsynopsis>
<para>
Set the current context size. By default, this value is -1 (and
evaluating XPath function <literal>last()</literal> in
the initial context raises an XPath error), but can be
set to any non-negative value. This usually only serves
to cheat the XPath engine to return the given value when
<literal>last()</literal> XPath function is called. If
context size is set to 0, position is automatically also
set to 0. If context size is positive, position is
automatically set to 1. Setting context size to -1
restores the default behavior.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getContextSize</term>
<listitem>
<funcsynopsis><funcsynopsisinfo>my $size = $xpc-&gt;getContextSize;</funcsynopsisinfo></funcsynopsis>
<para>Get the current context size.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>setContextNode</term>
<listitem><funcsynopsis><funcsynopsisinfo>$xpc-&gt;setContextNode($node)</funcsynopsisinfo></funcsynopsis>
<para>Set the current context node.</para>
</listitem>
</varlistentry>
</variablelist>
</sect1>
<sect1>
<title>Bugs And Caveats</title>
<para>
XML::LibXML::XPathContext objects
<emphasis>are</emphasis> reentrant, meaning that you can call
methods of an XML::LibXML::XPathContext even from XPath
extension functions registered with the same object or from a
variable lookup function. On the other hand, you should rather
avoid registering new extension functions, namespaces and a
variable lookup function from within extension functions and a
variable lookup function, unless you want to experience
untested behavior.
</para>
</sect1>
<sect1>
<title>Authors</title>
<para>Ilya Martynov and Petr Pajas, based on
XML::LibXML and XML::LibXSLT code by Matt Sergeant and
Christian Glahn.</para>
</sect1>
<sect1>
<title>Historical remark</title>
<para>Prior to XML::LibXML 1.61 this module was distributed separately
for maintenance reasons.
</para>
</sect1>
</chapter>
<chapter>
<title>XML::LibXML::Reader - interface to libxml2 pull parser</title>
<titleabbrev>XML::LibXML::Reader</titleabbrev>
<sect1>
<title>Synopsis</title>
<programlisting>use XML::LibXML::Reader;</programlisting>
<programlisting>$reader = new XML::LibXML::Reader(location => "file.xml")
or die "cannot read file.xml\n";
while ($reader-&gt;read) {
processNode($reader);
}</programlisting>
<programlisting>
sub processNode {
$reader = shift;
printf "%d %d %s %d\n", ($reader-&gt;depth,
$reader-&gt;nodeType,
$reader-&gt;name,
$reader-&gt;isEmptyElement);
}
</programlisting>
<para>or</para>
<programlisting>
$reader = new XML::LibXML::Reader(location => "file.xml")
or die "cannot read file.xml\n";
$reader-&gt;preservePattern('//table/tr');
$reader-&gt;finish;
print $reader-&gt;document-&gt;toString(1);
</programlisting>
</sect1>
<sect1>
<title>DESCRIPTION</title>
<para>This is a perl interface to libxml2's pull-parser implementation
xmlTextReader
<emphasis>http://xmlsoft.org/html/libxml-xmlreader.html</emphasis>.
This feature requires at least libxml2-2.6.21.
Pull-parser (StAX in Java, XmlReader in C#) use an iterator
approach to parse a xml-file. They are easier to program than
event-based parser (SAX) and much more lightweight than
tree-based parser (DOM), which load the complete tree into
memory.</para>
<para>The Reader acts as a cursor going forward on the document
stream and stopping at each node in the way. At every point
DOM-like methods of the Reader object allow to examine the
current node (name, namespace, attributes, etc.)</para>
<para>The user's code keeps control of the progress and simply
calls the <literal>read()</literal> function repeatedly to
progress to the next node in the document order. Other
functions provide means for skipping complete sub-trees, or
nodes until a specific element, etc.</para>
<para>At every time, only a very limited portion of the
document is kept in the memory, which makes the API more
memory-efficient than using DOM. However, it is also possible
to mix Reader with DOM. At every point the user may copy the
current node (optionally expanded into a complete sub-tree)
from the processed document to another DOM tree, or to
instruct the Reader to collect sub-document in form of a DOM
tree consisting of selected nodes.</para>
<para>Reader API also supports namespaces, xml:base, entity
handling, and DTD validation. Schema and RelaxNG validation
support will probably be added in some later revision of the
Perl interface.</para>
<para>The naming of methods compared to libxml2 and C#
XmlTextReader has been changed slightly to match the
conventions of XML::LibXML. Some functions have been changed
or added with respect to the C interface.</para>
</sect1>
<sect1>
<title>CONSTRUCTOR</title>
<para>Depending on the XML source, the Reader object can be created with either of:</para>
<programlisting>
my $reader = XML::LibXML::Reader-&gt;new( location =&gt; "file.xml", ... );
my $reader = XML::LibXML::Reader-&gt;new( string =&gt; $xml_string, ... );
my $reader = XML::LibXML::Reader-&gt;new( IO =&gt; $file_handle, ... );
my $reader = XML::LibXML::Reader-&gt;new( DOM =&gt; $dom, ... );
</programlisting>
<para>where ... are (optional) reader options described below in Parser options.
The constructor recognizes the following XML sources:</para>
<sect2>
<title>Source specification</title>
<variablelist>
<varlistentry>
<term>location</term>
<listitem>
<para>Read XML from a local file or URL.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>string</term>
<listitem>
<para>Read XML from a string.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>IO</term>
<listitem>
<para>Read XML a Perl IO filehandle.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>FD</term>
<listitem>
<para>Read XML from a file descriptor (bypasses Perl I/O
layer, only applicable to filehandles for regular
files or pipes). Possibly faster than IO.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>DOM</term>
<listitem>
<para>Use reader API to walk through a pre-parsed
XML::LibXML::Document.</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2>
<title>Parsing options</title>
<variablelist>
<varlistentry>
<term>URI</term>
<listitem>
<para>can be used to provide baseURI when parsing
strings or filehandles.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>encoding</term>
<listitem>
<para>override document encoding.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>RelaxNG</term>
<listitem>
<para>can be used to pass either a XML::LibXML::RelaxNG
object or a filename or URL of a RelaxNG schema to the
constructor. The schema is then used to validate the
document as it is processed.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Schema</term>
<listitem>
<para>can be used to pass either a XML::LibXML::Schema
object or a filename or URL of a W3C XSD schema to the
constructor. The schema is then used to validate the
document as it is processed.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>recover</term>
<listitem>
<para>recover on errors (0 or 1)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>expand_entities</term>
<listitem>
<para>substitute entities (0 or 1)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>load_ext_dtd</term>
<listitem>
<para>load the external subset (0 or 1)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>complete_attributes</term>
<listitem>
<para>default DTD attributes (0 or 1)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>validation</term>
<listitem>
<para>validate with the DTD (0 or 1)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>suppress_errors</term>
<listitem>
<para>suppress error reports (0 or 1)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>suppress_warnings</term>
<listitem>
<para>suppress warning reports (0 or 1)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>pedantic_parser</term>
<listitem>
<para>pedantic error reporting (0 or 1)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>no_blanks</term>
<listitem>
<para>remove blank nodes (0 or 1)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>expand_xinclude</term>
<listitem>
<para>Implement XInclude substitution (0 or 1)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>no_network</term>
<listitem>
<para>Forbid network access (0 or 1)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>clean_namespaces</term>
<listitem>
<para>remove redundant namespaces declarations (0 or 1)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>no_cdata</term>
<listitem>
<para>merge CDATA as text nodes (0 or 1)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>no_xinclude_nodes</term>
<listitem>
<para>do not generate XINCLUDE START/END nodes (0 or 1)</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
</sect1>
<sect1>
<title>METHODS CONTROLLING PARSING PROGRESS</title>
<variablelist>
<varlistentry>
<term>read ()</term>
<listitem>
<para>Moves the position to the next node in the stream,
exposing its properties.</para>
<para>Returns 1 if the node was read successfully, 0 if
there is no more nodes to read, or -1 in case of
error</para>
</listitem>
</varlistentry>
<varlistentry>
<term>readAttributeValue ()</term>
<listitem>
<para>Parses an attribute value into one or more Text and
EntityReference nodes.</para>
<para>Returns 1 in case of success, 0 if the reader was not positioned on an attribute node or all the attribute values have been read, or -1 in case of error.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>readState ()</term>
<listitem>
<para>Gets the read state of the reader. Returns the state
value, or -1 in case of error. The module exports
constants for the Reader states, see STATES
below.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>depth ()</term>
<listitem>
<para>The depth of the node in the tree, starts at 0 for
the root node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>next ()</term>
<listitem>
<para>Skip to the node following the current one in the
document order while avoiding the sub-tree if any.
Returns 1 if the node was read successfully, 0 if there
is no more nodes to read, or -1 in case of error.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>nextElement (localname?,nsURI?)</term>
<listitem>
<para>Skip nodes following the current one in the document
order until a specific element is reached. The element's
name must be equal to a given localname if defined, and
its namespace must equal to a given nsURI if defined.
Either of the arguments can be undefined (or omitted, in
case of the latter or both).</para>
<para>Returns 1 if the element was found, 0 if there is no more nodes to read, or -1 in case of error.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>skipSiblings ()</term>
<listitem>
<para>Skip all nodes on the same or lower level until the
first node on a higher level is reached. In particular,
if the current node occurs in an element, the reader
stops at the end tag of the parent element, otherwise it
stops at a node immediately following the parent
node.</para>
<para>Returns 1 if successful, 0 if end of the document is reached, or -1 in case of error.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>nextSibling ()</term>
<listitem>
<para>It skips to the node following the current one in
the document order while avoiding the sub-tree if
any.</para>
<para>Returns 1 if the node was read successfully, 0 if
there is no more nodes to read, or -1 in case of
error</para>
</listitem>
</varlistentry>
<varlistentry>
<term>nextSiblingElement (name?,nsURI?)</term>
<listitem>
<para>Like nextElement but only processes sibling elements
of the current node (moving forward using
<literal>nextSibling ()</literal> rather than
<literal>read ()</literal>, internally).</para>
<para>Returns 1 if the element was found, 0 if there is no
more sibling nodes, or -1 in case of error.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>finish ()</term>
<listitem>
<para>Skip all remaining nodes in the document, reaching end of the document.</para>
<para>Returns 1 if successful, 0 in case of error.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>close ()</term>
<listitem>
<para>This method releases any resources allocated by the
current instance and closes any underlying input. It
returns 0 on failure and 1 on success. This method is
automatically called by the destructor when the reader
is forgotten, therefore you do not have to call it
directly.</para>
</listitem>
</varlistentry>
</variablelist>
</sect1>
<sect1>
<title>METHODS EXTRACTING INFORMATION</title>
<variablelist>
<varlistentry>
<term>name ()</term>
<listitem>
<para>Returns the qualified name of the current node, equal to (Prefix:)LocalName.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>nodeType ()</term>
<listitem>
<para>Returns the type of the current node. See NODE TYPES below.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>localName ()</term>
<listitem>
<para>Returns the local name of the node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>prefix ()</term>
<listitem>
<para>Returns the prefix of the namespace associated with the node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>namespaceURI ()</term>
<listitem>
<para>Returns the URI defining the namespace associated with the node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>isEmptyElement ()</term>
<listitem>
<para>Check if the current node is empty, this is a bit
bizarre in the sense that &lt;a/&gt; will be considered
empty while &lt;a&gt;&lt;/a&gt; will not.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>hasValue ()</term>
<listitem>
<para>Returns true if the node can have a text value.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>value ()</term>
<listitem>
<para>Provides the text value of the node if present or undef if not available.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>readInnerXml ()</term>
<listitem>
<para>Reads the contents of the current node, including
child nodes and markup. Returns a string containing the
XML of the node's content, or undef if the current node
is neither an element nor attribute, or has no child
nodes.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>readOuterXml ()</term>
<listitem>
<para>Reads the contents of the current node, including
child nodes and markup.</para>
<para>Returns a string containing the XML of the node
including its content, or undef if the current node is
neither an element nor attribute.</para>
</listitem>
</varlistentry>
</variablelist>
</sect1>
<sect1>
<title>METHODS EXTRACTING DOM NODES</title>
<variablelist>
<varlistentry>
<term>document ()</term>
<listitem>
<para>Provides access to the document tree built by the
reader. This function can be used to collect the
preserved nodes (see <literal>preserveNode()</literal>
and preservePattern).</para>
<para>CAUTION: Never use this function to modify the tree
unless reading of the whole document is
completed!</para>
</listitem>
</varlistentry>
<varlistentry>
<term>copyCurrentNode (deep)</term>
<listitem>
<para>This function is similar a DOM function
<literal>copyNode()</literal>. It returns a copy of the
currently processed node as a corresponding DOM object.
Use deep = 1 to obtain the full sub-tree.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>preserveNode ()</term>
<listitem>
<para>This tells the XML Reader to preserve the current
node in the document tree. A document tree consisting of
the preserved nodes and their content can be obtained
using the method <literal>document()</literal> once
parsing is finished.</para>
<para>Returns the node or NULL in case of error.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>preservePattern (pattern,\%ns_map)</term>
<listitem>
<para>This tells the XML Reader to preserve all nodes
matched by the pattern (which is a streaming XPath
subset). A document tree consisting of the preserved
nodes and their content can be obtained using the method
<literal>document()</literal> once parsing is
finished.</para>
<para>An optional second argument can be used to provide a
HASH reference mapping prefixes used by the XPath to
namespace URIs.</para>
<para>The XPath subset available with this function is
described at</para>
<programlisting>http://www.w3.org/TR/xmlschema-1/#Selector</programlisting>
<para>and matches the production</para>
<programlisting>Path ::= ('.//')? ( Step '/' )* ( Step | '@' NameTest )</programlisting>
<para>Returns a positive number in case of success and -1
in case of error</para>
</listitem>
</varlistentry>
</variablelist>
</sect1>
<sect1>
<title>METHODS PROCESSING ATTRIBUTES</title>
<variablelist>
<varlistentry>
<term>attributeCount ()</term>
<listitem>
<para>Provides the number of attributes of the current
node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>hasAttributes ()</term>
<listitem>
<para>Whether the node has attributes.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getAttribute (name)</term>
<listitem>
<para>Provides the value of the attribute with the
specified qualified name.</para>
<para>Returns a string containing the value of the
specified attribute, or undef in case of error.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getAttributeNs (localName, namespaceURI)</term>
<listitem>
<para>Provides the value of the specified
attribute.</para>
<para>Returns a string containing the value of the
specified attribute, or undef in case of error.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getAttributeNo (no)</term>
<listitem>
<para>Provides the value of the attribute with the
specified index relative to the containing
element.</para>
<para>Returns a string containing the value of the
specified attribute, or undef in case of error.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>isDefault ()</term>
<listitem>
<para>Returns true if the current attribute node was
generated from the default value defined in the
DTD.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>moveToAttribute (name)</term>
<listitem>
<para>Moves the position to the attribute with the
specified local name and namespace URI.</para>
<para>Returns 1 in case of success, -1 in case of error, 0
if not found</para>
</listitem>
</varlistentry>
<varlistentry>
<term>moveToAttributeNo (no)</term>
<listitem>
<para>Moves the position to the attribute with the
specified index relative to the containing
element.</para>
<para>Returns 1 in case of success, -1 in case of error, 0
if not found</para>
</listitem>
</varlistentry>
<varlistentry>
<term>moveToAttributeNs (localName,namespaceURI)</term>
<listitem>
<para>Moves the position to the attribute with the
specified local name and namespace URI.</para>
<para>Returns 1 in case of success, -1 in case of error, 0
if not found</para>
</listitem>
</varlistentry>
<varlistentry>
<term>moveToFirstAttribute ()</term>
<listitem>
<para>Moves the position to the first attribute associated
with the current node.</para>
<para>Returns 1 in case of success, -1 in case of error, 0
if not found</para>
</listitem>
</varlistentry>
<varlistentry>
<term>moveToNextAttribute ()</term>
<listitem>
<para>Moves the position to the next attribute associated
with the current node.</para>
<para>Returns 1 in case of success, -1 in case of error, 0
if not found</para>
</listitem>
</varlistentry>
<varlistentry>
<term>moveToElement ()</term>
<listitem>
<para>Moves the position to the node that contains the
current attribute node.</para>
<para>Returns 1 in case of success, -1 in case of error, 0
if not moved</para>
</listitem>
</varlistentry>
<varlistentry>
<term>isNamespaceDecl ()</term>
<listitem>
<para>Determine whether the current node is a namespace
declaration rather than a regular attribute.</para>
<para>Returns 1 if the current node is a namespace
declaration, 0 if it is a regular attribute or other
type of node, or -1 in case of error.</para>
</listitem>
</varlistentry>
</variablelist>
</sect1>
<sect1>
<title>OTHER METHODS</title>
<variablelist>
<varlistentry>
<term>lookupNamespace (prefix)</term>
<listitem>
<para>Resolves a namespace prefix in the scope of the
current element.</para>
<para>Returns a string containing the namespace URI to
which the prefix maps or undef in case of error.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>encoding ()</term>
<listitem>
<para>Returns a string containing the encoding of the
document or undef in case of error.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>standalone ()</term>
<listitem>
<para>Determine the standalone status of the document
being read. Returns 1 if the document was declared to be
standalone, 0 if it was declared to be not standalone,
or -1 if the document did not specify its standalone
status or in case of error.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>xmlVersion ()</term>
<listitem>
<para>Determine the XML version of the document being
read. Returns a string containing the XML version of the
document or undef in case of error.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>baseURI ()</term>
<listitem>
<para>The base URI of the node. See the XML Base W3C
specification.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>isValid ()</term>
<listitem>
<para>Retrieve the validity status from the parser.</para>
<para>Returns 1 if valid, 0 if no, and -1 in case of
error.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>xmlLang ()</term>
<listitem>
<para>The xml:lang scope within which the node
resides.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>lineNumber ()</term>
<listitem>
<para>Provide the line number of the current parsing
point.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>columnNumber ()</term>
<listitem>
<para>Provide the column number of the current parsing
point.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>byteConsumed ()</term>
<listitem>
<para>This function provides the current index of the
parser relative to the start of the current entity. This
function is computed in bytes from the beginning
starting at zero and finishing at the size in bytes of
the file if parsing a file. The function is of constant
cost if the input is UTF-8 but can be costly if run on
non-UTF-8 input.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>setParserProp (prop =&gt; value, ...)</term>
<listitem>
<para>Change the parser processing behaviour by changing
some of its internal properties. The following
properties are available with this function:
``load_ext_dtd'', ``complete_attributes'',
``validation'', ``expand_entities''.</para>
<para>Since some of the properties can only be changed
before any read has been done, it is best to set the
parsing properties at the constructor.</para>
<para>Returns 0 if the call was successful, or -1 in case
of error</para>
</listitem>
</varlistentry>
<varlistentry>
<term>getParserProp (prop)</term>
<listitem>
<para>Get value of an parser internal property. The
following property names can be used: ``load_ext_dtd'',
``complete_attributes'', ``validation'',
``expand_entities''.</para>
<para>Returns the value, usually 0 or 1, or -1 in case of
error.</para>
</listitem>
</varlistentry>
</variablelist>
</sect1>
<sect1>
<title>DESTRUCTION</title>
<para>XML::LibXML takes care of the reader object destruction
when the last reference to the reader object goes out of
scope. The document tree is preserved, though, if either of
$reader-&gt;document or $reader-&gt;preserveNode was used and
references to the document tree exist.</para>
</sect1>
<sect1>
<title>NODE TYPES</title>
<para>The reader interface provides the following constants for
node types (the constant symbols are exported by default or if
tag <literal>:types</literal> is used).</para>
<programlisting>
XML_READER_TYPE_NONE =&gt; 0
XML_READER_TYPE_ELEMENT =&gt; 1
XML_READER_TYPE_ATTRIBUTE =&gt; 2
XML_READER_TYPE_TEXT =&gt; 3
XML_READER_TYPE_CDATA =&gt; 4
XML_READER_TYPE_ENTITY_REFERENCE =&gt; 5
XML_READER_TYPE_ENTITY =&gt; 6
XML_READER_TYPE_PROCESSING_INSTRUCTION =&gt; 7
XML_READER_TYPE_COMMENT =&gt; 8
XML_READER_TYPE_DOCUMENT =&gt; 9
XML_READER_TYPE_DOCUMENT_TYPE =&gt; 10
XML_READER_TYPE_DOCUMENT_FRAGMENT =&gt; 11
XML_READER_TYPE_NOTATION =&gt; 12
XML_READER_TYPE_WHITESPACE =&gt; 13
XML_READER_TYPE_SIGNIFICANT_WHITESPACE =&gt; 14
XML_READER_TYPE_END_ELEMENT =&gt; 15
XML_READER_TYPE_END_ENTITY =&gt; 16
XML_READER_TYPE_XML_DECLARATION =&gt; 17
</programlisting>
</sect1>
<sect1>
<title>STATES</title>
<para>The following constants represent the values returned by
<literal>readState()</literal>. They are exported by default,
or if tag <literal>:states</literal> is used:</para>
<programlisting>
XML_READER_NONE =&gt; -1
XML_READER_START =&gt; 0
XML_READER_ELEMENT =&gt; 1
XML_READER_END =&gt; 2
XML_READER_EMPTY =&gt; 3
XML_READER_BACKTRACK =&gt; 4
XML_READER_DONE =&gt; 5
XML_READER_ERROR =&gt; 6
</programlisting>
</sect1>
<sect1>
<title>VERSION</title>
<para>0.02</para>
</sect1>
<sect1>
<title>AUTHORS</title>
<para>Heiko Klein, &lt;H.Klein@gmx.net&lt;gt&gt; and Petr Pajas,
&lt;pajas@matfyz.cz&lt;gt&gt;</para>
</sect1>
<sect1>
<title>SEE ALSO</title>
<para>http://xmlsoft.org/html/libxml-xmlreader.html</para>
<para>http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html</para>
</sect1>
</chapter>
</book>