The PHP 4 DOMXML extension has undergone some serious transformation since PHP5 and is a lot easier to use. Unlike SimpleXML, DOM can, at times, be cumbersome and unwiedly. However, it is often a better choice than SimpleXML. Please join me and find out why.

Since SimpleXML and DOM objects are interoperable you can use the former for simplicity and the latter for power. How you can exchange data between the two extensions is explained at the bottom of the article.
The DOM extension is especially useful when you want to modify XML documents , as SimpleXML for example does not allow to remove nodes from an XML document. For this article's code examples we will use the same foundation that we used in the Parsing XML with SimpleXML post.
We will use this very site's google sitemap file, which can be downloaded here. The sitemap.xml file features an xml list of pages of php-coding-practices.com for easy indexing in google.

Loading and Saving XMLDocuments

The DOM extension, just like SimpleXML, provides two ways to load xml documents - either by string or by filename:

Notice that the sitemap xml file contains a namespace already, which we register using DomXPath::registerNamespace():

< ?xml version="1.0" encoding="UTF-8"?>

We really have to register that namespace with the DomXPath object or else it will not know where to search. ;) You can also register multiple namespaces, but more on that later. Notice that we use text() within the xpath query to get the actual text contents of the nodes.
If you want to learn the ins and outs of the xpath language, I recommend reading the W3C XPath Reference.

The code is pretty self-explanatory. First we create a new url element as well as some sub-elements. Then we append those sub-elements to the url element, which we in turn append to the document's root element. Note that the root element can be accessed via the $dom->documentElement property. The output:

Now it was certainly not as easy as it would have been had we used SimpleXML. The DOM extension provides many more methods for more power. For example you can associate a namespace with an element
while creation using DomDocument::createElementNS(). I will provide some example code on that later in the article.

Adding Attributes To Nodes

Here we set a fictive meta:level attribute with the value 3 to our url NodeElement from above.

Moving Data

Moving data is not as obvious as you might expect, as the DOM extension does not provide a real method that takes care of that, explicitly. Instead we will have to use
a combination of DomDocument::insertBefore(). As an example, suppose we want to move our new url from above just before the very first url:

DomDocument::insertBefore() takes two parameters, the new node and the reference node. It inserts the new node before the reference node. In our example, we insert the second url ($result->item(1)) before the first one ($result->item(0)).
I hear you asking why we use DomDocument::insertBefore() on the $result->item(1)->parentNode node.. Couldn't we just as easily use simply $result->item(0)? No of course not, as we need to execute DomDocument::insertBefore() on the root element, urlset, and not a specific url (look at our xpath query).
We could use the following code which is perfectly valid and gets us the same results, though:

The important thing here is that you have to supply omNode::cloneNode() with a true parameter (default is false), so that it copies all descendant nodes as well. If we had left that to false, we would have gotten an empty <url></url> node, which is not desirable. ;)

Modifying Node Data

When modifying node data, you want to modify the CDATA within a node. You can use xpath again to find the node you want to edit and then simply supply a new value to its data property:

This code transforms the location data of the second url to uppercase letters.

Removing Data From XML Documents

There are three types of data that you would possbily want to remove from xml documents: elements, attributes and CDATA. The DOM extension provides a method for each of them:DomElement::removeAttribute(), DomNode::removeChild() and DomCharacterData::deleteData(). We will use a custom xml document and not our sitemap to demonstrate their behavior. This makes it easier for you
to come back to this article and see at first glance how these methods work. Thank Nikos if you want to. ;)

<text>This is some other really cool text!</text><texttype="misc"></text><texttype="output">This is text!</text>

In this example we start by retrieving all text nodes from a document. Then we remove some data from that document. Simple.
In fact we remove the first node alltogether as well as the attribute of the second node. Finally we truncate the character data of the third node, using xpath to query the corresponding text() node.
Note that DomCharacterData::deleteData() requires a starting offset and a length parameter. Since we want to truncate the data in our example we supply 0 and the length of the CDATA node.

DOM And Working With Namespaces

DOM is very capable of handling namespaces on its own. Most of the time you can ignore them and pass attribute and element names with the appropriate prefix directly to most DOM functions.

Interfacing With SimpleXML

As I have mentioned at the start of our little DOM journey it is very easy to exchange loaded documents between SimpleXML and DOM. Therefore, you can take advantage of both
systems' strengths - SimpleXML's simplicity and DOM's power.

You can import SimpleXML object into DOM by using PHP's dom_import_simplexml() function:

DomDocument::importNode() creates a copy of the node and associates it with the current document. Its second parameter - a boolean value - determines if the method will recursively import the subtree or not.

You can also import a dom object into SimpleXML using simple_xml_import_dom():

Conclusion

DOM is certainly a very powerful way of dealing with XML documents. While it provides a good interface for basically every task one could dream of it often takes quite a lot of code lines to accomplish a task. SimpleXML's interface is of course a little easier, but less powerful.

Especially the fact that SimpleXML is rather incapable of removing data makes DOM the way to go for more complicated XML document processing. DOM's power in dealing with namespaces make it a valuable tool when dealing with large portions of data where naming conflicts are likely.

In fact we covered only a small portion of DOM's power. There are many other associating objects which have several useful methods. For example, we have not covered how to append character data. Check the DOM function reference for more information.

Thanks for staying with me on the DOM-boot till the end of our joirney! I hope you enjoyed it - please beware of the gap between the boot and the footbridge when leaving.

[...] Parsing XML With The DOM Library | PHP Coding Practices - Become an expert PHP Programmer - Great article that goes deep into the the DOMXML extensions of PHP, showing you how to do serious manipulation of XML documents. Includes loading, parsing and writing XML docs, using XPath queries, adding nodes, removing data and more. [...]

noob said
on Jun 29, 2007:

good stuff on namespaces; didn't see anyting about validating a document (dtd, schema) or error handling

Parsing XML With The DOM Library | PHP Coding Practices - Become an expert PHP Programmer - Great article that goes deep into the the DOMXML extensions of PHP, showing you how to do serious manipulation of XML documents. Includes loading, parsing and writing XML docs, using XPath queries, adding nodes, removing data and more

Hi Tim,
I'm wondering how do you format the xml output? Like in the 8th panel above (titled "XML") the output is nicely formatted with each node on a separate line. My output is always in one very very long line. Any tips are appreciated.
Thank you!

Ok, I found a nice solution to formatting the XML output - it is using the newline \n (and tab \t) characters. The idea is to append these before and after the text-node value, as shown in simple example below:

"The DomDocument::loadHTML() method will automatically add a DTD (Document Type Definition) and add the missing end-tag for the opened p-tag. Cool, isn't it?"
NO, IT IS NOT! It sucks big time, like anything that should be controlled but isn't! I'm still struggling to avoid JUST THIS automatic insertions, because I need to work on bits of HTML, without the DTD, HEAD, BODY tags.
What would be really cool is to find a way of using loadHTML, do your thing then output with saveHTML WITHOUT having to put up with these unwelcomed and uncontrollable additions by the PHP DOM. Jesus!

I agree with YoDaddy! I too am trying to work with HTMl fragments, and I am finding the "wonderful" auto DTD insertion a blinding headache! I've been at this for over 17 hours now and I still can't find a solution. If anyone has any suggestions then I would greatly welcome them!

Still a pretty cool article though, I did use it to start learning manipulation of XML with PHPs Dom a few months ago :)

Hmm, I don't see a solution either, other than manually editing the file later. The API obviously does not allow it. So one would need to write a wrapper around saveHtml() and that removes the unwelcomed additions.

Ben: You say you have been at this for 17 hours, did you not try to manually edit the file after?

In your examples, the namespace declaration is repeated (in both the root node and in the element where it is used). Having the namespace declaration repeated in every element is a lot of overhead when there are many elements. Any idea how to make it only appear in the root node?

Actually for me, it only shows up in every node where it's used, but not in the root. Using PHP 5.1.3