Login

Parsing Child Nodes with the DOM XML extension in PHP 5

In this last chapter of the series, I’m going to teach you how to handle the child nodes of an XML document by way of two simple methods, called hasChildNode() and removeChild() respectively. So let’s not waste any more time in preliminaries and learn how to use them in a helpful way.

During your life as a PHP developer, it’s quite possible that you’ve already built web applications that work with XML documents that needed to be parsed in one form or another. Of course, based on your own experience, you know that this process can be challenging, since handling XML data requires frequently traversing document nodes, dealing with attributes, copying elements from one place to another, and so forth.

Fortunately, PHP comes packaged with a powerful extension, called DOM XML, that can be used to work on XML documents by using the API provided by the Document Object Model (commonly known as DOM). Thus, if you wish to learn the most important methods provided by this XML library, then this series of articles may be what you’re looking for.

Welcome to the final part of the series “A quick overview of the DOM XML extension in PHP 5.” It is made up of seven approachable tutorials that go through some of the most relevant features that come bundled with this helpful PHP extension and is aimed at parsing XML documents in a painless way.

Having already established the objective of this series of articles, now I’m going to briefly rehash the items that were treated in the preceding tutorial, so you can link them more easily with what I plan to discuss in this last installment. In the aforementioned article, I demonstrated how to retrieve the attributes corresponding to a certain number of nodes within an XML document via the “getAttribute()” method.

In addition to teaching you how to work with the previous method, I showed you another one, called “hasAttribute()”, which came in handy for determining whether a particular element of an XML document contained an attribute or not. And lastly, I finished the tutorial by teaching you how to clone different nodes using the “cloneNode()” method, which is a process that hopefully was quite easy to understand.

So far, so good. At this stage, you’re armed with the right pointers to start using some of the most useful methods that come included with the DOM XML extension. However, the question that comes up now is: are there any additional features that still remain uncovered? Actually, this library has many other methods that can be useful for parsing XML documents, but in this last part of the series, I’ll be covering only a couple more. If you want to have a full reference guide on the DOM XML extension, the best place to go is the official PHP website.

Let’s get started now!

{mospagebreak title=Reintroducing a few methods of the DOM XML extension}

Before you start learning a brand new pair of methods aimed at handling the child nodes of a specified XML document, first I’d like to reintroduce some methods that were discussed in the last tutorial of the series. This way, you can fill some gaps regarding their concrete utilization.

Below I included two basic hands-on examples that demonstrate how to use the “getAttribute()” and “has Attribute()” methods in order to work with the respective attributes of a sample “headlines.xml” file:

Here are the corresponding code samples, along with the definition of the mentioned XML file. So take a close look at them, please:

// definition of ‘headlines.xml’ file

<?xml version="1.0" encoding="iso-8859-1"?>

<headlines>

<headline id="economics">

<image>image1.jpg</image>

<url>Link for headline 1 goes here</url>

<text>Text for headline 1 goes here</text>

</headline>

<headline id="sports">

<image>image2.jpg</image>

<url>Link for headline 2 goes here</url>

<text>Text for headline 2 goes here</text>

</headline>

<headline id="jetset">

<image>image3.jpg</image>

<url>Link for headline 3 goes here</url>

<text>Text for headline 3 goes here</text>

</headline>

<headline id="technology">

<image>image4.jpg</image>

<url>Link for headline 4 goes here</url>

<text>Text for headline 4 goes here</text>

</headline>

<headline id="art">

<image>image5.jpg</image>

<url>Link for headline 5 goes here</url>

<text>Text for headline 5 goes here</text>

</headline>

</headlines>

// example on using the ‘getAttribute()’ method

$dom=new DOMDocument();

$dom->load(‘headlines.xml’);

$headlines=$dom->getElementsByTagName(‘headline’);

foreach($headlines as $headline){

echo ‘ID attribute of current node is the following: ‘.$headline->getAttribute(‘id’).'<br />';

}

/* displays the following

ID attribute of current node is the following: economics

ID attribute of current node is the following: sports

ID attribute of current node is the following: jetset

ID attribute of current node is the following: technology

ID attribute of current node is the following: art

*/

// example on using the ‘hasAttribute()’ method

$dom=new DOMDocument();

$dom->load(‘headlines.xml’);

$headlines=$dom->getElementsByTagName(‘headline’);

foreach($headlines as $headline){

if($headline->hasAttribute(‘id’)){

echo ‘ID attribute of current node is the following: ‘.$headline->getAttribute(‘id’).'<br />';

}

}

/* displays the following

ID attribute of current node is the following: economics

ID attribute of current node is the following: sports

ID attribute of current node is the following: jetset

ID attribute of current node is the following: technology

ID attribute of current node is the following: art

*/

As demonstrated by the previous examples, checking for the existence of a particular attribute that belongs to a specific XML node and getting its corresponding value is actually a no-brainer process that can be accomplished with minor problems by using the “hasAttribute()” and “getAttribute()” methods shown above.

In this case, the methods in question are used to parse the node attributes of a trivial XML file, but of course, the same business logic can be applied when working with more complex XML data.

With the two earlier code examples well underway, this is an appropriate moment to continue exploring some other helpful features offered by the DOM XML extension. So in accordance with the concepts that I deployed in the introduction, in the next section of this article, I’m going to show you how to determine whether or not a specified node within an XML document has child nodes.

As you might have guessed, to see the complete details about how this will be achieved, simply click on the linked title that appears below and keep reading.

{mospagebreak title=Working with the hasChildNodes() method}

So far, you learned how to use the functionality provided by the DOM XML extension to iterate over a certain number of nodes contained within a simple XML document and retrieve their respective attributes. Nonetheless, as I suggested before, the library has a few additional methods that permit you to work directly with the child nodes of a specified element, as you’ve probably done hundreds of times, when using JavaScript to manipulate web documents on the client-side.

Speaking more specifically, the first method that I’m going to show you, with regard to handling child nodes, is one called “hasChildNodes().” It is tasked with determining whether a concrete node of an XML document also contains sub nodes (or child elements).

To illustrate how the “hasChildNodes()” does its thing, first I’m going to include the definition of the sample “headlines.xml” file that you saw in the previous section. Here it is:

// definition of ‘headlines.xml’ file

<?xml version="1.0" encoding="iso-8859-1"?>

<headlines>

<headline id="economics">

<image>image1.jpg</image>

<url>Link for headline 1 goes here</url>

<text>Text for headline 1 goes here</text>

</headline>

<headline id="sports">

<image>image2.jpg</image>

<url>Link for headline 2 goes here</url>

<text>Text for headline 2 goes here</text>

</headline>

<headline id="jetset">

<image>image3.jpg</image>

<url>Link for headline 3 goes here</url>

<text>Text for headline 3 goes here</text>

</headline>

<headline id="technology">

<image>image4.jpg</image>

<url>Link for headline 4 goes here</url>

<text>Text for headline 4 goes here</text>

</headline>

<headline id="art">

<image>image5.jpg</image>

<url>Link for headline 5 goes here</url>

<text>Text for headline 5 goes here</text>

</headline>

</headlines>

Now that you’ve recalled how the above XML file looks, let me go one step further and show you a short script that traverses the file in question and checks to see if each of its <headline> elements has a child node or not.

The signature of this brand new script is the following:

// example on using the ‘hasChildNodes()’ method

$dom=new DOMDocument();

// load XML data from existing file

$dom->load(‘headlines.xml’);

// get <headline> nodes

$headlines=$dom->getElementsByTagName(‘headline’);

foreach($headlines as $headline){

if($headline->hasChildNodes()){

echo ‘This node has children!<br />';

}

}

/* displays the following

This node has children!

This node has children!

This node has children!

This node has children!

This node has children!

*/

Despite the simplicity of the above code sample, I’m reasonably sure that it’s been useful enough to demonstrate the logic that drives the “hasChildNodes()” method. On this specific occasion, since each <headline> element within the pertinent XML document does wrap a group of child nodes, this condition is reflected by echoing an indicative message on the browser. Grasping how this method does its thing definitely isn’t rocket science.

Well, at this point you should congratulate yourself, since you learned one more method in the vast arsenal provided by the DOM XML extension. Nonetheless, if you’re used to working with XML documents on a frequent basis, then it’s highly possible that you’re wondering if this extension has the capability to remove an element’s child node. Luckily, the answer is an emphatic yes! And this will be the last topic that I’m going to discuss in this tutorial.

Deleting a child node from a given XML document is a process that must be performed by way of yet another method of the DOM XML extension, whose name is “removeChild().”

To see how it will be implemented in the context of a concrete example, please click on the link below and keep reading. We’re almost done!

{mospagebreak title=The removeChild() method}

If you’ve ever used the DOM API with JavaScript to dynamically remove several elements of a web page, then you’ll find the “removeChild()” method of the DOM XML extension very easy to grasp, since its functionality is identical to that of the one provided by its client-side counter part.

So, to do things a bit easier and faster, I’m going to use the same “headlines.xml” file that you saw earlier to exemplify how the “removeChild()” method works. That being said, here’s the familiar signature of this basic XML file:

// definition of ‘headlines.xml’ file

<?xml version="1.0" encoding="iso-8859-1"?>

<headlines>

<headline id="economics">

<image>image1.jpg</image>

<url>Link for headline 1 goes here</url>

<text>Text for headline 1 goes here</text>

</headline>

<headline id="sports">

<image>image2.jpg</image>

<url>Link for headline 2 goes here</url>

<text>Text for headline 2 goes here</text>

</headline>

<headline id="jetset">

<image>image3.jpg</image>

<url>Link for headline 3 goes here</url>

<text>Text for headline 3 goes here</text>

</headline>

<headline id="technology">

<image>image4.jpg</image>

<url>Link for headline 4 goes here</url>

<text>Text for headline 4 goes here</text>

</headline>

<headline id="art">

<image>image5.jpg</image>

<url>Link for headline 5 goes here</url>

<text>Text for headline 5 goes here</text>

</headline>

</headlines>

And here’s a simple script that uses the “removeChild()” method to delete the entirety of the <image> elements contained by the parent <headline> nodes.

The corresponding code sample looks like this:

// example on removing child nodes using the ‘removeChild()’ method

$dom=new DOMDocument;

// load XML data from existing file

$dom->load(‘headlines.xml’);

// get <headline> nodes

$headlines=$dom->getElementsByTagName(‘headline’);

foreach($headlines as $headline){

while($headline->hasChildNodes()){

// remove child node

$headline->removeChild($headline->childNodes->item(0));

}

}

Do you realize how easy it is to delete child nodes from an XML document? I guess you do! As you saw, in the above hands-on example, I used a combination of the “removeChild()” method and the brand new “childNodes” object in order to navigate to the first item of each “<headline> node and delete it.

Of course, this is merely a basic demonstration of how to implement the “removeChild()” method in a useful manner. But you’re completely free to develop your own testing examples regarding the use of this one and the rest of the methods discussed in this article.

Final thoughts

It’s hard to believe, but we’ve come to the end of this series. In general terms, the experience has been instructive and also fun, since you learned how to use some of the most relevant methods that come packaged with the DOM XML PHP extension.

As I stated at the beginning, this is only a brief guide on what you can do with the DOM API when working with XML documents. If you’re looking for a full reference guide on this library, the best place to go is the official PHP website.