Introduction to the
TreeWalker object of DOM

The TreeWalker object is a powerful DOM2 object that lets you easily
filter through and create custom collections out of nodes in the document.
Ok, this is sounding geeky already, but for geeky jobs requiring parsing the
document tree, it doesn't hurt at all to get familiar with this object. While scripting you may have come across the need to
retrieve all
elements in a webpage with a specific CSS classname, or for a XML document,
elements that carry a particular attribute value. The TreeWalker object makes
light work of accomplishing such tasks. In this tutorial, I'll provide a
introductory look at the TreeWalker object of DOM2, which is a DOM2 method supported in
Firefox/Opera8+ though not IE6 or IE7 (as of beta3).

Before I continue, note that there is a cousin to the
TreeWalker object called NodeIterator, which I'll cover in a future
tutorial.

document.createTreeWalker() method

The TreeWalker object can come off as mysterious and
complicated to some, but it really is just realized through a single method-
document.createTreeWalker(). This method and the 4 parameters it accepts
simplifies what may take many times the conventional coding required to,
say, filter all nodes in the document that are of a certain element type and
carry a particular attribute. But before we get to all that, here's a basic
description of document.createTreeWalker():

While there are 15 different NodeFilter constants to let you
limit the type of nodes returned by TreeWalker, you probably will just be
working with a few of them most of the time. NodeFilter.SHOW_ELEMENT for
example returns all element nodes.

Ok, so you're dying to see a demonstration of
document.createTreeWalker(), a very rudimentary one to start:

In this example, I specify the root node for TreeWalker to
begin traversing to the container with ID "contentarea". The second
parameter for the object specifies that TreeWalker should only crawl element
nodes (versus text nodes, comment nodes etc) within the container. The third
parameter, set to null, means no additional filtering should be done (not
yet!). The 4th parameter concerns whether entity references should be
expanded, and is set to false. With all the parameters in place, "walker"
now references all elements (P, SPAN, and B) within the DIV, along with the
DIV itself.

TreeWalker traversal methods

Having created a filtered list of nodes using
document.createTreeWalker(), you can then process these filtered nodes using
TreeWalker's traversal methods:

TreeWalker traversal methods

Method

Description

firstChild()

Travels to and returns the first child of the current node.

lastChild()

Travels to and returns the last child of the current node.

nextNode()

Travels to and returns the next node within the filtered collection of
nodes.

nextSibling()

Travels to and returns the next sibling of the current node.

parentNode()

Travels to and returns the current node's parent node.

previousNode()

Travels to and returns the previous node of the current node.

previousSibling()

Travels to and returns the previous sibling of the current node.

TreeWalker traversal properties

Property

Description

currentNode

Returns the current position/ node of TreeWalker.
Read/write, allowing you to explicitly set the current position of
TreeWalker to a particular node within the nodes returned.

Don't confuse the above methods with the standard DOM
element
properties/
methods;
the ones work exclusively within the TreeWalker object to let you navigate
the filtered nodes.

Using the same example as above, lets see how to use the
traversal methods to walk through the returned nodes:

As you use the traversal methods to step through the
nodes, TreeWalker not only returns the node in question, but travels to it.
This is why after stepping through the nodes using:

while (walker.nextNode())
//code here

I reset TreeWalker's position back to its root node before
trying to retrieve the firstChild of the filtered collection:

walker.currentNode=rootnode //reset TreeWalker pointer to point to root node

This is necessary, since TreeWalker prior to that point has
its pointer directed at the very last node (B element) of the collection due
to the while loop, in which there is no firstChild. and even if there were,
is not the firstChild of the entire filtered collection, but the B
element's!

Ok, another example of traversal in TreeWalker to solidify
our understanding of it:

In this example, I traverse all text nodes of the root
container to get its entire textual content.

You're free to use standard DOM
element properties/
methods
on top of the TreeWalker traversal methods, though the returned information
reflect that node's relationship relative to the entire document, not just
the filtered results. An example should drive this point home:

You may have expected 3 to be alerted; after all, there are
only 3 elements within the UL list. However, "childNode" is a DOM property,
not TreeWalker's, and returns information about a node oblivious to any
filtering that may have taken place by TreeWalker! That's why 7 is returned,
the total number of nodes including text nodes that the UL contains. The
same concept applies to DOM methods that you may invoke on top of a
TreeWalker returned node.

Having learned to navigate the nodes filtered by
document.createTreeWalker(), it's time to see how to refine the filtering
process itself. Recall that the 3rd parameter of document.createTreeWalker()
accepts an optional reference to a filtering function. Lets look at that
next.