If you have used UNIX, Microsoft Windows, Linux, or any operating system, you are probably familiar with the most common operation which consists of locating a file. In fact, the primary characteristic of a file is its location, which is analogous to the address of a house. By that analogy, to locate a house in the city or anywhere, you must follow a path. When it comes to an operating system, every location starts from a drive such as a hard drive, an optical drive (CD, DVD, BD), a USB drive, etc. Inside that drive are directories or folders. Such directories or folders can contain child directories or child folders, also called sub-directories or sub-folders. Those also can have children or sub-directories/sub-folders. At the end, there are files. In reality, a file can belong to any directory, including the root.

You are probably familiar with how hard it can be to locate an element in an XML file. Fortunately, the MSXML library provides various mechanisms through some classes and their methods. Such techniques are available only if you are using MSXML or the .NET Framework. To provide a common mechanism to locate the elements in an XML document, the W3C created the XPath language.

Introduction to XPath and .NET

Microsoft supports XPath in various ways, as a language in its own right, in the MSXML library,
and in the .NET Framework. In this lesson, we will use or address XPath as it is defined by W3C and as it is, or
can be, used in the .NET Framework or in a .NET Framework-based application.

To support XPath, the .NET Framework provides the System.Xml.XPath namespace that
contains various classes. As you should know already, the .NET Framework primarily supports the XML standards
through the System.Xml namespace that contains such valuable classes as XmlDocument,
XmlElement, XmlNode, and XmlNodeList (and many other important classes).
The XmlDocument class gives access to the whole contents of an XML document, starting with the
root node, which is represented by the DocumentElement property.

As you may know already, to open an XML file you can pass
its path to the Load() method of the XmlDocument class. Here is an example:

If the path is invalid or the file cannot be found, the
compiler would throw a FileNotFoundException exception. Otherwise, that is, if the file could be found, you can use
the document as you see fit. For example, you can access its root through the DocumentElement property:

From that root, you can "scan" (or navigate) the XML document for any reason.
The XmlDocument.DocumentElement property is of type XmlElement, which is derived
from XmlLinkedNode, itself a child of the XmlNode class.

An XPath Expression

Introduction

As you have used operating systems before, you are familiar with the way the address of a file is formulated. In Microsoft Windows, an example of the location of a folder is:

H:\Programming\Lessons\

This gives access to the Lessons sub-folder that is a child of the Programming folder created in the H drive. An example of the location of a file is:

H:\Programming\Lessons\Lesson01.htm

This gives access to the Lesson01.htm file located in the Lessons sub-folder that is a child of the Programming folder created in the H drive. You can pass such a location to a method of a class, for example if you are performing file processing using one of the classes of the
System.IO namespace. A file location is considered an expression. Such an expression contains words (for example Lessons) and operators (such as . or : or \). In the same way, the XPath language uses expressions to specify a path.
XPath uses expressions. Internally, there is a program called the XPath parser, or just the parser. That parser receives the expression and analyzes it.
When the parser has finished doing its job, it sends its report to the F# compiler.

To give you the ability to use XPath, the XmlNode class is equipped with a method named
SelectNodes. This method is overloaded with two versions. The
signature of one of the versions is:

Public Function SelectNodes(ByVal xpath As String) As XmlNodeList

As you can see, the XmlNode.SelectNodes() method takes an XPath expression passed as a string. This can be illustrated as follows:

If the XmlNode.SelectNodes() method succeeds in what it is supposed to do, it returns a name, a value, a string, or a list of nodes, as an
XmlNodeList collection. This can be done as follows:

Since XPath is a language, it has rules that an expression must follow.
An expression that follows the rules is referred to as well-formed. In our use
of XPath, two rules will apply: first, the rules of the XPath language, followed by
the rules of the F# language. The primary rule of XPath is that the expression you
formulate must be valid (be well-formed). If the expression doesn't follow the rules
or the expression violates a rule (that is, if the expression is not well-formed, if you
didn't follow the XPath rules), its parser concludes that the expression is not valid.
The parser stops any processing and sends a report to the F# compiler. The F# compiler
would not perform any further processing and it would throw an exception named
XPathException. Here is an example of the compiler displaying an error,

The XPathException is defined in the System.Xml.XPath namespace. If you want, you can catch that exception and take appropriate measures (of course, this class has a
Message property).

If the XPath expression is valid, the parser hands the job to another program referred to as an interpreter. Its job is to produce the result requested by the XPath expression. The interpreter starts "scanning" the document that was initiated by the
XmlDocument.DocumentElement object. We will see various types
of operations that can be requested. For example, you may ask the interpreter to look for a certain element. Another operation may consist of comparing two values. The interpreter checks the document from beginning to end. If it doesn't find any element that responds to the expression, the interpreter sends a message to the
F# compiler that no result was found. In this case, the
XmlNode.SelectNodes() method returns null.

Starting from the beginning of the document, if the interpreter finds a node that responds to the XPath expression, it adds it to its list of results and continues checking the nodes in the document. When it reaches the end of the document, it gets its final list and sends it to the
F# compiler. The compiler stores that list in an
XmlNodeList collection and makes it the returned list of the
XmlNode.SelectNodes() method. You can then use that list as you see fit. Since the XmlNodeList class holds a collection, you can use a for
... in loop to "scan" it. Consider the following XML document whose file is named Videos.xml:

As you may know already, every XML document starts with a root node. If you don't know the name of the root (this could happen if you are using an XML file created by someone else and you are not familiar with the document's structure), you can pass the argument to
XmlElement.SelectNodes(string xpath) method as /. Here is an example:

Both the XPath language and the .NET Framework provide various means to present the result of an XPath expression. As we will see in the next sections, the XPath language provides various operators such as the forward slash / to specify the path, or where to start considering the path, to a node. On the other hand, the
XmlNode class is equipped with the InnerText, the
InnerXml, and the OuterXml properties that produce various results as we will see.

As you should know already, the XmlNode.InnerText property produces the value of an element. The
XmlNode.InnerXml property produces the tag (start and end) and the name of an element. The
XmlNode.OuterXml property produces a node and its XML format, including the child nodes, if any, of the element. Here is an example that uses the
XmlNode.OuterXml property:

You can use this technique to locate a node. You can check the results to find a particular node you are looking for.

Accessing the Child Nodes of an Element

A child is a node that appears immediately down after a node in the ancestry lineage. To access the child nodes of an element, follow its name with /*. With this operator, if the immediate child node of the element is:

A simple node made of a name and value, only the value of that node is included in the produced result

A node that itself has child nodes

If you use code that produces only the values (such as the
XmlNode.InnerText property), the result would include the values of the child nodes all treated as one combined object. An example would be DramaEnvironmentScience Fiction

If you use code that produces the nodes as objects (such as the
XmlNode.InnerXml property), the result would include
the whole XML code of its child nodes as a combined object. An example would be <genre>Drama</genre><genre>Environment</genre><genre>Science Fiction</genre>

As stated already, if the last element of the expression you passed includes a child node that itself has child nodes, all those child nodes would be combined and produced as one object. If you want to get the individual nodes, pass their name after that of the element. Here is an example:

A grand-child is a node that appears down after the child node in the ancestry lineage. To access the grandchildren of a node, separate its name and that of the grand-child name with /*/. Here is an example:

Dim xnlVideos = xeVideo.SelectNodes("/videos/*/director")

This example is based on the root and it produces the same result as seen above. Otherwise, if you are starting from another level, make sure you specify it. Here is an example:

Dim xnlVideos = xeVideo.SelectNodes("/videos/video/*/actor")

Accessing Specific Nodes

If you have many elements that share the same name in your XML document, to access those elements, pass their name preceded by //. Here is an example:

Notice that the results in each section, those belonging to the same parent node, are treated as one object. If you pass the common name of the nodes that are at the end of their ancestry, they would be treated individually. Consider the following example:

In the same way, you can use the // operator to specify where to start considering a path to a child or grand-child node. Here is an example:

Dim xnlVideos = xeVideo.SelectNodes("//video/*/actor")

In the same way, you can combine operators (separators) to get to a node. For example, to get the child of a node X that itself is a grandchild, simple follow that X node with / and the name of the child. Here is an example:

Dim xnlVideos = xeVideo.SelectNodes("//videos/*/cast-members/actor")

XPath and Arrays

Introduction

If you pass an expression that accesses many nodes on the same level and the expression produces many results, as you know already, the results are stored in an
XmlNodeList collection. Based on the XPath standards, those results are stored in an array.

Accessing a Node by its Position

When accessing the results of an XPath expression, each result uses a specific position in the resulting array. The position corresponds to the index of arrays seen in
F#, except that the indexes start at 1. The first result is positioned at index 1, the second at index 2, until the end.

To access an individual result of the XPath expression, use the square brackets as done in C-based languages. Here is an example:

You can apply the square brackets [] to any element. This means that by default, every element has a built-in array

If you pass a value higher than the number of nodes in the elements, the XmlNodeList.SelectNodes() method would return null; no exception would be thrown

Using these features of arrays applied to XPath, you can access any node. For example, you can apply the square brackets to a certain node and access it by its position. If the node at that position has child nodes, you can use the name of those child nodes and apply the square brackets on that name to indicate what node you want to access. Consider the following example:

As mentioned already, all the child nodes of an element are stored in an array. In fact, if you pass the name of the root as your XPath expression, this is equivalent to applying [1] to it. Here is an example:

By using arrays, the XPath language makes it possible to get a collection of nodes based on their common name. To get only the nodes that have a specific section, pass the name of that section to the square brackets of the parent element. From our XML document, imagine you want only the videos that have a section named cast-members. Here is an example of getting them:

Notice that, in our XML document, only some of the videos include the list of actors. If you pass a name that doesn't exist, the interpreter would produce an empty result (no exception would be thrown).

This would produce all cast-members sections of all videos (only the cast-members section). If you want to get a only a specific child node, assign its value to the name in the square brackets. Here is an example:

From our XML file, this would produce only the videos that have a categories section but that categories section must have a keywords section as child, which excludes a video where the categories and their
keywords are children on the same level:

Accessing the Grand-Children by Position

Notice that our XML document has some videos that have a cast-members section. Instead of getting
all of them, you can ask the interpreter to return only the one at a specific position. This is done by adding a second pair of square brackets and passing the desired index. Here is an example:

The XPath language provides many functions. Some functions
are made to perform some of the operations we have applied already. Some
other functions are meant to produce some results we have not gotten so far.

The Position of a Node

We have seen that we can use the square brackets to get to the position of a node. The XPath language has a function named
position that can be used to access a node based on its position. To use it, in the square brackets of the node that is considered the parent, assign the desired position to
position().

As you may know already, the first child of a node has an index of 1. Therefore, to get the first child node, assign 1 to the
position() function. Here is an example:

To help you get to the last child node of a node, XPath proposes a function named
last. Therefore, to get the last child node, pass
last() as the index of the node that acts as the parent. Here is an example:

open System
<%@ Import Namespace="System.IO" %>
<%@ Import Namespace="System.Xml" %>
open System.Drawing
open System.Windows.Forms
let exercise = new Form(MaximizeBox = false, Text = "Video Collection",
ClientSize = new System.Drawing.Size(175, 50),
StartPosition = FormStartPosition.CenterScreen)
Dim strVideosFile = Server.MapPath("App_Data/videos.xml")
Dim xdVideos As New XmlDocument()
if File.Exists( strVideosFile)Then
Using fsVideo AS New FileStream(strVideosFile, FileMode.Open, FileAccess.Read)
xdVideos.Load (fsVideo)
Dim xeVideo As XmlElement = xdVideos.DocumentElement
// This expression first gets the 2nd video.
// Then it gets the 3rd actor of that video
Dim xnlVideos As XmlNodeList = xeVideo.SelectNodes( "/videos/video[last()]"
for xnVideo in xnlVideos do
response.write(String.Format("Video: {0}", xnVideo.OuterXml))
next
End Using
end if

This would produce:

As an alternative, you can assign last() to
position(). Here is an example:

Notice that the result includes all nodes from a parent that has a cast-members section. If you want to get only the last node that has that section, include the whole path in parentheses excluding the square brackets and their index. Here is an example:

You may have noticed that, in our XML document, some parent nodes include some child nodes that are not found in other parent nodes. For example, the first, the second, and the third videos of our XML document have a <format> child node but the fourth video does not. On the other hand, the second, the third, and the fourth videos have a <categories> child node while the first does not. You can ask the XmlNodeList.SelectNodes() method (in fact the XML interpreter) to produce in its results only the parent nodes that have a combination of certain two nodes. To get such a result, add a first combination that includes one of the names of child nodes. Include a second pair of square brackets with the other name. Here is an example:

The "and" operator is used to check that two child nodes are found under a parent node. To use it, create the square brackets and in them, provide the names of two child nodes separated by the
and operator. Here is an example:

When the not() function returns, it
produces only the nodes that don't have the child-node whose name was passed:

Sets Combinations

All the expressions we have seen so far produced only one
result each. To let you combine two results into one, the XPath language
provides the | operator. It must be applied between two expressions. Here is an example:

The XPath language provides some operators that can be used
to perform Booleans operations. The numeric comparisons are used to find the
Boolean relationship between two values. Only some operators are allowed. If
you use an operator that is not valid, the compiler would throw an exception.
The valid Boolean operators and the rules are:

= Both values are the same. This operator can be applied to strings (and characters or symbols) or numeric values. Here is an example:

> or &gt; The left value is greater than the right value.
This operator can be applied only to numeric values. If the operator is applied to
a string, the XmlNodeList.SelectNodes() method would return nothing;
no exception would be thrown. If you want to perform such comparisons as to find out
if one string is alphabetically lower than another, call the Compare()
method of the String class

>= The left value is greater than, or equal to, the right value.
This operator follows the numeric and string rules of the > operator

< or &lt; The left value is lower than the right value

<= The left value is lower than, or equal to, the right value

!= The values are different. This operator can be applied to strings and/or numeric values

Logical conjunction consists of combining two Boolean
operations that must both produce true results. To create a logical
conjunction, create some square brackets applied to the parent name of a
node and create the conjunction operation in the square brackets. There
are two main ways you can perform the operation:

To apply the same criterion to two different nodes, in the square
brackets, type the name of first node and apply the desired Boolean
operation to it, followed by the name of the other node and apply
the same Boolean operation to it. Separate the operations with the
and operator. For example, from our XML, we have two videos
directed by Dany DeVito. That person is an actor in one of the videos
but is not an actor in the other. Imagine that you want to get only
the video where he is an actor. You can apply the ='Danny Devito'
comparison to both the director and the actor node. Here is an example:

To apply different criteria to the same node, in the square
brackets, type the name of the node and apply the desired Boolean
operation to it, followed by the name of the same node and apply
the other Boolean operation to it. Separate the operations with the
and operator. For example, from our XML, we have two videos
directed by Dany DeVito. Imagine that you want to get only one of
them. You can create of the operations as direcor='Dany Devito'
and the other operations that is makes one video different from
the other. Here is an example:

In the same way, you can create as many conjunctions
as you want, by separating the operations with the and operator.

Logical Disjunction

Logical disjunction consists of applying the same
criterion to two nodes or applying different criteria to the same node
so that at least one of the operations needs to be true. To create a
logical disjunction, apply the square brackets to the parent node.
As seen for the conjunction, there are two main ways to
use a logical disjunction:

To get a result where the same operation is applied to two
different nodes but at least one of both comparisons produces a true
result, in your XPath expression, type the name of the parent node
followed by square brackets. In the square brackets, type the name
of one of the nodes and apply the desired Boolean operation to it,
followed by the name of the other node and apply the same Boolean
operation to it. Separate the operations with the or operator. Here is an
example that gets videos in which either Eddie Murphy or
Danny DeVito star:

To get a result where different operations are applied to thwo nodes
but at least one of both comparisons produces a true result, in your
XPath expression, type the name of the parent node followed by square
brackets. In the square brackets, type the name of node and apply the
desired Boolean operation to it, followed by the name of the same node
and apply the other Boolean operation. Separate the operations with
the or operator. Here is an example that gets videos directed
by either Danny DeVito or Roland Emmerich:

By using the Boolean search operators (and,
or, and the not() functions) and the logical
conjunction/disjunction operations, you can create
tremendous combinations to include and exclude some nodes in the XPath
expression and get the desired results.

An axis is a technique of locating a node or a series of
nodes using a path. To address this issue, the XPath language provides some keywords that can be
used in the expressions. As we are going to see, some keywords are
optional and some other keywords are necessary or useful. If you decide to
include one of those
keywords in your XPath expression, you must use only a valid keyword. If you use a keyword that is not
recognized, the compiler will throw an XPathException exception.

When used, an axis keyword is followed by ::, followed by
the name of a node or an operator.

Child Nodes

The child keyword is used to indicate that a child node must be accessed.
The child keyword is followed by ::. If you want to see all child nodes,
follow :: with *. Here is an example:

That code would produce all nodes that are direct children
of video. If a child node includes its own child nodes, the node and all its
children would be considered as one. To get only specific child nodes, follow ::
by the name of the child node to access. Here is an example that accesses the director child nodes:

The descendants of a node are its child(ren) and grand-child(ren), down to the
last grand-child of that node. To get the descendants of a node, precede its name with the
descendant keyword.

The Previous Sibling of a Node

The previous sibling of a node is a node that comes before it in the tree
while both nodes are on the same level. To get the collection of nodes that come before a
certain node, precede its name with the preceding keyword. Here is an example:

If your expression includes the preceding keyword, the result
would include the node itself and the next node of the same name in the same tree level.
If you want to exclude the node itself from the result, in other words if you want to
get only the next sibliing of the same level, use the preceding-sibling keyword.

The Next Sibling a Node

The following keyword is used to get the next node(s)
that come after the referenced one but of the same tree level. Consider the
following expression:

As we have seen so far, both the XPath language and the .NET Framework provide various means to access an attribute. For example, the XPath language uses the @ sign to access an XML attribute while the XmlNode class is equipped with the OuterXml, the InnerXml, and the InnerText properties. Here is an example that uses the XmlNode.InnerXml property to access the attribute:

Notice that the XmlNode.InnerXml property applied to an attribute produces only the text of the attribute. In the same way, you can use the
XmlNode.InnerText property to get the same result. On the other hand, the
XmlNode.OuterXml property produces the name and value of an attribute. Here is an example: