What is MVA?

Successful technologists never stop learning and great technology never stops evolving. Microsoft Virtual Academy (MVA) offers online Microsoft training delivered by experts to help technologists continually learn, with hundreds of courses, in 11 different languages. Our mission is to help developers, knowledgeable IT professionals and advanced students learn the latest technology, build their skills, and advance their careers. MVA is free of charge, and the entire service is hosted on Windows Azure.

When we index into our myDictionary, we get an ICollection<int> that contains the elements 1, 2, and 3. If the key wasn’t in the MultiDictionary, then an empty ICollection associated with that key will be returned.

All ICollection instances returned by indexing into the MultiDictionary function as indirections to the collections inside of our MultiDictionary, which means that as the MultiDictionary changes so does the ICollection and vice versa. Consider the following example that illustrates this:

Parsing XML files is an unglamorous task that can be time consuming and tricky. In the days before .NET, programmers were forced to read XML as a text file line by line and then use string functions and possibly regular expressions. This is a time-consuming and error-prone process, and just not very much fun.

While I was writing .NET test automation that had test case data stored in XML files, I discovered that the .NET Framework provides powerful new ways of parsing XML. But in conversations with colleagues, I also discovered that there are a variety of opinions on which way of parsing XML files is the best.

I set out to determine how many different ways there are to parse XML using .NET and to understand the pros and cons of each technique. After some experimentation, I learned that there are five fundamentally different ways to parse XML, and that the “best” method depends both on the particular development situation you are in and on the style of programming you prefer.

In the sections that follow, I will demonstrate how to parse a testCases.xml file using five different techniques. Each technique is based on a different .NET Framework class and its associated methods:

XmlTextReader

XmlDocument

XPathDocument

XmlSerializer

DataSet

After I explain each technique so you can modify my examples to suit your needs, I will give you guidance on which technique should be used in which situation. Knowing these five methods for parsing XML files will be a valuable addition to your .NET skill set. I’m assuming that you’re familiar with C#, VS.NET, the creation and use of class libraries, and have a working knowledge of XML files.

The XML File to Parse and the Goal

Let’s examine the testCases.xml file that we will use for all five parsing examples. The file contents are shown in Listing One.

Note that each of the three test cases has five data items: id, kind, arg1, arg2, andexpected. Some of the data is stored as XML attributes (id and kind), and arg1 and arg2are stored as XML elements two levels deep relative to the root node (suite). Extracting attribute data and dealing with nested elements are key tasks regardless of which parsing strategy we use.

The goal is to parse our XML test cases file and extract the data into memory in a form that we can use easily. The memory structure we will use for four of the five parsing methods is shown in Listing Two. (The method that employs an XmlSerializer object requires a slightly different memory structure and will be presented later.)

Because four of the five techniques will use these definitions, for convenience we can put the code in a .NET class library named “CommonLib.” A TestCase object will hold the five data parts of each test case, and a Suite object will hold a collection of TestCase objects and provide a way to display it.

Once the XML data is parsed and stored, the result can be represented as shown in >Figure 1. The data can now be easily accessed and manipulated.

Figure 1 XML data stored in memory

Parsing XML with XmlTextReader

Of the five ways to parse an XML file, the most traditional technique is to use theXmlTextReader class. The example code is shown in Listing Three.

After creating a new C# Console Application Project in Visual Studio .NET, we add a Project Reference to the CommonLib.dll file that contains definitions for TestCase and Suite classes. We start by creating a Suite object to hold the XML data and an XmlTextReader object to parse the XML file.

The key to understanding this technique is to understand the Read() andReadElementString() methods of XmlTextReader. To an XmlTextReader object, an XML file is a sequence of nodes. For example,

The Read() method advances one node at a time. Unlike many Read() methods in other classes, the System.XmlTextReader.Read() does not return significant data. TheReadElementString() method, on the other hand, returns the data between the begin and end tags of its argument, and advances to the next node after the end tag. Because XML attributes are not nodes, we have to extract attribute data using the GetAttribute()method.

Figure 2 shows the output of running this program. You can see that we have successfully parsed the data from testCases.xml into memory.

Figure 2 Output from the XmlTextReader technique

The statement xtr.WhitespaceHandling = WhitespaceHandling.None; is important because without it you would have to Read() over newline characters and blank lines.

The main loop control structure that I used is not elegant but is more readable than the alternatives:

Parsing an XML file with XmlTextReader has a traditional, pre-.NET feel. You walk sequentially through the file using Read(), and extract data with ReadElementString() andGetAttribute(). Using XmlTextReader is straightforward and effective and is appropriate when the structure of your XML file is relatively simple and consistent. Compared to other techniques we will see in this article, XmlTextReader operates at a lower level of abstraction, meaning it is up to you as a programmer to keep track of where you are in the XML file andRead() correctly.

Parsing XML with XmlDocument

The second of five ways to parse an XML file is to use the XmlDocument class. The example code is shown in Listing Four.

XmlNode n = node.SelectSingleNode("inputs"); // get the one <input> node

tc.arg1 = n.ChildNodes.Item(0).InnerText;

tc.arg2 = n.ChildNodes.Item(1).InnerText;

tc.expected = node.ChildNodes.Item(1).InnerText;

s.items.Add(tc);

} // foreach <testcase> node

s.Display();

} // Main()

} // class Class1

} // ns Run

XmlDocument objects are based on the notion of XML nodes and child nodes. Instead of sequentially navigating through a file, we select sets of nodes with the SelectNodes()method or individual nodes with the SelectSingleNode() method. Notice that because XML attributes are not nodes, we must get their data with an Attributes.GetNamedItem()method applied to a node.

After loading the XmlDocument, we fetch all the test case nodes at once with:

In this statement, n is the <inputs> node; ChildNodes.Item(0) is the first element of<inputs>, i.e., <arg1> and InnerText is the value between <arg1> and </arg1>.

The output from running this program is shown in Figure 3. Notice it is identical to the output from running the XmlTextReader technique and, in fact, all the other techniques presented in this article.

Figure 3 Output from the XmlDocument technique

The XmlDocument class is modeled on the W3C XML Document Object Model and has a different feel to it than many .NET Framework classes that you are familiar with. Using theXmlDocument class is appropriate if you need to extract data in a nonsequential manner, or if you are already using XmlDocument objects and want to maintain a consistent look and feel to your application’s code.

Let me note that in discussions with my colleagues, there was often some confusion about the role of the XmlDataDocument class. It is derived from the XmlDocument class and is intended for use in conjunction with DataSet objects. So, in this example, you could use theXmlDataDocument class but would not gain anything.

Parsing XML with XPathDocument

The third technique to parse an XML file is to use the XPathDocument class. The example code is shown in Listing Five.

while(tcSubChild.MoveNext()) // each part (<arg1>, <arg2>) of <inputs>

{

if(tcSubChild.Current.Name == "arg1")

tc.arg1 = tcSubChild.Current.Value;

elseif(tcSubChild.Current.Name == "arg2")

tc.arg2 = tcSubChild.Current.Value;

}

}

elseif(tcChild.Current.Name == "expected")

tc.expected = tcChild.Current.Value;

}

s.items.Add(tc);

} // each testcase node

s.Display();

} // Main()

} // class Class1

} // ns Run

Using an XPathDocument object to parse XML has a hybrid feel that is part procedural (as inXmlTextReader) and part functional (as in XmlDocument). You can select parts of the document using the Select() method of an XPathNavigator object and also move through the document using the MoveNext() method of an XPathNodeIterator object.

After loading the XPathDocument object, we get what is in essence a reference to the first <testcase> node into an XPathNodeIterator object with:

The XPathDocument class is optimized for XPath data model queries. So using it is particularly appropriate when the XML file to parse is deeply nested or has a complex structure. You might also consider using XPathDocument if other parts of your application code use that class so that you maintain a consistent coding look and feel.

Parsing XML with XmlSerializer

The fourth technique we will use to parse an XML file is the XmlSerializer object. The example code is shown in Listing Six.

Using the XmlSerializer class is significantly different from using any of the other classes because the in-memory data store is different from the CommonLib.Suite we used for all other examples. In fact, observe that pulling the XML data into memory is accomplished in a single statement:

I created a class library named “SerializerLib” to hold the definition for a Suite class that corresponds to the testCases.xml file so that the XmlSerializer object can store the XML data into it. The trick, of course, is to set up this Suite class.

Creating the Suite class is done with the help of the xsd.exe command-line tool. You will find it in your Program Files\Microsoft Visual Studio .NET\FrameworkSDK\bin folder. I used xsd.exe to generate a Suite class and then modified it slightly by changing some names and adding aDisplay() method.

The screen shot in Figure 4 shows how I generated the file testCases.cs, which contains aSuite definition that you can use directly or modify as I did. Listings Seven and Eight show the classes generated by XSD and my modified classes in the SerializerLib library.

Using the XmlSerializer class gives a very elegant solution to the problem of parsing an XML file. Compared with the other four techniques in this article, XmlSerializer operates at the highest level of abstraction, meaning that the algorithmic details are largely hidden from you. But this gives you less control over the parsing and lends an air of magic to the process.

Most of the code I write is test automation, and using XmlSerializer is my default technique for parsing XML. XmlSerializer is most appropriate for situations not covered by the other four techniques in this article: fine-grained control is not required, the application program does not use other XmlDocument objects, the XML file is not deeply nested, and the application is not primarily an ADO .NET application (as we will see in our next example).

Parsing XML with DataSet

The fifth and final method we will use to parse an XML file into memory uses the DataSetclass. The example code is shown in Listing Nine.

tc.arg1 = (children[0]["arg1"]).ToString(); // there is only 1 row in children

tc.arg2 = (children[0]["arg2"]).ToString();

s.items.Add(tc);

}

s.Display();

} // Main()

} // class Class1

} // ns

We start by reading the XML file directly into a System.Data.DataSet object using theReadXml() method. A DataSet object can be thought of as an in-memory relational database. The XML data ends up in two tables, “testcase” and “inputs,” that are related through a relation “testcase_inputs.” The key to using this DataSet technique is to know the way to determine how the XML data gets stored into the DataSet object.

Although we could create a custom DataSet object with completely known characteristics, it is much quicker to let the ReadXml() method do the work and then examine the result. I wrote a helper function DisplayInfo() that accepts a DataSet as an argument and displays the information we need to extract the data from the DataSet‘s tables.

To keep the main parse program uncluttered, I put DisplayInfo() into a class library named “InfoLib.” The code is shown in Listing Ten. The output from running the parse program is shown in Figure 5.

Figure 5 Output from the DataSet technique

The first table, “testcase,” holds the data that is one level deep from the XML root: id, kind, and expected. The second table, “inputs,” holds data that is two levels deep: arg1 and arg2. In general, if your XML file is n levels deep, ReadXml() will generate n tables.

Extracting the data from the parent test case table is easy. We just iterate through each row of the table and access by column name. To get the data from the child table inputs, we get an array of rows using the GetChildRows method:

tc.arg1 = (children[0]["arg1"]).ToString(); // there is only 1 row in children

Using the DataSet class to parse an XML file has a very relational database feel. Compared with other techniques in this article, it operates at a middle level of abstraction. TheReadXml() method hides a lot of details but you must traverse through relational tables.

Using DataSet to parse XML files is particularly appropriate when your application program is using ADO .NET classes so that you maintain a consistent look and feel. Using a DataSetobject has high overhead and would not be a good choice if performance is an issue. Because each level of an XML file generates a table, if your XML file is deeply nested then usingDataSet would not be a good choice.

Further Discussion

There are several related issues not yet covered: namespaces, generalization, error handling, validation, filtering, and performance. In the context of parsing XML data files, XML namespaces are a mechanism to prevent name clashes. Each of the techniques we’ve used can deal with namespaces. The MSDN Library will give you all the information you need to handle XML files with namespaces.

The techniques we have seen were not written to be particularly general. If you have a different XML structure, you will have to write different code. There is always a trade-off between writing code for a specific situation and making the code more generalized.

The code in this article does not have any error handling. Parsing XML files is quite error prone and in a production scenario, you would need to add lots of try-catch blocks to create a robust parser.

Additionally, I didn’t address XML validation with schema files, but once again, in a production environment you would need to generate XML schema files and validate your XML data files against them before attempting to parse. It is possible to add validation to your parsing code, but I recommend validating before parsing.

In every example, we have read all the XML data into memory. In many cases, you will want to filter and just read in some data. All the techniques in this article can be modified to provide front-end filtering. The XPathDocument class has especially nice filtering capabilities by way of XPath syntax.

If performance is an issue — usually in the case where you are parsing many small XML files — you will have to run some timing measurements to determine if your chosen technique is fast enough. Performance is too tricky to make many general statements and the only way to know if your performance is acceptable is to try your code. As a guideline, however, XmlTextReaderhas the best performance characteristics.

A Key Skill

XML data files are a key component of Microsoft’s .NET developer environment. The ability to parse data from XML files into memory is a key skill in a .NET setting. Each of the five techniques, based on the XmlTextReader, XmlDocument, XPathDocument, XmlSerializer, and DataSet classes, is significantly different in terms of coding mechanics, coding mind set, and scenarios for usage. The .NET Framework gives you great flexibility in parsing XML data files and makes this essential task much easier and less error prone than using non-.NET techniques.