XML Handling Part 1 - Introduction, Reading an XML

Date posted: 13/07/2013

Name:

*

My email:

*

Recipient email:

*

Message:

*

Fields marked as bold are compulsory.

You haven't filled in compulsory values.The email is not correct

XML files show up in most applications lately, so it's a good idea to know how to make proper use of them. This is the first part of my XML articles which carries an introduction on XML files and guides you how to read its contents. In the next article we will go through further XML operations such as editing

What is XML?

XML is a file type same as an aspx or a doc file is. The same way a doc file has its own content that will show up nicely in a document editor, the XML's content is recorded in its own way so that an XML reader will show it up nicely as well. In other words an XML file is expected to have certain form of content. This is an example of an XML file

<?xml version="1.0"encoding="utf-8" ?>

<book>

<title>The Hobbit</title>

<author>J. R. R. Tolkien</author>

<price>5.30</price>

</book>

The first line declares that the following content is expected to be written using XML standards of writing.

What follows on is the interesting part.

If you read the content above, you may have already guessed, that somehow it describes a book. Here's the title, the author and the price.

You may have also noticed that it is written in a strange and strict way. Stored info, must be placed between two tags naming what this info represents.

<title>The Hobbit</title>

In the previous line <title> is the opening tag and </title> is the closing tag.

If that is so, what is the deal with the <book> tag? Why is its closing tag in the bottom? That is because it represents the whole of the book (title, author and price - they all represent one book).

Now let's add one more book in the XML. We can also add an extra tag called <books> which will contain all books. Take a look.

<?xml version="1.0"encoding="utf-8" ?>

<books>

<book>

<title>The Hobbit</title>

<author>J. R. R. Tolkien</author>

<price>5.30</price>

</book>

<book>

<title>The Hobbit</title>

<author>J. R. R. Tolkien</author>

<price>5.30</price>

</book>

</books>

Keep your eye on the architecture style. Our XML contains books. Books contain two book entities. Each book contains a title, an author and a price.

This could go on forever. An XML file can contain any combination of tags as longs as its syntax is correct.

For example the following XML is not acceptable

<?xml version="1.0"encoding="utf-8" ?>

<books>

<book>

<title>The Hobbit</title>

<author>J. R. R. Tolkien</author>

<price>5.30</price>

</books>

</book>

as the primary tag <books> is closed before the secondary tag <book> does.

Information can also be stored the way serial_num does here

<?xml version="1.0"encoding="utf-8" ?>

<books>

<book serail_num="E-2365">

<title>The Hobbit</title>

<author>J. R. R. Tolkien</author>

<price>5.30</price>

</book>

</books>

Before we go on, there are some basic XML terms we should check.

A node is the typical node of a tree structure.

<title>The Hobbit</title>

<book serail_num="E-2365">

<title>The Hobbit</title>

<author>J. R. R. Tolkien</author>

<price>5.30</price>

</book>

are both nodes, even though the second one is more complex.

An attribute is data stored in the form of <book serail_num="E-2365">

An element is data stored in the form of <title>The Hobbit</title>

Based on what we just said you may have realized that data within an XML is stored either in attribute or element form. Most data is stored as the latter. Attributes should be used when you store metadata, that is data that have to do with the xml nodes and not with the info stored itself.

Why should I use XML?

XML's only purpose is to store info. And it's really good at doing so. You saw how easy it was to add a new entity in our example. We could have added everything. Even if we had absolutely no idea about that when we were creating the XML file. XML is so flexible to use that makes other data storage methods look absolutely strict.

In the beginning, we mentioned that an XML is a file that has a certain way of storing its content. XMLs are not specifically made for .NET or Java. They are so flexible that I can create an XML file using my ASP.NET application and pass it over to a friend of mine who uses PHP in order to show its content. XMLs can be used anywhere you wish.

Reading an XML file

So far, so good. We know what an XML stands for. We are now about to create a C# example of how to read an XML file. The following XML represents the way SMS are stored within a cell phone.

<?xml version="1.0"encoding="utf-8" ?>

<SmsDataSet xmlns="http://tempuri.org/SMS.xsd">

<Sms>

<Id>0</Id>

<Numbers>+306931234567</Numbers>

<Body>Good morning!</Body>

<SmsType>0</SmsType>

<Time>2012-02-05T21:11:19.075+02:00</Time>

<ThreadId>3</ThreadId>

<Status>2</Status>

<ChatType>0</ChatType>

</Sms>

<Sms>

<Id>1</Id>

<Numbers>+306931234567</Numbers>

<Body>How are you?</Body>

<SmsType>0</SmsType>

<Time>2012-02-07T07:47:48.005+02:00</Time>

<ThreadId>3</ThreadId>

<Status>2</Status>

<ChatType>0</ChatType>

</Sms>

<Sms>

<Id>2</Id>

<Numbers>+306931234567</Numbers>

<Body>Bon voyage!</Body>

<SmsType>0</SmsType>

<Time>2012-02-09T20:24:19.069+02:00</Time>

<ThreadId>3</ThreadId>

<Status>2</Status>

<ChatType>0</ChatType>

</Sms>

</SmsDataSet>

We would like to create a web form that shows all SMS bodies (the conversation). Of course we could do much more than that, however since the point is how to extract the data and not what we do with them later on, there is no need to focus on that.

To read an XML, we need an XmlTextReader object. This reader will help us get the XML's content. This is a way to initialize it

XmlTextReader reader = new XmlTextReader("XMLFile_path.xml");

The following method reads the given XML and return its SMS bodies separated by a line break.

First it creates an XmlTextReader object named reader which points to our XML. Then we loop over every node this reader contains using

while (reader.Read())

However, what is a node? It can be everything the XML contains. Either the XMLDeclaration, Element, or Text and much more. What we are interested in, is the Element node. This represents every tag. For example while reading the XML we reach a Body tag. This node will be of Element type and, moreover, its name will be "Body".

So, what's the catch here? We loop over the XML till we find an Element node. When we do we mark its name. The next loop will bring us forth the Text node (the element's content). That's when we will check if the tag is the one we are looking for. If it is, we will store its value within a StringBuilder, if not, nothing happens (it was just another useless tag).

Repeating this process, we go through all XML tags. The output can be shown at a literal like this.

<asp:Literal runat="server" ID="XMLOutputLitID" />

XMLOutputLitID.Text = GetHTMLOutputFromXML();

Here's what we get

Good morning!

How are you?

Bon voyage!

We should never forget to close the reader when the process is over. Using a try catch method and inserting the reader's close in the finally part is the best way to remain sure that resources are never left lying there.

Summary

XML is a file format created to contain information. Its flexibility, as much as its platform-independency, makes it really popular among data storing methods. We can read an XML file using an XmlTextReader object. Looping over its content we can spot the information we are looking for and use it the way we want.