In the previous lesson, Storing objects in CSV format in C# .NET, part 2, we wrote a database using text files, or
more accurately, using files of the CSV format. In today's tutorial, we're going
to focus on the XML format. First, we'll describe it, then, we'll introduce
classes which the .NET framework provides for reading and writing to and from
these files. We'll try writing out today and leave reading for the next
lesson.

The XML format

We're about to go over lots of terms. Now, if you don't understand any of
them, don't worry, we'll go into as much detail as possible

XML (eXtensible Markup Language) is a markup language developed by W3C (the
organization that is responsible for Web standards). XML is very universal and
is supported by a number of languages and applications. The word extensible
indicates the ability to create your own language using XML, one of which is
XHTML for creating websites. XML is a self-describing language,
meaning that it has a structure in which we can determine what each value means.
In CSV files, we can only guess what the third number eight means, whereas in
XML, it'd be immediately clear that it's the number of articles that a user has
made. The disadvantage to it is that the XML files are larger, but it isn't
inconvenient in most cases. Personally, I almost always choose to use the XML
format, it's a good choice for saving a program's configuration, high scores for
game players, or for saving a small user database. Thanks to XSD schemas, we can
also validate them so that we can prevent errors during run time.

XML can be processed in different ways. Usually, by continuously
reading/writing or using a DOM object structure. We're so far in that some tools
(including .NET libraries) allow us to work with XML just like a database and
execute SQL queries on it. As you can imagine, this saves a lot of work. Another
language for querying XML files is XPath.

XML competes with JSON, which is simpler but less popular in business
applications. Unlike XML, it can be used to easily log at the end of a file
without loading the entire document.

XML is very often used to exchange data between different systems (e.g.
desktop applications and web applications on a server). Therefore, as we already
mentioned, there are many libraries for it and every tool is aware of and is
able to work with it. This includes web services, SOAP, and so on. However, we
won't deal with any of them now.

Last time, we saved a list of users to a CSV file. We saved their name, age,
and date of registration. The values were next to each other, separated by
semicolons. Each line represented a user. The file's contents looked like
this:

John Smith;22;3/21/2000
James Brown;31;10/30/2012

Anyone who isn't directly involved wouldn't know what any of that means,
would they? Here is the equivalent to that file in the XML format:

Now everyone can tell what is stored in the file. I saved age as an attribute
just to demonstrate that XML is able to do things like that. Otherwise, it'd be
saved as an element along with the name and registration date. Individual items
are called elements. I'm sure you're all familiar with HTML, which is based on
the same fundamentals as XML. The elements are usually paired, meaning that we
write the opening element followed by the value and then the closing element
with a slash. Elements can contain other elements, so it has a tree structure.
Furthermore, we're able to save an entire hierarchy of objects into a single XML
document.

At the beginning of an XML file, there is a header. The document has to
contain exactly one root element in order for it to be valid. Here, it's the
user element which contains the other nested elements. Attributes
are written after the attribute name in quotation marks.

As you can probably tell, the file got bigger, which is the price paid for it
to look pretty. If the user had more than three properties, you'd be able to see
just how messy the CSV format can get, and how worthwhile the XML format is.
Personally, as I gain more and more experience, I prefer solutions that are
clear and simple, even if that means that they occupy more memory. This not only
applies to files but for source codes as well. There is nothing worse than when
a programmer looks at their code after a year and has no idea what the eighth
parameter in a CSV file is when there are 100 numbers per line. Even worse,
having a five-dimensional array, which is super fast, but if they designed an
object structure instead, they wouldn't have to write this functionality ever
again. However, that last part was a going off on a tangent to an extent.

XML in .NET

We'll focus on two fundamental approaches to work with XML files - the
continuous approach (the SAX parser) and the object oriented approach (DOM).
Today's and the next lessons will be dedicated to SAX, after which we'll get to
DOM. Again, there are more ways to work with XML files using the .NET framework.
Some are old and only present for backward compatibility's sake. I spent quite a
lot of time working with XML files within .NET, so I only added the most modern
approached and simple constructs.

Parsing XML via SAX

SAX (stands for Simple API for XML) is actually a simple extension of the
text file reader. Writing is relatively simple. We subsequently write the
elements and attributes in the same order as they are present in the file (we
ignore the tree structure in this approach). .NET provides the XmlWriter class
which relieves us from having to deal with the fact that XML is a text file. We
only work with the elements, more accurately, nodes (more on that later).

Reading is performed just like writing. We read the XML as a text file, line
by line, from top to bottom. SAX gives us what are known as nodes (XMLNode)
which it gets while reading. A node can be an element, an attribute, or a value.
We receive nodes in a loop in the same order that they're written in the file.
We use the XmlReader class to read XML files. Both classes are in the
System.Xml namespace.

The advantage to the SAX approach is its high speed and low memory
requirements. We'll see the disadvantages once we compare this approach to the
DOM object-oriented approach later on.

Writing XML files

Let's create a simple XML file. We'll use the example with the users above
for it. We already worked with the User class last time. Just to be sure, I will
show you it here once more. Create a new project, a console application, name it
XmlSaxWriting, and add a new class to the project:

For simplicity's sake, we'll write the code right in the Main() method. All
we're really doing is testing out SAX's functionality. At this point, you should
already know how to design object-oriented applications properly.

Don't forget to add using System.Xml.

We create an XmlWriter using the (static) factory Create()
method. There is another way to do it, but this method is the most appropriate.
The object will be wrapped in a using block. Of course, we can only
store a single object to XML (e.g. some settings). Here, we'll learn how to
store a list of several objects. If you only want to store one object, you'll
only need to make very minor changes

Now we have something to write. We'll have the XML output be nicely formatted
and indented according to its tree structure. Unfortunately, this setting is not
default, so we'll have to force it by passing an XmlWriterSettings class
instance. We'll set its Indent property to true:

Done. Next, we'll create an instance of the XmlWriter class
using the factory Create() method. We'll work in the
using block. We pass the file path and settings as parameters to
the instance:

using (XmlWriter xw = XmlWriter.Create(@"file.xml", settings))
{
}

Now, let's get to the actual writing. First, let's add in the document
header:

xw.WriteStartDocument();

Then (as you should know by now) the root element has to follow which
contains the rest of the XML. We use the WriteStartElement() and
WriteEndElement() methods for writing elements. The first method takes the name
of the element we're opening as a parameter. The second method determines the
element name on its own from the document context and it doesn't have any
parameters. Let's open the root element, which is the users element in our
case:

xw.WriteStartElement("users");

Next, we'll move on to writing individual users so the code can be placed in
a foreach loop.

We write the value to the element using the WriteValue() method, which takes
its value as a parameter. Similarly, we can add an element attribute using the
WriteAttributeS­tring() method, whose parameters are the attribute name and its
value. The value is always of the string type, so we have to convert the age to
a string in our case. Looping and writing the user elements looks
like this (without the nested elements) :

We'll add one more EndElement() to close the root element and EndDocument()
to close the whole document. Like with text files, we have to empty the buffer
using the Flush() method. The entire application code now looks like this:

We can see that SAX recognized that there is no value in the user element,
except for an attribute, and generated its element as unpaired. Now, let's add 2
additional elements into the user element, moreover, their name and
the registration date properties:

None of the elements include additional elements or attributes. These sort of
elements (that only hold text values) can be written using a single
WriteElementStrin­g() method, whose attributes are the element's name and the
value it needs to include:

Download

The author is a programmer, who likes web technologies and being the lead/chief article writer at ICT.social. He shares his knowledge with the community and is always looking to improve. He believes that anyone can do what they set their mind to.

The author learned IT at the Unicorn College - a prestigious college providing education on IT and economics.