Introduction

One of the questions frequently asked on the forums here on CodeProject is how does one add elements to an XML file. At first read, this seems like a trivial task, but it's really not. The quickest way to do it is to open the XML file in an XmlDocument object, add the rows, and call the Save method. But how much memory does it take to do this and how fast is it? This article explores the options available when appending to an XML file.

I've recently updated this article to show what happens when using .NET 2.0. There really is a big difference in speed and memory usage between .NET 1.1 and .NET 2.0.

Setup

To test, we'll need a large XML file, some idea of how much memory we're using, and a timer. The timer is pretty simple since we don't really need a low-level performance timer if our XML file is big enough. Getting a big XML file is pretty easy:

Now we have a file called test.xml with 500,000 rows in it. The file will probably come out to 23 megs. The next thing we need is an idea of how much memory is being used. We can use WMI to grab the amount of heap memory being used for our process. That is accomplished with this class:

Please note that the code is written in .NET 1.1. So all you pattern-nazis out there ready to write in about not using a static class can calm down.

Getting an idea of how much memory you're using in .NET is pretty difficult. The CLR has grabbed a big chunk of memory and manages internally how your program uses it. This is why we need to use a big XML document because we want to make sure that the CLR is grabbing memory. Any readings from this performance counter have to be taken with a grain of salt because there is always that buffer zone and you never know when garbage collects are happening. This makes the testing environment a bit unstable, so we can't run one test right after another without closing the application.

Option 1: XmlDocument

The first approach we'll take is the simplest approach. We simply open the existing XML file in an XmlDocument object, append the row(s), and save it to the original filename. The code would look something like this:

XmlDocument doc = new XmlDocument();
doc.Load("test.xml");
XmlElement el = doc.CreateElement("child");
el.InnerText = "This row is being appended to the end of the document.";
doc.DocumentElement.AppendChild(el);
doc.Save("test.xml");

As you can see, they've improved the speed and memory usage in .Net 2.0.

Option 2: Copy to Memory, Add Row(s), Write to File

In this approach, we will use a MemoryStream to hold the XML document. We'll read from the current file until we get just before the end and simultaneously write into the MemoryStream. Then we add the rows, stick the end element tag for the document element on the end, rewind the stream, and write the whole thing out to the original file. The gist of it is in this code:

The Copy is the one doing all the work. This is actually why most people stick with the XmlDocument approach. Copying an XML document is not clear cut. The copy method I came up with may not be able to handle all XML documents, but it should handle a significant majority of them fairly well:

One thing you'll want to take note of if you're copying this code is that it looks at the name of the document element and then tries to find an end element with the same name. That is the indicator which tells the code it's close to the end of the file and needs to begin appending rows. It is entirely possible that your document element name is used for another element within your XML file. If I wasn't so lazy, I'd create a counter that would be incremented every time I see a start element with a name matching the document element and decremented in similar fashion.

.NET 1.1 Results

Total committed bytes before opening file : 663472
Total committed bytes after creating MemoryStream : 663472
Total committed bytes after writing to MemoryStream : 30433160
Total committed bytes after writing to file : 30433160
Time to append a row : 3.156351

The MemoryStream approach uses 28 megs of memory to store the entire file. Which is pretty understandable because the XML file generated is about 23 megs and the MemoryStream is simply a stream wrapped around a byte array.

.NET 2.0 Results

Total committed bytes before opening file : 1454072
Total committed bytes after creating MemoryStream : 1454072
Total committed bytes after writing to MemoryStream : 28729336
Total committed bytes after writing to file : 28729336
Time to append a row : 2.659330

Once again, a speed and memory usage improvement in .NET 2.0. Notice that the speed advantage between this approach and the XmlDocument approach has narrowed.

Option 3: Writing to a Temporary File

The MemoryStream from the previous option is kind of like a temporary file, but in memory. So, if we instead just write to a temporary file, then we can get rid of the extra memory usage. We can also eliminate an extra run through the document because the temporary file can be renamed to match the original file's name.

The code for this is below. Compare it to Option 2 to see the differences mentioned above:

.NET 1.1 Results

Total committed bytes before opening files : 663472
Total committed bytes after opening files : 663472
Total committed bytes after writing to file : 5513136
Time to append a row : 2.578208

Not only is this faster than both the previous methods, it also has an extremely small memory footprint. We're looking at less than 5 megs of committed memory.

.NET 2.0 Results

Total committed bytes before opening files : 1454072
Total committed bytes after opening files :1454072
Total committed bytes after writing to file :5832696
Time to append a row : 1.582042

We actually have a slightly larger memory footprint with this approach. But the speed has increased by almost a second.

The downside of all this is that it can be hard to control temporary files. You want to make sure a file does not already exist with the same name and you need permissions to create, delete, and rename files in the directory you're working in. To make your appending code robust, you have to take into account all the problems that go with using a temporary file. Plus, it feels kinda kludgy.

Option 4: Custom Xml Serializable Classes

Another way to handle this is to use a custom class that serializes to XML using the XmlSerializer. This was suggested to me by CP'ian BoneSoft. To handle the records, I created a very simple custom class that looks like this:

While custom classes can come in all different shapes and sizes, this was pretty much the simplest way I could come up with of storing the data. The XmlSerializer class will automatically fill in the StringCollection with the inner text of each child node. Here's how the code looks for the test:

.NET 1.1 Results

Total committed bytes before deserialization : 663472
Total committed bytes after deserialization : 54046560
Total committed bytes after writing to file : 54046560
Time to append a row : 3.500112

As you can see, this method is faster than the XmlDocument and uses less memory. However, it still does not compete with the other two options. But don't take me as saying you should not pursue something like this. Memory consumption could be a lot less as the XML gets more complicated. A custom class could have an enum or flag that converts into a much larger piece of XML or could compress child nodes into a much smaller space. The memory consumption could go below that of the MemoryStream in this case. The best way to put it: Your results may vary.

.NET 2.0 Results

Total committed bytes before deserialization : 1900536
Total committed bytes after deserialization : 41680888
Total committed bytes after writing to file : 41680888
Time to append a row : 2.039076

This shows a significant improvement in speed. This approach is now faster than all but option 3. It also uses significantly less memory than in .NET 1.1. Another thing to notice is that the initial memory usage is just a bit higher than the other .Net 2.0 tests. I tested this several times to be sure of it.

Option 5: DataSet

.NET 1.1 Results

We all know that the DataSet is a heavyweight object. It also poses some restrictions on the XML that it can read. But just how heavyweight is it? When doing the testing for this option I found that the typical test XML I was using with 500,000 rows was way too large for the DataSet to handle. It ended up taking on the order of hours to load up. So, I had to decrease the number of rows. The most I could get it to reasonably handle was 20,000 rows. Below are the results for using a DataSet with 20,000 rows but please, please, please don't misinterpret this as being competitive with the other options. If there's one thing you should take away from this article it's that using a DataSet in .NET 1.1 is a bad idea.

Total committed bytes before reading into DataSet : 663472
Total committed bytes after reading into DataSet : 13160368
Total committed bytes after writing to file : 10149808
Time to append a row : 18.359963These results are for 20,000 records not 500,000 like the other tests.

My experiments showed me that the DataSet gets almost exponentially worse as the number of rows increases. For the 20,000 rows, it ends up using about 12 megs. Multiply that by 25 to estimate 500,000 rows and you get about 300 megs. A 300 meg DataSet compared to a 60 meg XmlDocument is definitely a heavyweight object. It's also slow and clunky. Bottom line, don't use it.

.NET 2.0 Results

Well, whatever it was in .NET 1.1 that made the DataSet unusable has most certainly been fixed in .NET 2.0. Not only is the DataSet now able to handle all 500,000 rows, it can do it in a fairly reasonable amount of time. Granted, it's still slow, but now it's at least an option.

Total committed bytes before reading into DataSet : 1454072
Total committed bytes after reading into DataSet : 140300280
Total committed bytes after writing to file : 140300280
Time to append a row : 13.250085These results are for 500,000 records.

Suddenly the DataSet doesn't feel so heavy anymore. Sure, it doesn't compare speed wise with the other options, but it's an incredible improvement over .NET 1.1. It is still a memory hog, using about 140 megs to load our 500,000 row XML file, but much less so than in 1.1.

Summary

This graph shows the performance monitor's take on the whole thing for .NET 1.1:

The graph for .NET 2.0 is pretty similar with the big change being that the DataSet actually works now:

Kudos to Microsoft for doing such an excellent job of tuning the code to get such large speed and memory usage improvements in .NET 2.0. Speed improvements on the order of seconds are very significant. They also made the DataSet a lot lighter and faster. Developers should feel less anxiety about passing DataSets around.

This article was not meant to tell you the only correct way to append to a large XML file. It was meant to show you all the different options and explain the pitfalls with real data. A peer might tell you to not use an XmlDocument because it uses too much memory, but they might not know exactly how much. I wanted to know for sure. If you're ever in a forum or having an argument with a colleague about the finer points of appending to XML files, you can link them to this article. The only option here which is not viable is the DataSet in 1.1. The rest depend on your situation and needs.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

I'm not sure why the .Net performance counters are not there. It could be that you're not an administrator on your machine and don't have access to the performance counters. Can you bring up the performance console under administrative tools? In there, you can add a counter and then you'll be able to see a list of all the counters that are available.

“Ooh... A lesson in not changing history from Mr. I'm-my-own-grandpa” - Prof. Farnsworth

The problem was Attributes. The reader seemed to just bypass the attributes in my XML documents. So I had to modify the code a bit ..so it also was true for Xml Docs with attributes ... Also fixed an Encoding problem.

FileInfo fi = new FileInfo(@"filename.xml"); //What ever your filepath is
XmlTextReader xtr = new XmlTextReader(fi.OpenRead()); //Yes it have to be a XmlTextReader or I got a exception when I wanted to delete the file later.

Thank you for this article. I was looking into adding lines
to a large xml file -- a log file. So it was good seeing
these different options being compared.

But in the course of investigating other options, I found
something else suitable for my needs that I thought I'd share:
xml fragments. It was described at:
http://www.tkachenko.com/blog/archives/000053.html

It won't work in the general case, but it will for a lot
of examples where you're adding a series of elements, such
as the example presented in this article.

By comparison, this is what I had for Test #1
> 1
Mem before opening XmlDocument: 606208
Mem after opening XmlDocument : 61599744
Mem after writing XmlDocument : 67313664
Time to append a row: 14.390625
Done.

As you can see, this is dramatically better. Just as long as
your particular xml structure allows for it.

Cool, I never thought of that. It's definitely going to be a vast speed improvement to not have to worry about the document element. As long as you're viewing it through your .Net code, it should be fine. Unfortunately, when your sys admin double-clicks on the xml file it won't open through a standard xml reader. But at least it will open in notepad or wordpad.

I suspect that a lot of people will be interested in using xml for logging and this is a good suggestion. Thanks!

I'm really not quite sure that that's going to help anything. The actual numbers for time and memory are not important by themselves. They're only important in relation to the other tests. The article should give you the idea of how one method compares to others in relative speed and memory usage. I hope that the article was not misleading in this sense.

Yes, that's a good option. Speed and memory usage would be extremely subjective though. For every possible permutation of serializable classes I could put up, someone would say "well what about this..." But yes, this is definitely a good way to handle it. I would still call this option 4 though because the previously mentioned option 4 is not really an option.

An option 5 could be to just use plain DataSets but there's limitations to the XML and it's a larger memory footprint.

I've also read that .Net 2.0's XPathDocument is read/write. Maybe that can be an option 6. I'll have to update the article soon.

Yeah, I hate using a DataSet for XML. And I usually have motives for generating a serializable model other than just appending data. But at the same time, XSD.exe sucks bad. I'm a little too picky about my code to be content with the slop it produces (part of my motivation to build my own code generation tools).

This article did need to be written, too many people are looking for this information. Just wanted to mention serialization as an option. Good job, well written and pretty thorough.

Thanks. I incorporated your suggestion and updated the article. Serializing is definitely a good option. It has the potential of using less memory than the MemoryStream and should be a bit faster than XmlDocument for most situations. The temp file is still the fastest, but has its own problems. Turns out the XPathDocument's XPathNavigator implementation is not read/write, only the XmlDocument's is. So there's no option 6.

Ya know, I never thought to test the various options. I like working with serialization so much I don't bother with other methods unless there's a specific reason. So it's pretty interesting to see the results of these test. I gotta say, I'm not suprised at the DataSet though.

Option 3 uses the same code as 2 for copying the document. That goes through the Copy() method. I'm not sure what you mean about SAX though. Do you mean to use the open source project SAX.NET? SAX is a read-only, one-way parser, just like the XmlReader class is in the .Net framework, so I'm not sure what benefit could be gained.