Introduction

Working with XML using Microsoft's .NET Framework version 2.0 and below is a cumbersome task. The API available follows the W3C DOM model, and is document-centric. Everything begins with the document; you can't create elements without having a document; even a fragment of XML is a document.

In the latest release of the .NET Framework, however, this has changed. XML is now element-centric. With features like object initialization and anonymous types, it's very easy to create XML. Add to this the features of LINQ, and we now have a very easy to use and powerful tool for XML.

In this article, I will explore some of the features available in .NET Framework release 3.5 related to XML and LINQ. This is, of course, not an extensive discussion of either subject, merely a familiarization and stepping stone for more learning and exploration.

The LINQ part

When discussing LINQ to XML, or LINQ to whatever, the first thing that needs to be discussed is, of course, LINQ.

LINQ

Language-Integrated Query, or LINQ, is an extension to the .NET Framework in version 3.5 that makes queries and set operations first class citizens of .NET languages such as C#. It has been further defined as, "A set of general purpose standard query operators that allow traversal, filter, and projection operations to be expressed in a direct yet declarative way in any .NET language."

Getting started

Two things noticeable here are the var keyword and the strange-looking operator =>.

Var keyword

var is new data type that has been introduced in 3.5. Although it looks similar to the var data type in VB or JavaScript, it isn't quite the same. In VB and JavaScript, var represents a variant data type, one that can be used to represent just about anything. In LINQ, however, var is more of a placeholder; the actual data type is set at compile time, and is inferred by the context it is used in.

In the above example, name is a resolved to System.Linq.Enumerable.SelectIterator<string,bool>

var name = "Hello, World";

In this example though, name is resolved to a string.

This ambiguity is useful when you are unsure of what exactly will be returned from a query, and the fact that it is not necessary to cast the variable to another type before using it, is very convenient.

Lambda expressions

Lambda expressions were first introduced in 1936 by mathematician Alonzo Church as a short hand for expressing algorithms. In .NET 3.5, they are a convenient way for developers to define functions that can be passed as arguments, and are an evolution of Anonymous Methods introduced in .NET 2.0.

The => operator is used to separate input variables on the left and the body of the expression on the right.

In this example, each string in the names array is represented by the variable s. It's not necessary to declare a data type because it is inferred from the type of the collection, names in this case.

These two statements would be somewhat analogous:

var name = names.Select(s => s);
foreach(string s in names) { }

The body of the expression, s.StartsWith("P"), just uses the string method to return a boolean value. Select is an extension method (more on that shortly), for that takes as its parameter a Func object.

Func and Action

Func and Action are two new methods available in .NET 3.5, and are used to represent delegates.

Func<TSource, TResult>

This is used to represent a delegate that returns a value, TResult.

Action<T>

On the other hand, this is used to represent a delegate that does not return a value.

Sequences

Running the demo code from this article, you will notice that all of the examples above do not return a single value. Rather, they return a collection of boolean values indicating whether each element in the input collection matched the specified expression. This collection is referred to as a sequence in LINQ.

If we wanted the single value that matched the expression, we would use the Single extension method.

string name = names.Single(pOnly);

Notice here that the name variable is typed as a string. Although we could still use var, we know that the return value is, or should be, a string.

Extension Methods

Extension Methods are a feature of .NET 3.5 that allows developers to add functionality to existing classes without modifying the code for the original class. A useful scenario when you want to provide additional functionality and don't have access to the code base, such as when using third-party libraries.

Extension Methods are static methods on static classes. The first parameter of these methods is typed as the data type for which it is extending, and uses the this modifier. Notice that this is being used as a modifier, not as a reference to the current object.

When this class is compiled, .NET applies the System.Runtime.CompilerServices.Extension to it, and when it is in scope, Intellisense can read this information and determine which methods apply based on the data type.

As we can see here, in the first example, Intellisense knows that the ToInt method applies to strings, and only DoubleToDollars applies to doubles.

Query expression and methods

There are two ways to execute LINQ queries: query expression and dot-notation. The former resembles a SQL query, except that the select clause is last.

These two statements produce the same results because the query expression format is converted to methods at compile time. There are several ways to produce results with methods. Each of the below will produce the same results.

Constructing a document in this way is possible because of the functional construction feature in LINQ to XML. Functional construction is simply a means of creating an entire document tree in a single statement.

public XElement(XName name, Object[] content)

As we can see from one of the constructors for XElement, it takes an array of objects. In the example above, the employees element is constructed from four XElements, one for each employee, which in turn is constructed from XAttributes and other XElements.

In the above example, we could have replaced XDocument with XElement if we removed the XDeclaration and XComment objects. This is because the constructor XDocument used takes a XDeclaration instance, rather than the XName that the XElement constructor takes.

public XDocument(XDeclaration declaration,Object[] content)

Another thing to note when running the demo is how both documents are printed to the console window.

As we can see, the old method just streams the contents of the document to the console. The method does that also; however, it is nicely formatted with no extra effort.

Conclusion

XML is a fantastic construct that has been deeply ingrained into just about everything. Having the ability to easily construct, query, transform, and manipulate XML documents is an invaluable service that will improve the speed in which applications can be built and the quality of those applications.

This article is not an exhaustive investigation of LINQ to XML; there have been many other articles, snippets, and blogs written on the subject. It mainly is just a taste and familiarization of what is possible using .NET 3.5.

I have an xml code which is getting updated based on the object value. The 'foreach' loop here is taking almost 12-15 minutes to fetch a 200 kb xml file. Please suggest how I can improve the performance.

Thank you. Your question is difficult to answer without more information. It depends on how and what you are using the data for, how is it constructed and shaped, what is the data, integers, multi-line text, binary encodings?