Search This Blog

2009-06-03

C# 3.0 Tutorial -8:Linq

LINQ

Introducing LinqLinq is short for Language Integrated Query. If you are used to using SQL to query databases, you are going to have something of a head start with Linq, since they have many ideas in common. Before we dig into Linq itself, let's step back and look at what makes SQL different from C#.

Imagine we have a list of orders. For this example, we will imagine they are stored in memory, but they could be in a file on disk too. We want to get a list of the costs of all orders that were placed by the customer identified by the number 84. If we set about implementing this in C# before version 3 and a range of other popular languages, we would probably write something like (assuming C# syntax for familiarity):

Here we are describing how to achieve the result we want by breaking the task into a series of instructions. This approach, which is very familiar to us, is called imperative programming. It relies on us to pick a good algorithm and not make any mistakes in the implementation of it; for more complex tasks, the algorithm is more complex and our chances of implementing it correctly decrease.

If we had the orders stored in a table in a database and we used SQL to query it, we would write something like:

SELECT Cost FROM Orders WHERE CustomerID = 84

Here we have not specified an algorithm, or how to get the data. We have just declared what we want and left the computer to work out how to do it. This is known as declarative or logic programming.

Linq brings declarative programming features into imperative languages. It is not language specific, and has been implemented in the Orcas version of VB.Net amongst other languages. In this series we are focusing on C# 3.0, but the principles will carry over to other languages.

Understanding A Simple Linq Query

Let's jump straight into a code example. First, we'll create an Order class, then make a few instances of it in a List as our test data. With that done, we'll use Linq to get the costs of all orders for customer 84.

Let's walk through the Main method. First, we use collection and object initializers to create a list of Order objects that we can run our query over. Next comes the query - the new bit. We declare the variable Found and request that its type be inferred for us by using the "var" keyword.

We then run across a new C# 3.0 keyword: "from".from o in Orders

This is the keyword that always starts a query. You can read it a little bit like a "foreach": it takes a collection of some kind after the "in" keyword and makes what is to the left of the "in" keyword refer to a single element of the collection. Unlike "foreach", we do not have to write a type.

Following this is another new keyword: "where".where o.CustomerID == 84

This introduces a filter, allowing us to pick only some of the objects from the Orders collection. The "from" made the identifier "o" refer to a single item from the collection, and we write the condition in terms of this. If you type this query into the IDE yourself, you will notice that it has worked out that "o" is an Order and intellisense works as expected.

The final new keyword is "select".select o.CostThis comes at the end of the query and is a little like a "return" statement: it states what we want to appear in the collection holding the results of the query. As well as primitive types (such as int), you can instantiate any object you like here. In this case, we will end up with Found being a List, though.

You may be thinking at this point, "hey, this looks like SQL but kind of backwards and twisted about a bit". That is a pretty good summary. I suspect many who have written a lot of SQL will find the "select comes last" a little grating at first; the other important thing to remember is that all of the conditions are to be expressed in C# syntax, not SQL syntax. That means "==" for equality testing, rather than "=" in SQL. Thankfully, in most cases that mistake will lead to a compile time error anyway.

A Few More Simple Queries

We may wish our query to return not only the Cost, but also the OrderID for each result that it finds. To do this we take advantage of anonymous types.

Here we have defined an anonymous type that holds an OrderID and a Cost. This is where we start to see the power and flexibility that they offer; without them we would need to write custom classes for every possible set of results we wanted. Remembering the projection syntax, we can shorten this to:

And obtain the same result. Note that you can perform whatever computation you wish inside the anonymous type initializer. For example, we may wish to return the Cost of the order with an additional sales tax of 10% added on to it.

Conditions can be more complex too, and are built up in the usual C# way, just as you would do in an "if" statement. Here we apply an extra condition that we only want to see orders valued over a hundred pounds.

After the "orderby" keyword, we write the expression that the objects will be sorted on. In this case, it is a single field. Notice this is different from SQL, where there are two words: "ORDER BY". I have added the keyword "ascending" at the end, though this is actually the default. The result is that we now get the orders in order of increasing cost, cheapest to most expensive. To get most expensive first, we would have used the "descending" keyword.

While I said earlier that the ordering condition is based on fields in the objects involved in the query, it actually doesn't have to be. Here's a way to get the results in a random order.

We would like to produce a list featuring all orders, stating the ID and cost of the order along with the name of the customer. To do this we need to involve both the List of orders and the List of customers in our query. This is achieved using the "join" keyword. Let's replace our query and output code with the following.// Query.var Found = from o in Ordersjoin c in Customers on o.CustomerID equals c.CustomerIDselect new { c.Name, o.OrderID, o.Cost };// Display results.foreach (var Result in Found)Console.WriteLine(Result.Name + " spent " +Result.Cost.ToString() + " in order " +Result.OrderID.ToString());The output of running this program is:Pedro spent 159.12 in order 1Emma spent 18.5 in order 2Pedro spent 2.89 in order 3

We use the "join" keyword to indicate that we want to refer to another collection in our query. We then once again use the "in" keyword to declare an identifier that will refer to a single item in the collection; in this case it has been named "c". Finally, we need to specify how the two collections are related. This is achieved using the "on ... equals ..." syntax, where we name a field from each of the collections. In this case, we have stated that the CustomerID of an Order maps to the CustomerID of a Customer.

When the query is evaluated, an object in the Customers collection is located to match each object in the Orders collection. Note that if there were many customers with the same ID, there may be more than one matching Customer object per Order object. In this case, we get extra results. For example, change Vladimir to also have an OrderID of 84. The output of the program would then be:

Pedro spent 159.12 in order 1Vladimir spent 159.12 in order 1Emma spent 18.5 in order 2Pedro spent 2.89 in order 3Vladimir spent 2.89 in order 3

Notice that Vladimir never featured in the results before, since he had not ordered anything.

Getting All Permutations With Multiple "from"s

It is possible to write a query that gets every combination of the objects from two collections. This is achieved by using the "from" keyword multiple times.

var Found = from o in Ordersfrom c in Customersselect new { c.Name, o.OrderID, o.Cost };

Earlier I suggested that you could think of "from" as being a little bit like a "foreach". You can also think of multiple uses of "from" a bit like nested "foreach" loops; we are going to get every possible combination of the objects from the two collections. Therefore, the output will be:Emma spent 159.12 in order 1Pedro spent 159.12 in order 1Vladimir spent 159.12 in order 1Emma spent 18.5 in order 2Pedro spent 18.5 in order 2Vladimir spent 18.5 in order 2Emma spent 2.89 in order 3Pedro spent 2.89 in order 3Vladimir spent 2.89 in order 3

Which is not especially useful. You may have spotted that you could have used "where" in conjunction with the two "from"s to get the same result as the join:

However, don't do this, since it computes all of the possible combinations before the "where" clause, which goes on to throw most of them away. This is a waste of memory and computation. A join, on the other hand, never produces them in the first place.

Grouping

Another operations that you may wish to perform is categorizing objects that have the same value in a given field. For example, we might want to categorize orders by CustomerID. The result we expect back is a list of groups, where each group has a key (in this case, the CustomerID) and a list of matching objects. Here's the code to do the query and output the results.

This query looks somewhat different to the others that we have seen so far in that it does not end with a "select". The first line is the same as we're used to. The second introduces the new "group" and "by" keywords. After the "by" we name the field that we are going to group the objects by. Before the "by" we put what we would like to see in the resulting per-group collections. In this case, we write "o" so as to get the entire object. If we had only been interested in the Cost field, however, we could have written:

You are not restricted to just a single field or the object itself; you could, for example, instantiate an anonymous type there instead.

Query Continuations

At this point you might be wondering if you can follow a "group ... by ..." with a "select". The answer is yes, but not directly. Both "group ... by ..." and "select" are special in so far as they produce a result. You must terminate a Linq query with one or the other. If you try to do something like:

//query to return not only the Cost, but also the OrderID for each result that it findsvar Found2 = from o in Orderswhere o.CustomerID == 84select new { MyOrderID = o.OrderID, MyCost = o.Cost };foreach (var Result in Found2)Response.Write("Found 2->Cost: " + Result.MyCost.ToString() +" Order ID:" + Result.MyOrderID.ToString() + "<br/>");

Now we have looked at the practicalities of using Linq, I am going to spend a little time taking a look at how it works. Don't worry if you don't understand everything in this section, it's here for those who like to dig a little deeper.

Throughout the series I have talked about how all of the language features introduced in C# 3.0 somehow help to make Linq possible. While anonymous types have shown up pretty explicitly and you can see from the lack of type annotations we have been writing that there is some type inference going on, where are the extension methods and lambda expressions?

There's a principle in language design and implementation called "syntactic sugar". We use this to describe cases where certain syntax isn't directly compiled, but is first transformed into some other more primitive syntax and then passed to the compiler. This is exactly what happens with Linq: your queries are transformed into a sequence of method calls and lambda expressions.

The C# 3.0 specification goes into great detail about these transformations. In practice, you probably don't need to know about this, but let's look at one example to help us understand what is going on. Our simple query from earlier:

And this is what actually gets compiled. Here the use of lambda expressions becomes clear. The lambda passed to the Where method is called on each element of Orders to determine whether it should be in the result or not. This produces another intermediate collection, which we then call the Select method on. This calls the lambda it is passed on each object and builds up a final collection of the results, which is then assigned to Found. Beautiful, huh?

Finally, a note on extension methods. Both Where and Select, along with a range of other methods, have been implemented as extension methods. The type they use for "this" is IEnumerable, meaning that any collection that implements that interface can be used with Linq. Without extension methods, it would not have been possible to achieve this level of code re-use.

DLinq and XLinq

In this article I have demonstrated Linq working with objects instantiated from classes that we implemented ourselves and stored in built-in collection classes that implement IEnumerable. However, the query syntax compiles down to calls on extension methods. This means that it is possible to write alternative implementations of Linq that follow the same syntax but perform different operations.

Two examples of this, which will ship with C# 3.0, are DLinq and XLinq. DLinq enables the same language integrated query syntax to do queries on databases by translating the Linq into SQL. XLinq enables queries on XML documents.

Conclusion

Linq brings declarative programming to the C# language and will refine and unify the way that we work with objects, databases, XML and whatever anyone else writes the appropriate extension methods for. It builds upon the language features that we have already seen in the previous parts of the series, but hiding some of them away under syntactic sugar. While the query language has a range of differences to SQL, there are enough similarities to make knowledge of SQL useful to those who know it. However, its utility is far beyond providing yet another way to work with databases.

Closing Thoughts On C# 3.0

This brings us to the end of this four part series on C# 3.0. Here is a quick recap on all that we have seen.Type inference removes much of the tedium of writing out type annotations again and again.Lambda expressions make higher order programming syntactically light.Extension methods provide another path to better code re-use when correctly applied.Object and collection initializers along with anonymous types make building up large data structures much less effort.Linq gives us declarative programming abilities over objects, databases and XML documents.When I saw C# 1.0 I highly doubted that C# was going to be a language I would ever be excited about. I have been pleasantly surprised, and writing this series has been a lot of fun. I hope that it has been informative and enjoyable to read, and that it will help you to make powerful use of the new language features. I greatly look forward to being able to use them in my own day-to-day development and seeing how other people use them.

Of course, knowing about something and doing it yourself are two entirely different things; if you haven't already done so, grab yourself the Visual Studio 2008 trial or the free Express Edition. Only then will you become comfortable with the new features and be able to use them effectively in your own development. Happy hacking, and have fun!

Reference:www.programmersheaven.com

tags:what is LINQ,step by step tutorials of Linq,what is new in c# 3.0/3.5,beginner tutorials on Linq