January 19, 2008

First of all, I want to apologize for the amount of time it has taken me to put up a new blog post. I typically try to write once a week, but my life has been crazy as of late. The outcome of all of the madness is that I will be packing up my family and moving from Portland, Oregon to Denver, Colorado at the end of this month. (My wife has been accepted into a Master’s degree program there!)Because of this, posts for the next couple of months may be sporadic, as well. (I’m looking for work in the Denver area. If anybody has any leads, let me know… )

In celebration of my move, I thought that I would put together an example that demonstrates two different ways that WCF serializes objects. (This posting will simply show different ways the engine can work. I hope to actually wire it into the WCF pipeline in a future post.) Since I spent last night packing all of my books into boxes, I thought that maybe I’d find a way to organize them. The goal of this application is to create a manifest list allows me quickly find any book that I’ve packed away in a moving box.

As always, the complete code is at the end of the blog post. The amount of code with this post is a little bit greater than usual. I’ll try and write code directly related to the topic within the text, but I would recommend copying the code to studio if you want to follow along.

Since my main objective is to manage books and boxes, I’m going to begin by simply creating a very basic object for each. My Book object will contain two different properties (Title and Author), and my Box object will simply contain an Id. Since we will be using the WCF engine to do our serialization we will need to mark the classes and properties we wish to expose to the serialization engine with the [DataContract] and the [DataMember] attributes.

I now need a method to associate books with boxes. In order to keep my Book and Box object as loosely coupled as possible, I will add a Manifest object to the project which will be responsible for maintaining the relationship between the two. The Manifest object will expose three properties. The Books property will contain a collection of all of the Books on my bookshelf. The Boxes property will contain all of the Boxes I’m using to move. The LineItems property will contain information as to which book is in which box.

Since we are using WCF, the serializer will require both a getter and a setter for every object it serializes. I don’t really want to give consumers of my class the ability to blow away my boxes collection, so I have explicitly created a private setter. This allows WCF the access it needs, while denying direct access to other consumers.

LineItems are added to the collection by calling the AddLineItem method. This method adds the book to the book collection, adds a box to the box collection, and creates a relationship between the two by adding a new LineItem to the LineItems collection. It is important to note that the association created in the LineItems collection does not contain new instances of books and boxes, rather it contains a reference to the master copy in the Books and Boxes collection. (In other words, the association is by reference.)

I’ve overridden the ToString on the manifest object to display the complete list of books and boxes that it contains. There is also a CreateShippingManifest method which will be used to create a generic manifest of three books contained in two boxes. (See code below.)

Box 1 A Brief History of Time
Box 1 Guards Guards
Box 2 The Reptile Room

Now, we’ll make an assumption that I made a mistake in entering a box label into the program. Instead of box 1, I meant the Id to be box 100. We’ll make the change to the box id, and the manifest will automatically pick up the results.

Box 100 A Brief History of Time
Box 100 Guards Guards
Box 2 The Reptile Room

Now, let’s add some code to serialize and deserialize our manifest. This is actually a fairly simple process. The first method will serialize the data to disc (as Xml), the second method will read the file and deserialize it into a new object.

Object after deserialization
Box 1 A Brief History of Time
Box 1 Guards Guards
Box 2 The Reptile Room
Box 1 A Brief History of Time
Box 1 Guards Guards
Box 2 The Reptile Room

But wait a minute! What happened here? We clearly changed the Id of the first box to be 100, yet the results still state that our box is Box 1! The code looks very similar to what we used in the non-serialized objects. The reason that this happens is because of the way the default instance of the DataContractSerializer writes out the results. If you take a look at the Xml file that is created from our serializer, you’ll see results similar to the following:

The Xml does not contain the associations between the manifest items and the books and boxes that we set up so carefully in our code. Changing the box Id in the master collection no longer changes the box id in all of the children.

We can very easily fix this by updating the DataContractSerializer instantiation in our Serialize method. The important parameters in the new constructor are the third and fifth. The third parameter (maxItemsInObjectGraph) indicates the total number of objects that the Xml can contain. If the number of objects in the Xml is exceeded, an error will be raised. The fifth parameter (preserveObjectReferences) indicates that the associations between objects should be preserved.

The important parameters in the new constructor are the third and fifth. The third parameter (maxItemsInObjectGraph) indicates the total number of objects that the Xml can contain. If the number of objects in the Xml is exceeded, an error will be raised. The fifth parameter (preserveObjectReferences) indicates that the associations between objects should be preserved. Now, when we rerun our demo, we see that the Id of the box is correctly updated.

Object after deserialization
Box 1 A Brief History of Time
Box 1 Guards Guards
Box 2 The Reptile Room
Box 100 A Brief History of Time
Box 100 Guards Guards
Box 2 The Reptile Room

Examining the Xml now shows the Id and IdRef structures in place to reassociate the data.

So, there you have it. With WCF you can serialize by value or by reference. Pretty neat stuff. This has kind of been a marathon post, I hope you could follow along. Let me know if you find anything that isn’t clear!
Code Safe!
MW

January 1, 2008

Recently, we discovered that I needed a doubly linked list to chain objects together in our code. .Net has made this an incredibly easy process, as it provides a LinkedList generic object which manages the creation of the list, as well as the inserting and the deletion of the nodes.

Our project also requires that we persist our linked list to a database. The task seems easy enough. All that needed to be done was to create a table which contains our data, as well as pointers to the previous and next nodes. In other words, our initial table structure looked like this:

This structure seems to work on the surface, but we very quickly realized two very critical problems.

The first problem is that inserting the data into this data structure requires two passes. On the first pass, we insert all of the records into the database. Only after all of the records have been inserted can we assign links to both the parent and child records in the ParentId and ChildId columns.

foreach (link in theChain)
{ //Insert the record. }
foreach (link in theChain)
{ //Update the record to include the parent and child pointers }

The second problem is that the data can fall out of sync with itself. For example, what happens if the data ends up looking like this due to some misbehaving code? Id 1 believes that the child record should be Id 2, but Id 2 believes that it is the top of its own chain.

Id: 1 ParentId: null ChildId: 2
Id: 2 ParentId: null ChildId: 3

Both of these problems can be solved by treating the doubly linked list as a singly linked list in the database. If you have the links of a chain going in one direction, you should be able to determine the links going the other way. We initally avoided this option, because we thought the query to retrieve the data would be extremely complex. (Query the parent with a union of the child, maybe into a temporary table. Ugh.)

While on a walk, yesterday, though, I came up with the idea of simply writing a query with an additional join that would return the data with the links in both directions. Our database would no longer need the ChildId column. If we order our data so that parents always fall above their children (the natural state of a linked list), we can insert all of this data in a single pass. Since there is no ChildId, the data can’t become inconsistent.

Id int
ParentId int (FK to Id)
Description nvarchar(30)

We retrieving data to recreate the LinkedList in code, we can get both parent and child ids by linking the LinkedList table to itself.

It’s always a neat experience when an elegant solution comes out of the blue to solve a complex problem. I’m amazed at how often walking away and letting the subconscience mind work will lead to a better solution than when it is being actively developed. Seems like a good New Year’s resolution will be to walk more. Leads to a healthier me, and healthier code.

December 22, 2007

Merry Christmas everyone! I hope this holiday finds you happy and healthy with your loved ones! We’ve made the journey north to Washington to be with our families, and the kids are very excited for Christmas this year.

In celebration of Christmas, I thought that I would share with you a coded Christmas tree. I learned of this Christmas tree back in college when I was taking a math methods class, and first programmed it on my trusty TI-85 graphing calculator.

To build this tree, we’re going to play a simple game. It has three rules:

Define three points that represent the verticies of a traingle traingle.

Starting from one of the verticies, move half the distance to a randomly chosen vertex point, and draw a new point.

Starting at the new point, move half the distances to a randomly chosen vertex point and draw a new point.

Repeat step 3 until you get bored.

Let’s implement the steps in order. We’ll simply use a windows form project and paint the results directly on the form itself.

Our first step is the definition of the verticies. We’ll declare three points forming our triangle as member variables on our form.

Point _initialPoint1 = new Point(200, 0);
Point _initialPoint2 = new Point(0, 400);
Point _initialPoint3 = new Point(400, 400);

Next, we will need a method to draw our individual points on the form itself. My DrawPoint method accepts a point and a graphics object. Accepting the graphics object as a parameter prevents us from continually having to create and dispose the graphics object.

To implement steps two and three, we will need a method which, given a point, will calculate the half the distance to one of the original verticies and return a new point. You’ll notice we had to create a new member variable called _random. I initially was creating a new random method within the function, itself, but I was getting decidely unrandom results. When the Random object is created, it uses a seed value from the system time. My method was getting called faster than the time was changing, so I was seeing repeated “random” numbers. By moving the object creation outside of the method, the object is seeded only once, and the values turn out to be truly random.

Excellent. Let’s build and run our Christmas tree progam and see what comes out.

Isn’t that neat? This code generates a well known fractal called a Sierpinski triangle. An entertaining (non-code) alternative to creating the traingle is to write out Pascal’s triangle, and shade in all of the odd numbers. Pretty neat stuff!

December 14, 2007

Last week, I blogged about creating a jukebox using LINQ to query the filesystem. We used a quick-and-dirty query to join properties from three different objects (two directory objects and a file object) into an anonymous type which represented the music folder hierarchy on my machine. It was a pretty neat first attempt at using LINQ, but I want my jukebox to do more.

My goal for this blog is to append additional information to the individual tracks in my library. To do this, I will store comments on individual tracks in an xml file. Each xml element will contain the filename of the track to which the comment applies, as well as the comment itself. Using the filename as a key, we will use LINQ to join the additional Xml information into the data generated from the directory structure.

The xml file is structured as follows. (The file attribute is truncated for space in this posting.)

The first step in this process will be to load the Xml. LINQ introduces a whole new series of objects for dealing with Xml data. When using LINQ, XDocuments are used to hold Xml data. These can be loaded directly from a file in much the same you would load the more familiar XmlDocument object.

We’re now going to loop through each of the Track elements and extract both the file name and the comment from the data. We will start at the DocumentElement and drill down into the XML DOM object using the Elements method. This new method returns an IEnumerable list of type XElement. Using each of these elements, we will create a new anonymous type from the element’s attributes and store a collection of them in the variable comments.

For those of you following along at home, you’ll be quick to point out that this is not quite the result we’re looking for. This LINQ query returns an INNER JOIN style result. I have 50 or so tracks on my laptop, but only two entries in my Xml file. Using this query, I can only access the two tracks that exist in the Xml. What we really want is a LEFT OUTER JOIN. We wish to include all tracks, even if they don’t have a comment associated with them.

It takes a little bit to convince LINQ to do a flattened OUTER JOIN. First, we will do a group join on our Xml and store our results in a temporary variable (tracksAndComments) using the DefaultIfEmpty() method. DefaultIfEmpty will force a null into the right hand side of the join if no data matches the key. Next, we will export this data into a new anonymous type to be stored in mergedResults. We will use a ternary operator to replace any null tracksAndComments objects with an empty string.

December 9, 2007

Earlier this week, the Portland Area Dot Net Users’ Group had an installation party for Visual Studio 2008. During the event, they had a contest to see who could come up with the best LINQ sample. The winner would receive a customized Zune. While I had not yet used LINQ, I decided to throw my hat in the ring with the following query:

As you’ve probably already guessed, I didn’t leave with the fancy new piece of hardware. I did receive a chuckle from the judge, though.

After everything had ended, I came up with an idea that may have been a serious contender. A cool entry would have been to try and model an mp3 player’s functionality using LINQ. I decided to create a program that would rip through the music structure on my PC and display the results by artist, album and track.

On my laptop, I have an ITunes folder. (Can you say ITunes on an essentially Microsoft blog?) This folder arranges music into three different levels. The topmost folder contains one folder for each artist I have music for on my PC. Each artist folders contains one folder for each album I have by this artist. The artist folder, in turn, contains a list of tracks that I have available to play on my pc.

My ultimate goal is to take this hierarchical structure and flatten it into a listbox view similar to the following:

Traditional (pre-LINQ) programming would have achieved this through a simple nested-loop construct. Starting at the top level folder, loop through the subfolders populating the ListBox’s ListItems as you go.

This method works well enough, but LINQ gives us a much more elegant solution. In the above code, we have three different objects. The first object maintains artist directories, the second maintains album directories, and the third maintains track information. By using LINQ, we can create an anonymous type which will hold only the pieces of data we are interested in dealing with. The following method is the LINQ equivalent to the above code.

The from statements in this query retrieves data from each of the individual folders and merges them into a flattened hierarchy. The select statement then creates a new anonymous type which contains four properties: Track, Album, Artist, and TrackFile. Not only has the data been reduced to only the data we care about, it has been renamed to make more sense for our application. Finally, we loop through the data returned in the query, adding the values into the ListView.

So, really, what is so amazing? Is the second method really that much better than the first?

What I really think will set LINQ apart is the fact that it is platform agnostic when it comes to querying data. The same syntax can be used to query databases, objects, and Xml. Furthermore, one is able to extract and merge exactly what one needs from these different types and combine them into specialized types on the fly. Sorting and filtering data in LINQ is very simple. Want to see only music by the Big Horn Brass with the tracks in descending order? No problem.

The biggest drawback I see at the moment is the syntax for non-trivial queries. I’d originally wanted to merge in some Xml comments to a couple of tracks using the equivalent of a LEFT OUTER JOIN, but never quite got it to work right. The syntax for Xml seems to be entirely new and is not immediately intuitive to someone who has used the old model. (Granted, I’ve probably played with LINQ a total of three hours, now, so I can’t complain too much.) I’ve picked up a LINQ book, and will work to figure that one out. I suspect that the syntax will become easier with time and practice.