2005.11.30

Ever wonder about the exact difference between data types in SQl Server? This handy reference explains and gives some advice on their use.

I hadn't realized that SQL Server will create GUIDs with the "uniqueidentifier" type before. That's what I get for using O/R mapping abstraction layers all the time. Speaking of that, I've been using the Wilson ORMapper lately and really like it.

2004.12.08

I recently mentioned that Improving is planning an upcoming product release. The product is an open source toolkit for object relational mapping and database related code generation, Or. We view it as a competitor to NHibernate. Although it probably won't be as feature-rich near-term, it will definitely hit a useful sweet spot, and provide Improving a platform to discuss a number of design decisions, corners of the .NET framework, and, of course, object relational mapping techniques.

As part of the release, we wanted to have sample applications that demonstrate its use (in lieu of quality documentation that should follow.) We decided tactically that rather than create a canned application, it would make more sense to start with some existing, open-source sample applications, port them to use the framework, and then do a before/after comparison to see what and whether the changes improved the application (gratuitous marketing: Improving. Its what we do.) For the first release, we thought we would start with Microsoft's own sample TimeTracker application from ASP.NET and Dottext, the popular, open source blog server application. Todd is hacking up the TimeTracker, and I was taking on the .Text code base.

Now.

Before I get into this, I think the following disclaimer needs to be shared. Design is full of trade-offs, and many designs are [thank you, Mr. Weinberg] how they are because they got that way. It is probably reasonable to assume that, with the exception of a few choice pieces of code, there is always a way to improve on an existing design. That is one of the core tenets of our brand, Improving -- there is *always* room for Improving. I respect and applaud Microsoft for releasing to open-source the sample Time Tracker app to provide a concrete example of how to use ASP.NET. Even more than that, I respect and applaud Scott Watermasysk for giving his code to the community, even though he reserved the right to take it back and, under new employment, appears to have done so. We are giving our code to the community as well with the release of Or. We fully expect someone not so close to our code could look at it and find tons of opportunities for Improving the design, and we welcome it. If you have great ideas that you want to share, email or at improvingtech.com and ask to become a committer. We are going to be very picky about who gets on the committer list, so we expect you to demonstrate yourself with real live code.

Now.

Back to the application I have been focusing on - .Text. .Text is a blogging platform that supports a variety of syndication formats and multiple blogs. It is backed by a Sql Server database and has a very consistent design, which makes it fairly easy to understand and modify. All of these things, at first glance, make it appear to be a good candidate for the port to Or. As I dug deeper, however, I found a number of twists that made the port challenging, and I was distracted by a number of coding techniques that I don't typically practice. For example, I almost never create typesafe collections. I want to be able to support typesafe collections in Or, but I haven't really taken the time to test things out. I thought I would dig a little deeper in .Text to understand how it was using them, and the result of that investigation is this post.

Typesafe collections provide a [go figure] typesafe data structure for managing multiple instances of the same type of object. I really didn't see typesafe collections much before I developed with Microsoft technologies. To me, typesafe collections have the potential to provide a number of benefits. First, you can restrict the type of objects that are passed into a collection. This ensures that the types that you get out are also of the same type. Second, if you go further in your implementation of typesafe collections than the average [.NET] implementation, you can eek a small bit of performance out of your collection by backing it with a typesafe array, allowing you to avoid the cost of casting to and from your target contained type, which adds at least an isinst IL, which, relative to some other IL ops, is fairly expensive. If the majority of the time in your application is spent iterating over collections of objects, this could actually have a pretty significant impact on the the responsiveness. This has never [ever... EVER] been the bottleneck for me, but your mileage may vary. I can imagine plenty of cpu intensive cases that might yield such results, and I know close personal friends who have had to deal with similar problems in Java many moons ago, but we all know that *that* was because it was implemented in Java, right guys? Come on? Your with me, aren't you? [Of course not].

For what it's worth, my experience has shown that the real bottleneck is almost always i/o, and it is typically disk first, network second, unless you're doing something really stupid, er, I mean, naive, er, I mean innocent, like accessing properties on distributed objects, which everyone should know by now is a bad idea. There are some who would say distributed objects at all are a bad idea. I definitely prefer [text-based] message passing ala services or REST. Back to our story.

As I dug deeper on the typesafe collections, I noticed a number of things. First, the code that accesses the collections in .Text almost always looks like the following:

Without actually being responsible for designing the code, I can only presume that the code is written this way for performance reasons. First, the null check is paranoid-but-reasonable defensive measure. Second, the Count method from CollectionBase (which ends up delegating to an internal ArrayList, for what it is worth) is pretty darn cheap, as method calls go. Compare that with this slightly cleaner syntax sugar coated version that assumes that the GetKeywords() method will at least return an empty collection (which it does):

This cleaner version hides the fact that under the covers, an Enumerator object is created, which is certainly more expensive than a null check and a Count check. Both implementations hide the fact that the reference is downcast, e.g. there is an isinst call [and its friends], with the typesafe collection having its downcast occuring in the indexer. I could just as easily do the same checks before the foreach version to gain the modicum of performance in the null case. Once again, I feel compelled to state that none of this has ever been material to an application I have written, but if you think it may be, PAYF. Again, if the typesafe implementation is backed by a typesafe array, a little more performance can be had. Given that, it would appear that the only benefit left to having a typesafe collection would be so that we could ensure that we are only allowing particular types to enter the collection. So, we get to this point and we have to ask ourselves, is the hundred or so lines of code necessary to build a typesafe collection in the manner that it is done in .Text worth the effort to gain the security of ensuring that we only add types we expect to add? For me, the answer is "No." This is largely due to the fact that I trust myself to only add in types that are homogeneous at some level in a hierarchy below object. I rarely have a situation where I have lots of different code adding objects to the same collection. If I do, the collection is usually an instance variable on another class, and I can protect the collection by implementing a mutator for the collection along the lines of:

I went through and deleted all of the typesafe collections and modified the code that accessed them to use foreach statements, and the code looked to me to be considerably cleaner and the number of classes and files that needed to be maintained were reduced. I *did* have to add back in a PagedList implementation and an ImageCollection, because they seemed to actually differentiate themselves. For the ImageCollection, I didn't actually extend CollectionBase, but instead just held on to an IList instance. So, the moral of the story is that TypeSafe collections can add value to a system, but they do come at a cost, and that cost should be weighed relative to the amount of effort necessary not only to [generate] the code, but also to maintain and especially to understand the system. Fewer extraneous types means fewer things to understand means a shorter ramp. I have included an example of a typical typesafe collection implementation from the .Text source code to demonstrate the amount of code we are talking about. It isn't a huge amount of code, until you realize that there are 14 other classes of identical size. Fortunately, the code is very much boilerplate, and there are many templating systems, e.g. CodeSmith, QuickCode, CodeRush, etc. that allow you to generate said boilerplate.

2004.08.23

I agree that implementation how-to comments are usually redundant (if appropriate, a single comment that names the algorithm being used is good), but I like interface assumptions documented.

*Unless* I am building a closed source library/framework, product that someone else is going to program against, and I correspondingly need good documentation for the developers, I typically want to go one of two places to understand interface assumptions: either the regression test suite (you have one, right?), or the code itself.

For example, consider the following psuedo-code (psuedo so I don't have to worry about it compiling):

I do think it is worth making the point at this point that I am talking about my personal preference and coding style. If my team agrees to a particular granularity for commenting, I am going to stick to it, if only for the esprit de corps. If I am on a distributed development team, and the disconnectedness forces higher ceremony and more verbose documentation (apart from the source code), then I am all Visio and Word. I just prefer face to face communication, tests, and code as the ultimate arbiter of truth. And I still long for the "Hide the comments" option.

2004.08.21

I don't like source code comments. Most of the time, they come in one of two flavors: out-of-date or clueless. One of the biggest wishes I have on my list for Visual Studio is a "completely hide comments all of the time" option.

Comments should explain why. Only add comments when it isn't obvious from the code. Granted, the person who wrote the code is usually the least appropriate person to figure what is or isn't obvious about the code they have written. Also, you might have to comment a bit more for closed source libraries and products that require developer documentation. Caveat Lector.

Comments should not explain what or how. Your source code should. Please do me a favor, do yourself a favor, and learn to like reading source code. Personally, if I comment source code, I usually tend to comment the key extensibility points in the architecture using a class comment at the top of the source file. I try hard to avoid most other comments. Of course, you might hate my software because of that, but I don't really care that much.

Unless you are writing a tutorial on how to use a particular language, please don't add comments that explain how the language works. For example,

// Look -- this is a *performance critical section.* I know it looks like a serious hack, but great care
// was taken to ensure this would meet such and such automated benchmark that you can run as follows.
// Please ensure you don't f it up.

One thing that particularly peaves me is including open-source or otherwise (read: useless information) licensing information. It isn't that I mind the license being included or the fact that it is open-source. I love open-source software. I love to read open-source software as a way to learn various techniques or libraries. Please, please, dear open source community, please either start including links to the license or put it at the end of the source file rather than the beginning.