Clear API Design

When designing object models, there are often times when a Dictionary is the best choice for rapid lookup and access of items. However, when attempting to make those object models as intuitive and simple as possible, sometimes the fact that a dictionary is being used is an implementation detail and needn’t be exposed externally.

The consumers of our API may not care about iterating through key-value pairs, but now they have to remember to use this Values property or face the wrath of red squiggles and compiler errors when they forget. Of course, there’s a pattern we could use to hide this dictionary inner goo from the outside.

Now from the outside, we can foreach over db.Tables, but inside we can use the Dictionary for fast access to elements by key.

The Need for a Dictionary-List Hybrid

This is an either-or approach: that is, it assumes that the API consumer is better off with an IEnumerable collection and won’t have any need for keyed access to data (or even adding data, in this case). How can we have the best of both words, with the ability to write this kind of code?

This is a hybrid of a Dictionary and a List. (Don’t confuse this with a HybridDictionary, which is purely a dictionary with runtime-adapting storage strategies.) It provides an IEnumerable<T> enumerator (instead of IEnumerable<K, T>), as well as an indexer for convenient lookup by key.

There’s another aspect of working with dictionaries that has always bugged me:

db.Tables.Add("Vendors", new Table("Vendors"));

This is repetitive, plus it says the same thing twice. What if I misspell my key in one of these two places? What I’d really like is to tell my collection which property of the Table class to use, and have it fill in the key for me. How can I do that? Well, I know I can select a property value concisely (in a compiler-checked and refactoring-friendly way) with a lambda expression. So perhaps I can supply that expression in the collection’s constructor. I decided to call my new collection KeyedList<K, T>, which inherits from Dictionary so I don’t have to do all the heavy lifting. Here’s how construction looks:

Tables = new KeyedList<string, Table>(t => t.TableName);

Now I can add Table objects to my collection, and the collection will use my lambda expression to fill in the key for me.

Tables.Add(new Table("Vendors"));

How does this work, exactly? Here's a first cut at our KeyedList class:

This is still pretty simple, but I can think of one thing that it’s missing (aside from a more complete IList<T> implementation). With a collection class like this, with tightly-integrated knowledge about the relationship between the key property of an item and the key in the Dictionary, what happens when we change that key property in the item? Suddenly it doesn’t match the dictionary key, and we have to remember to update this in an explicit separate step in our code whenever this happens. It seems that this is a great opportunity to forget something and introduce a bug into our code. How could our KeyedCollection class track and update this for us?

Unfortunately, there’s no perfect solution. “Data binding” in .NET is weak in my opinion, and requires implementation of INotifyPropertyChanged in our classes to participate; and when it does so, we only get notification of the property name that changed (supplied as a string), and have no idea what the old value was unless we store that somewhere ourselves. Automatically injecting all classes with data binding code isn’t practical, of course, even using AOP (since many BCL classes, for example, reside in signed assemblies). Hopefully a future CLR will be able to perform some tricks, such as intelligently and dynamically modifing those classes, for which other class’s data binding code specify interest, so we can have effortless and universal data binding.

Now back to reality. I want to mention that although my code typically works just as well in Compact Framework as it does in Full Framework, I’m going off the reservation here. I’m going to be using expression trees, which are not supported in Compact Framework at all.

Expressions

The Expression class (in System.Linq.Expressions) is really neat. With it, you can wrap a delegate type to create an expression tree, which you can explore and modify, and at some point even compile into a function which you can invoke. The best part is that lambda expressions can be assigned to Expression types in the same way that they can be assigned to normal delegates.

Func<int> func = () => 5;
Expression<Func<int>> expr = () => 5;

The first line defines a function that returns an int, and a function is supplied as a lambda that returns the constant 5. The second line defines an expression tree of a function that returns an int. This extra level of indirection allows us to take a step back and look at the structure of the function itself in a precompiled state. The structure is a tree, which can be arbitrarily complex. You can think of this as a way of modeling the expression in a data structure. While func can be executed immediately, expr requires that we compile it by calling the Compile method (which generates IL for the method and returns Func<int>).

The first two lines are equivalent, as are the last two. I just wanted to point out here, with the two ways of calling the functions, how they are in fact the same, even though the last line looks funky.

Synchronizing Item & Dictionary Keys

So why do we need expressions? Because we need to know the name of the property we’ve supplied in our KeyedList constructor. You can’t extract that information out of a function (supplied as a lambda expression or otherwise). But expressions contain all the metadata we need. Note that for this synchronization to work, it requires that the items in our collection implement INotifyPropertyChanged.

This is tedious work, and though there are some patterns and code snippets I use to ease the burden a little, it’s still a lot of work to go through to implement such a primitive ability as data binding.

In order to get at the expression metadata, we’ll have to update our constructor to ask for an expression:

One caveat about this approach: our shadowing method Add will unfortunately not be called if accessed through a variable of the base class. That is, if you assign a KeyedList object to a Dictionary object, and call Add from that Dictionary variable, the Dictionary.Add method will be called and not KeyedList.Add, so synchronization of keys will not work properly in that case. It’s extremely rare that you’d do such a thing, but I want to point it out regardless. As inheritor of a base class, I would prefer the derived class be in fuller control of these behaviors, but we work with what we have. I’ll actually take advantage of this later on in a helper method.

Finally, the tricky part. We need to examine our lambda’s expression tree and extract the property or field name from it. We’ll compare that to the property name reported to us as changed. The comparison is actually done between two MemberInfo variables, which is possible because reflection ensures that only one MemberInfo object will exist for each member. The MemberExpression object, which iniherits from Expression, possesses a Member property, and the other we get from typeof(T).GetMember. Here’s what that looks like:

This code makes an important assumption; namely, that a lambda expression will be used, which will contain a single field or property access. It does not support composite or calculated keys, such as (t.SchemaName + “.” + t.TableName), though it’s possible. I’m currently working on a method that recursively explores an Expression tree and checks for member access anywhere in the tree, to support scenarios like this. For now, and for the purpose of this article, we’ll stick to the simple case of single member access.

I found that having access to the list of KeyValuePairs was actually useful in my code, and to keep the PropertyChanged handler concise, I added a new KeyValuePairs property to expose the base Dictionary’s enumerator, which you can find in the complete listing of the KeyedList class toward the end of this article. I now have two iterators; and the way I’ve flipped it around, the default iterator of the base class has become a secondary, named iterator of KeyedList.

Here is a test program to demonstrate the functionality and flexibility of the KeyedList class.

Conclusion

As with the non-binary Tree data structure I created in this article, I prefer to work with more intelligent object containers that establish tighter integration between the container itself and the elements they contain. I believe this reduces mental friction, both for author and consumer of components (or the author and consumer roles when they are the same person), and allows a single generic data structure to be used where custom collection classes are normally defined. Additionally, by using data structures that expose more flexible surface areas, we can often reap the benefits of having powerful lookup features without locking ourselves out of the simple List facades that serve so well in our APIs.

The Achilles Heel of this solution is a weak and un-guaranteed data binding infrastructure combined with lack of support for Expressions in Compact Framework. In other words, items have to play along, and it’s not platform universal.

Clearly, more work needs to be done. Event handlers need to be unhooked when items are removed, optimizations could be done to speed up key synchronization, and more complex expressions could be supported without too much trouble. Ultimately what I’d like to see is a core collection class with the ability to add and access multiple indexes (such as with a database), instead of presuming that a Dictionary with its solitary key is all we can or should use. The hashed key of a Dictionary seems like a great adornment to an existing collection class, rather than a hard-coded stand-alone structure. But I think this is a good start toward addressing some of the fundamental shortcomings of these existing approaches, and hopefully demonstrating the value of more intelligent collection and container classes.

Other Implementations

I was at a loss for a while as to what to call this collection class; I considered DictionaryList (and ListDictionary), as well as HashedList, before arriving at KeyedList. To my amazement, I found several other implementations of the same kind of data structure with the same name, so it must be a good name. The implementations here and here are more complete than mine, but neither auto-assign keys with a key-selection function or update dictionary keys using data binding, which ultimately is what I’m emphasizing here. Hint: it wouldn’t be tough to combine what I have here with either of them.

It’s hard to argue that code like this is succinct, elegant, and easy to read and write. It’s a common enough scenario to check for membership in a set that it warrants support in a language, although from inquiries I’ve made of the Microsoft C# compiler team, the syntax presents some parsing complexity and therefore implementation difficulty.

But this is a lot of work (relative to the task): calling a static member on a type, instantiating an array with the new operator, comparing the result to -1 (which is a detail of our approach rather than the essense of our intention), and obscuring the value we want tested (c, the primary actor in the conditional operation) in the middle of a busy syntax.

Using the Contains extension method is better, as it more directly expresses our intention, but we’re still creating the array explicitly and using symbols on the keyboard that slow us down–if we had a whole bunch of these to write all at once, it would create some micro-irritation. As with the Array.IndexOf example, we’re also reversing the flow of thought from the original Pascal example, specifying a set and then checking for membership, instead of starting with a member and testing if it belongs to a set. The end result is the same, but I personally like Pascal’s direction and flow.

It’s pretty difficult to match the simplicity of Pascal’s in operator. But in the spirit of stubbornness and mild obsession, I came up with this extension method to reverse the direction of Contains:

With this generic extension method defined on Object, I can now write strongly-typed IN-operator-style conditional expressions like so:

if (c.In('A', 'E', 'I', 'O', 'U')) // ...

if (column.DataType.In("nchar", "nvarchar", "ntext")) // ...

I could have implemented the method more simply, like this:

return values.Contains(obj);

… but that involves another method call, and I’d rather not introduce any additional overhead for a commonly-used, global operator if it’s this easy not to.

One obligatory cautionary note: I don’t recommend adding too many extension methods on Object because of the potential for cluttering Intellisense (which currently doesn’t have an option for filtering out extension methods), but this is one of those general-purpose operators that I’d like to always have available; thus the exception.

In setting up virtual machines for development, I’ve repeatedly run into trust issues accessing solutions on network shares. Many blogs advise using the .NET 1.1 Configuration tool, which is no longer shipped with Visual Studio. You can still get it by installing the old .NET Framework 1.1 SDK first, and then going through a series of installations to bring your machine up to date with the remaining versions and toolsets. I went through the process once, and it’s very undesirable, especially if you build or rebuild development machines more often than you’d like to admit.

So in my latest round of setups, I came across Robert McLaws’ article on the proper caspol syntax for establishing Full Trust for a specific network share, based on this Microsoft article whose title is overly specific. I’ll reiterate that command here for your convenience:

To Robert’s point, who would have thought to include four forward slashes?

Be aware that you’ll get an access error in Vista with UAC on, unless you run with elevated privileges.

I’ve done this on Windows Vista 32bit and it seems to be working great. Even better, I don’t need to use a VMWare Virtual Disk (which itself has some kind of trust or compatibility issue with Visual Studio, due to being VMFS instead of NTFS), or a Physical Disk, which prevents snapshots unless you first disconnect the disk. I talked about these VM setup issues in this article.

A lot has been said and written about dynamic programming, metaprogramming, and language syntax extensions–not just academically over the past few decades, but also as a recently growing buzz among the designers and users of mainstream object-oriented languages.

Dynamic Programming

After a scene-setting tour through the history and evolution of C#, Anders addressed how C# 4.0 would allow much simpler interoperation between C# and dynamic languages. I’ve been following Charlie Calvert’s Language Futures website, where they’ve been discussing these features early on with the development community. It’s nice to see how seriously they take the feedback they’re getting, and I really think it’s going to have a positive impact on the language as a whole. Initial thoughts revolved around creating a new block of code with code like dynamic { DynamicStuff.SomeUndefinedProperty = “whatever”; }.

But at the PDC we saw that instead dynamic will be a type for our dynamic objects, and so dynamic lookup of members will only be allowed for those variables. Anders’ demo showed off interactions with JavaScript and Python, as well as Office via COM, all without the ugly Type.Missing parameters (optional parameter support also played a part in that). Other ideas revolved around easing Reflection access, and XML document access for Xml nodes dynamically.

Meta-Programming

At the end of his talk, Anders showed a stunning demo of metaprogramming working within C#. It was an early prototype, so all language features were not supported, but it worked similar to Eval where the code was constructed inside a string and then compiled at runtime. But it was flexible and powerful enough that he could create delegates to functions that he Eval’ed up into existence. Someone in the audience asked how this was different from Lisp macros, to which Anders replied: “This is basically Lisp macros.”

Before you get too excited (or worried) about this significant bit of news, Anders made no promises about when metaprogramming would be available, and he subtly suggested that it may very well be a post-4.0 feature. As he said in the Future of Programming Panel, however: “We’re rewriting the compiler in managed code, and I’d say one of the big motivators there is to make it a better metaprogramming system, sort of open up the black box and allow people to actually use the compiler as a service…”

Regardless of when it arrives, I hope they will give serious consideration to providing syntax checking of this macro or meta code, instead of treating it blindly at compile-time as a “magic string”, as has so long plagued the realm of data access. After all, one of the primary advantages of Linq is to enable compile-time checking of queries, to enforce not only strict type checking, but to also more fundamentally ensure that data sources and their members are valid. The irregularity of C#’s syntax, as opposed to Lisp, will make that more difficult (thanks to Paul for pointing this out), but I think most developers will eventually agree it’s a worthwhile cause. Perhaps support for nested grammars in the generic sense will set the stage for enabling this feature.

Language Syntax Extensions

If metaprogramming is about making the compiler available as a service, language extensions are about making the compiler service transparent and extensible.

The majority (but not all) of the language design panel stressed caution in evolving and customizing language syntax and discussed the importance of syntax at length, but they’ve been considering the demands of the development community seriously. At times Anders vacillated between trying to offer alternatives and admitting that, in the end, customization of language syntax by developers would prevail; and that what’s important is how we go about enabling those scenarios without destroying our ability to evolve languages usefully, avoiding their collapse from an excess of ambiguity and inconsistency in the grammar.

“Another interesting pattern that I’m very fond of right now in terms of language evolution is this notion that our static languages, and our programming languages in general, are getting to be powerful enough, that with all of these things we’re picking up from functional programming languages and metaprogramming, that you can–in the language itself–build these little internal DSLs, where you use fluent interface style, and you dot together operators, and you have deferred execution… where you can, in a sense, create little mini languages, except for the syntax.

If you look at parallel extensions for .NET, they have a Parallel.For, where you give the start and how many times you want to go around, and a lambda which is the body you want to execute. And boy, if you squint, that looks like a Parallel For statement.

But it allows API designers to experiment with different styles of programming. And then, as they become popular, we can pick them up and put syntactic veneers on top of them, or we can work to make languages maybe even richer and have extensible syntax like we talked about, but I’m encouraged by the fact that our languages have gotten rich enough that you do a lot of these things without even having to have syntax.” – Anders Hejlsberg

On one hand, I agree with him: the introduction of lambda expressions and extension methods can create some startling new syntax-like patterns of coding that simply weren’t feasible before. I’ve written articles demonstrating some of this, such as New Spin on Spawning Threads and especially The Visitor Design Pattern in C# 3.0. And he’s right: if you squint, it almost looks like new syntax. The problem is that programmers don’t want to squint at their code. As Chris Anderson has noted at the PDC and elsewhere, developers are very particular about how they want their code to look. This is one of the big reasons behind Oslo’s support for authoring textual DSLs with the new MGrammar language.

One idea that came up several times (and which I alluded to above) is the idea of allowing nested languages, in a similar way that Linq comprehensions live inside an isolated syntactic context. C++ developers can redefine many operators in flexible ways, and this can lead to code that’s very difficult to read. This can perhaps be blamed on the inability of the C++ language to provide alternative and more comprehensive syntactic extensibility points. Operators are what they have to work with, so operators are what get used for all kinds of things, which change per type. But their meaning gets so overloaded, literally, that they lose any obvious (context-free) meaning.

But operators don’t have to be non-alphabetic tokens, and the addition of new keywords or symbols could be introduced in limited contexts, such as a modifier for a member definition in a type (to appear alongside visibility, overload, override, and shadowing keywords), or within a delimited block of code such as an r-value, or a curly-brace block for new flow control constructs (one of my favorite ideas and an area most in need of extensions). Language extensions might also be limited in scope to specific assemblies, only importing extensions explicitly, giving library authors the ability to customize their own syntax without imposing a mess on consumers of the library.

Another idea would be to allow the final Action delegate parameter of a function to be expressed as a curly-brace-delimited code block following the function call, in lieu of specifying the parameter within parentheses, and removing the need for a semicolon. For example, with a method defined like this:

As Dr. T points out to me, however, the tricky part will consist of supporting local returns: in other words, when you call return inside that delegate’s code block, you really expect it to return from the enclosing method, not the one defined by the delegate parameter. Support for continue or break would also make for a more intuitive fit. If there’s one thing Microsoft does right, it’s language design, and I have a lot of confidence that issues like this will continue to be recognized and ultimately implemented correctly. In reading their blogs and occasionally sharing ideas with them, it’s obvious they’re as passionate about the language and syntax as I am.

The key for language extensions, I believe, will be to provide more structured extensibility points for syntax (such as control flow blocks), instead of opening up the entire language for arbitrary modification. As each language opens up some new aspect of its syntax for extension, a number of challenges will surface that will need to be dealt with, and it will be critical to solve these problems before continuing on with further evolution of the language. Think of all we’ve gained from generics, and the challenges of dealing with a more complex type system we’ve incurred as a result. We’re still getting updates in C# 4.0 to address shortcomings of generics, such as issues regarding covariance and contravariance. Ultimately, though, generics were well worth it, and I believe the same will be said of metaprogramming and language extensions.

Looking Forward

I’ll have much more to say on this topic when I talk about Oslo and MGrammar. The important points to take away from this are that mainstream language designers are taking these ideas to heart now, and there are so many ideas and options out there that we can and will experiment to find the right combination (or combinations) of both techniques and limitations to make metaprogramming and language syntax extensions useful, viable, and sustainable.