Today we botched deployment for production. We discovered an unforeseen complication related to the production network topography which means that (to simplify) transactions wouldn’t work.

I am happy about this. Not because of the failed production deployment, but because of how it happened. We worked on deploying the app for half an hour, found the problem, then spent the next few hours trying to figure out the exact cause and options to fix it. We actually came up with four ways of resolving the issue, from the really hackish to what I believe that would be appropriate, but would take some time to do using the proper process for production changes.

So, why am I happy?

We spent time on that from noon to 3 PM (look Ma, no midnight deployment), and we actually aren’t going live any time soon. The reason for deploying for production was to practice, to see what strange new things would show up in production. They did show up, we will fix them, at our leisure, without trying to solve a hairy problem while staring at the screen with blood shot eyes and caffeine positioning. At no time there was someone breathing down our neck, or the IT guys to argue with, or emergency change request or any of the usual crap that usually comes with deploying to production if you aren’t being careful.

Practice the way you play, and since pushing to production is usually such a mess anyway, you want to practice this as soon as possible. And not just deploying to another environment (we already did that last week) to test your deployment process. You want to deploy to the actual production environment, so you can get permissions, firewalls, subnets and all the rest sorted out up front.

Another example, which is naturally asynchronous, is the way most web sites create a new user. This is done in a two stage process. First, the user fills in their details and initial validation is done on the user information (most often, that we have no duplicate user names).

Then, we send an email to the user and ask them to validate their email address. It is only when they validate their account that we will create a real user and let them login into the system. If a certain period of time elapsed (24 hours to a few days, usually), we need to delete any action that we perform and make that user name available again.

When we want to solve the problem with messaging, we run into an interesting problem. The process of creating the user is a multi message process, in which we have to maintain the current state. Not only that, but we also need to deal with the timing issues build into this problem.

It gets a bit more interesting when you consider the cohesiveness of the problem. Let us consider a typical implementation.

First, we have the issue of the CreateUser page:

Then we have the process of actually validating the user:

And, lest us forget, we have a scheduled job to remove expired user account reservations:

We have the logic for this single feature scattered across three different places, which execute in different times, and likely reside in different projects.

Not only that, but if we want to make the experience pleasant for the user, we have a lot to deal with. Sending an email is slow. You don’t want to put this as a part of synchronous process, if only because of the latency it will add to showing a response to the user. It is also an unreliable process. And we haven’t even started to discuss error handling yet.

For that matter, sending an email is not something that you should just new an SmtpClient for. You have to make sure that someone doesn’t use your CreateUser page to bomb someone else’s mailbox, you need to keep track of emails for regulatory reasons, you need to detect invalid emails (from SMTP response), etc.

Let us see how we can do this with async messaging, first we will tackle the register user and send an email to validate their email:

When the user click on the link in their email, we have the following set of interactions:

And, of course, we need to expire the reserved username:

In the diagrams, everything that happens in the App Server is happening inside the context of a single saga. This is a single class that manages all the logic relating to the creation of a new user. That is good in itself, but I gain a few other things from this approach.

Robustness from errors and fast response times are one thing, of course, but there are other things that I am not touching here. In the previous example, I have shown a very simplistic approach to handling the behavior, where everything is happening inline. This is done because, frankly, I didn’t have the time to sit and draw all the pretty boxes for the sync example.

In the real world, we would want to have pretty much the same separation in the sync example as well. And now we are running into even more “interesting” problems. Especially if we started out with everything running locally. The sync model make it really hard to ignore the fallacies of distributed computing. The async model put them in your face, and make you deal with them explicitly.

The level of complexity that you have to deal with with async messaging remains more or less constant when you try to bring the issues of scalability, fault tolerance and distribution. They most certainly do not stay the same when you have sync model.

Another advantage of this approach is that we are using the actor model, which make it very explicit who is doing what and why, and allow us to work on all of those in an independent fashion.

The end result is a system compromised of easily disassembled parts. It is easy to understand what is going on because the interactions between the parts of the system are explicit, understood and not easily bypassed.

Steve from iMeta has been doing a lot of work on the HQL AST Parser. For a long time, that has been a really troublesome pain point for us, since this is a prerequisite for a lot of other features. It is also one of two parts of NHibernate that really need significant refactoring because the way they are built right now make it hard to do stuff (the second being the semantic model for the mapping, which the Fluent NHibernate guys are working on).

Just to give you two features that should make you drool which depends on this work:

This is something that several members of the NHibernate project has tried doing in the past, but the scope of the work is very big, and require full time work for an extended period of time. iMeta has been sponsoring Steve’s work on NHibernate, which make this possible.

I have been going through the code, and I am literally jumping up and down in excitement. It is still very early, but it is already clear that Steve has taken us much farther than before, and it is possible to see the end. The most significant milestone has been reached, and we are currently able to execute ( a very simple ) query and get the results back:

If you are going to book before Feb 28th, just put SM1368-622459-33L (in the Promo Code field) and pay just £350 (normal price £1000). I hear that the tickets are going fast, so if you would like to secure a place and claim your discount – do hurry up!

As a bonus, you can stay for the next week and enroll in the NHibernate course that I am also giving.

The last event where we did something similar was Kazien Conf, and we are going to try to reproduce the same thing in the Seattle ALT.Net conference in a week as well.

I can’t talk about what will happen next week, but the previous workshops were a tremendous success. We are going to run 12 sessions during this event, and I am finding myself in the problematic position of wanting to be in at least three places at the same time, while I can only manage two (maybe I need to let Barid out as well*).

A while ago in the NHibernate mailing list we got a report that NHibernate is making use of a dictionary with an enum as the key, and that is causing a big performance issue.

The response was almost unanimous, I think, “what?! how can that be?!!?”. Several people went in to and tried to figure out what is going on there. The answer is totally non oblivious, Dictionary<K,V> force boxing for any value type that is used as the key.

That sound completely contradictory to what you would expect, after all, one of the major points in generics was the elimination of boxing, so what happened?

Well, the issue is that Dictionary<K,V> has to compare the keys, and for that, it must make some assumptions about the actual key. It is abstracted into EqualityComparer, and that is where the actual problem starts. EqualityComparer has some special cases for the common types (anything that is IEquatable<T>, which most of the usual suspects implements), to speed this up.

The problem is that the fall back is to an ObjectComparer, and that, of course, will box any value type.

And enum does not implements IEquatable<T>…

Omer has a good coverage on the subject, with really impressive results. Take a look at his results.

I am not going to steal his thunder, but I suggest going over and reading the code, it is very elegant.

I needed to get a list of subqueues of a queue, and there is no way of actually doing that in managed code. I decided that I can still use P/Invoke and just make the appropriate unmanaged calls. In hindsight, it would have been significantly simpler to just build an unmanaged DLL in C++/CLR, but I was already too deep, and I can never remember how to properly compile a C++ app anymore.

I finally had to slap myself a couple of times and remind me that I was a C++ developer for a long time, and I damn well should remember how to treat memory like it was something both sacred and abused. I had tested that on a Windows 2008 64 bits machine. During this project, I also found out that there is basically no information at all about this sort of thing, so I am putting the code here.

I had to do this yesterday, and it was fun enough to post about it. Not only that, but the solution that I came up with is enough of a twist that I think would make this hard for many people.

Here is the challenge:

Given a zip file, produce an executable that when run on another machine, will unzip the content of the zip file on that machine. (nitpicker corner: yes, there are many ways to cheat this, but don’t try to cheat). You may use any library and process that you want, but the end result must be a standalone file that can be copied to an separate machine and when execute will extract the content of the zip file.

You can assume that on that separate machine you will have the .Net framework installed, but you cannot depend on the presence of external libraries. I’ll note that the default compression streams in the .Net framework are not applicable here, since they can’t handle zip files (which contains a directory structure), they can only handle zip streams.

I took me roughly 30 minutes, most of which were spent dealing with ICSharpCode.SharpZipLib API, which is not something that I dealt with before.