While testing Rhino Service Bus, I run into several pretty annoying issues. The most consistent one is that the actual work done by the bus is done on another thread, so we have to have some synchronization mechanisms build into the bus just so we would be able to get consistent tests.

In some tests, this is not really needed, because I can utilize the existing synchronization primitives in the platform. Here is a good example of that:

Here, the synchronization is happening in line 8, Peek() will wait until a message arrive in the queue, so we don’t need to manage that ourselves.

This is not always possible, however, and this actually breaks down for more complex cases. For example, let us inspect this test:

1: [Fact]

2:publicvoid Can_ReRoute_messages()

3: {

4:using (var bus = container.Resolve<IStartableServiceBus>())

5: {

6: bus.Start();

7: var endpointRouter = container.Resolve<IEndpointRouter>();

8: var original = new Uri("msmq://foo/original");

9:

10: var routedEndpoint = endpointRouter.GetRoutedEndpoint(original);

11: Assert.Equal(original, routedEndpoint.Uri);

12:

13: var wait = new ManualResetEvent(false);

14: bus.ReroutedEndpoint += x => wait.Set();

15:

16: var newEndPoint = new Uri("msmq://new/endpoint");

17: bus.Send(bus.Endpoint,

18:new Reroute

19: {

20: OriginalEndPoint = original,

21: NewEndPoint = newEndPoint

22: });

23:

24: wait.WaitOne();

25: routedEndpoint = endpointRouter.GetRoutedEndpoint(original);

26: Assert.Equal(newEndPoint, routedEndpoint.Uri);

27: }

28: }

Notice that we are making explicit synchronization in the tests, line 14 and line 24. ReroutedEndpoint is an event that we added for the express purpose of allowing us to write this test.

I remember several years ago the big debates on whatever it is okay to change your code to make it more testable. I haven’t heard this issue raised in a while, I guess that the argument was decided.

As a side note, in order to get rerouting to work, we had to change the way that Rhino Service Bus viewed endpoints. That was a very invasive change, and we did it in less than two hours, but simply making the change and fixing the tests where they broke.

For NH Prof, I need to have some licensing solution so people would be reminded that after 30 days of using the trail, they should pay. Initially, I bought a licensing component. That didn’t work out, and I now find myself in the position of having to writing the licensing infrastructure for NH Prof.

Any second that I put into the licensing infrastructure is a second that I can’t put into actually making the product itself useful. More than that, in order to produce a good licensing story, you need to invest a lot of time writing some tricky code so hackers would have harder time breaking this.

I got some advice in the matter from friends, which I am very grateful for, if not for the fact that this is so depressing.

Now, just to make things more complicated. Licensing is actually a big topic. I got requests from users regarding the licensing. Those range from being able to use a license on several machines, support floating licenses and removing a license from a machine.

Yesterday I added a bug to Rhino Service Bus. It was a nasty one, and a slippery one. It relates to threading nastiness with MSMQ, and the details aren’t really interesting.

What is interesting is how I fixed it. You can see the commit log here:

At some point, it became quite clear that trying to fix the bug isn’t going to work. I reset the repository back to before I introduced that bug (gotta love source control!) and started reintroducing my change in a very controlled manner.

I am back with the same functionality that I had before I reset the code base, and with no bug. Total time to do this was about 40 minutes or so. I spent quite a bit longer than that just trying to fix up that.

I explicitly don’t want to go over the exact scenario that this is relating to. I want to talk about a general sentiment that I got from several people from Microsoft a few times, which I find annoying.

It can be summed up pretty easily by this quote:

You all know that we work on the Agile process here, right? We get something out (perhaps a little early) and then improve it. Codeplex is for open source and continuous improvement with community feedback.

The context is a response to a critique about unacceptable level of quality in something Microsoft put out. Again, I do not want to discuss the specifics. I want to discuss the sentiment, I got answers in a similar spirit from several Microsoft people recently, and I find it annoying in the extreme.

Agile doesn’t mean that you start with crap, call it organic fertilizer and try to tell me that it will improve in the future. Quality is supposed to be built in, it is the scope that you grow incrementally, not the product quality.

I actually find the open source comment to be even more annoying. Open source does not mean that you get someone else to do your dirty work. And if you take something and call it open source, it doesn’t mean that you are not going to get called on the carpet for the quality of whatever you released.

Calling it open source does not mean that the community is accountable for its quality.

Those are not actually new features, if you want to be strict about it. There is a whole bunch of things in NH Prof that already exists, but are only now starting to have an exposed UI.

I believe that NH Prof’s ability to analyze and detect problems in your NHibernate’s usage is one of the most valuable features that it have. Heavens know that I spent enough time on that thing to make it understand everything that I learned about NHibernate in 5 years of usage (how did it get to be that long)?

The problem is that NH Prof is not self aware yet, and assumptions about what is good practice or not cannot be made in vacuum, they must be made in context, which NH Prof lacks. As such, it is possible that you’ll find yourself inundated with alerts that aren’t valid for your scenario.

A typical example would be that for your project, which uses MySql, you cannot use NHibernate’s batching (which isn’t supported on MySql). Therefore, the batching alert is not only invalid, it is actually annoying. You can globally disable an alert from the settings dialog:

But that is like whacking flies with a rifle. It will kill all the alerts of that type. What about when you want to ignore a specific alert in a specific circumstance?

NH Prof supports this as well:

The profiler is smart enough to ignore the same alert from the same source in the future.

Of course, we can also remove ignored alerts from the settings dialog as well.

Recently I found myself facing several pretty tough problems. Solving the problem or the generic case would have been hard if not impossible. In all of those cases, I was able to redefine the problem to make the solution trivially simple.

Problem #1 – SQL display in NH Prof

We had an issue with NH Prof regarding scrolling a big list. The problem was that the performance for big lists isn’t that great. WPF has a solution for that, virtual lists, which means that we only get to bind to the visible portion of the list, which significantly improve the system performance. The problem is that when you do this, you lose smooth scrolling, and then you get into a bit of a situation when you have large SQL statements. The UI doesn’t work in a nice way.

I wanted to have both. We figured out a couple of ways to do that, but I kept having this nagging feeling that I am being stupid. Eventually I realized that I had a problem in the problem specification. We do not need to display the large SQL statement in the list. It makes no sense from a UI perspective anyway. I was just coasting along on inertia without thinking, and I run into this issue.

Before:

After:

There is no value in the long statement. We are stripping a lot of information away from the statements anyway, to make it easy to understand what is going on there at a glance. The previous version just put additional burden on the user to try to understand what is going on in the mess. If they want the detailed view, we have that, and it is formatted, nice and easy to read.

Problem #2 – Heterogeneous load balancing

When building Rhino Service Bus load balancing support (what NServiceBus calls distributer / grid), I had run into a major issue of non elegance. I initially had thought that each node in the grid would tell the balancer what messages the node can handle, and on arrival, the balancer will inspect the message and dispatch to the an available end point.

The problem was that I didn’t like this, it required too many moving parts (each node keep telling the balancer which messages it could handle, updating several dispatch lists on each message, etc). It was a complex solution, and I didn’t like where I was heading.

Again, I had a problem with the solution complexity because the problem was stated in a problematic fashion. I didn’t need to support Heterogeneous system, I don’t have one at the moment. I can specify that a load balancer is going to always front a homogenous set of nodes, and reduce the problem to a dequeue & send.

Rethinking about the problem can often tell you that you are trying to solve more than what you should. By reducing the scope of the problem by a degree that is often meaningless to the desired business requirement, we can drastically simplify the solution and the implementation.