My initial naive implementation of IBus.Request set up a new response subscription each time Request was called. Obviously this is inefficient. It would be much nicer if I could identify when Request is called more than once with the same callback and re-use the subscription.

The question I had was: how can I uniquely identify each callback? It turns out that action.Method.GetHashcode() reliably identifies a unique action. I can demonstrate this with the following code:

Here, I’m creating an action cache keyed on the action method’s hashcode. Then I’m calling RunAction a few times with two distinct action delegates. Note that they also close over a variable, i, from the outer scope.

I just had a need for a delay task. A simple method that I can call to create a task that will turn a Func<T> into a Task<T> that will execute after a given delay.

The starting point for any Task creation based on an external asynchronous operation, like a Timer callback, is the TaskCompletionSource class. It provides methods to transition the task it creates to different states. You call SetResult when the operation is completes, SetException if the operation fails, and SetCancelled if you want to cancel the task.

I simply create a new TaskCompletionSource and a Timer where the callback calls SetResult with the result of the given Func<T>. If the Func<T> throws, we simply catch the exception and call SetException. Finally we start the timer and return the Task.

Thursday, July 14, 2011

I’ve started thinking about the best patterns for implementing error handling in EasyNetQ. One of the aims of EasyNetQ is to remove as many infrastructure concerns from the application developer as possible. This means that the API should correctly handle any exceptions that bubble up from the application layer.

One of the core requirements is that we shouldn’t lose messages when the application throws. The question then becomes: where should the message, that the application was consuming when it threw, go? There seem to be three choices:

Put the failed message back on the queue it was consumed from.

Put the failed message on an error queue.

A combination of 1 and 2.

Option 1 has the benefit that it’s the out-of-the-box behaviour of AMQP. In the case of EasyNetQ, I would simply catch any exceptions, log them, and just send a noAck command back to RabbitMQ. Rabbit would put the message at the back of the queue and then resend it when it got to the front.

Another advantage of this technique is that it gives competing consumers the opportunity to process the message. If you have more than one consumer on a queue, Rabbit will send the messages to them in turn, so this is out-of-the-box.

The drawback of this method is that there’s the possibility of the queue filling up with failed messages. The consumer would just be cycling around throwing exceptions and any messages that it might be able to to consume would be slowed down by having to wait their turn amongst a long queue of failed messages.

Another problem is that it’s difficult to manually inspect the messages and selectively delete or retry them.

Option 2 is harder to implement. When an error occurs I would wrap the failed message in a special error message wrapper. This can include details about the type and location of the exception and other information such as stack traces. I would then publish the error message to an error exchange. Each consumer queue should have a matching error exchange. This gives the opportunity to bind generic error queues to all error exchanges, but also to have special case error consumers for particular queues.

I would need to write an error queue consumer to store the messages in a database. I would then need to provide the user with some way to inspect the messages alongside the error that caused them to arrive in the error queue so that they could make a ignore/retry decision.

I could also implement some kind of wait-and-retry function on the error queue, but that would also add additional complexity.

It has the advantage that the original queue remains clear of failing messages. Failed messages and the error condition that caused the failure can be inspected together, and failed messages can be manually ignored or retried.

With the failed messages sitting in a database, it would also be simple to create a mechanism where those messages could be replayed on a developer machine to aid in debugging.

A combination of 1 and 2. I’m moving towards thinking that a combination of 1 & 2 might be the best strategy. When a message fails initially, we simply noAck it and it goes back to the queue. AMQP provides a Redelivered flag, so when the messages is consumed a second time we can be aware that it’s a retry. Unfortunately there doesn’t seem to be a retry count in AMQP, so the best we can do is allow for a single retry. This has the benefit that it gives a competing consumer a chance to process the message.

No retry count is a problem. One option some people use is to roll their own ‘nack’ mechanism. In this case, when an error occurs in the consumer, rather than sending a ‘nack’ to Rabbit and relying on the built-in behaviour, the client ‘acks’ the message to remove it from the queue, and then re-publishes it via the default exchange back to the originating queue. Doing this gives the client access to the message and allows a ‘retry count’ header to be set.

After the single retry we fall back to Option 2. The message is passed to the error queue on the second failure.

I would be very interested in hearing how other people have implemented error handling with AMQP/RabbitMQ.

Wednesday, July 13, 2011

I had an interesting problem with the Managed Extensibility Framework yesterday. I’m using the DirectoryCatalog to load assemblies from a given directory. Pretty standard stuff. When I tested my host on my developer machine, it got the works on my machine badge, but when I ran the host on one of our servers, it ignored all the assemblies.

Nothing loaded …

Hmm …

It turns out, after much digging and help from my Twitter crew, that the assembly loader that MEF’s DirectoryCatalog uses ignores any files that have a URL Zone set. I described these zones in detail in my previous post here:

Because we copy our plugins from a file share, Windows was marking them as belonging to the Intranet Zone. Thus the odd only-when-deployed behaviour.

How you deal with this depends on whether you think that files marked in this way represent a security threat or not. If you do, the best policy is to detect any assemblies in your DirectoryCatalogue directory that have a Zone set and log them. You can do that with the System.Security.Policy.Zone class:

If you don’t consider files copied from elsewhere a security concern, but rather a feature of your operating procedure, then you can clear the Zone flags from all the assemblies in the directory with the help of Richard Deeming’s Trinet.Core.IO.Ntfs library. I wrote a little class using this:

I spent most of yesterday investigating some weird behaviour in MEF, which I’ll discuss in another post. I was saved by Twitter in the guise of @Grumpydev, @jordanterrell and @SQLChap who came to the rescue and led me down a very interesting rabbit hole, to a world of URL Zones and Alternate Data Streams. Thanks chaps!

If you download a file from the internet on Windows 2003 or later, right click, and select properties, you’ll see something like this:

The file is ‘blocked’ which means that you will get various dialogues if you try to say, run an executable with this flag set.

Any file on NTFS can have a ‘Zone’ as the flag is called. The values are described in this enumeration:

If you want to create, view and delate ADSs in .NET you will need to resort to pInvoke, there is no support for them in the BCL. Luckily for us, Richard Deeming, has done the work for us and created a set of classes that wrap the NTFS API. You can read about it here and get the code from GitHub here.

Using Richard’s library, you can list the ADSs for a file and their values like this:

ADSs are very interesting, and open up a whole load of possibilities. Imagine storing application specific metadata in an ADS for example. I’d be very interested to hear if anyone has used them in this way.

Monday, July 11, 2011

RabbitMQ comes with a nice .NET client called, appropriately enough, ‘RabbitMQ DotNet Client’. It does a good job of implementing the AMQP protocol in .NET and comes with excellent documentation, which is good because there are some interesting subtleties in its usage. This is because AMQP is designed with flexibility in mind and supports a mind boggling array of possible messaging patterns. But as with any API, with flexibility comes complexity.

The aim of EasyNetQ, my simple messaging API for RabbitMQ on .NET, is to hide much of this complexity and provide a very simple to use interface. But in order to make it simple I have had to take away much of the flexibility of AMQP and instead provide a strongly opinionated view of one way of using RabbitMQ with .NET.

Today I’m going to discuss how Subscriptions work with the RabbitMQ DotNet Client (RDC) and some of the choices that I’ve made in EasyNetQ.

You create a subscription using the RDC with the AMQP command ‘basic consume’. You pass in the name of the queue you want to consume from.

channel.BasicConsume(ackNackQueue, noAck, consumer);

If you use the default QueueingBasicConsumer, the RabbitMQ server then takes messages from the queue you specified and sends them over the network to the RDC. The RDC has a dedicated worker thread that listens to a TCP socket and pulls the messages off as they arrive and places them on a shared thread-safe queue. The client application, in my case EasyNetQ, pulls messages off the shared queue on its own thread and processes them as required. Once it has processed the message it can acknowledge that it has completed by sending an AMQP ‘basic ack’ command. At that point the RabbitMQ server removes the message from its queue.

Now, what happens if messages are arriving faster than the user application can process them? The shared queue will gradually fill up with messages and eventually the process will run out of memory. That’s a bad thing. To fix this, you can limit the number of messages that RabbitMQ will send to the RDC before they are acknowledged with the Quality of Service prefetchCount setting.

channel.BasicQos(0, prefetchCount, false);

The default value for prefetchCount is zero, which means that there is no limit. If you set prefetchCount to any other positive value, that will be the maximum number of messages that the RDC’s queue will hold at any one time. Setting the prefectchCount to a reasonably high number will allow RabbitMQ to more efficiently stream messages across the network.

What happens if the shared queue is full of messages and my client application crashes? Won’t all the messages be lost? No, because messages are only removed from the RabbitMQ queue when the user application sends the basic ack message. The messages queued in the RDC’s shared queue are not acknowledged and so will not yet have been removed from the RabbitMQ queue.

However, if when you call ‘basic consume’ you pass in true for ‘noAck’ then the messages will be removed from the RabbitMQ queue as they are transmitted across the network. You would use this setting if you’re not worried about loosing some messages, but need them to be transmitted as efficiently as possible.

For EasyNetQ, I’ve made the default settings as follows: 1000 messages for the prefetchCount and noAck to be false. I’m assuming that most users will value reliability over performance. Eventually I hope to provide some dial with setting like ‘high throughput, low reliability’, ‘low throughput, high reliability’, but for now I’m going for reliability.

I’d be very interested to hear from anyone who’s using RabbitMQ with .NET and how they have configured these settings.

Sunday, July 10, 2011

This question came up at the last Brighton ALT.NET Beers. It proved almost impossible to discuss in words without seeing some code, so here’s my attempt to explain closures in C#. Wikipedia says:

In computer science, a closure (also lexical closure, function closure or function value) is a function together with a referencing environment for the nonlocal names (free variables) of that function. Such a function is said to be "closed over" its free variables. The referencing environment binds the nonlocal names to the corresponding variables in scope at the time the closure is created, additionally extending their lifetime to at least as long as the lifetime of the closure itself.

So a closure is a function that ‘captures’ or ‘closes over’ variables that it references from the scope in which it was created. Yes, hard to picture, but actually much easier to understand when you see some code.

Here we first define a variable ‘x’ with a value of 1. We then define an anonymous function delegate (a lambda expression) of type Action. Action takes no parameters and returns no result, but if you look at the definition of ‘action’, you can see that ‘x’ is used. It is ‘captured’ or ‘closed over’ and automatically added to action’s environment.

When we execute action it prints out the expected result. Note that the original ‘x’ can be out of scope by the time we execute action and it will still work.

It’s interesting to look at ‘action’ in the debugger. We can see that the C# compiler has created a Target class for us and populated it with x:

Tuesday, July 05, 2011

I’ve recently been introduced to a code base that illustrates a very common threading anti-pattern. Say you’ve got a batch of data that you need to process, but processing each item takes a significant amount of time. Doing each item sequentially means that the entire batch takes an unacceptably long time. A naive approach to solving this problem is to create a new thread to process each item. Something like this:

The problem with this is that each thread takes significant resources to setup and maintain. If there are hundreds of items in the batch we could find ourselves short of memory.

It’s worth considering why ProcessItem takes so long. Most business applications don’t do processor intensive work. If you’re not protein folding, the reason your process is talking a long time is usually because it’s waiting on IO – communicating with the database or web services somewhere, or reading and writing files. Remember, IO operations aren’t somewhat slower than processor bound ones, they are many many orders of magnitude slower. As Gustavo Duarte says in his excellent post What Your Computer Does While You Wait:

Reading from L1 cache is like grabbing a piece of paper from your desk (3 seconds), L2 cache is picking up a book from a nearby shelf (14 seconds), and main system memory is taking a 4-minute walk down the hall to buy a Twix bar. Keeping with the office analogy, waiting for a hard drive seek is like leaving the building to roam the earth for one year and three months.

You don’t need to keep a thread around while you’re waiting for an IO operation to complete. Windows will look after the IO operation for you, so long as you use the correct API. If you are writing these kinds of batch operations, you should always favour asynchronous IO over spawning threads. Most (but not all unfortunately) IO operations in the Base Class Library (BCL) have asynchronous versions based on the Asynchronous Programming Model (APM). So, for example:

Your main thread doesn’t block when you call BeginMyIoOperation, so you can run hundreds of them in short order. Eventually your IO operations will complete and the callback you defined will be run on a worker thread in the CLR’s thread pool. Profiling your application will show that only a handful of threads are used while your hundreds of IO operations happily run in parallel. Much nicer!

Of course all this will become much easier with the async features of C# 5, but that’s no excuse not to do the right thing today with the APM.

Code Rant

Notepad, thoughts out loud, learning in public, misunderstandings, mistakes. undiluted opinions. I'm Mike Hadlow, an itinerant developer. I live (and try to work in) Brighton on the south coast of England. Please don't mistake me for an expert in anything. I love technology and programming, but make no claims to be any good at it. Much of what you read here may be poorly thought out, wrong, or just plain dangerous.