A blog about one man's journey through code… and some pictures of the Peak District

Category Archives: Message Queue

In the beginning

When I first started programming (on the Spectrum ZX81), you would typically write a program such as this (apologies if it’s syntactically incorrect, but I don’t have a Speccy interpreter to hand):

10 PRINT "Hello World"
20 GOTO 10

You could type that in to a computer at Tandys and chuckle as the shop assistants tried to work out how to stop the program (sometimes, you might not even use “Hello World”, but something more profane).

However, no matter how long it took them to work out how to stop the program, they only paid for the electricity the computer used while it was on. Further, there will only ever be a finite and, presumably (I never tried), predictable number of “Hello World” messages produced in an hour.

The N-Tier Revolution

Fast forward a few years, and everyone’s talking about N-Tier computing. We’re writing programs that run on servers. Some of those servers are big and expensive, but pretty much the same statement is true. No matter how big and complex your program that you run on the server, it’s your server (well, actually, it probably belongs to a company that you work for in some capacity). For example, if you have a poorly written SQL Server Procedure that scans an entire table, the same two statements are still true: no matter how long it takes to run, the price for execution is consistent, and the amount of time it takes to run is predicable (although, you may decide that if it’s slow, speeding it up might be a better use of your time than calculating exactly how slow).

Using Other People’s Computers

And now we come to cloud computing… or do we? The chronology is a little off on this, and that’s mainly because everyone keeps forgetting what cloud computing actually is. You’re renting time on somebody else’s computer. If I was twenty years older, I might have started this post by saying that “this was what was done back in the 70’s and 80’s in a manner of speaking. But I’m not, so we’ll jump back to the mid to late 90’s: and SETI. Anyone who had a computer back in those days of dial-up connections and 14.4K modems will remember that SETI (search for extra-terrestrial intelligence) were distributing a program for everyone to run on their computer instead of toaster screensavers*.

Wait – isn’t that what cloud computing is?

Yes – but SETI did it in reverse. You would dial-up, download a chunk of data and your machine would process it as a screensaver.

Everyone was happy to do that for free: because we all want to find aliens. But what if there had been a cost associated to each byte of data processed; clearly something similar was in the mind of Amazon when they started with this idea.

Back to the Paradigm Shift

So, the title, and gist of this post is that the two statements that have been consistent right through from programs written on the ZX Spectrum to programs written in Turbo C and Pascal, to programs written in C# running on a dedicated server, has now changed. So, here are the four big changes brought about by the cloud **:

1. Development and Debugging

If you write a program, you can no-longer be sure of the cost, or the execution time. This isn’t a scare post: both are broadly predictable; but the paradigm has now changed. If you’re in the middle of writing an Azure function, or a GCP BigQuery query, and it doesn’t work, you can’t just close your laptop and go for dinner while you have a think, because while you do, nodes are lighting up all over the world trying to complete your task. The lights are dimming in Seattle while your Azure function crashes again and again.

2. Scale and Accessibility

The second big change is the way that your code is scaled. For example, you might be used to parallelising slow code so that you can make use of all available threads; however, in our new world, if you do that, you may actually be making it harder for your cloud platform of choice to scale your code.

Because you pay per minute of computing time (typically this is more expensive than storage), code that is unnecessarily slow or inefficient may not cause your system to slow down – what it will probably do is to cost you more money to run it at the same speed.

In some cases, you might find it’s more efficient to do the opposite of what you have thus-far believed to be accepted industry best practice. For example, it may prove more cost efficient to de-normalise data that you need to access.

3. Logging

This isn’t exactly new: you should always have been logging in your programs – it’s just common sense. However, there’s a new emphasis here: you’re not running this on your own server (you’re not even running it on a customer’s server) – it’s somebody else’s. That means that if it crashes, there’s a limited amount of investigation that you can do. As a result, you need to log profusely.

4. Microservices and Message Busses

IMHO, there are two reason why the Microservice architecture has become so popular in the cloud world: it’s easy – that is, spinning up a new Azure function endpoint takes minutes.

Secondly, it’s more scaleable. Microservices make your code more scaleable because it’s easy for the cloud provider to instantiate two instances of your program, and then three, and then a million. If your program does one small thing, then only that small thing needs to be instantiated. If your program does twenty different things, it can still scale, but it’ll cost more, because it will require more processing power.

Finally, instead of simply calling the service that you need, there is now the option to place a message on a queue; apart from separating your program into definable responsibility sectors, this means that, when your cloud provider of choice scales your service out, all the instances can pick up a message and deal with it.

Summary

So, what’s the conclusion: is cloud computing good, or bad? In five years time, the idea that someone in your organisation will know the spec of the machine your business critical application is running on seems a little far-fetched. The days of having a trusted server that has a load of hugely important stuff on, but nobody really knows what, and has been running since 1973 are numbered.

Obviously, there’s a price to pay for everything. In the case of the cloud, it’s complexity – not of the code, and not really of the combined system, but typically you are introducing dozens of moving parts to a system. If you decide to segregate your database, you might find that you have several databases, tens, or even hundreds of independent processes and endpoints, you could even spread that across multiple cloud providers. It’s not that hard to lose track of where these pieces are living and what they are doing.

In this post, I discussed creating an Azure service bus that sends an e-mail as an action once a message has expired; and in this post, I covered Azure functions and setting a basic one up.

These two pieces of functionality seem to be crying out to be together. After all, if your functionality to send an e-mail is in the cloud, you don’t have to worry about your server being down (which, if your message has expired, is a real possibility).

Create the Azure Function

The first thing to do is to create the Azure function to send an e-mail. Remember that we’ll be hooking into the service bus, and so we’ll create the function a little differently.

The first few steps are the same, though:

The new function is here:

We’ll create a custom function again:

Although this looks familiar from the last post, the next part does differ slightly. This time, we’ll set up a Service Bus Trigger:

This requires the connection string to your service bus…

As you can see above, the service bus connection is blank, and there are no possible entries… onto App Settings:

App Settings

On the App Settings tab, you can configure settings that pertain to your Azure Function App. Select “Manage App Settings”. Here we can set-up a connection string:

Now, we should be able to see that from the Function:

Does it work?

What does this function do out of the box?

Well, having populated the queue with 50 messages that time out after 30 seconds, the function kicked in and started logging that it was picking up messages after 30 seconds – so that’s a promising sign!

The messages are processed and removed from the dead letter queue. This process happens so quickly, it’s easy (as I did) to interpret this as a bug (i.e. messages are not being dead-lettered). However, as we can see from the function logs – they are.

This did, however, leave me with a concern that the messages were being disposed of before they had been successfully processed. To check this, I changed the function slightly:

So, it crashes correctly:

And here, safe and sound, are 50 freshly dead-lettered messages:

Function Code

Now we have a function, we need to make it send an e-mail… so we’ll need some code. Let’s start with what we created here.

Debugging Azure

A quick side note on debugging Azure. There are a number of resources with details of how this should work on the web, and I’ll probably have a later post of my own experiences, but it’s a pretty flaky experience, and I ended up using trial and error to determine the issue.

So, the problem was just that I was referencing an unknown variable (messageText). I’m unsure exactly why I needed to travel to the mountains of Mordor to determine this – a simple error message in the online text would have sufficed.

The other issue that I faced was a security challenge; however, once I’d persuaded Azure that this really was me, everything sprung into life:

Credit Considerations

Unlike in previous posts where I’ve identified the Azure cost to be negligible, functions are the fastest way to use up credit I have found so far. Especially functions such as I’ve created here. I left the (non-working) function above active, but failing all night, and it used up over £40 worth of credit, continually trying, and failing, to process the dead-letter queue… I think the lights might even have dimmed in Redmond for a split second! The moral of the story is is: be careful when you’re debugging this – you can’t just leave at the end of the night with a function that doesn’t work, but is still active.

Summary

This concept is extremely compelling. I can have a service bus queue that is processed and monitored by an Azure function. If aliens land and steal the entire office, all the servers, dev PCs and programmers, this function will continue to run. There is obviously a mindset shift here, and it doesn’t make sense to move everything into this kind of model, but consider the possibilities; imagine a system that books holidays: it processes the customer request and adds it to a queue; the aeroplane booking system picks that from the queue and books the ticket on the plane, the car hire system takes the message to book a car, once they’re all complete they add respective messages to say so (but remain agnostic of each other), finally, if any one part of the system fails, an Azure function could sit there monitoring and cancel the whole lot. I’ve never worked in this kind of industry, so there’s a lot that I’ve probably not considered, but the essence is that you can have active functionality on (even catastrophic) failure – which is a brand new concept.

A message queue has, in its architecture, two main points of failure; the first is the situation where a message is added to a queue, but never read (or at least not read within a specified period of time); this is called a Dead Letter, and it is the subject of this post. The second is where the message is corrupt, or it breaks the reading logic in some way; that is known as a Poison Message.

There are a number of reasons that a message might not get read in the specified time: the service reading and processing the messages might not be keeping up with the supply, it might have crashed, the network connection might have failed.

One possible thing to do at this stage, is to have a process that automatically notifies someone that a message has ended up in the dead letter queue.

What happens to the message by default?

I’ve added a message to the queue using the default timeout of 5 minutes; here it is happily sitting in the queue:

Looking at the properties of the queue, we can determine that the “TimeToLive” is, indeed, 5 minutes:

In addition, you can see that, by default, the flag telling Service Bus to move the message to a dead letter queue is not checked. This means that the message will not be moved to the dead letter queue.

5 Minutes later:

Nothing has happened to this queue, except time passing. The message has now been discarded. It seems an odd behaviour; however, as with ReadAndDelete Locks there may be reasons that this behaviour is required.

Step Two – Dead Letters

If you want to actually do something with the expired message, the key is a concept called “Dead Lettering”. The following code will direct the Service Bus to put the offending message into the “Dead Letter Queue”:

Step Three – Doing something with this…

Okay – so the message hasn’t been processed, and it’s now sat in a queue specially designed for that kind of thing, so what can we do with it? One possible thing is to create a piece of software that monitors this queue. This is an adaptation of the code that I originally created here:

If this was all that we had to monitor the queue, then somebody’s job would need to be to watch this application. That may make sense, depending on the nature of the business; however, we could simply notify the person in question that there’s a problem. Now, if only the internet had a concept of an offline messaging facility that works something akin to the postal service, only faster…

In this post I discussed receiving, processing and acknowledging a message using the Azure Service Bus. There are two ways to acknowledge a message received from the queue (which are common to all message broker systems that I’ve used so far). That is, you either take the message, process it, and then go back to the broker to tell it you’re done (explicit acknowledgement); or, you remove the queue and then process it (implicit acknowledgement).

Explicit Acknowledgement / PeekLock

If the message is not processed within a period of time, then it will be unlocked and returned to the queue to be picked up by the next client request.

The code for this is as follows (it is also the default behaviour in Azure Service Bus):

Implicit Acknowledgement / ReadAndDelete

Here, if the message is not processed within a period of time, or fails to process, then it is likely lost. So, why would you ever use this method of acknowledgement? Well, speed is the main reason; because you don’t need to go back to the server, you potentially increase the whole transaction speed; furthermore, there is clearly work involved for the broker in maintaining the state of a message on the queue, expiring the message lock, etc.

In this post. I documented how to create a new application using Azure Service Bus and add a message to the queue. In this post, I’ll cover how to read that post from the queue, and how to deal with acknowledging the receipt.

The Code

The code uses a lot of hard coded strings and static methods, and this is because it makes it easier to see exactly what it happening and when. This is not intended as an example of production code, more as a cut-and-paste repository.

Reading a Message

Most of the code that we’ve written can simply be re-hashed for the receipt. First, initialise the queue as before:

Then run this to clear the queue. By default, client.Receive has a default timeout, so it will pause for a few seconds before returning if there are no messages. This timeout is a very useful feature. Most of this post was written on a train with a flaky internet connection, and this mechanism provided a resilient way to allow communications to continue when the connection was available.

I’ve added some times to this, too. It’s processing around 10 / second – which is not astoundingly quick. It’s worth mentioning again that this post was written largely on a train, but still, 10 messages per second means that 10K messages will take around 15 mins. It is faster when you have a reliable (non-mobile) internet connection, but still. Anyway, back to cost. 10K messages still showed up as a zero cost.

But, Azure is a paid service, so this has to start costing money. This time, I’m adding 1000 character string to send as a message, and sending that 100,000 times.

After this, the balance was the same; however, the following day, it dropped slightly to £36.94. So, as far as I can tell, the balance is updated based on some kind of job that runs each day (which means that the balance might not be updated in real-time).

Conceptually, message queues work in a similar fashion: you send a message to an exchange, the exchange allows people to read the message, and you have functionality that ensures the original message arrives with the destined recipient. Acknowledgement of a message is basically a way to ensure that delivery. Having said that, the two main message brokers that I’ve been investigating deal with this in a slightly different manner (albeit, the same things happen in the end).

If you want to follow through, it might be an idea to use the code from my first post as a starting point.

BasicConsume() has a parameter called “noAck”. It took me a while to work this out, but noAck means that it doesn’t expect an acknowledgement; that is, it will automatically acknowledge receipt. So, noAck = True mean automatically acknowledge, and noAck = False means manually acknowledge. That not entirely uncomplicated.

I’ve left the error and the commented out BasicAck in on purpose. If you run this, unlike with ActiveMQ, where you will get a message at a time, you will get all messages in the queue (because it’s event based). Add in the BasicAck() to acknowledge the queue and you’re good to go.

If you add in the error at this stage, you can see that, in exactly the same way as ActiveMQ, it will only acknowledge the correct message. What you will also see here is we have the poison message scenario that I discussed in this post on ActiveMQ.

IMHO, this is where RabbitMQ beats ActiveMQ hands down. The following code is the simplest version of dealing with a poison message:

So, we have a problem, and we issue a Nack. What the Nack does is allow you to re-queue the message. The code above allows the queue to process, but just moves all the bad ones to the back. The obvious problem here is that it the problem isn’t transient, we’ll keep coming back to them. It does, however, get around the problem – the queue is no longer blocked.

The solution, as it was with ActiveMQ, is to put them in a “dead letter queue”; however, unlike ActiveMQ, this is remarkably easy. Firstly, let’s refactor our queue creation a little:

You’ll notice that we go to a new helper method called: “CreateDeadLetterQueue()” and return a dictionary; which is, in turn, passed through to our new queue. The CreateDeadLetterQueue() function looks like this:

There’s effectively two steps. Firstly we need a dead letter exchange, and this needs a routing key (in this case, “dead-letter”). Next, we declare the DeadLetterQueue with the same routing key. Finally, return the argument list, which allows the linking of the main queue to the dead letter queue.

Now we are going to change the receive code so that it doesn’t re-queue:

There are a number of considerations here; firstly, what if the message that you read errors – we want to retry; but secondly, what happens is the message repeatedly errors (this type of message is known as a poison message).

As you can see, there is now an issue in the code; for some reason, it is repeatedly throwing an error entitled “Test”. I can’t work out why (maybe I’ll post a question on StackOverflow later), but when I run that, despite crashing, the message is read, and the queue is now empty.

Obviously, this is an issue: if that message was “DebitBankAccountWith200000” then someone is going to wish that the person that wrote this code hadn’t automatically acknowledged it.

Firstly, how do we stop the auto acknowledge?

There are basically two alternatives to auto acknowledge (there are more, but we’ll only look at two here): client acknowledge, and transactional acknowledgement. I’ll leave transactional acknowledgement for another day.

Client Acknowledge

This method is basically the manual version. You’re telling ActiveMQ that you will, or will not acknowledge the message yourself. Now, let’s alter the receive code slightly:

As you can see, I’ve changed two main parts here; the first is that I’ve changed that AcknowledgmentMode to ClientAcknowledge and I’ve added a call to the acknowledge method on the message.

Now let’s re-run the send and receive and see what happens to the queue.

Unfortunately, I still haven’t worked out why it’s crashing, but here’s the queue; still safely with the message:

We had an error, it crashed, but because it was never acknowledged, it’s still safe and sound in the queue. When we run the receive again, hopefully the bug will have magically disappeared and the message will successfully process.

Poison Messages

The concept of a poison message is where the issue with the message, resides in the message; the situation described above is not a poison message because the message is fine; but code is erroring. Once the code above is fixed, the message can be processed; however, let’s have a look at a different error scenario; here’s some new receive code:

This time, we have some code that actually processes the message and, based on the contents, does something; in this case, it throws an error where the message doesn’t start with ‘t’. So, the rules are simple; messages start with ‘t’. Let’s run the send code again and try some messages:

Following this year’s trip to DDD North I got speaking to someone about Rabbit MQ. I’d previously done some research on Active MQ, however, given that there are, realistically, only two options in this market, I thought I should have a look at the competition.

Why the Thread.Sleep? The reason is that, because the message consumer is event based, you need to stop tidying up the objects until you’re done. Obviously, sleeping for 1 second has its issues. I suspect this could be better achieved using a TaskCompletionSource, of even just a Console.ReadLine(), but for the purposes of this post, it suffices.

Topics

The concept of topics seems to be the same as ActiveMQ; however, the implementation is different, and there were a few dead-ends to go down first.

Notice the routing key; this (broadly speaking) needs to match the sender’s routing key. The purpose seems to be to allow a complex mapping between publisher and subscriber.

Conclusion

In comparison to ActiveMQ, RabbitMQ seems to be a more actively maintained code base; a quick comparison of the GitHub repo of each reveals more activity in Rabbit. Rabbit also seems a little cleaner in the way it’s run and installed; however, it also seems to have more moving parts.

I’ve recently been experimenting with message queues. I’ve used MSMQ in the past, but never with any complexity, and so I thought I’d spend some time investigating ActiveMQ. There are a number of articles and courses out there, but for some reason, C# seems to be the poor relation. So, here’s a C# programmer’s guide to installing and running ActiveMQ.

The examples that they supply do work out of the box, and they are not a bad place to start, if you don’t want to read the rest of this post.

Queues versus Topics

It is important for reasons that will become apparent shortly, to understand how and why these two concepts differ before writing any code. Let’s start with Topics. These are effectively a way to communicate between two end points; the important thing here is that there must be both for it to work. When you publish a topic message, it is published to any “listeners”. If your app wasn’t listening then that’s hard luck. The use cases here are situations whereby a message might be time sensitive; for example, a stock price had just changed or a server needs the client to refresh because there is more data. There are such things as durable topics, but for now, let’s leave topics as described here.

Queues on the other hand have a persistent nature. Once you add a message to the queue, it will remain there until it is handled. Use cases for this might include a notification to send an e-mail, a chat program, or a request to place a sales order. The queue will be read on a first in, first out basis, and so you can load balance a queue: that is, you can have n listeners, and they will all process the messages in order from the queue. If you were to do this with the topic, they would all receive the same message at the same time.

Publish and Subscribe to a Topic

Start off by creating a new console application: you might want to call it something like SendMessage, or Blancmange. Then, add the ActiveMQ NuGet package.

The eagle eyed amongst you might notice that the code is almost identical; you need the same NuGet package for both the publisher and subscriber.

Topics Caveat

Okay, there’s a hugely important caveat here, which a smarter man than me would have instantly realised: if you run the subscriber now, nothing will happen. This is because the topic messages are only sent to active subscribers. In order for the above code to work, the subscriber needs to be running when the messages are sent.

So. Providing that you have an active subscriber when you publish your message, the above code will send whatever you type into the console to the subscriber.

Queues

So, the code for using queues looks very similar, but is conceptually different. Here’s the SendMessage code: