Automatically dead-letter expired messages in Azure Service Bus & how it works

TL;DR -Automatically expiring messages in Service Bus is a nice feature but you have to be aware that it does not "automatically" move expired messages to the dead-letter queue. In some scenarios this can cause reliability loss.

Lately my team and I have been using Service Bus for Windows Server in our projects where we are hosting it on our on-premise infrastructure.
Besides the name, everything is actually pretty much the same as Azure Service Bus, except for the fact that you’re in charge of the installation & administration. Next to that, you’re stuck with an “old” version of Azure Service Bus - Currently v2.1 - which has a more limited set of features than the latest and greatest Azure Service Bus, f.e. no auto forwarding from deadletter queue to another entity.

Our scenario was built around the concept of queues and messages that should only live for a certain amount of time, also called time-to-live (TTL).
Once a message has exceeded its time-to-live it needs to expire and be moved to the deadletter queue. This is a common messaging scenario.

Getting started with Message expiration & dead-lettering

When you create a queue in Service Bus, you are actually creating two queues– Your requested queue called ‘YourQueueName’ and a dead-letter queue. Once a message is expired, it will automatically be moved to this dead-letter queue if you enable this.

Enabling expiration of messages on your queue

Enabling this functionality on your queue is very easy - You create a QueueDescription based on your queue name and you enable EnableDeadLetteringOnMessageExpiration, that's it!

But hold on, it currently uses the default Time-to-Live for each message in the queue which is 10675199 days and thus almost infinite. You can specify this time windows yourself by changing the DefaultMessageTimeToLive-property.

Queue Time-to-Live vs Message Time-to-Live

As we've just seen we can specify the Time-to-Live on a queue and on a message, but you can also combine them. This allows you to define a default on the queue-level and exceptionally assign one to a message.

Bear in mind that the shortest Time-to-Live windows will be applied, whether it is your queue's default or the message's Time-to-live!

Here is a small example - Imagine you're in a scenario where you send different types of messages to one queue but certain messages need to expire sooner than the other.

Processing expired messages from the dead-letter queue

As mentioned before all your expired messages will be moved to the dead-letter queue of your queue and sit there until you process them.

Processing them is the same as processing messages from your normal queues except that the name, also called entity path, is different and applies the following pattern -

QueueName$DeadLetterQueue

If you are using the Azure SDK, you can even generate the name for your dead-letter queue with one line of code -

string dlqEntityName = QueueClient.FormatDeadLetterPath(queueName);

Note - It is important to know that there could also be other messages in the DLQ f.e. poison messages but this is out of scope of this article.

How does Service Bus expiration work?

Now that we've seen how we can enable expiration and assign the time your messages should be alive, we can take a look at how it works based on a fictional scenario.

We have a backend system that is sending invoices to our client application over a Service Bus queue. This allows the customer to determine when he wants to receive the next invoice but to make sure that each invoice is payed in the required amount of time we want to expire the invoice and receive it back at the backend.

As I was building this scenario I noticed that Service Bus will not monitor your queue expired messages, instead it waits until you perform a Receive. Then it will move all expired messages at that time to your dead-letter queue. Important to know that a Peek is not sufficiant as messages will not expire!

In our scenario this means that when the customer doesn't request a new invoice for a very long time the messages will not expire and thus not be processed by the backend.

Although this is a big problem in our scenario since we are relying on Service Bus, it also pretty much makes sense as well - This avoids that Service Bus is permanently monitoring our messages which results in a higher load and lower scalability.

Implementing the "Monitoring" pattern

We had to redesign our scenario because of this behavior. Since we couldn't rely on the automatically expiration of messages, we stepped away from TTL and thanks to Clemens Vasters we implemented a pattern I call the "Monitoring" pattern.

Instead of only using one queue we are now using two queues - Our InvoiceQueue and a MonitorQueue.

Every time we send a message to our InvoiceQueue we will send a dummy message to the Monitoring-queue. That dummy message will contain metadata about the original invoice and will also have a ScheduledEnqueueTimeUtc based on the current UTC time with addition of the required TTL timespan. This will enqueue our dummy message but it will not be visible for the receivers until the specified time.

Once our backend system picks up one of the dummy messages from the MonitorQueue it will use the metadata to look if the message still exists on the InvoiceQueue.
If it is still present, it will remove it from the InvoiceQueue and perform the required logic because the customer failed to process it in time. If the message is already gone it will just remove it from the MonitorQueue and move on to the next one.

One small side note is that we used Service Bus sessioning on the messages in the InvoiceQueue, this allows the backend system to retrieve the session Id from the dummy message its metadata and request that specific session on the InvoiceQueue.

Don't believe me? Try it yourself!

I've prepared a sample application that shows you the behavior with Azure Service Bus and Service Bus for Windows Server.

Conclusion

Automatically expiring messages in Service Bus is a nice feature but you have to be aware that it does not "automatically" move expired messages to the dead-letter queue. Once you know a Receive() is required you can build your solution around this and increase your expired messages reliability.