26 Jul 2011

Error handling in Camel for JMS consumer endpoint

As of version 2.0 Camel now uses the DefaultErrorHandler out of the box. It offers a more Java-like error handling in the way that any exceptions that occur while routing the message will be propagated back to the caller while also ending the Exchange immediately. See the Camel error handling documentation for more details.

This is great for most use cases but not the best choice for every scenario. For illustration purposes lets consider the example of using Camel as a JMS bridge (although the following applies to any Camel route starting with a jms consumer endpoint).

I leave out the Spring bean definition of the activemq and webspheremq JMS components for simplicity.This route will forward any messages sent to the ActiveMQ queue "GatewayToWebSphereMQ" to a queue called "FromActiveMQ" in WebSphere MQ.

Now lets see what happens if WebSphere MQ is down for whatever reasons.If a new message is put on the GatewayToWebSphereMQ, it will be picked up by the camel-activemq component. This component uses AUTO_ACKNOWLEDGE mode by default. That means Camel will receive the message from the broker and ack it straight away, before routing the message any further. From the point of view of the broker, the message has been consumed and in fact the message is now entirely in the hands of Camel.

So after acking the message, Camel will now try to route the message, i.e. sending it to WebSphere MQ. But the other end is down, so we won’t get a tcp connection established. Instead some sort of a socket exception will be thrown. This is where the error handling in Camel comes into play now.

From Camel 2.0 onwards the DefaultErrorHandler will be called. The default behavior of this error handler is to propagate the error back to the caller. In this example this isn't possible as the camel-jms component has already consumed and acked the message. So it cannot raise the exception to the broker nor can we put the message back on top of the queue. Instead what happens is the message gets discarded (without being stored anywhere) after logging the error. So the message is basically lost. This is certainly not ideal if you cannot afford to loose messages.

There are a few solutions:

1) The DefaultErrorHandler will by default not try to redeliver the message. You can configure the error handler for a different redelivery policy so that it attempts to redeliver the message a couple of times before giving up in the hope that WebSphere will have restarted within that time frame. However when finally giving up on the retries, the message would still be discarded and lost.

2) If you can't afford to loose messages the better solution is to use a different Camel error handler, i.e. the Dead Letter Channel. This handler will move the message to a configurable dead letter queue if it cannot be routed.Here is a sample configuration for a dead letter channel:

This example configures the dead letter channel with a custom redelivery policy. Camel will now retry every message three times with a 5 seconds delay. If delivery is still unsuccessful thereafter, the msg gets moved to the queue "ActiveMQ.DLQ" in ActiveMQ. If the connectivity problem to WebSphere MQ is only short term, a properly configured redelivery policy may prevent moving any messages to a dead letter queue.You can configure for any other dead letter queue and in fact it does not necessarily have to be a JMS queue, as "seda:errorqueue" will also work.

This configuration of a Camel error handler will never loose any persistent messages! However you will need to think of a strategy what to do with messages ending up on a dead letter queue (e.g. manually re-route them back to the original queue after the connection to WebSphere MQ got restored).

3) A third possible solution would be to use a transacted Camel route. For transacted routes theTransactionErrorHandler is used. The camel-jms endpoint is a transaction capable endpoint, and so do ActiveMQ and WebSphere MQ support transactions. The entire Camel route above could therefore spawn a single transaction. If there are any errors encountered within the transaction (e.g. while trying to send the message to WebSphere MQ), the transaction will be rolled back and the message is moved back to the original queue again (in fact, it never leaves the queue). Now you require an appropriate redelivery policy configuration inside ActiveMQ (e.g. try to redeliver the message up to 5 times before moving the message to a dead letter queue "ActiveMQ.DLQ"). Further as the Camel route involves two different JMS endpoints, you would need to configure Camel for XA transactions, involving an XA transaction monitor. XA transactions also have an impact on performance and might not always be needed. The Camel Transaction Guide on FuseSource.com has some really good chapters on configuring Camel for XA transactions.

The second solution outlined above might be the easiest solution that guarantees messages won’t get lost.

Wouldn't the messages that are acked by Camel be lost (in the second option) if the server goes down or if a sys admin stops camel/activemq process for maintenance purposes, etc? It would be very tedious and error-prone for ops guys to wait and make sure that there are no messages in camel before they do any maintenance work. Can you think of a solution for this problem? Would CLIENT ACKNOWLEDGE mode work in activemq-activemq routing scenarios?

Sorry, did not spot the comments until today. @Vinicius: The cost is the redelivery of your failed msg are few times before moving to DLQ. Given that in such scenario you often cannot afford to loose msgs, the cost is low.

@Ozan: If you take Camel down while it processes msgs, then yes the msg would be lost. To secure against that, use transactions. Btw, its possible to shutdown the Camel Context or an individual Camel route gracefully so that it waits for outstanding requests to be completed before shutting down. The JMX operations shutdown on the Camel context and the Camel route work that way.

@Anonymous: When using transactions, you use the transactional error handler. It will try to redeliver your msgs 5 times before giving up and moving it to a DLQ. The amount of retries is configurable. It does not make sense to retry forever on a msg that potentially might always fail.