Availability and scalability of IBM MQ in containers

This is a technical guide to architecting the availability and scalability of IBM MQ servers and client applications in a containerized cloud environment. There is a particular focus on running containers in Kubernetes, and its many commercial offerings, including IBM Cloud Private and the IBM Cloud Container Service. IBM MQ has been supported in Docker containers for several years, but the powerful new features offered by cloud platforms like Kubernetes, bring about new ways of approaching solution architecture.

Important: Running IBM MQ in containers is a rapidly evolving subject. This information is correct as of June 2018.

Availability

Types of availability

It is important to separately consider “message” and “service” availability. With IBM MQ on distributed platforms, a message is stored on exactly one queue manager, so if that queue manager becomes unavailable, you temporarily lose access to the messages it holds.Â To achieve high “message” availability, you need to be able to recover a queue manager as quickly as possible. You can achieve “service” availability by having multiple instances of queues for client applications to use, for example, by using multiple MQ cluster queues with the same name on different queue managers. Therefore, horizontal scaling is useful to improve scalability and service availability.

MQ availability technologies

There are three main ways to make MQ highly available in cloud environments:

Multi-instance queue manager, which is an active-standby pair, using a shared, networked filesystem.

Replicated data queue manager, which replicates data under control of MQ

Single resilient queue manager, which offers a simple approach for HA in the cloud, using networked storage.

Multi-instance queue managers

Multi-instance queue managers are the â€śtraditionalâ€ť way of thinking about high availability with MQ, and involves an “active” and a “standby” system. A multi-instance system consists of two servers where a queue manager could run, and a single shared filesystem. The queue manager’s data is held on the shared filesystem. The queue manager is only active on one server at a time, with the other server waiting in a standby mode. This system has a few key disadvantages:

Itâ€™s sensitive to specific filesystems, because of its reliance on file locking to manage take-over by the standby.

It requires additional resources and MQ license costs for the standby

There are two components to manage

Edit (2018/06/05): Despite the disadvantages, multi-instance queue managers are still a viable choice in containers.

Replicated data queue managers

A replicated data queue manager system consists of three servers where a queue manager could run, and an MQ-managed block storage device which is synchronously replicated to each server. The queue manager is only active on one server at a time, with the other two servers waiting in a standby mode, while receiving replicated data.

It is possible to use this system with containers, but it is generally not recommended, due to the use of Linux kernel modules. In most cases, users running containers will not have sufficient access to the host server to manage kernel modules.

Single resilient queue manager

There are new ways of thinking about high availability (HA) in cloud environments.Â A â€śsingle resilient queue managerâ€ť, is where you have a single instance of a queue manager, and the cloud environment monitors it and replaces the VM or container as necessary.Â A queue manager can be thought of in two parts: the data stored on disk; and the running processes which allow access to the data. Any queue manager can be moved to a different virtual machine, or run in a different container, as long as it keeps the same data, and the same network address. Most cloud environments provide the ability to keep IP addresses (for example, Kubernetes Services), and also offer highly available network-attached storage.

As with multi-instance queue managers, this system relies heavily on the availability of the storage system. MQ can’t be more available than the storage it is using. If you want to tolerate an outage of an entire availability zone, you need to use cloud storage which replicates to another zone. For example, Amazon’s EFS replicates data in this way.

The recovery times for a “single resilient queue manager” can be similar to using a multi-instance queue manager. The “standby” process for a multi-instance queue manager is already running, but it’s not a full queue manager. The time taken to fully start the standby is similar to regular queue manager startup. The main difference with the “single resilient queue manager” system, is that the cloud provider potentially has to download the image to run the queue manager on a new worker node (this could be a virtual machine or container host). This download time is typically short for a small container image, but larger in the case of a VM image. The download time can be improved by pre-pulling images, as described in the following section.

Note that it is important to have a highly available cloud environment. For example, in Kubernetes, the master components such as the scheduler need to be highly available, as they are responsible for re-instating a failed queue manager.

Edit (2018/06/05): In Kubernetes, a failing pod failure will typically be recovered quickly, but the failure of an entire node is handled differently.Â If the Kubernetes master loses contact with a worker node, it waits for its pod-eviction-timeout (set to five minutes by default) before evicting all the pods running on that node.Â This is intended to prevent mass pod evictions in the case of a network problem, but can cause a slower recovery in the case of a real node failure.

Edit (2018/09/27): This shows that container environments are still evolving with their HA capabilities, especially when it comes to stateful workloads such as messaging. So it may be preferable to overlay traditional HA capabilities such as multi-instance queue managers if you have doubts over your container environment’s capabilities. This is an area we are continuing to investigate.

Improving message availability

You can improve message availability by reducing the recovery time as much as possible. For example:

Frequent health checks â€” If you can detect a failure quickly, you can take action more quickly. Balance this with performance concerns, because health checks use resources.

Pre-pulling images â€“ when using the “single resilient queue manager” pattern, part of the recovery time in a cloud environment is the time taken to download the software image to the host server. In container terms this is often a “docker pull”. You can remove this delay by making sure the image is already available on an eligible hosts. Pre-pulling images is not a first-class feature in Kubernetes, but it can be accomplished. One way to pre-pull images, is as follows. Run a Daemon Set using your MQ image, on every worker node which is eligble for MQ deployment, but override the “entrypoint” of the container so that it doesn’t run a queue manager. This will cause Kubernetes to pull the MQ image to each node. By running in a Daemon Set, Kubernetes will restart the container immediately if it fails. If you also set the image pull policy to “Always”, then Kubernetes will pull the image again every time it needs to run the container. If you make the container entrypoint (say) sleep for thirty minutes, then the MQ image will be pulled down again (if necessary) every thirty minutes.

Quiesce messaging load before a planned outage – in the case of a planned outage, you can redirect traffic away in advance, to reduce the number of messages which will be held on a particular queue manager. For example, you can do this by suspending the queue manager from an MQ cluster, or by lowering the priority of the channels or queues in the cluster. This has two benefits: Messages held on queues during an outage cannot be processed during that time, so it’s better to have them processed through alternative queue mangers; The time taken to restart a queue manager can be affected when many messages are stored or being processed by a queue manager, so storing fewer messages allows a quicker recovery.

Redirect existing messages before a planned outage – in the case of a planned outage, you could attempt to drain a queue manager which has already been quiesced. One way of doing this, with MQ clusters, is to use the “amqsclm” sample to re-send all messages back into the cluster. This will re-distribute the messages to other queue managers.

Scalability

The normal techniques for scaling MQ are not changed when you move to the cloud, but you do get more options. Standard, well proven techniques, such as using MQ clusters to workload balance messages across multiple queue managers, continue to work as normal. Running in containers also makes it easier to “vertically” scale, by changing the resources used by a container, and potentially re-scheduling to another worker node if necessary. However, you may be interested in using your cloud platform to “horizontally” scale MQ, by using multiple MQ servers with similar or identical configuration. This has potential benefits both to scalability, and for availability (as a so-called “active-active” availability solution). Horizontal scaling is often used in cloud environments, particularly for stateless workloads, but requires careful architecture when applied to stateful workloads. This is particularly true when you’re sending high value messages using MQ.

Dynamic versus static scaling

It is easier to deploy a horizontally scaled set of MQ servers, which is fixed in sized, than to dynamically change the number of MQ servers. Scaling IBM MQ up by adding more queue managers is fairly straightforward, by using either an IBM MQ cluster of queue managers, or a load-balanced set of identical queue managers, depending on what messaging patterns you use. Scaling down is more complex because you almost always want to remove a queue manager in a controlled manner to be sure no messages remain on the queue manager at the time it is deleted.

Scaling down in many cloud environments is especially difficult, because you need to distinguish between a server which is shutting down but will be restarted elsewhere, and a server which is shutting down because it is being deleted. This isn’t a problem with stateless workloads, but is usually a problem with stateful workloads like MQ.

In addition, you need to consider how use of dynamic scaling can affect MQ clusters. If you delete a queue manager that is a member of an MQ cluster without first performing the recommended steps to remove that queue manager, information about that queue manager will be held by the MQ cluster for up to 90 days, but shouldnâ€™t have any adverse effects. However, the same is not true for MQ objects such as cluster queues, which can cause problems if they are not removed carefully. For example, messages may stay on cluster transmission queues waiting for deleted queue managers to return

In summary, it is not recommended to reduce the size of the “set” after initial deployment, which will cause the deletion of queue managers, and can result in loss of messages. In Kubernetes terms, this means that you should not change the number of replicas in a Stateful Set after initial deployment. If you want to change the number of replicas, you must have a proven process in place to ensure that new replicas are correctly integrated into your solution.

Message ordering

If you use horizontal scaling, messages can be processed concurrently, which means messages might be received out of sequence. IBM MQ provides features to allow groups of messages to be handled in small ordered batches, when being sent between queue managers. You can also choose to manage this yourself in your application.

Client versus server bindings

IBM MQ client applications should generally be scaled separately from IBM MQ servers, which means running them in separate virtual machines or containers. Having single-purpose containers is also considered a best practice, and brings the maximum benefits of containerization, including dependency isolation, resource isolation, security isolation, and ease of access to logs. For MQ, this means that cloud topologies typically use IBM MQ client connections to communicate between applications and queue managers.

There are some cases where server bindings may be required, though, for example applications that are using a queue manager as the transaction coordinator, of global (XA) transactions are required to connect using server bindings. In addition, there are a number of cases where server bindings may be desirable, for improved performance, or simply because applications have been developed in that way.

There are several IBM-provided solutions which require server bindings, either for the reasons given above or due to a need for access to locally held MQ information:

IBM Integration Bus uses server bindings to provide global transaction support on distributed platforms, as well as for certain nodes.

You can use any of these technologies from within the same container. If you want to use them from a separate container (for example, for increased isolation, or easier error log management), then there are some technical restrictions:

A shared IPC namespace (shared memory), because in server bindings mode, MQ clients connect to the server using shared memory.

A shared process (PID) namespace, because in server bindings mode, the MQ server monitors the process ID (pid) of the client, to determine whether it is still running.

Docker supports using a shared IPC and PID namespace from Docker V1.12 (July 2016) onwards. However, Kubernetes only introduced an alpha-quality feature to use a shared PID namespace, from Kubernetes V1.10 (March 2018).

Load balancing

In order to horizontally scale, you need more than one queue manager which is able to offer the same messaging service (such as a queue). Once you have this, you need a way to assign clients to each queue manager in a balanced way. The simplest load balancing is connection load balancing, where a server (queue manager) is chosen at connection time. MQ also provides more sophisticated load balancing, on a per-message basis, through the use of MQ clusters.

Discovering and connecting to a queue manager

How do MQ clients locate a queue manager to use? There are three main approaches for an initial connection:

Manual/fixed – Clients can be configured at deployment time to use a specific queue manager for the lifetime of that deployment. For example, this might be done by having a queue manager and client application in the same Kubernetes Pod. It could also be as simple as supplying an environment variable, Kubernetes ConfigMap, or via VCAP_SERVICES in CloudFoundry.

Client Channel Definition Table (CCDT) – Clients can be configured to retrieve queue manager information using an MQ CCDT. CCDTs offer a rich set of connection information, including channel identifiers, TLS cipher specs, maximum message size, and more. The CCDT needs to be updated whenever new queue managers are created, or old ones deleted. The CCDT can be provided as a file (for example, using a Kubernetes ConfigMap), or accessible on the network via HTTP or FTP (for example, located using a Kubernetes Service). Note that if you use a “queue manager group” feature of CCDTs, and your client application connects using a “*” in the queue manager identifier, this is similar to using an L4 load balancer, except it avoids the problems with JMS.

L4 load balancer – Clients can be configured to use a network load balancer, which selects the destination for TCP/IP connections. This is often known as an “OSI layer 4” or “L4” load balancer, and is commonly offered in cloud environments. For example, a typical Kubernetes Service will spray TCP/IP connections to one or more servers, in round-robin order. OSI layer 7 load balancing is discussed later, as it doesn’t apply to the initial connection.

There are a number of restrictions you face when using an L4 load balancer (or a CCDT with a “*” in the queue manage identifier), which need to be carefully considered before you use this approach:

Manual/fixed use of multiple channels – In most cloud environments, addressing information for services consists of simply a TCP/IP address and port. MQ provides additional flexibility: behind a single TCP/IP address and port, you have one or more channel definitions. This is typically used to configure security settings for certain connections, such as TLS ciphers and certificates. The CCDT mechanism allows you to define channel information as part of the connection information, but if you use an L4 load balancer, you need to specify channels in a manual/fixed way (for example, with an environment variable at deployment time). This is fine if you have a small number of different channel configurations to deal with, but becomes cumbersome if you configure lots of different channels.

Avoid JMS APIs when using a L4 load balancer – MQ’s Java Messaging Service (JMS) implementation uses multiple TCP/IP connections, which could be incorrectly distributed across different queue managers. Using a CCDT resolves this problem.

Restricted use of request/response pattern – A common messaging pattern is where a client sends a message and waits for a response before continuing. This creates state in the application, and can result in messages which don’t get consumed if/when the client fails and gets restarted (which can be common in cloud environments).Â If you want to use request/response messaging, then you should either persist any information required to process a response (for example, in a database), or place all information needed to process the response in the message itself.

Avoid server affinity – See the next section

Server affinity

You have an architecture with “server affinity” if you have any situation where if a client were disconnected for some reason, that it would need to reconnect to the same queue manager.Â Some MQ features which can create server affinity include:

If you use features which require server affinity, and you want to use an L4 load balancer or a CCDT with a “*”, then the client is responsible for persisting the information it needs to reconnect to the correct queue manager. For example, the client might query the queue manager name, and write it to a database or key-value store.

Heterogeneous or homogeneous sets of queue managers

Many cloud environments make it easy to “horizontally scale” multiple replicas of a container or VM. This gives you a “homogeneous set” of queue managers, with an identical configuration. This typically makes it easy for the cloud to provide L4 load balancing, and also allows for cloud-managed rolling upgrades of your queue managers. For example, Kubernetes offers a Kubernetes Stateful Set, and will manage upgrading each replica (queue manager) in turn.

MQ clustering â€” L7 load balancing

MQ clusters offer more sophisticated load balancing, by choosing a cluster queue when it is opened, on a per-message basis, or when creating a message group. This can happen within a single TCP/IP connection. This load balancing is part of the MQ clustering feature, and requires the queue managers to be interconnected. It is often used in conjunction with one or more “gateway” queue managers, which act as a front-end for routing messages to back-end queue managers.

This load balancing is based on many things, but is primarily based on a set of weightings and priorities, and whether queue managers are known to be available or not.

Summary

The following is a summary of the recommended practices and guidance for using MQ in containers in production:

Availability – use the â€śsingle resilient queue managerâ€ť system to achieve high availability. This is where the cloud environment will replace a failed instance with a new one, instead of having a “warm” standby queue manager already running. This requires network-attached storage as is typical in most container systems. Zone/region availability is largely dictated by the replication available for your chosen underlying storage system. Where even greater availability of the messaging service is required, introduce multiple “single resilient queue managers” to provide a continuously available messaging solution.

Scalability

For applications where you don’t know (or can’t control) precisely which features are used, you should scale using a heterogeneous set of queue managers using a CCDT and/or an MQ cluster, instead of using cloud-controlled scaling. In Kubernetes, this means you should use a Stateful Set with exactly one replica, and deploy multiple sets. Each Stateful Set will provide a highly available queue manager, and the MQ cluster will allow the scaling.

For other applications, you may be able to use cloud-provided scaling, but you must avoid some MQ features. Use a fixed-size homogeneous set of queue managers (for example, using a Kubernetes Stateful Set), and don’t reduce the size of the set after you deploy. You can use either an L4 load balancer (for example, a Kubernetes Service), or a CCDT (which is required for applications using JMS).

You can successfully deviate from this guidance with the correct level of architectural consideration, as discussed in this blog entry.