I have recently received a couple of queries from customers regarding the use of multiple cluster transmission queues with IBM MQ. One customer wanted to know how best to update an existing cluster to use them. The other customer had a cluster-sender channel that wouldn’t start because they’d accidentally deleted its transmission queue and wanted to know how best to recover. It is actually quite easy to resolve that problem and fortunately it was only on a test system so the customer wasn’t unduly impacted. Given these queries I thought a blog post on this subject might be useful. This post describes the multiple cluster transmission queue feature, why you might wish to configure multiple transmission queues and how to do so. I also explain how to resolve some problems you might encounter. There is quite a lot of information in this post so I’ve divided it in to sections so you can jump straight to content that interests you.

Introducing multiple cluster transmission queues...

A transmission queue is used by MQ to store messages until they can be transmitted over the network to their destination. Regular sender channels have a dedicated transmission queue and they send all messages put to their transmission queue to their remote receiver. Cluster-sender channels are more cooperative. The default behaviour is for all cluster-sender channels to share a single transmission queue called SYSTEM.CLUSTER.TRANSMIT.QUEUE. The correlation ID of messages put to this queue identifies the cluster-sender channel over which they should be sent. Cluster-sender channels use MQGET by correlation ID to remove only those messages they should send. I mention this because many users don’t realise this difference compared to other transmission queues. It is actually extremely important that cluster channels work this way. If they didn’t then a single message at the head of the queue would block all cluster communication if its destination queue manager was unavailable.

Although a single transmission queue for all cluster communication is simple for administrators to understand and is sufficient for many users it does have some drawbacks. Therefore, support for multiple cluster transmission queues was introduced on Windows, Linux and UNIX in version 7.5. IBM i and z/OS did not have a version 7.5 offering so the same feature was introduced in version 8 on these platforms. Many people assume that the introduction of this capability was to improve performance. Some users may notice an improvement if they are constrained by queue contention and/or message buffering but the use of multiple transmission queues predominantly provides the following benefits:

Separation of message traffic
When a single transmission queue is used it is possible for messages destined for one channel to interfere with those for another. For example, if messages cannot be sent over one or more channels then a shared transmission queue can eventually become full.

Management of messages
Administrators often like to use queue attributes such as MAXDEPTH to manage available resources. When all cluster channels share a single transmission queue these attributes become less useful, especially when a queue manager is a member of multiple clusters and the transmission queue is used to service multiple applications.

Monitoring
When a single transmission queue is used it is not possible to use queue monitoring to track the number of messages processed by each channel, although channel statistics provide some of the same information. Administrators must also perform investigative work to identify why the depth of a single transmission queue is growing when it is used by multiple applications and channels. If message traffic is separated it is much easier for administrators to determine the cause and what is affected.

Configuring multiple cluster transmission queues

The transmission queue a regular sender channel uses is configured using the XMITQ channel attribute. A similar attribute cannot be used for cluster communication because most channels are automatically defined based on the cluster-receiver definition of each remote endpoint. It would be undesirable and difficult to manage if remote definitions affected the transmission queue local channels use. It might also cause problems for back-level queue managers coexisting in the same cluster. Therefore, an alternative means to configure the transmission queue each cluster-sender channel should use has been implemented.

A new queue manager attribute called DEFCLXQ has been introduced, which stands for ‘default cluster transmission queue’. This attribute has two permissible values, SCTQ and CHANNEL. The value SCTQ, which is the default for backwards compatibility, indicates that by default cluster-sender channels use SYSTEM.CLUSTER.TRANSMIT.QUEUE. The value CHANNEL indicates that by default each cluster-sender channel uses a dynamically created transmission queue called SYSTEM.CLUSTER.TRANSMIT.<channel-name>. Using the value CHANNEL provides administrators with a simple option to use a separate queue for each channel. The queue manager automatically creates and deletes transmission queues as necessary to serve cluster channels.

For many users the use of DEFCLQX alone will be sufficient. However, it is recognised that in large clusters a separate transmission queue for every channel might be too granular. Administrators might also prefer to use a different naming convention for the transmission queues. On z/OS, administrators might wish to control which storage class (page set) and buffer pool is associated with each queue. Therefore, a new queue attribute called CLCHNAME has also been introduced. Instead of defining the transmission queue on the channel, the CLCHNAME attribute allows an administrator to define on a transmission queue which cluster channels should use it. The attribute supports wildcards in any position to allow many channels to use the same manually defined queue. For example, a value of ABC.* matches any channel with a name that starts with ABC followed by a dot. If the common naming convention of <cluster>.<queue-manager> is used for cluster channels this makes it easy for administrators to configure a separate transmission queue for each cluster a queue manager is a member of, or a separate transmission queue for specific remote queue managers.

The transmission queue a channel uses is determined by searching for a matching CLCHNAME value. If multiple matches are found then the most specific match takes precedence. If no match is found then the value of the DEFCLQX attribute is used to determine which queue to use.

Consider the following example, assuming no other transmission queues have a non-blank CLCHNAME value:

A channel called AAA.BBB uses CLUSTER.XMITQ2 because that transmission queue has a specific CLCHNAME value that matches the name of the channel.

A channel called AAA.CCC uses CLUSTER.XMITQ1 because that transmission queue has a generic CLCHNAME value that matches the name of the channel and there is not a more specific match.

The transmission queue used by a channel called XXX.YYY depends on the value of the DEFCLXQ queue manager attribute because no CLCHNAME value matches its name. It will use either SYSTEM.CLUSTER.TRANSMIT.QUEUE or a permanent-dynamic transmission queue called SYSTEM.CLUSTER.TRANSMIT.XXX.YYY.

Switching transmission queue

The transmission queue associated with a cluster sender channel can be potentially modified by performing any of the following actions:

Altering the value of the DEFCLQX queue manager attribute

Manually defining a transmission queue with a non-blank value for the CLCHNAME attribute

Altering the value of the CLCHNAME attribute on an existing transmission queue

Deleting a transmission queue that has a non-blank value for the CLCHNAME attribute

To avoid channels switching transmission queue when they are running, or when multiple configuration changes are made in quick succession, no immediate action is taken by the queue manager when a DEFINE, ALTER or DELETE command is processed. Instead, each channel queries the transmission queue it should use when it starts. If a configuration change has been made since it was last active a switch of its transmission queue is initiated. The process used to switch transmission queue is:

The channel opens the new transmission queue for input and starts getting messages from it (using get by correlation ID)

A background process is initiated by the queue manager to move any messages queued for the channel from its old transmission queue to its new transmission queue. While messages are being moved any new messages for the channel are queued to the old transmission queue to preserve sequencing. This process might take a while to complete if there are a large number of messages for the channel on its old transmission queue, or new messages are rapidly arriving.

When no committed or uncommitted messages remain queued for the channel on its old transmission queue then the switch is completed. New messages are now put directly to the new transmission queue.

Further changes to the transmission queue configuration for a cluster-sender channel do not take effect while the channel is switching, even if the channel is restarted. The existing switch must complete first to avoid messages being dispersed over more than two queues. This is important to remember should you wish to back-out the change that resulted in a switch occurring.

Administrators might not wish cluster-sender channels to switch transmission queue when they next start because this might be at a time when application workload is high. When workload is high there is an inherent race between messages arriving and the queue manager moving them from the old to the new transmission queue in order to complete the switch operation. Although the queue manager will eventually win out CPU consumption (and potentially I/O) will increase during this time. Administrators might also want to avoid a lot of channels switching simultaneously so they can avoid the queue manager spawning many processes to accomplish this. To help avoid this eventuality MQ provides a command that provides the ability to switch the transmission queue of one or more channels that are not running. On distributed platforms the command is called runswchl. On z/OS the CSQUTIL utility can be used to process a SWITCH CHANNEL command instead. Using these commands administrators can explicitly switch one or more channels, either manually or using a script or job. The command processes each channel in turn instead of them all in parallel and waits for each switch to complete before starting the next. This is particularly useful because it avoids administrators having to monitor the status of background switching operations. It is also a good idea to explicitly set the status of the channels that are to be switched to STOPPED beforehand to avoid them being started while the command is running. If a channel is running it will be skipped by the command. Each channel may be started once the switch of its transmission queue has been initiated, even if the moving messages phase has not yet completed. This helps avoid an extended outage for the channel. Messages will be sent by the channel as soon as they have been moved by the queue manager to its new transmission queue.

Monitoring the status of switch operations

It is important for administrators to be able to understand the state of systems they manage. To understand the status of switch operations administrators can perform the following actions:

Monitor the queue manager error log (AMQERR01.LOG) where messages are output to indicate the following stages during the operation:

The switch operation has started

The moving of messages has started

Periodic updates on how many messages are left to move (if the switch operation does not complete quickly)

The moving of messages has completed

The switch operation has completed

On z/OS, these messages are output to the queue manager job log, not the channel initiator job log, although a single message is output by a channel to the channel initiator job log if it initiates a switch when starting.

The DISPLAY CLUSQMGR command can be used to query the transmission queue that each cluster-sender channel is currently using

The runswchl command (or CSQUTIL on z/OS) can be run in query mode to ascertain the switching status of one or more channels. The output of this command identifies the following for each channel:

Whether the channel has a switch operation pending

Which transmission queue the channel is switching from and to

How many messages remain on the old transmission queue

This is a really useful command because in one invocation an administrator can determine the status of every channel, the impact a configuration change has had and whether all switch operations have completed.

Potential issues

Here is a list of some issues that might be encountered when switching transmission queue, their cause and most likely resolution.

Insufficient access to transmission queues on z/OS

Symptom: A cluster-sender channel on z/OS might report it is not authorized to open its transmission queue.

Cause: The channel is switching, or has switched, transmission queue and the channel initiator has not been granted authority to access the new queue.

Resolution: Grant the channel initiator the same access to the channel’s transmission queue that is documented for the transmission queue SYSTEM.CLUSTER.TRANSMIT.QUEUE. When using DEFCLXQ a generic profile for SYSTEM.CLUSTER.TRANSMIT.** avoids this problem occurring whenever a new queue manager joins the cluster.

Moving of messages fails

Symptom: Messages stop being sent by a channel and they remain queued on the channel’s old transmission queue

Cause: The queue manager has stopped moving messages from the old transmission queue to the new transmission queue because an unrecoverable error occurred. For example, the new transmission queue might have become full or its backing storage exhausted.

Resolution: Review the error messages written to the queue manager’s error log (job log on z/OS) to determine the problem and resolve its root cause. Once resolved, restart the channel to resume the switching process, or stop the channel then use runswchl instead (CSQUTIL on z/OS).

A switch does not complete

Symptom: The queue manager repeatedly issues messages that indicate it is moving messages. The switch never completes because there are always messages remaining on the old transmission queue.

Cause 1: Messages for the channel are being put to the old transmission queue faster than the queue manager can move them to the new transmission queue. This is likely to be a transient issue during peak workload because if were commonplace then it is unlikely the channel would be able to transmit the messages over the network fast enough.

Cause 2: There are uncommitted messages for the channel on the old transmission queue.

Resolution: Resolve the units of work for any uncommitted messages, and/or reduce/suspend the application workload, to allow the moving message phase to complete.

Accidental deletion of a transmission queue

Symptom 1: Channels unexpectedly switch due to the removal of a matching CLCHNAME value.

Symptom 2: A put to a cluster queue fails with MQRC_UNKNOWN_XMIT_Q.

Symptom 3: A channel abnormally ends because its transmission queue does not exist.

Symptom 4: The queue manager is unable to move messages to complete a switch operation because it cannot open either the old or the new transmission queue.

Cause: The transmission queue currently used by a channel, or its previous transmission queue if a switch has not completed, has been deleted.

Resolution: Redefine the transmission queue. If it is the old transmission queue that has been deleted then an administrator may alternatively complete the switch operation using runswchl with the -n parameter (or CSQUTIL with MOVEMSGS(NO) on z/OS). Use the -n parameter with caution because if it is used inappropriately then messages for the channel can be orphaned on the old transmission queue. In this scenario it is safe because as the queue does not exist there cannot be any messages to orphan.

How do I connect a new active/active MQ environment to an existing single HA queue manager?

How can I create an active/active MQ connection across company boundaries without using a cluster?

How could I create a split data-center active/active MQ topology?

So I've tried to put together an overview in the below slideshare, covering the most common scenarios I've found myself scribbling on a whiteboard recently.
See what you think. If anyone is interested in a worked example of a particular scenario, please let me know.

Recent discussions I’ve had have made me realise that there are many MQ clusters out there that have cluster sender channels manually defined where they shouldn’t be, and possibly some missing where they really should be. This could be down to remnants of previous configurations or simple misunderstandings of what’s needed. So I thought I’d try to clarify things a little, there are basically two simple rules, so this should be quick…

A full repository queue manager must have a manually defined cluster sender channel pointing to every other queue manager that is also a full repository for each of the clusters that this queue manager is a full repository for.

For each cluster that a queue manager is only a partial repository for there should only be one manually defined cluster sender channel pointing at one of the queue managers that hosts a full repository for that cluster.

A little more info on these…

1. Full repositories

1. A full repository queue manager must have a manually defined cluster sender channel pointing to every other queue manager that is also a full repository for each of the clusters that this queue manager is a full repository for.

This is necessary so that cluster information learnt by one full repository is replicated across the other full repositories. Obviously, there will normally only be one other full repository (see here), but occasionally you may have a third or forth during a full repository migration phase, in which case you must have manually defined cluster sender channels between each of them.

2. Partial repositories

For each cluster that a queue manager is only a partial repository for there should only be one manually defined cluster sender channel pointing at one of the queue managers that hosts a full repositories for that cluster.

Many people manually define a cluster sender channel to more than one full repository queue manager, either because they believe this is necessary or because they think it will dictate which full repositories are to be used. Neither is really true…

Once a partial repository queue manager has communicated with a full repository (over the first manual cluster sender channel it detects) it will be told by that full repository of a second full repository to use for HA purposes, this is then remembered over queue manager restarts. So it doesn’t need an explicit channel definition to a second full repository, the first full repository sorts that all out automatically.

And despite it appearing to behave in this way at first glance, if you do have more than two full repositories and you do manually define that second channel it doesn’t 100% guarantee that the two full repositories that this queue manager uses will be the two pointed to by the two manual cluster sender definitions. There are edge conditions where this is not the case as this was never the intended use. So to save future confusion, just keep it simple and have the single defined cluster sender channel.

In both of the above cases, if multiple clusters are involved with the same full repository queue managers it doesn’t matter if there is a single cluster sender channel definition using a cluster namelist to list all the clusters or individual cluster sender channel definitions for each cluster, the same rules apply.

It’s also worth pointing out that it’s possible for a queue manager to be a full repository for some clusters and a partial repository for others, that’s why the above rules are described in terms of clusters rather than queue managers as a whole.

So, is your setup correct?

If you’re responsible for maintaining the queue managers in a cluster, especially one that’s grown over time, it can be hard to work out from channel configuration alone what is right and what is wrong, especially if there are multiple overlapping clusters to consider.

At the time MQ detects an invalid configuration it will report it to the queue manager error logs, where you’ll see this message (CSQX427E on z/OS):

AMQ9427: CLUSSDR channel does not point to a repository queue manager.

But it’s easy to miss those, so how do you check when the system’s configured? Obviously you can check all the channel and repository configuration on each queue manager and work it out but another way to check for a correct configuration that doesn’t need you to cross reference between multiple queue manager’s definitions is to check each queue manager’s cluster queue manager entries through the MQSC command DISPLAY CLUSQMGR (this also applies when using MQExplorer but I’ll concentrate on MQSC here).

Each CLUSQMGR entry represents how the local queue manager sees the other queue managers in each of the clusters it is a member of. You get a separate entry per cluster irrespective of whether namelists have been used on the cluster channels or not. The entry contains two particularly useful attributes for this discussion, QMTYPE and DEFTYPE.

QMTYPE simply shows if the other queue manager is a full repository (‘REPOS’) for the cluster or a partial repository (‘NORMAL’).

DEFTYPE shows you how the relationship between the queue managers has been established, based on what cluster channels have been defined.

DEFTYPE has a number of rather cryptic values, CLUSSDR, CLUSSDRA, CLUSSDRB and CLUSRCVR. We’re working on improving the MQ documentation for these, but in the meantime, I’ll summarise them here, so a short diversion…

DEFTYPE values

CLUSRCVR: This is the entry for the local queue manager in each cluster it has a cluster receiver channel defined.

Any CLUSSDR* value means this entry represents a remote queue manager in a cluster. The different values however help you understand how the local queue manager came to know about it:

CLUSSDRA: This is a remote cluster queue manager that the local queue manager has no manually defined cluster sender channel for but has been told about it by someone else, either by the remote queue manager itself (typically because this queue manager is a full repository) or because a full repository has told this queue manager about it as it needs to communicate with it for some reason.

CLUSSDRB: This means the local queue manager has a manually defined cluster sender channel which has been used to establish contact with the target queue manager and that queue manager has accepted it from the point of view of the cluster. The target could be a full or a partial repository, although as I’ve already said you really only want it to be a full repository at the other end.

CLUSSDR: This means the local queue manager has manually defined a cluster sender channel to the remote queue manager but the initial cluster handshake between them has not yet completed. This may be because the channel has never started, perhaps because the target is not running or the configuration is incorrect. It could also mean the channel has successfully started but the target queue manager did not like the cluster information provided, for example a cluster name was set in the cluster sender channel definition that does not match the target’s cluster membership. Once the handshake has been performed the DEFTYPE should change to CLUSSDRB, so in a healthy system CLUSSDR should only be a transitory state.

Anyway, where were we? Oh yes, working out if our cluster channel configuration is correct or not. The simplest way I’ve found is to check that any CLUSSDRB queue managers are pointing at a full repository and that if the queue manager is only a partial repository, it only has the one CLUSSDRB per cluster. So issuing the following on a queue manager allows you to see that…

DISPLAY CLUSQMGR(*) WHERE(DEFTYPE EQ CLUSSDRB) QMTYPE

Then check that all the QMTYPEs in the results are set to REPOS. Any that say NORMAL indicate a manual cluster sender pointing to a partial repository which is incorrect and needs some investigation.

You now check which clusters the local queue manager is a full repository for, you can use the CLUSQMGR entries again to do that by looking at the QMTYPE for each entry with a DEFTYPE of CLUSRCVR…

DISPLAY CLUSQMGR(*) WHERE(DEFTYPE EQ CLUSRCVR) QMTYPE

(Alternatively you can look at the REPOS and REPOSNL attributes of the queue manager)

With this full repository information crosscheck the CLUSQMGR results from above to make sure that for every cluster that this queue manager is not a full repository for there is only a single CLUSQMGR returned for that cluster with a DEFTYPE of CLUSSDRB.

If this queue manger is a full repository for any clusters you now need to issue the following command…

DISPLAY CLUSQMGR(*) WHERE(QMTYPE EQ REPOS) DEFTYPE

Now, for each entry that matches a cluster that this queue manager is a full repository for you need to make sure the DEFTYPE is set to CLUSSDRB and not CLUSSDRA (ignoring the local, CLUSRCVR entries). If there are any CLUSSDRAs you need to manually define a cluster sender channel to that queue manager.

Once you’ve done this you should have the right cluster sender channel configuration for that queue manager.

But before I go, if you issue the following command and get any results back it means there are cluster sender channels defined that have not been correctly established so you should look into why that’s the case (check the configuration, the availability of the other end and for any error messages written by either end mentioning that channel/queue manager)…

DISPLAY CLUSQMGR(*) WHERE(DEFTYPE EQ CLUSSDR)

Logically thinking…

For those of you that likes to think in pseudo code, or want to automate the checking, try this….

First build a list of all clusters that the local queue manager is a full repository (FR) for.

<e.g. DISPLAY CLUSQMGR(*) WHERE(DEFTYPE EQ CLUSRCVR)>

For each CLUSQMGR with a DEFTYPE of CLUSSDR*:

<e.g. DISPLAY CLUSQMGR(*) WHERE(DEFTYPE NE CLUSRCVR)>

If DEFTYPE=CLUSSDRB:

Is QMTYPE=NORMAL?

Error: You shouldn’t have a manual cluster sender to a partial repository.

Is the local queue manager a partial repository for this cluster?

Have we already seen a CLUSSDRB for this cluster?

Error: You should only have one manual cluster sender defined.

If DEFTYPE=CLUSSDRA:

Is QMTYPE=REPOS and the local queue manager is a FR for this cluster?

Error: All FR to FR channels should be manually defined.

If DEFTYPE=CLUSSDR:

Error: The cluster handshake has not completed.

And finally...

If you do want to investigate this with MQExplorer you need to look at the “Cluster-sender Channels” tab for each queue manager listed under the “Queue Manager Clusters” folders. Then check the “Queue manager type” (i.e. QMTYPE) and “Definition type” (i.e. DEFTYPE) for each of them.

That was longer than I was expecting, so thanks for reading to the end, I hope it helps to clarify things.

If you are familiar with IBM MQ Clustering, you will already know that keeping your Full Repositories running smoothly is key to maintaining a happy and healthy cluster. You will therefore have come across best practice advice to dedicate specific hosts to running these queue managers, and then leave them to get on with the business of running the cluster without also running application workloads.

However, sooner or later you may need to relocate your cluster repositories, perhaps because

You are migrating to a new release of the product

Your choice of environment (OS, hardware) has changed

Workload has grown and you want to separate out systems which were previously co-located

…etc. or maybe a combination of the above.

The usual recommendation when ‘replacing’ a queue manager in a cluster is to avoid relying on the new system being an exact replica of the old. Because clusters are asynchronous, and to some extent keep a historical record of members and the objects they have advertised, it is usually better to add new systems which are completely unique, even if they are to serve the same purpose as an older system being decommissioned. This, then, is also the approach taken here.

Basic Steps

Create a new queue manager which is to start hosting a full repository for the cluster, with a unique name.

Introduce this queue manager into the cluster as normal – defining a sender channel on this queue manager to an existing FR, and a receiver channel by which other queue managers will contact this queue manager.

Now promote this queue manager to hold a full repository for the cluster – remember to also define cluster sender channel(s) to (the/all) other full repository or repositories and from them to this queue manager at this point. It is critical for all to be fully interconnected to ensure a full picture of the cluster state reaches this new repository.

Note 1: Assuming you normally run with the recommended 2 FRs, you now have 3 Full Repositories for this cluster for a short period. This is perfectly acceptable as long as all remain fully interconnected.

At this point it is good to take a checkpoint – display cluster objects (queues, topics, queue managers) on the new full repository to confirm that all knowledge of the cluster has been transferred to the new cache. If entries which you believe should be present are missing:

Check all your CLUSSDR channels between FRs are correctly defined and able to start

Check there is not a build up of messages still being processed on the SYSTEM.CLUSTER.COMMAND.QUEUEs (or any transmit queues in the cluster).

Check for any errors in the queue manager error logs.

Now you are ready to decommission an existing full repository. This is as simple as modifying the REPOS or REPOSNL attribute on the queue manager. Migration complete!

Note 2: At some point before or after step 5 you will need to go and visit all partial repositories and other FRs removing any CLUSSDR definitions which point to this ex-Full Repository and replacing with a definition pointing to one which is still active. This should be done as soon as is convenient after these changes, but is not critical for it to be carried out instantly. However, if while in this state you attempt to ‘bootstrap’ the cluster in any way (for example issuing REFRESH CLUSTER) you will experience problems, so it is desirable to complete this process in a timely manner.

If necessary, you can now repeat these steps to migrate the (an) other full repository

Varying the process

In the real world, there are a number of factors which can mean the above process has to be varied slightly:

If you must keep the same name for the new full repository, it may be preferable to temporarily drop down to one FR, and ensure the old queue manager is completely forgotten (using RESET CLUSTER) before adding the new system with the same name to the cluster. Running with one FR for a short maintenance window is not a concern, since even if that fails you will not experience immediate problems as long as no applications attempt to access queues which they have not used before in this period. Again – this is NOT the recommended route, as in practice using the same name for different actual queue managers has been seen to lead to confusion and hard to spot errors, which can occur many days or weeks after the change in configuration.

By corollary to (1), one of the most common reasons for requiring the same queue manager name is that application queues are also hosted on the full repository. Consider taking this opportunity to separate out application and repository tasks onto separate queue managers (even if still hosted on the same system). The new full repository (with a new name) can then be introduced as a separate step from flushing out and replacing the system with the name which is being kept.

In general if you prefer temporarily dropping to one FR to running with three - for example, simply to avoid the brief period with one extra queue manager active - a window with one full repository is acceptable; (see (1).

I hope this post has been helpful both in outlining best practice and the reasoning behind it – as ever the very first step is to try out your planned process on a test system and ensure the precisely tailored version you intend to use is fully tested and documented for your own reference. If you found this useful, and in case you missed them, you can always check out my profile page for links to previous blog entries relating to MQ Clustering.

Part of the 'Clustering FAQ' series, looking at how to keep an eye on the 'health' of your cluster tasks. (Note – this information is intended as guidance for use with current WMQ versions 7.1 and 7.5. While many of the details are the same for all versions of WMQ Clusters there will be some differences, so you should use caution if referring to this document when working with earlier product versions).

What is the repository process?

All queue managers participating in a cluster maintain a local cache of information about the cluster, whether they are full repositories with a complete picture of the cluster, or partial repositories which keep just a working subset. See previous blog posts and the Infocenter for more information about the contents of this cache. In this post I’m going to be discussing the repository manager process, which is the component of the queue manager responsible for keeping the cache up to date and sharing information with other queue managers in the cluster.

All queue managers, even those not using clustering, will have a repository manager process – though if they are doing no cluster work, this process should be almost entirely idle and have a very small footprint. On platforms other than z/OS, the repository manager is a stand alone process connected to the queue manager, called amqrrmfa. On WMQ for z/OS, the repository manager runs as a task within the CHIN address space. Therefore when the chinit is not running on z/OS, the cluster cache will not be kept up to date.

If you have a significant cluster deployment, alongside other monitoring of your system it is a very good idea to monitor the health of the repository manager – problems here could end up causing application issues ranging from not getting the workload balancing you expect to complete non-delivery of messages. There are a number of ways to look at what the repository manager is doing, and I’ll go through some of the main ones here.

CPU Usage

As a very basic starting point, you may wish to use system tools to watch the CPU usage of the repository task. The repository usually spends most of it’s time idle, with the following exceptions.

Periodically, the repository manager runs a ‘maintenance’ of the local cache. This includes checking all locally defined objects to see if they should be readvertised to the cluster, and looking for remotely defined entries which are no longer in use and can be garbage collected. Typically this should take a matter of seconds or in very large clusters a few minutes. After running a maintenance cycle, the repository manager schedules another to run 1 hour later. So although this will normally be seen as an hourly ‘heartbeat’ of activity, in large environments where a maintenance run takes a little longer this will not be completely regular.

The repository manager is always listening for updates from other queue managers, or local requests for information about the cluster. These are normally handled very quickly, so CPU usage will barely register in most cases. The exception might be when bootstrapping a new queue manager into a cluster, or carrying out administrative work such as a REFRESH – at these times there may be a large amount of work to do causing a period of increased activity.

The performance of the Repository task itself is not normally something that requires much consideration, even in substantial cluster deplyments. One exception to this, which can be very visible to end users, is when a repository task is so overloaded that cluster queries on behalf of applications cannot be processed in a timely manner. This will be seen as an MQRC_CLUSTER_RESOLUTION_ERROR in the application. If these are occuring frequently, gathering CPU information and looking at what is happening on the repository queues (see below) may help track down the source of the problem (or ultimately be useful documentation if raising a PMR with IBM service is required).

Memory Usage

On distributed platforms, the cluster cache resides in one or more blocks of shared memory referenced by both the repository manager process, the queue manager execution controller, and any other MQ processes which need to query information about the cluster. Typically this will be by far the largest memory usage associated with amqrrmfa, though of course there will also be a certain amount of local storage use as for any application. Similarly on WMQ for z/OS, the cluster cache storage is mapped into both the CHIN and MSTR address spaces – note that today this is 31bit addressed ‘below the bar’ storage in both cases, so a large local cache will affect the memory available for other usage such as buffer pools in the queue manager address space.

The ‘System Resources’ section of previous post http://tinyurl.com/bigclusters gives some guidance on how much memory may be consumed for a particular cluster configuration.

The Repository Queues

All of the work done by the repository manager affects a small set of local queues, and by monitoring these queues we can learn a lot about what the repository manager is currently doing. It is a good idea to configure queue statistics on some of these queues if you are concerned/curious about the activities and performance of the repository process. For example, to gather statistics about the repository command queue at 10 minute intervals you could issue:

ALTER QL(SYSTEM.CLUSTER.COMMAND.QUEUE) STATQ(ON)

ALTER QMGR STATINT(600)

You will need to restart the queue manager to release handles on the queues before these commands take effect. The amqsmon sample (see Infocenter) can be used as a simple means to format the statistics messages generated – your prefered monitoring tools may have options to process these in a variety of more advanced ways.

Here are brief descriptions of the important queues to consider:

The SYSTEM.CLUSTER.REPOSITORY.QUEUE – the contents of the local cache is persisted to this queue so that if the queue manager is restarted at any time it does not need to rediscover everything it knew about the cluster. This queue will be updated every time the cache is modified for any reason – the size of the data on the queue is not that interesting as checkpointing mechanisms mean that this will jump around, but high frequency of PUTs and GETs to the queue indicates a lot of activity in the cache.

The SYSTEM.CLUSTER.COMMAND.QUEUE – all work except periodic maintenance is queued for the repository process here. This includes local requests for information (the first time an application accesses a cluster resource) and data sent from other repositories in the cluster. Most of the time this queue should be empty – a backlog of messages either indicates a very busy period (perhaps someone else in the cluster has issued a REFRESH, in which case high CPU would be another symptom) or possibly a problem with the local repository manager task.

The SYSTEM.CLUSTER.TRANSMIT.QUEUE – traditionally, all messages destined for other queue managers in the cluster flowed through this queue, and this includes repository maintenance messages exchanged between queue managers. At times when application traffic is low, monitoring this queue can give an indication of the communication happening between repositories. If you have chosen to configure multiple transmission queues (available from version 7.5) you will know which additional queues need monitoring – although this may be more work, it can allow finer levels of detail to be seen.

The SYSTEM.CLUSTER.HISTORY.QUEUE is mentioned here only for completeness. This queue is primarily to assist the IBM service team in diagnosing problems and it is not usually useful to configure monitoring here.

Error Handling and Troubleshooting

The cluster repository process can be interrupted for a number of reasons: local system resource problems (e.g. lack of memory), configuration problems (such as someone accidentally disabling access to one of the SYSTEM.CLUSTER queues), bad messages being sent to the input COMMAND queue maliciously or due to queue manager errors elsewhere, or of course program failures in the repository process itself.

If the repository process has stopped processing for one of these reasons, the cluster cache will not be kept up to date, and eventually applications will stop being able to put messages to cluster destinations – even before then there is the risk that updates will not have been received and the wrong routing information will be being used. This is therefore seen as a serious situation, and it is very important to monitor the error logs/chinit log for your queue manager for repository errors and take the appropriate action.

For most types of error there is a ‘grace period’ of retrying cluster operations before the repository manager and eventually the queue manager will shut down. See the clustering and best practices section of the Infocenter for more details of what to look for and what actions can be taken in these situations.

Note that you should never deliberately interrupt the repository manager by attempting to kill the process unless explicitly recommended by IBM service, nor attempt to restart it ‘manually’ (independently from queue manager restart). The process is tightly coupled to the queue manager (via shared memory etc.) and this can cause serious problems, particularly in more recent versions of the product, as well as leaving the cache in an unknown state and out of sync with the rest of the cluster.

Ever seen the advice for ‘Best Practice’ in a WMQ cluster to only have two Full Repositories and wondered why? This post is for you.

To recap, one of the fundamental components of a cluster is the Full Repository (FR). These are queue managers like any other, but chosen to be a ‘bootstrapping’ point for a cluster by holding a complete image of all the queue managers and objects (channels, queues, topics) shared in that cluster. When a new object is defined, the definition is sent up to the FRs. Then, when an application on a PR (partial repository) uses an object for the first time, the information is fetched from a FR and cached at the point of use. In both of these situations, two full repositories are used to push to or fetch from, so that if one is temporarily unavailable, another should be able to honour the update or request.

For high availability, administrators often feel that more than two FRs might be helpful. So what happens in our cluster if we introduce a 3rd or 4th FR?

When a queue manager advertises an object definition, it picks two of the full repositories to inform as before (which two is based on the normal workload balancing algorithm). These, upon receipt, will forward on to any other Full Repositories and each other. It is assumed that object definitions change relatively infrequently - you probably don’t have applications defining thousands of queues per second and advertising them to the cluster - so this intercommunication is not a big overhead and ensures all FRs are consistent (they will compare sequence numbers of the updates to make sure the latest and greatest definition flows and is stored everywhere). This also means that the partial repository, which is more likely to be running on a relatively low powered host, only needs to start two channels to advertise.

When a queue manager requests a definition, the process is almost reversed. The request is sent to two FRs, which should already know about any definitions because of the above ‘advertising’ flow. There is therefore no need to go and fetch from other Full Repositories. This is helpful as applications coming and going and accessing new/different objects is more likely to be a regular occurrence, and as applications move around we can expire definitions from some local PR caches and add them to others. Inter FR chatter is therefore minimised when this happens.

So far, so good. However, there is another piece to the puzzle which is where we run into issues. When we request information about an object, the FRs which we contacted remember that we asked for that information (this is sometimes referred to as a ‘cluster subscription’ – not to be confused with publish/subscribe application subscriptions.) These subscriptions mean that if the object changes, is deleted, or another instance is defined somewhere (for instance a secondary queue for workload balancing), the partial repository gets notified. These subscriptions only exist on the FRs we originally contacted with our request.

So what happens now if we have three or four FRs and two of them become unavailable? As always, any cached definitions at PRs are still valid and can be used for at least the next 60 days, so from the outset most applications will be fine. However, presumably we added additional FRs to try and provide high availability even for new and changed definitions during this outage.

Because we have more than two FRs, our object definition publications and our request subscriptions will have been balanced across the pool. For some given objects on some PRs, the only subscriptions will be on the two which are now out of action. So although new ‘advertising’ and new requests can be processed perfectly well by the remaining FRs, updates on those particular objects will never be received by the unlucky PRs.

The end result of this for an administrator tends to be that everything ‘seems to be working’ until we hit a problem and complete confusion – although we can see we have two FRs available (and that everyone seems to be talking to them ok), some particular changed, added, or deleted object definition has not been flowed everywhere we expected. At this point if the failed FRs cannot be soon recovered, the best option is probably to clear out the cache on the PR using the REFRESH command, which will force it to remake its subscriptions to the available FRs – but this isn’t an ideal situation to be in.

Hopefully this clarifies why the recommendation is always to keep to two Full Repositories – as long as these are kept appropriately separate from each other, they should provide a sufficient level of reliability and do not usually need to both have ‘9 9s’ availability for smooth functioning of the cluster (because of the caching nature of PRs). In the rare situation where this is genuinely not seen as sufficient, options include using HA Clustering or Multi Instance queue managers to increase the availability of the FR queue managers. I hope it is clear from this post that it is not sufficient to try and make up for unreliable systems hosting the full repositories through sheer ‘weight of numbers’. If you REALLY want to go ahead with three or more FRs the option remains, but it’s important to bear in mind the above information in planning your ‘outage’ response.

This subject is another from the ‘Frequently asked questions’ bag for WMQ Clusters. This one comes in various sub categories:

Can I mix different versions of WMQ in a single cluster?

What are the steps to migrate a cluster queue manager from one version to another?

What about special considerations for Full Repositories?

And I’ll try to tackle these all together here.

The first thing to understand is that WebSphere MQ Clustering doesn’t try to be too smart – anything cluster related that a particular repository process doesn’t understand, it just ignores. That might sound philosophically closed minded – but it’s great news for an administrator trying to deal with an estate of hundreds of queue managers in a cluster. Keeping them all at the same level would be an impossible task, so communication between cluster queue managers works on a lowest common denominator basis. Administration messages include all information which the sender knows about, and if the recipient expected more, he fills in the blanks with defaults. Any excess information is discarded.

In theory then, any WMQ Queue manager will happily participate in a cluster with any other version from the day clusters were introduced (but of course, if a release is out of support you’d try it at your own risk). However, there are a few more things to think about.

If the Full Repository doesn’t understand some fields of a structure, say a new property of a queue, or a whole new object type (for example Topic objects when they were introduced in Version 7), it won’t be able to store and forward the new properties or objects. For this reason, we always recommend upgrading FRs first if at all possible. Don’t worry if you can’t do this for some reason – the cluster will still work fine, but you need to be careful not to make use of any new cluster function until you do get round to it. If you do try to make use of new features, bear in mind the above principle – WMQ will ignore what it doesn’t understand, so things might look as though they’re working… until you realise that that new workload balancing parameter isn’t having the effect you wanted!

One specific situation in which you might deliberately continue with FRs at a lower release than PRs is when FRs are hosted on z/OS. At time of writing the highest level released on this platform is 7.1, while WMQ for Distributed is at version 7.5. In this particular case, there is no concern with allowing the FRs to remain at 7.1, as there are no changes to the structures exchanged between repositories in these two releases. This is because the majority of the changes in 7.5 simply relate to the repackaging of the FTE (now Managed File Transfer) and AMS components. Even the one cluster related change, allowing the use of multiple cluster transmit queues in 7.5, can safely be exploited in this configuration as all changes are local to the individual queue manager, and not flowed through the Full Repositories.

When it comes to migrating an individual queue manager in a cluster, whether it’s a Full Repository or not, the process is much the same as for any other migration: the only additional consideration is that you might want to ‘SUSPEND’ the queue manager from the cluster before taking it down, which will give other Queue Managers a hint to avoid sending work to cluster queues hosted here. However, bear in mind that any message sent to this particular queue manager by name, or sent BIND_ON_OPEN will have to wait on the transmission queues until it becomes available again. Don’t forget to take a backup as a rollback strategy (and if you do have to use it, remember that you need to REFRESH the queue manager in the cluster after restoring). On z/OS make sure that you have any backwards migration PTFs in place.

After starting the queue manager using the new product version, remember to un-SUSPEND the queue manager, and no further action is required. No REFRESH CLUSTER for example is needed as part of a clean upgrade.

For your Full Repositories, always take them down and bring them back up one at a time (so that there is always one available for business as usual). Don’t worry too much about the fact that that means you will only have one for a period – this is why you had two in the first place (and the cluster can continue processing for weeks without any FRs present in the worst case – only first time MQOPENs and new or altered object definitions will be affected.)

One of the most common questions asked regarding WebSphere MQ Clusters is "How big is a big cluster?”. I thought it would be useful to put together a post side-stepping this question once and for all (at great length - but for those who make it to the end, there may be a sort of answer as a reward).

There are all sorts of factors which will affect how a cluster scales, and before we can begin to tackle what makes a cluster 'big' we need to think about each in turn.

Full Repositories

Much has been said elsewhere about choosing Full Repositories (FRs) for a cluster which we won't repeat here, but some basic best practices bear repeating which will particularly affect cluster scalability:

There should be exactly two FRs for every cluster (probably a topic for a separate post another day)

They need to be able to connect to every queue manager in the cluster, ideally simultaneously, so must at least be able to support that many channel pairs (see below, ‘Channel Considerations’).

Avoid using the same FRs for multiple clusters

For large or busy clusters, avoid hosting application queues on the FRs.

System resources

The same considerations apply for all queue managers to a degree, but by far the highest overhead from clustering falls on the Full Repositories (which will need to persist a complete record of every object in the cluster).

In either Full or Partial Repository caches, there are implications for memory requirements, CPU and disk usage on these queue managers. To give a very broad idea, memory/disk will be in the order of a minimum of 1kB for every clustered object (queues, queue managers, and topics) discovered.

There will then also be an overhead of around 0.5kB for every different 'subscription' for those objects (created when a given queue manager notifies the cluster that it is making use of a particular cluster resource). As you can see, the subscription information means that where resources are actually being accessed in the cluster - where applications connect - makes a big difference to the load on the Full Repositories in coordinating this information. Grouping applications accessing the same queues together will mean less of this to maintain, and avoiding overlapping clusters will have a dramatic positive effect, see the Topology Complexity section later.

I do not want to recommend trying to be precise with these measurements, but if a quick estimate based on the above was in the region of 1GB for a given environment it might be considered 'very large'. There is an upper limit of 2GB for the cluster cache, but the practical limit is typically lower depending on platform, version of WMQ and the like. CPU usage, particularly at hourly cluster maintenance intervals, will also increase with larger numbers of objects to maintain.

Channel considerations

How many active channels a queue manager can support will vary by platform/version/hardware etc. - some information on the overheads per channel instance can be found in the published performance report support packs.

As mentioned above, there are situations in which Full Repositories will need to contact every queue manager in the cluster to share information. Examples are when object definitions change, or when processing a REFRESH (for example, if a queue manager has to be restored from backup). If not all channels can start simultaneously this is not necessarily a disaster - the information can be forwarded when some other channels have stopped freeing resources. However, for smooth functioning of the cluster it is better to be sure that all cluster state is being shared in a timely manner.

For other queue managers, channel requirements will vary dramatically depending on how your applications and queues are distributed and configured - (see also the next section). Publish Subscribe clusters have particularly high inter queue manager communication requirements - it is best to assume that every queue manager will need to talk to every other queue manager in a 'Pub/Sub Cluster', so typically these will not be able to scale to the same degree.

Workload and application design

Clusters are very deliberately designed so that most information is only shared on a 'need to know' basis with partial repositories. This means that clusters can scale quite well to large numbers (thousands) of queue managers where actual communication between individual queue managers is quite 'sparse'. A good (and typical) example of this is a 'star' topology where a few heavy duty servers in a datacenter host the Full Repositories and certain back-end application queue managers, and a much larger number of smaller servers - often 'in-store' or 'branch' installations - connect back to this hub.

In this situation, most of the queue managers do not need to cache large amounts of cluster data, and need only run a few pairs of channels. To help further regulate this kind of configuration there are some specific workload balancing parameters - in particular CLWLMRUC - which can control how many servers a particular 'satellite' queue manager should use to service application requests.

Topology complexity

Placing queue managers in multiple clusters puts significant extra strain on each in the overlapping zone - the 'subscription' overhead discussed in 'System Resources' above is multiplied by the number of overlapping clusters involved. In particular, it is best to avoid one Full Repository hosting multiple large clusters for this reason, or defining large numbers of objects in 'gateway' queue managers which participate in multiple clusters.

So, if we keep all the above in mind, can we come up with a very rough definition of a large cluster? The answer is no - there isn't one definition of 'large'. However, we can make some statements about what might be considered large for a particular 'type' of cluster. Please bear in mind that these are only very approximate guidelines and may vary massively in light of any of the factors above, hardware environment etc. Some deployments will certainly have larger numbers today configured in such a way that they are not seeing any issues.

With those provisos, a 'big WMQ Cluster' might range from:

A publish subscribe cluster where applications can come and go as they please, dynamically creating topics which are used for many-to-many communications: 5 queue managers

50 overlapping clusters (maybe to provide dedicated application channels) involving the same small set of queue managers and a few hundred queues in total: 10 queue managers

A tightly managed publish/subscribe cluster with a few administratively controlled topics and carefully sized systems which can support all required channel activity: 100 queue managers

A point to point (queued only), many-to-many mesh in a single cluster: 500 queue managers

A point to point, carefully managed star topology, with high powered FRs and concentrated application servers: 3000 queue managers

As a final note, however much planning and preparation is done in advance, maintaining a large cluster deployment and continuing to scale outwards is also going to involve continuous monitoring of the system over time to identify bottlenecks and potential problems before they occur. While there are many tools and pieces of documentation out there to help with this, monitoring the cluster repository processes in particular may need to be the subject of a further post: watch this space…