Sybase Replication Server: Out of Mutexes? wtf?

For those of us that use Sybase’s Replication Server, we have long ago been pacified into believing that there really isn’t much that can be done with Replication Server’s quirks but to endure them.

In today’s edition of the Sybase RepServer Quirks we take a look at the “out of mutexes” error. A mutex is an exclusive lock on a resource. If you receive an error about RepServer running out of mutexes, you just don’t have enough defined through RepServer. Simple huh?

Of course it is. Ahh.. but then how do how many mutexes your particular Sybase RepServer needs? How many grains of sand are on the moon? According to Sybase you should know this already but since you can’t read minds (I hope you can’t because I’m thinking of donuts and not RepServer), I’ll pass on what Sybase is saying when pressed about it (Greg Carter @ Sybase):

> Just to elaborate a bit more on this question of the number of mutexes; as
> you probably know the mutex requirements for RepServer increased
> dramatically with the SMP feature. I have since struggled to come up with a
> formula for estimating mutex requirements so that you may properly set the
> “num_mutexes” configuration. While in the latest iteration of this formula
> I have satisfied myself that all mutexes have been accounted for, still the
> estimate it provides seems to fall short in some cases.
>
> Recent investigations by Connectivity seem to indicate that the problem may
> not be with sizing mutex requirements, but rather with sizing message queue
> (“num_msgqueues”) requirements. It appears that Open Server may be using
> the total of the settings for “num_mutexes” and “num_msgqueues” as the
> upper bound for the creation of these two objects together. So it may be
> that even though “num_mutexes” has been sized properly, if “num_msgqueues”
> is too low then you may see a message regarding the failure to create a
> mutex or the failure to create a message queue depending on which one was
> being created at the time that upper bound was surpassed.
>
> The moral here is that until Open Server resolves this issue you need to
> verify the sizing of both “num_mutexes” and “num_msgqueues” in the event
> that either error message appears since you can not rely on the message to
> indicate which one is low.
>
> For your convenience I’ll include here the latest formulas for estimating
> mutex and message queue requirements. Note that the one for mutexes may not
> agree completely with the one that is given in the 12.6 SMP White Paper – I
> have not compared them.
>
> Mutex requirements for the optimized binary:
> num_mutexes = 75 + Num(partitions) + 4*Num(DSI/S) + 3*Num(DSI/E) +
> 2*Num(Dist) + 2*Num(RepAgent Exec) + 2*Num(RSI User) + 5*Num(Queues) +
> 5*Num(SQT Cache) + Num(rs_subscriptions rows) + Num(RSSD tables) +
> Setting(cm_max_connections) +
> 2*MAX(Admin connections) + 2*Num(Other Connections) + Num(Origins) +
> 2*Num(Threads) + MAX(subscription (de)mat)
>
> Where
> – “Other Connections” are connection to this RepServer including ID Server
> connections, RSM connections, etc.
> – “Origins” are the different origins (or primary databases) that could
> possibly have transactions flowing through this RepServer, whether by a
> RepAgent or by a route (intermediate included)
> – “Threads” includes every thread RepServer may start. These are the
> “Global” thread, the “Initialization” thread, threads for each of the
> daemons (dAIO, dCM, dVersion, dRec, dSub, dStats, dAlarm), RepAgent User
> threads, SQM Writer threads, SQT threads, Distributor threads, DSI/S and
> DSI/E threads, RSI User and RSI threads.
> – “subscription (de)mat” is the number of asynchronous subscription
> management requests for materializing or dematerializing that may be taking
> place at any moment.
>
> Note: For the diagnostic binary you will need to double the figure
> determined with the above formula.
>
> Message queue requirements for the optimized and diagnostic binary:
> num_msgqueues = 10 + Num(DSI/S) + Num(DSI/E) + Num(Queues) + Num(Dist)
>
> Thanks,
> G.Carter

Neither this explanation or the two equations are anywhere in the manuals. I’ve opened feature request 485482 for RepServer to handle this automagically as there really is NO NEED for a RepServer admin to have to worry about this. If you are or ever have run into this problem, give Sybase a holler and tell them to fix this bug.

For some Linux 12.6 replication servers, there’s a hard coded mutex limit (see CR 574371) that we hit with about 160 DSI connections open. You can set num_mutexes higher, and the repserver will accept/display the higher setting…, and ignore it. As far as I can tell, there’s no way to see the actual number of mutexes currently being used. This hardcoded limit is supposed to have been fixed in repserver 15.2, ESD1

The opinions expressed within are the sole rantings of a raving lunatic and in no way reflect the rantings, fits, tantrums, errors, corrections, allocutions, or aimless thoughts of Sybase or its employees or of TeamSybase or ISUG. Any resemblance to reasonable thought, or any official or published opinion of Sybase, TeamSybase or ISUG is merely coincidental, and should be totally ignored.