Services
& Support

Support Alerts 2005

November 21, 2005 – Caché Cluster Journal Corruption

InterSystems Worldwide Response Center (WRC) has identified a defect that could result in incorrect journal records in a Caché cluster.

This risk only exists for shared disk clusters on HP Tru64 and HP OpenVMS. This risk exists in all currently released versions of Caché starting with 5.0.0.

The defect introduces out of order cluster sequence numbers into a journal file. This occurs when a process running on a non-master node of a cluster has used ^%NOJRN to disable journaling and issues $Increment().

Data in Caché databases is unaffected until a corrupted journal file is applied to a database. Journals are applied primarily in one of three ways:

Automatically during cluster failover.

During manual journal restore which normally occurs in conjunction with a restore of backups.

Journal files are the mechanism used by Caché shadowing to move data from the primary server to the shadow system. If shadowing is running, journal files are constantly being automatically applied to the shadow.

The correction to this defect is identified as HYY1173 and is targeted for Caché maintenance kit 5.0.21. It can also be requested in an Ad Hoc distribution from the WRC. After applying the correction, a backup should be taken to ensure that your disaster recovery procedures no longer require restoring the affected journals. Any shadows should also be resynchronized with a copy of the cluster databases.

October 18, 2005 – Caché and Exec-Shield feature of Red Hat Linux

InterSystems Worldwide Response Center (WRC) has diagnosed several reported Caché problems as conflicts with a new Red Hat Linux feature introduced in Red Hat Enterprise Linux Update 3. The feature is Exec-shield and a full description can be found in therelease notes available at the Red Hat website.

This alert applies to all released versions of Caché and all versions of Red Hat that include the Exec-shield feature.

Reported problems include:

Caché startup aborted with the error message “Error 22 attaching shared memory”. This error has been reported with cache executables linked to custom code using clink.

“Shared memory segment does not exist” errors when logging into Caché.

Note that the Exec-shield feature is enabled by default so that installing the Red Hat update is enough to introduce the risk for Caché.

The Worldwide Response Center recommends disabling the Exec-shield feature of Red Hat for installations where Caché is installed. Instructions for disabling the feature are included with the Red Hat release notes referenced above.

September 30, 2005 – Caché Dead Job Cleanup – Update

The modification of the default setting to disable dead job cleanup is introduced in Caché 5.0.20 rather than in Caché 5.0.16 as stated in the earlier alert (this earlier alert has been modified to reflect this new information).

InterSystems continues to recommend that dead job cleanup remain disabled in all Caché 5.0.x production system.

Sept 20, 2005 –Lock Deletion Problem with ECP Clients

InterSystems has corrected a defect that can cause application locks to be deleted incorrectly.

This problem exists only in ECP configurations using versions of Caché prior to 5.0.20 (any platform).

When it is necessary to delete locks for a process running on an ECP client, the correct procedure is to use Control Panel or ^LOCKTAB to delete the locks on the ECP CLIENT. The defect is that Control Panel allows deletion of these locks on the ECP SERVER. This has two negative effects:

1) ECP SERVER-based locks are deleted for all processes running on the ECP CLIENT, not just the target process.
2) Lock inconsistency can occur with other ECP clients.

Until the correction is in place steps should be taken to ensure that locks for ECP client based processes are never manually deleted on the ECP server.

Note: The need to manually delete locks in a production environment should be very rare. If this is a common occurrence please contact the WRC (Worldwide Response Center) to help you investigate why this was necessary.

The correction for this defect is identified as TCS015. The correction removes the ability from Control Panel to delete ECP server-based locks for an ECP client-based process. The WRC (Worldwide Response Center) recommends upgrading to at least Caché 5.0.20 to protect against this defect. TCS015 can also be requested in an Ad Hoc distribution of Caché.

September 14, 2005 – SQL: Implicit Join / TLEVEL

InterSystems has identified and fixed to defects within Caché SQL.
Both problems were introduced with Caché Version 5.0.18.

Caché Versions:
5.0.18
5.0.19

Defects:
#1 Implicit Join is broken

A problem has been corrected where an UPDATE statement with an implicit join in the WHERE clause using arrow syntax could result in an <UNDEFINED> error running the query or no rows being updated when there should have been.

#2 TLEVEL not being adjusted properly

A problem has been fixed where multiple SQL insert/update within a single method or procedure that is enclosed within a transaction in AUTO_COMMIT mode may commit a transaction that was started by the application code and not the filer.

Sample:

TSTART
…
&sql(insert…)
…
TCOMMIT

TLEVEL is not correct!

Corrections:
The correction for defect #1 is identified as DPV2502, for #2 as DPV2506.
Both corrections are included in released versions starting with 5.0.20.
It is also available from the WRC in Ad Hoc distributions.

August 18, 2005 – Data Integrity on UNIX/Linux Platforms

InterSystems has identified a defect that could result in data integrity problems on UNIX/Linux platforms.

This problem exists on all versions of Caché prior to 5.0.16. Caché is at risk on all UNIX/Linux platforms.

The problem is triggered when the operating system returns an error code to Caché indicating that an fsync() system call has failed. Caché may improperly handle the error code with the result being a corrupted database, WIJ or Journal file. The most likely scenario for this to occur is when a storage device becomes unavailable while the OS continues to run.

The correction for this defect is identified as JO1907 and is included in released versions starting with Caché 5.0.16. It is also available from the WRC in Ad Hoc distributions. With JO1907, Caché will detect and properly handle an error code indicating a failed fsync().

August 15, 2005 – Data Inconsistency With More Than 31 ECP Connections

InterSystems has corrected a defect that could cause data inconsistency between ECP server and client or between ECP clients. This defect could also cause Transaction Processing to function improperly.

All currently released versions of Caché (5.0.18 is the current release) are at risk for this defect. This problem only occurs on 64 bit platforms.

This defect is only triggered when a Caché instance is serving more than 31 ECP connections. In order to determine if your system is at risk use the following on your ECP server:

>w $system.ECP.MaxServerConnections()

The value returned is the number of ECP connections configured.

This value can be modified in Configuration Manager -> Advanced -> Network -> This System as an ECP Server -> Max # of ECP Clients.

Note that simply configuring a system to support more than 31 connections is not sufficient to trigger this defect. The defect is triggered only when a system is actually serving more that 31 connections. The number of connections being served can be determined with the following call:

>w $system.ECP.NumServerConnections()

InterSystems strongly recommends that any installation which meets the above conditions should reduce the number of ECP connections configured to less than 32 until the correction is installed. The correction is identified as GK438 and can be requested from the WRC in an Ad Hoc distribution of Caché. GK438 is currently targeted for inclusion in Caché 5.0.19 Maintenance Kit.

If you have any questions regarding this alert, please contact the Worldwide Response Center (WRC) at support@intersystems.com.

July 13, 2005 – Caché in Failover Clusters

InterSystems WRC has identified an issue with Caché running on failover clusters that may prevent proper Caché startup during certain kinds of failover scenarios. This issue affects Caché 5.0.13 and above in any failover cluster. Caché releases prior to 5.0.13 and true clusters on OpenVMS and Tru64 UNIX are not affected.

When Caché is not shutdown cleanly the cache.ids file is not deleted. This is expected behavior but a change in Caché 5.0.13 related to information in this file causes a problem with startup on the failover node.

InterSystems recommends for OpenVMS and UNIX/Linux that failover scripts be modified to delete the cache.ids file before starting Caché in a failover situation.

InterSystems recommends for Windows platforms, upgrading to Caché 5.0.17 or higher and modifying the failover configuration to delete the cache.ids file before starting Caché in a failover situation. Caché 5.0.17 includes a change identified as LRS948, which is required under Windows to fix a potential timing issue with deleting the cache.ids file on startup.

Examples:

Under UNIX/Linux

If your instance was installed in directory /usr/cachesys, you would add a command such as:

rm -f /usr/cachesys/mgr/cache.ids

Under OpenVMS

If your instance was installed in directory DISK$:[CACHESYS], you would add a command such as:

DEL DISK$:[CACHESYS.MGR]CACHE.IDS;*

Under Windows

During Cluster Group definition in the Cluster Administrator, create a Generic Application resource named, for example, DelFile, whose purpose is to delete cache.ids. It should have a Dependency on the Physical Disk resource that describes the disk containing the Caché instance (and thus it’s cache.ids file) and the Generic Service resource that describes the Cache Controller should depend on this new Generic Application resource. The Advanced property for DelFile should have “Do not restart” checked, and “Affect the group” unchecked. The Parameters property should contain the location of the Caché Manager’s directory for the instance and the delete command. Suppose for example that Caché is installed in R:\Cachesys, the Parameters box should contain:

If you have any questions regarding this alert, please contact the Worldwide Response Center (WRC) at support@intersystems.com.

May 31, 2005 – Advisory Regarding Caché and AIX Default Parameters

The Worldwide Response Center (WRC) Performance Team has identified several AIX parameters that can adversely affect performance. The settings and recommendations are detailed below. If there are any questions about the impact to your system InterSystems recommends that you consult with the WRC Performance Team or with your AIX supplier before making any changes.

These recommendations are independent of Caché versions and apply to both JFS and Enhanced JFS (aka JFS2) file systems.

IO Pacing Parameter

This has caused severe performance problems on Caché production systems.

The current setting for IO Pacing high and low water marks can be viewed/changed by issuing the ‘smitty chgsys’ command.

Currently, ISC is using the following recommendation for determining the appropriate high water mark. It is prudent to verify this recommendation with IBM as it may change in the future.

High water mark = (4 * n) + 1
n = the maximum number of spindles any one file (database, journal, or WIJ) spans across. For example a CACHE.DAT file is stored on a storage array, and the LUN (or file system) where it resides consists of 16 spindles/drives. The high water mark would be (4*16)+1 = 65.

InterSystems recommends that the low water mark be set to 75% of the high water mark.

*Note: If not in a HACMP cluster, the high and low water marks should both be set to 0 (zero), so that IO Pacing is disabled.

Further details on IO Pacing can be found on IBM’s web pages at the following address:

File System Mount Option

For optimal performance CACHE.DAT, CACHE.WIJ and Caché journal files should always be placed on file systems with the –rbrw (release behind for reads and writes) mount option.

Memory Management Parameters

Depending on how many file systems you have and the activity on them, the number of memory structures available to JFS or JFS2 could be limited and cause delay in IO operations waiting for memory structures to become available.

To monitor these metrics, you can issue a ‘vmstat –v’ command, wait 2 minutes, and then issue another ‘vmstat –v’ command. The output will look similar to the following:

For instructions on how to increase any of these parameters, please refer to the IBM documentation located on the following URLs:

When increasing these parameters from the default values, the first increase should be an additional 50% of the current value. Then check the ‘vmstat –v’ output again. Remember to run vmstat twice two minutes apart. If that field is still increasing, then increase again by the same amount. Continue this step until the field stops increasing between vmstat reports. As always when changing system settings, it is prudent to keep a record of the value of the original settings.

*Note: Remember to change both the current and the reboot values, and check the ‘vmstat –v’output regularly because IO patterns may change over time (hours, days, weeks).

If you have any questions regarding this advisory, please contact the Worldwide Response Center at support@intersystems.com.

May 11, 2005 – Cached Query Storage Corruption

A problem has been corrected where under certain (rare) circumstances the same Cached Query routine could be used by two different SQL statements.

Where:
All versions of Caché 5.0.x

Symptoms:
We store information about SQL statements that have been run on the server. This information is used to identify what routine should be run if a query is requested more than once. If this information gets corrupted you most likely will get <NOROUTINE>, <NOLINE>, or application Data Type errors.

Cause:
This can happen if someone does a Purge of Cached Queues on a busy system that makes use of TSTART and TCOMMIT in their SQL application. Caché will also do automatic purges when DDL statements are run, Classes are compiled and in the morning to clean up old unused queries.

Solution:
InterSystems has a patch available that will prevent this problem from happening. If you would like to receive this patch please contact Support and reference Dev Change DPL2422. This correction will be part of Caché 5.1 and will be included in Caché 5.0.17.

If you have any questions regarding this advisory, please contact the Worldwide Response Center at support@intersystems.com.

May 10, 2005 – Application Distribution with Caché

How to Package Your Server-Side Application within the Caché Installation Kit

Many Application Partners package their server-side application into an OS specific installation script that installs both Caché and their application. The downside of this is that the APs will most likely need to write and maintain a separate installation script for each OS that they support.

Here is an alternative solution that uses the files cbootuser.rsa and update.rsa to install the server-side portion of an application during Caché installation. This avoids most of the OS specific code that one must write using the previous method.

cbootuser.rsa – This file should contain an export of a routine called ^cbootuser and should be placed in the ‘install’ directory of the Caché installation kit. Upon installation of Caché, if this file exists, it is imported and the ^cbootuser routine is executed. At this point of the installation ^cbootuser will have access to the variable “SrcDir”(see note below). “SrcDir” will contain the complete path to the installation kit which is necessary to locate any user code that needs to be imported.

update.rsa – This file should also be placed in the ‘install’ directory of the Caché installation kit. However it does not need to contain any specific routines like cbootuser.rsa did. It can hold any number of routines that you wish to be imported during Caché installation. This is sufficient for .int or .mac code but will fall short if you have classes that you want to import from XML files. The ‘install’ directory can be found at the following locations on each platform.

Windows: Install\

UNIX: dist\Cache\install\misc

VMS: [.DIST.CACHE.INSTALL.MISC]

Caché 5.0.x has a limitation that requires more work to be done to import classes from an XML file. The following steps work around this limitation by using cbootuser.rsa, update.rsa, and a special ^ZSTUINSTALL routine designed to run only once.

Create a cbootuser routine whose only purpose is to save the value of “SrcDir”. It can simply contain 1 line of code that does this. Then export this routine to the cbootuser.rsa file.

Create a ^ZSTUINSTALL routine that does all of the work of your installation.

In routine ^ZSTU or SYSTEM^%ZSTART add a call to your ^ZSTUINSTALL routine. ^ZSTU and SYSTEM^%ZSTART are called as part of the Caché installation. Your ^ZSTUINSTALL will have access to anything that ^ZSTU or SYSTEM^%ZSTART would normally have access to. For example, it can make use of $System.OBJ.Load() and ^SYS(“SrcDir”) to load your classes from XML files which you included in the Caché installation kit. It is important to have ^ZSTU only call ^ZSTUINSTALL the first time it is run. The easiest way to ensure this happens is to use a global as a flag. For example, set ^INSTALLING=1 from ^cbootuser and the set it back to 0 after ^ZSTU or SYSTEM^%ZSTART executes ^ZSTUINSTALL. Have ^ZSTU or SYSTEM^%ZSTART only call ^ZSTUINSTALL if ^INSTALLING=1.

Then export ^ZSTU, ^%ZSTART, ^ZSTUINSTALL, and any routines you want imported during installation to the update.rsa file.

NOTE: There is a defect in Caché 5.0.x where SrcDir accidentally gets newed and is not available to ^cbootuser. The fix for this is ALE675 and it will be needed in order to use this method. ALE675 will be delivered in a future maintenance kit.

Note: In Caché 5.1 the limitation regarding class importation is removed and all code and classes can be imported using ^cbootuser.

If you have any questions regarding this advisory, please contact the Worldwide Response Center at support@intersystems.com.

April 5, 2005 – Updated GMHEAP Requirements in Caché 5.0.13

Enhancements to shadowing functionality and performance in Caché 5.0.13 require additional amounts of the Caché GMHEAP resource. Systems which were sufficiently configured prior to 5.0.13 may be underconfigured after upgrading.

This additional requirement applies to Caché 5.0.13 and above on all operating systems and platforms.

The following recommendation was included in the 5.0.13 release notes but several customers have still encountered problems.

Warning: Since the journal reader and database updaters communicate via shared memory allocated from the generic memory heap, it is important that one configures the generic heap as large as possible for the optimal performance of shadow dejournaling. The minimum requirement is 4 pages for shadowing. Shadowing may fail to start if there is insufficient space in the general memory heap.

Note: To take full advantage of the improvements to shadowing, the Generic Memory Heap size in a Cache configuration should be increased by 2MB per CPU on systems where a shadow is expected to run. Increases larger than 2MB per CPU may improve performance even further, but this is dependent on the nature of the shadowing demands.

The new version of ^GMHEAPREQ also implements an entry point to be called programmatically:

It can be loaded into any namespace using the %RI utility. If running this utility on versions of Caché prior to 5.0, please ignore the <SYNTAX> errors during the import process. This area of code is specific to 5.0+ and will not be executed in prior versions.

If you have any questions regarding this advisory, please contact the Worldwide Response Center at support@intersystems.com.

After reviewing the methods the Worldwide Response Center (WRC) uses to connect to client systems, I have selected Enexity SecureLink as our new standard for connecting to client systems. InterSystems has always been committed to providing excellent service in a way that puts the customer in control. SecureLink fits nicely into this philosophy and I believe it is the most secure, powerful and easy-to-use remote access tool available today.

SecureLink provides many benefits to our clients. You can decide when to grant access, who is allowed access, and how much access is given during a support call. SecureLink sessions are always initiated by you, from inside your network, and never require any modification to the firewall. SecureLink has detailed auditing and monitoring capabilities and is fully encrypted. I believe SecureLink will allow us to resolve complex issues more quickly, easily, and securely, virtually eliminating the need to deal with VPN complexity and delays, particularly at a time when you need assistance quickly.

Using SecureLink requires only a one-time download of a small component to one server on your network, preferably your workstation. Failure to install the component may result in service delay on your next call, so I would encourage you to download the component as soon as possible. The entire process should take about five minutes. After you have downloaded the component, contact the Worldwide Response Center and we will make a test connection to your server and demonstrate how it might be used in a support situation.

To learn more about SecureLink and its many benefits, visit the remote access section of our WRC Support Page. If you have any questions, please contact us at your convenience.

March 29, 2005 – Caché Dead Job Cleanup

InterSystems has identified a weakness in dead job cleanup functionality that has been a contributing factor to three catastrophic failures of Caché on production systems.

This weakness affects all Caché 5.0.x versions on all platforms.

The likelihood of encountering this problem is extremely low but the impact can be very high. Until a remedy is available InterSystems recommends disabling dead job cleanup functionality. Disabling is accomplished by adding the following call to the ^ZSTU routine. This will be the default starting with the Caché 5.0.20 maintenance release.

set %SW=20,%VAL=1 d INT^SWSET

Dead job cleanup is intended to prevent problems when processes improperly exit Caché. Issues such as process or systems hangs due to resources not being released and reduction in available license units are possible side effects of improperly exited processes. One possible cause of a dead job is a process being terminated at the OS level. Specific to Windows only, a process that exits due to an access violation can also leave a dead job.

In the case of an access violation a message will be logged to the cconsole.log file. InterSystems Worldwide Response Center (WRC) should always be notified of such an occurrence so that root cause can be investigated.

A problem exists in the cspbroker.jar file released with Caché 5.0.13 (all platforms). CSP applications which use Java-dependent #server hyperevents will not function properly. Under the default CSP configuration, all #server hyperevents require Java.

In Caché 5.0.13, the CSP technology gives the developer the choice of running #server hyperevents via a Java applet as in previous versions. However, it also allows #server hyperevents to run without Java.

There are two ways to easily avoid this problem:

In CSP application configuration, choose to implement hyperevents using XMLHTTPRequest. This is an XML based mechanism in all recent browsers that allows synchronous communication between client and server.

To choose this option, please do the following:

Click on the Caché Cube and select Caché Configuration Manager.

Select the CSP tab.

Expand Applications by clicking [+].

Choose the CSP application you wish to configure and expand it by clicking [+].

Under your application name, select HyperEvent Implementation.

Click the Change button.

Select Use XMLHttpRequest object.

Click OK.

Repeat steps 4-8 for each of your CSP applications.

Click OK. Activate the changes.

Alternately, if you wish to continue using a Java applet as the #server hyperevent mechanism, InterSystems has a corrected jar that can be downloaded from our FTP site.