Every now and then I run across an article or technical publication which interests me and I save the content via bookmark or download it for later consumption. Recently I read an article which was posted on aixmind.com that caught my attention which I wanted to share.

This document describes how to extend the default core dump facility on AIX with the chcore and syscorepath commands,

The default core dump facility on AIX requires minimal configuration, and by default will create a binary file named core in the current working directory for the process that has terminated abnormally. If a core file is successfully created, a CORE_DUMP entry will also be written into the error report. The chcore and syscorepath commands can be used to provide additional functionality, and fix problems that can sometimes occur with the default core dump facility.

Note: See the technote How to Diagnose Core Dump Failure for more information about problems that can occur when using the default core dump facility.

Both chcore and syscorepath allow you to install a directory where all core files will be written. The chcore command also allows you to enable core file compression, and unique core file naming, so that each core file written into the directory will have a unique file name. Both commands (if the chcore -n on option is used) will create unique core file names with the format core.pid.ddhhmmss, where pid is the process ID, dd is the day of the month, hh is the hour in 24-hour format, mm is minutes, and ss is seconds. For security reasons, the default core dump facility will not write a core file for an executable that has the suid or sgidpermissions set, but syscorepath will allow core dumps from these types of binaries to be created. In general you should use the newer chcore command to extend the functionality of the default core dump facility, and use syscorepath only if you need the suid/sgid feature not available with chcore.

Core File Settings

There are only a few configurable core file settings. The most important of these is thefullcore setting. If enabled, fullcore will increase the size of each core file written on the system by including additional information such as shared memory segments that might be needed in order to find the cause of a problem, and additional data needed when debugging multi-threaded programs.

To enable full core:chdev -l sys0 -a fullcore=true

To disable full core:chdev -l sys0 -a fullcore=false

To check the current setting for full core:lsattr -l sys0 -a fullcore -E

To disable core file creation:ulimit -c 0

Note: Core file creation in a particular user account can be disabled if the core ulimit is set to 0. A core ulimit greater than 0 determines the maximum size of the core file in bytes. System ulimits are set and viewed with the ulimit command, and in /etc/security/limits.

chcoreand lscoreCommands

The chcore command can be used to configure directories as core dump repositories, and thelscore command displays the current core file settings. Unlike the default core file facility,chcore allows you to have complete control over where core dump files will be written. Withchcore you can set up a default directory that can be used as a core file repository for all accounts on the system, including root. However each user, again including root, can override the default directory and install a directory in their home directory as a core file repository for core files generated from processes owned by that account.

A central core file repository can be set up for all accounts on the system, but each account may override the default and install a custom repository. Select a file system with plenty of free space for this repository. All accounts will need read and write access to this directory. To prevent users from deleting core files created by other users, the sticky bit should be enabled. Permissions on this directory should normally be 1777, the same as for /tmp.

To set up a system-wide core file repository with unique core file naming, run the following commands as root:cd /path/to/filesystem
mkdir corefiles
chmod 1777 ./corefiles
chcore -p on -n on -l ./corefiles

Note: Add the option -c on to turn on core file compression. The settings will only be effective for newly logged in accounts and will persist across reboots.

To override the system-wide default repository and set up a repository for a specific user, run the following commands as the user:mkdir ~/corefiles
chcore -p on -n on -l ~/corefiles

Note: Add the option -c on to turn on core file compression. The settings will only be effective for newly logged in accounts and will persist across reboots.

To list the system-wide default core file settings:lscore -d

To list the core file settings for a specific user:lscore user

syscorepathCommand

The syscorepath command is less capable than chcore but has the ability to allow creation of core files for suid and sgid programs. This command will only configure a system-wide repository for core files; it does not have the capability to configure user specific repositories. Settings made with this command do not persist across reboots. This command can only be executed by root.

The default core dump facility on AIX is very basic, and can occasionally cause problems when creating core files. The chcore and syscorepath commands can be used to gain more control over how and where core files are written to avoid these problems. The newer chcorecommand is preferred over the less capable syscorepath command, but syscorepath allows the creation of core dumps for suid and sgid executables, and may be used together with chcore.

Overview

Even in a perfectly engineered world, things can break.Hardware that is not redundant can fail, or software can encounter a condition that requires intervention. You can automate some of this intervention. For example, you can enableyour DB2 server to automaticallycollect diagnostic data when it encounters a significant problem. Eventually, however,a human being must look at the data to diagnose and resolve the issue. When the need arises, you can use several DB2 troubleshooting tools that provide highly granular access to diagnostic data.

The information and scenarios in this paper show how you can use the DB2 troubleshooting tools to diagnose problems on your server.

In large database environments, the collection of diagnostic data can introduce an unwanted impact to the system. This paper shows how you can minimize this impact by tailoring the values of a few basic troubleshooting configuration parameters such as diagpath, DUMPDIR, and FODCPATH and by collecting data more selectively.

The result? When things do break, you are well prepared to make troubleshooting as quick and painless as possible.

The following DB2 troubleshooting scenarios are covered in this paper:

Troubleshooting high processor usage spikes

Troubleshooting sort overflows

Troubleshooting locking issues

For each scenario, this paper shows you how to identify the problem symptoms, how to collect the diagnostic data with minimal impact to your database environment, and how to diagnose the cause of the problem.

The target audience for this paper is database and system administrators who have some familiarity with operating system and DB2 commands.

This paper applies to DB2 V10.1 FP2 and later, but many of the features that are described here are available in earlier DB2 versions as well.

AIX error notification allows you to run your own shell scripts or programs automatically in response to specified errors appearing in the AIX error log. These can be hardware and software operator error messages that are logged in the AIX error log. Each time an error is logged in the system error log, the error notification daemon checks the defined notification objects. If the error log entry matches the selection criteria it runs the notify methods.

The notification objects are error conditions that includes a wide range of criteria to define. AIX already contains some predefined notification objects. This section explains how to add a PowerPath notification object to detect an "all path down condition."

This section includes the following information:

◆ “Defining custom automatic error notification”

◆ “Configuring PowerPath custom automatic error notification”

◆ “Creating a script”

◆ “Testing error notification objects”

◆ “Additional error notification objects”

Defining custom automatic error notification

Before you configure the automatic error notification, you must have a valid PowerHA topology and resource configuration.

Automatic error notification should be configured only when the cluster is not running. The following entries will need to be populated with the correct values, as shown in Table 13.

Entry

Description

Notification Object Name

Enter a user-defined name that identifies the error notification object.

Persist across system restart?

Set this field to Yes if you want to use this notification object persistently. Set this field to No if you want to use this object until the next reboot.

Process ID for use by Notify Method

You can specify a process ID for the notify method to use or you can leave it blank. Objects that have a process ID specified should have the Persist across system restart field set to No.

Select Error Class

Choose the appropriate error class. Valid values are:

◆ None: No error class match

◆ All: All error classes

◆ Hardware: Hardware error class

◆ Software: Software error class

◆ Errlogger: Operator notifications, messages from

Select error type

Identify the severity of error log entries. Valid values are:

◆ None: No entry type to match

◆ All: Match all error types

◆ PEND: Impending loss of availability

◆ PERM: Permanent

◆ PERF: Performance degradation

◆ TEMP: Temporary

◆ UNKN: Unknown error type

Match Alertable Errors?

This field is provided for use by network management applications alert agents. Chose None to ignore this entry. Valid values are:

◆ All: Match all alertable errors

◆ TRUE: Matches alertable errors

◆ FALSE:Matches non-alertable errors

Select Error Label

Select an error label associated with the accurate error identifier from the /usr/include/sys/errids.h file. Press F4 for a listing. If your application supports the AIX system error log, you can specify the application-specific error label.

Resource Name

The name of the failing resource. For the hardware error class, this is the device name. For the software class, this is the name of the failing executable. Specify All to match all error labels.

Resource Class

For the hardware error class, the resource class is the device class. It is not applicable for software errors. Specify All to match all resources classes.

Resource Type

Enter the device type by which a resource is known in devices object for hardware error class. Specify All to match all resource types.

Notify Method

Enter the full-path name of the executable file, shell script, or command to be ran whenever an error is logged that matches the defined criteria. You can pass the following variables to the executable:

Once the custom PowerPath notification is added, you will need to create the pp_cl_failover script to include any specific actions you would like the cluster to take. That script is created in the following directory: /usr/es/sbin/cluster/diag.

This example creates a notification script to message the users and halt the node. Create the script to include the following entries and make sure the permissions are set to 500.

The PERSISTENT RESERVE IN / OUT commands have been around for quite some time and are clearly documented in the T10 SCSI Primary Command standards. It isn't until the past year or so where I've seen a higher customer acceptance within products such as VIOS, GPFS and DB2 pureScale.

devrsrv Command

Purpose

Syntax

Description

Thedevrsrvcommand enables you to query and break the single-path and persistent reservations on the device. The command enables you to run the persistent reserve in (prin) and persistent reserve out (prout) service actions.

Thequerysubcommand queries and displays the current reservation status of the device. Thereleasesubcommand releases the reservation on the device by using the single-path reservation.

Theprinsubcommand displays all the registered reservation keys, reservation key holder, and capabilities information. Theproutsubcommand requests service action that reserves a device for the exclusive or shared use of a particular I/O path to the device. Theproutsubcommand supports the following service actions:

Item

Description

RELEASE

Releases the specified persistent reservation for the device.

CLEAR

Clears all the reservation keys and all the persistent reservations.

PREEMPT

Preempts the persistent reservations or removes registrations, or both.

PREEMPT AND ABORT

Preempts persistent reservations or removes registrations, or both and aborts all tasks for all preempted I/O paths to the device.

REGISTER AND IGNORE KEY

Registers the new key value in place of the old key value.

Flags

Item

Description

-c

Specifies the following subcommands:

query

Queries and displays the status of reservations on a device.

release

Releases the device with the single-path reservation by using SCSI-2.

prin

Specifies the persistent reservation in service action.

prout

Specifies the persistent reservation out service action.

-s

Specifies the service action for persistent reservations. The valid service actions for theprinsubcommand follow:

0

READ KEYS

1

READ RESERVATION

2

REPORT CAPABILITIES

The valid service actions for theproutsubcommand follow:

2

RELEASE

3

CLEAR

4

PREEMPT

5

PREEMPT AND ABORT

6

REGISTER AND IGNORE EXISTING KEY

-r

Specifies the reservation key. The-rflag is required for the REGISTER, PREEMPT, PREEMPT AND ABORT, and RELEASE service actions.

-k

Specifies the service action reservation key. The-kflag is required for the REGISTER, PREEMPT, and PREEMPT_ABORT service actions.

Breaks the reservation that is held by other I/O path or host. For single-path reservations, thedevrsrvcommand issues aSC_FORCED_OPENaction to break the reservation. For persistent reservations, thedevrsrvcommand issues aproutsubcommand along with the CLEAR service action to clear the persistent reservation and the registrations.

Examples

The following are the examples related to different scenario.

Query operation

To query the reservation status of thehdisk0device when it is not reserved by any host, enter the following command:

# devrsrv -c query -l hdisk0

Device Reservation State Information
==================================================
Device Name : hdisk0
Device Open On Current Host? : NO
ODM Reservation Policy : SINGLE PATH RESERVE
Device Reservation State : NO RESERVE

The output shows that the device is not opened on the current host and the Object Data Manager (ODM) reservation policy is SINGLE PATH RESERVE. This indicates that the reservation policy is set in the ODM for this device. The device reservation state indicates the reservation that is present on the device. You can find the device reservation state by running a sequence of SCSI commands.

To query the reservation status of thehdisk1device when it is reserved by a host, enter the following command:

# devrsrv -c query -l hdisk1

The device is reserved by using the single path reservation by a host.

Device Reservation State Information
==================================================
Device Name : hdisk1
Device Open On Current Host? : NO
ODM Reservation Policy : SINGLE PATH RESERVE
Device Reservation State : SINGLE PATH RESERVE

To query the reservation status of thehdisk2device when it is reserved on the same host, enter the following command:

To release the persistent reservation and to remove all the registrations from a device server that uses the CLEAR service action by using a registered I/O path with key 555, enter the following command:

# devrsrv -c release -l hdisk0
Device Reservation State Information
==================================================
Device Name : hdisk0
Device Open On Current Host? : YES
ODM Reservation Policy : SINGLE PATH RESERVE
Device Reservation State : SINGLE PATH RESERVE
Device is currently Open on this host by a process.Do you want to continue y/n:y
Command Successful
Reservation cleared on the device. Query operation may not work properly.
Close the application that holds the reservation and retry.

Scenario 2: The current host is not the owner of the reservation.

# devrsrv -c query -l hdisk0
Device Reservation State Information
==================================================
Device Name : hdisk0
Device Open On Current Host? : NO
ODM Reservation Policy : SINGLE PATH RESERVE
Device Reservation State : SINGLE PATH RESERVE
Because the current host does not own the reservation on the device,
try the force option if you want to break the reservation.

# devrsrv -f -l hdisk0

The device is already reserved by using the single-path reservation by another host.

Device Reservation State Information
==================================================
Device Name : hdisk0
Device Open On Current Host? : NO
ODM Reservation Policy : SINGLE PATH RESERVE
Device Reservation State : SINGLE PATH RESERVE
Reservation will be cleared on the device. Do you want to continue y/n:y

After running the release command successfully, the query option must displayNO RESERVEas the device reservation state.

Power Systems have a long history of being the best -- record-setting performance year after year, new innovations for processor and virtual machine performance and efficiency, growing from 16% share in UNIX revenue to 53% share, running many of the world's largest ERP implementations, and helping competitive-installed clients migrate over 3,400 systems to Power in just the last three years.

Click here IBM Power Systemsto learn more on how Power Systems continues to consisistently deliver differentiated customer value from IBM Fellow and CTO, Satya Sharma