11/01/96, 4FAX# 1770
System Crashes in AIX
Special Notices . . . . . . . . . . . . . . . . . . . . 1
About This Document . . . . . . . . . . . . . . . . . . 1
Collecting The Dump . . . . . . . . . . . . . . . . . . 2
Locate The Dump . . . . . . . . . . . . . . . . . . . . 2
Packaging The Dump . . . . . . . . . . . . . . . . . . 3
Verify The Dump . . . . . . . . . . . . . . . . . . . . 4
Copy The Dump . . . . . . . . . . . . . . . . . . . . . 4
Gathering Preliminary Crash Information . . . . . . . . 6
Reader's Comments . . . . . . . . . . . . . . . . . . . 9
SPECIAL NOTICES
Information in this document is correct to the best of our
knowledge at the time of this writing. Please send feedback
by fax to "AIXServ Information" at (512) 823-4009.
Please use this information with care. IBM will not be
responsible for damages of any kind resulting from its use.
The use of this information is the sole responsibility of
the customer and depends on the customer's ability to eval-
uate and integrate this information into the customer's
operational environment.
ABOUT THIS DOCUMENT
This document discusses how to take a user or system initi-
ated dump from your machine and package it for distribution
to your support organization for problem determination.
This document assumes that an IBM software support problem
record has already been opened and the reference number for
that problem is known by the customer. This can be achieved
in 2 ways.
1. Open a software problem with IBM AIX SupportLine. This
can be achieved by calling 1-800-237-5511 option 3.
2. Submit this problem through IBM Program Services.
Retrieve FAX document #1760 from 1-800-IBM-4FAX
(1-800-426-4329) for details on how to use this service.
For more in-depth coverage of this subject, the following
IBM documents are recommended:
o AIX Version 4.1 Software Problem Debugging and Reporting
for the RISC System/6000. (GG24-2513)
o Common Diagnostics and Service Guide (SA23-2687)
o Diagnostic Information For Micro Channel Bus Systems
(SA23-2765)
o Diagnostic Information for Multiple Bus Systems
(SA38-0509)
o Problem Solving Guide and Reference (SA23-2204),
(SA23-2606)
System Crashes in AIX 1
11/01/96, 4FAX# 1770
| o Managing System Dump Devices (FAX# 6221 from 1-800-
| IBM-4FAX)
This document applies to AIX versions 3.2, 4.1, and 4.2.
COLLECTING THE DUMP
Our objective is to remove the dump from the dump device.
It can then be moved to another machine or packaged and sent
to IBM for analysis.
The steps to do this are as follows:
1. Locate the dump
2. Package the dump
3. Verify the dump
4. Copy the dump
LOCATE THE DUMP
To locate the dump issue the command:
sysdumpdev -L
The following output shows a good dump:
Device name: /dev/hd6
Major device number: 10
Minor device number: 1
Size: 16197120 bytes
Date/Time: Fri Nov 18 02:58:14 CST 1994
Dump status: 0
Dump copy filename: /var/adm/ras/vmcore.0
In this case we have located a valid dump that was safely
saved by the system into the /var/adm/ras directory.
| The "Dump Status:" line indicates if the dump was suc-
| cessful. The following table indicates the possible return
| codes, and their meaning.
| SUCCESSFUL DUMP = 0
| DUMP DISABLED = -1
| PARTIAL DUMP = -2
| DUMP FAILED = -3
| If the dump status indicates a "PARTIAL DUMP", then the
| common problem is that the dump device is not large enough
| to hold a complete dump for this machine. The system dump
| device should be increased to a more appropriate value.
| Please follow the steps in FAX document #6221 ("Managing
| System Dump Devices") for a complete description of how to
| do this.
| If the dump status indicates a "DUMP FAILED", then this
| indicates that a dump error occurred during the system dump
| process. The data in the system dump device may still hold
System Crashes in AIX 2
11/01/96, 4FAX# 1770
| information to help determine the cause of the system
| problem. Continue with the steps in this document to deter-
| mine if usable information is present.
The following case shows the command output when the dump
copy failed. Presumably the dump is available on the
external media device, for example, tape.
Device name: /dev/hd6
Major device number: 10
Minor device number: 1
Size: 15914496 bytes
Date/Time: Thu Nov 17 19:59:01 CST 1994
Dump status: 0
0481-195 Failed to copy the dump from /dev/hd6 to
/var/adm/ras.
0481-198 Allowed the customer to copy the dump to
external media.
PACKAGING THE DUMP
The "snap" command will automatically collect the required
files pertaining to the system dump.
To gather all of the dump including kernel and general
information, run this command :
snap -Dfkg
This will make a directory in /tmp called "ibmsupt" where
the kernel, crash, and general system information about the
machine will be stored.
If you are running AIX 4.1.x or 4.2.x and your dump device
is the primary paging space (hd6), then upon reboot from a
crash, the system will attempt to copy the dump from the
dump device to a filesystem (usually "/var/adm/ras"). If
there is not enough space in this filesystem, it will prompt
you to copy this to tape, unless you have set the machine to
discard the dump if no space is available.
==========================================================
IMPORTANT: If you have the system copy this to external
media, the information on this media is generally NOT suffi-
cient to determine the cause of a system crash. The dump
needs to be transferred back to the system to be packaged
with other system files to be complete.
==========================================================
If the dump was copied to tape on reboot, then the "snap"
command above will return an error about there being no good
dump. The dump is no longer accessible to the system, since
it is on external media.
In this case, ignore the warning messages, and proceed to
copying the dump from the external media back to the system.
System Crashes in AIX 3
11/01/96, 4FAX# 1770
Space must be available in the /tmp filesystem to copy this
dump. Refer to the information in "sysdumpdev -L" to deter-
mine how much space will be needed for the dump. In the
"sysdumpdev -L" command last run above, the required space
for the dump was "15914496 bytes" or roughly 16meg. In this
case, 16 meg of space needs to be available in /tmp. If this
space is not available, then 16 meg of space needs to be
cleared from /tmp, or /tmp can be increased in size to have
that much space available.
If this information is not available, proceed with the fol-
lowing steps and take appropriate actions if enough space is
not available.
For example, a dump saved to the /dev/rmt0 device is
restored by commands:
cd /tmp/ibmsupt/dump
tctl -b0 -Bnf /dev/rmt0 read | tar -xvf-
mv dump_file dump
If the dump was copied to diskette instead of tape, run
these commands:
cd /tmp/ibmsupt/dump
tar -xvf /dev/fd0
mv dump_file dump
VERIFY THE DUMP
Quickly check the dump is valid and readable before submit-
ting it to IBM for analysis.
First, make sure that the following files are in the
/tmp/ibmsupt directory with this command:
# ls -og /tmp/ibmsupt/dump
total 11352
-rw-r--r-- 1 4530888 Jul 02 16:12 dump.Z
-rw-r--r-- 1 180 Jul 02 16:11 dump.snap
-rw-r--r-- 1 1270849 Jul 02 16:11 unix.Z
NOTE: the file "dump.Z" may be just "dump", and the file
"unix.Z" may be just "unix". These files should also be
greater than 0 in size. In this case, "dump.Z" = 453088
bytes and "unix.Z" = 1270849 bytes.
Use the steps in GATHERING PRELIMINARY DUMP INFORMATION to
verify that there is a good dump.
COPY THE DUMP
Use one of the following steps to transfer the dump informa-
tion from this machine to another site.
1. Sending a Testcase via Tape
System Crashes in AIX 4
11/01/96, 4FAX# 1770
a. With the following command, determine the block size
used to back up the tape:
lsattr -E -l rmt# | grep block_size
Sample output:
block_size 512 BLOCK size (0=variable length) True
NOTE: The number where "512" is in this example is
the block size. Block size 0 is not recommended.
b. Place a blank tape in the tape drive.
c. Copy the appropriate information to the tape with
/usr/sbin/snap -o /dev/rmt#
d. Label the tape.
tape block_size = xxx
VERY IMPORTANT: If the person sending in this
testcase is not the person who reported the problem,
be sure to include the name of the person who
reported it. If the proper information is not on
the package, then it takes valuable time to process
and delays solving your problem.
e. Use the following address to send the tape via
standard mail or overnight delivery.
NOTE: This address can change, so please verify
this address by either having the latest version of
this document or verifying with your IBM software
support organization.
IBM Corp. / Zip 2900 / Bldg 42
Attn: AIX Testcase Dept. J66S
11400 Burnet Road
Austin, TX 78758-3493
NOTE: Only Federal Express makes morning deliveries
(Monday - Friday) directly to our building (the
address must show Bldg. 42).
NO overnight delivery service delivers directly to
our building on Saturday. If you specify Sat- urday
delivery without making special arrange- ments with
AIX Support, there will be a delay of several days.
Using other carriers may also cause delays.
f. Sending Testcases Electronically via Internet or IBM
VM
To create a compressed tar image on the testcase,
run:
snap -c
System Crashes in AIX 5
11/01/96, 4FAX# 1770
This will create a file called "snap.tar.Z" in the
/tmp/ibmsupt directory.
The size limit for a single compressed testcase to
be transferred by FTP is 50 MB.
FTP it to our testcase repository:
Sample file names for the tarred and compressed
file:
ad1000.tar.Z (item number.tar.Z)
1x234.001.tar.Z (problem_report_#.branch_office_#.tar.Z)
1x234.1234567.tar.Z (problem_report_#.customer_#.tar.Z)
ftp 198.17.57.67 (or "ftp testcase.boulder.ibm.com")
login: anonymous
password:
(e.g., "customer@wallyworld.com")
bin (change to binary transfer mode)
cd aix
put 1x234.001.tar.Z (for example)
ls -l
quit
For IBM VM, the testcase should be uploaded in
binary format.
Please name the file PPPPPBBB TARZBIN, where PPPPP
is the PMR number and BBB is the branch office
number (without the "b"). PMR number and BBB is the
branch office number (without the "b").
node = "AUSVMR" userid = "V3DEFECT"
NOTE: Because of the size of a dump, it may be
faster to send it by overnight mail than to send it
over VM.
Reclaiming Space in /tmp
Once the testcase has been sent, you can reclaim the
space in /tmp with the "snap -r" command.
GATHERING PRELIMINARY CRASH INFORMATION
First, use the "script" command to generate a file that will
contain all of the information to be returned by the fol-
lowing commands.
Example:
script /tmp/crash.info
Now, run the following command:
sysdumpdev -L
After that, gather information from the DUMP itself. If you
have already moved the dump information into the
System Crashes in AIX 6
11/01/96, 4FAX# 1770
/tmp/ibmsupt directory using the steps in COLLECTING THE
DUMP, then run "cd /tmp/ibmsupt" and run "crash dump". This
file may be compressed (ie. dump.Z). If it is and space is
available in /tmp, uncompress this file (ie. "uncompress
dump.Z"). Otherwise, use the information returned by the
"sysdumpdev -L" command above to determine the location of
the dump. If the dump is on the machine, then run the
"crash" command on that location.
Example:
#sysdumpdev -L
Device name: /dev/hd6
Major device number: 10
Minor device number: 1
Size: 16197120 bytes
Date/Time: Fri Nov 18 02:58:14 CST 1994
Dump status: 0
Dump copy filename: /var/adm/ras/vmcore.0
#crash /var/adm/ras/vmcore.0
If this command comes back with an error like:
WARNING: dumpfile does not appear to match namelist.
then you likely have a bad dump. This usually occurs if the
unix image in your boot logical volume does not match the
unix image in /unix. If this is the case, use the "bosboot"
command to rebuild all boot images on your machine, reboot,
and wait for the condition to occur again.
This situation can also occur if you are mirroring the
"rootvg" volume group, and have more than one boot image
(ie. hd5, hd51, hd52, hd5x, etc). Remember to rebuild the
boot image on all boot logical volumes whenever any upgrades
or fixes are installed or any system configuration changes
are made.
If you are mirroring your "rootvg" volume group, you should
not mirror your dump devices. This can also cause system
dumps to be incomplete or corrupt. To avoid this, separate
non-mirrored dump devices should be created for both the
primary and secondary dump devices.
"crash" should eventually come back with a ">" prompt.
After this prompt appears, run the following commands:
stat
trace -m
od *vmmerrlog 9 a
At this point, you can exit out of "crash" with "quit", and
then exit out of the "script" command with "exit". The file
| you created with the "script" command will have the contents
| of what was displayed above.
| You should examine this file to verify that the date of this
| dump reported by both the "sysdumpdev -L" command AND the
| "stat" command within "crash" correspond to the same time
System Crashes in AIX 7
11/01/96, 4FAX# 1770
| frame. If they do not, then you may actually be looking at a
| system crash from an earlier time. Contact you local soft-
| ware support organization if there are problems regarding
| this.
| You can arrange to fax, mail, etc. this information to your
| support organization for this problem, if applicable. If
| this was a user initiated dump or you are just verifying the
| dump, then the above information is only used to verify that
| the dump is readable. The full dump would need to be sent
| to correctly diagnose and problems.
Example:
# script /tmp/crash.out
Script started, file is /tmp/crash.out
# crash /dev/hd7
Using /unix as the default namelist file.
Reading in Symbols .........................
> stat
sysname: AIX
nodename: unix
release: 1
version: 4
machine: 000011653000
time of crash: Wed Mar 20 08:46:01 CST 1996
age of system: 8 hr., 53 min.
> trace -m
MST STACK TRACE:
0x2ff3b400 (excpt=1000b844:4000f0b0:40002f36:1000b844:107)
(intpri=b)
IAR: 00009064 .disable_lock + 60: mtsr 0xf,r8
*LR: 00044e9c .selnotify + 3c
*2ff3a818: 00044f6c .selnotify + 10c
2ff3a878: 018c3ccc [pse:select_wakeup_on_events&rbk. + 11f24
2ff3a8c8: 018bb284 [pse:osrq_wakeup&rbk. + 94dc
2ff3a918: 018c3bdc [pse:sth_rput&rbk. + 11e34
2ff3a978: 018ba3d8 [pse:csq_lateral&rbk. + 8630
2ff3a9b8: 018b5ebc [pse:puthere&rbk. + 4114
> od *vmmerrlog 9 a
00000000: 00000000 00000000 00000000 00000000 |................|
00000010: 00000000 00000000 00000000 00000000 |................|
00000020: 00000000 |....|
> quit
# exit
Script done, file is /tmp/crash.out
System Crashes in AIX 8
11/01/96, 4FAX# 1770
READER'S COMMENTS
Please fax this form to (512) 823-4009, attention "AIXServ Informa-
tion". You may also e-mail comments to: elizabet@austin.ibm.com.
These comments should include the same customer information requested
below.
Use this form to tell us what you think about this document. If you
have found errors in it, or if you want to express your opinion about
it (such as organization, subject matter, appearance) or make sug-
gestions for improvement, this is the form to use.
If you need technical assistance, contact your local branch office,
point of sale, or 1-800-CALL-AIX (for information about support offer-
ings). These services may be billable. Faxes on a variety of sub-
jects may be ordered free of charge from 1-800-IBM-4FAX. Outside the
U.S. call 415-855-4329 using a fax machine phone.
When you send comments to IBM, you grant IBM a nonexclusive right to
use or distribute your comments in any way it believes appropriate
without incurring any obligation to you.
NOTE: If you have a problem report or item number, supplying that
number may help us determine why a procedure did or did not work in
your specific situation.
Problem Report or Item #: Branch Office or Customer #:
Be sure to print your name and fax number below if you would like a
reply:
Name: Fax Number:
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
END OF DOCUMENT (how.to.dump.krn, 4FAX# 1770)
System Crashes in AIX 9