Tuesday, 21 January 2014

Purging trace and dump files with 11g ADRCI

In previous versions of Oracle prior to 11g, we had to use our own housekeeping scripts to purge the udump, cdump and bdump directories.

In Oracle 11g, we now have the ADR (Automatic Diagnostic Repository) which is defined by the diagnostic_dest parameter.

So how are unwanted trace and core dump files cleaned out in 11g automatically?

This is done by the MMON background process.

There are two time attributes which are used to manage the retention of information in ADR. Both attributes correspond to a number of hours after which the MMON background process purges the expired ADR data.

LONGP_POLICY (long term) defaults to 365 days and relates to things like Incidents and Health Monitor warnings.

SHORTP_POLICY (short term) defaults to 30 days and relates to things like trace and core dump files

The ADRCI command show control will show us what the current purge settings are as shown below.

Oracle's Automatic Diagnostic Repository (ADR) is a location on the file system used to collect various diagnostic related files, such as the Database alert log, trace files, and the new 11g HealthMonitor report files. The ADR location is not limited just to the Oracle Database it can be used as the diagnostic repository for other Oracle products. The command line tool used to 'manage' the diagnostic data is ADRCI.

Two terms that help 'understand' the adrci tool are Problem and Incident.

Problem – The critical error.

Incident – A single occurrence of the error. Diagnostic data (eg trace files etc) are collected and related to an incident.

To Start the ADRCI command line tool:
adrci

To end an adrci session use either exit or quit:
adrci>exit
adrci>quit

To see the list of available commands:
adrci>help
adrci>help <command name>
eg
adrci>help set home

Setting the Home
The structure of the ADR allows it to be the repository for many database instances, each instance having its own sub-directories within ADR base directory. The default behaviour for the adrci tool to 'act' on all the 'homes' within the ADR. You can also 'set' the home to limit the source of diagnostic data to manage.

Viewing Incidents.
To view the incidents of the current ADR home use the adrci>show incident command. Two arguments that control the output are the -p “predicate string” and the -mode {BASIC | BRIEF | DETAIL} indicator. For example:

The show incident -p “incident_id=7489” -mode detail will include in the output the location of the related trace files for the selected incident.

Viewing Trace Files
To display a list of trace files within the currently set Home use adrci> show tracefile. Two useful methods to control the tracefile output are -i <incident_id> and the use of % wildcard. Use the -i <incident_id> to list tracefiles related to the incident. Use the %wildcard to limit the tracefiles to those that match a particular filename. For example:

adrci> show tracefile
or
adrci> show tracefile %dw00%
or
adrci> show tracefile -i 7489

Packing it (For Oracle Support or for yourself)
There are multiple ways to achieve this, here I am creating an empty 'package' and adding incident data to it. The steps are
1. Create a package (A logical container for the diagnostic data)
2. Add diagnostic data into the package (from an incident or manually by adding trace files)
3. Generate a 'physical' package (file on the filesystem)
4. Finalize the package

Friday, 10 January 2014

object lock of WRI$_OPTSTAT_HISTHEAD_HISTORY

The old statistics are purged automatically at regular intervals based on the statistics history retention setting and the time of recent statistics gathering performed in the system. Retention is configurable using the ALTER_STATS_HISTORY_RETENTION procedure. The default value is 31 days.

The SYSAUX and SYSTEM tablespaces have been continually growing due tothe databases where a high amount of
import/exports and RMAN are taking place.SELECT occupant_name "Item", space_usage_kbytes/1048576 "Space Used (GB)", schema_name "Schema", move_procedure "Move Procedure"FROM v$sysaux_occupantsORDER BY 1/Item Space Used (GB) Schema Move ProcedureSM/OPTSTAT 12.45 SYSTo resolve this I set the stats retention period to 5 days.SQL> exec dbms_stats.alter_stats_history_retention(5);

To find out the oldest available stats you can issue the following:SQL> select dbms_stats.get_stats_history_availability from dual;GET_STATS_HISTORY_AVAILABILITY---------------------------------------------------------------------------28-SEP-13 00.00.00.000000000 +04:00To find out a list of how many stats are gathered for each day between the retention the current date and the oldest stats history issue the following:SQL> select trunc(SAVTIME),count(1) from WRI$_OPTSTAT_HISTHEAD_HISTORY group by trunc(SAVTIME) order by 1;TRUNC(SAV COUNT(1)--------- ----------28-SEP-13 292014029-SEP-13 84368330-SEP-13 51983401-OCT-13 95883602-OCT-13 315805203-OCT-13 28704-OCT-13 125395205-OCT-13 73236106-OCT-13 50718607-OCT-13 18941608-OCT-13 261909-OCT-13 149110-OCT-13 28711-OCT-13 12632412-OCT-13 13955613-OCT-13 18106814-OCT-13 483215-OCT-13 25802716-OCT-13 115217-OCT-13 28718-OCT-13 2783921 rows selected.What has happened here is that the job run by MMON every 24hrs has checked the retention period and tried to run a purge of all stats older than the retention period. As the job has not compeleted within 5 minutes because of the high number of stats collected on each day, the job has given up and rolled back. Therefore the stats are not being purged.As each day continues the SYSAUX table is continuing to fill up because the job fails each night and cannot purge old stats.To resolve this we have to issue a manual purge to clear down the old statistics. This can be UNDO tablespace extensive so it’s best to keep an eye on the amount of UNDO being generated. I suggest starting with the oldest and working fowards.To manually purge the stats issue the following:SQL> exec dbms_stats.purge_stats(to_date('28-SEP-13','DD-MON-YY'));PL/SQL procedure successfully completed.OR

exec DBMS_STATS.PURGE_STATS(SYSDATE-5);

Purge stats older than 5 days

(best to do this in stages if there is

a lot of data (sysdate-30,sydate-25 etc)

Then I tried rebuilding the stats indexes and tables as they would now be fragmented.

First, the correct place for this is probably in /etc/cron.d not in /etc/crontab. If you really want to keep it where it is now, I'd suggest looking in /var/log/cron and making sure that it is executing at all. I'd look at `aureport -a` and see if anything is logged as an selinux denial around the time you expect this to execute and then use `ausearch -a nnn` where nnn is the number from the far right hand end of the aureport output line(s). Trying it in permissive mode by running `setenforce 0` would be a good test of this.