Thursday, 30 October 2014

ORA-01552: cannot use system rollback segment for non-system tablespace 'USERS'1) find the status of your rollback segments by using the following query. You will find that your tablespace status is offline. Alter the tablespace to online.

Wednesday, 30 July 2014

ORA-00445: background process "m001" did not start after 120 seconds Incident details in: /app/u01/app/oracle/diag/rdbms/incident/incdir_3721/db1_mmon_7417_i3721.trc ERROR: Unable to normalize symbol name for the following short stack (at offset 2): Tue Jun 21 03:03:06 2011 ORA-00445: background process "J003" did not start after 120 seconds or Waited for process W002 to initialize for 60 seconds The system appears to be running very slowly and defunct processes can appear. Changes REDHAT 5 kernel 2.6.18-194.el5 #1 SMP Tue Mar 16 Oracle 11.2.0.2 Single Instance IBM: Linux on System z Cause Recent linux kernels have a feature called Address Space Layout Randomization (ASLR). ASLR is a feature that is activated by default on some of the newer linux distributions. It is designed to load shared memory objects in random addresses. In Oracle, multiple processes map a shared memory object at the same address across the processes.

With ASLR turned on Oracle cannot guarantee the availability of this shared memory address. This conflict in the address space means that a process trying to attach a shared memory object to a specific address may not be able to do so, resulting in a failure in shmat subroutine.

However, on subsequent retry (using a new process) the shared memory attachment may work. The result is a random set of failures Solution It should be noted that this problem has only been positively diagnosed in Redhat 5 and Oracle 11.2.0.2. It is also likely, as per unpublished BUG:8527473, that this issue will reproduce running on Generic Linux platforms running any Oracle 11.2.0.x. on Redhat/OEL kernels which have ASLR.

ASLR also exists in SLES10 and SLES 11 kernels and by default ASLR is turned on, however, to date not problem has been seen on SuSE.

You can verify whether ASLR is being used as follows:

# /sbin/sysctl -a | grep randomize kernel.randomize_va_space = 1

If the parameter is set to any value other than 0 then ASLR is in use. On Redhat 5 to permanently disable ASLR. add/modify this parameter in /etc/sysctl.conf kernel.randomize_va_space=0 kernel.exec-shield=0

You need to reboot for kernel.exec-shield parameter to take effect. Note that both kernel parameters are required for ASLR to be switched off.

Monday, 7 April 2014

Bulk Collect

Executing sql statements in plsql programs causes a context switch between the plsql engine and the sql engine. Too many context switches may degrade performance dramatically. In order to reduce the number of these context switches we can use a feature named bulk binding. Bulk binding lets us to transfer rows between the sql engine and the plsql engine as collections. Bulk binding is available for select, insert, delete and update statements.

Bulk collect is the bulk binding syntax for select statements. All examples below are simplified versions of the live cases i have seen.

One of the things i usually come across is that developers usually tend to use cursor for loops to process data. They declare a cursor, open it, fetch from it row by row in a loop and process the row they fetch.

As can be clearly seen, bulk collecting the rows shows a huge performance improvement over fetching row by row.

The above method (which fetched all the rows) may not be applicable to all cases. When there are many rows to process, we can limit the number of rows to bulk collect, process those rows and fetch again. Otherwise process memory gets bigger and bigger as you fetch the rows.

-- IN doesn't get correct results.-- That's because of how IN treats NULLs and the Three-valued logic-- NULL is treated as an unknown, so if there's a null in the t2.t1id -- NOT IN will return either NOT TRUE or NOT UNKNOWN. And neither can be TRUE.-- when there's a NULL in the t1id column of the t2 table the NOT IN query will always return an empty set. SELECT t1.* FROM t1 WHERE t1.id NOT IN (SELECT t1id FROM t2) ;

NO Rows Returned -- NOT EXISTS gets correct results

SELECT t1.* FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE t1.id = t2.t1id);

title 6 5 title 5 5

We can see that it's best to use EXISTS because it always behaves as the user would think it does.

Tuesday, 21 January 2014

Purging trace and dump files with 11g ADRCI

In previous versions of Oracle prior to 11g, we had to use our own housekeeping scripts to purge the udump, cdump and bdump directories.

In Oracle 11g, we now have the ADR (Automatic Diagnostic Repository) which is defined by the diagnostic_dest parameter.

So how are unwanted trace and core dump files cleaned out in 11g automatically?

This is done by the MMON background process.

There are two time attributes which are used to manage the retention of information in ADR. Both attributes correspond to a number of hours after which the MMON background process purges the expired ADR data.

LONGP_POLICY (long term) defaults to 365 days and relates to things like Incidents and Health Monitor warnings.

SHORTP_POLICY (short term) defaults to 30 days and relates to things like trace and core dump files

The ADRCI command show control will show us what the current purge settings are as shown below.

Oracle's Automatic Diagnostic Repository (ADR) is a location on the file system used to collect various diagnostic related files, such as the Database alert log, trace files, and the new 11g HealthMonitor report files. The ADR location is not limited just to the Oracle Database it can be used as the diagnostic repository for other Oracle products. The command line tool used to 'manage' the diagnostic data is ADRCI.

Two terms that help 'understand' the adrci tool are Problem and Incident.

Problem – The critical error.

Incident – A single occurrence of the error. Diagnostic data (eg trace files etc) are collected and related to an incident.

To Start the ADRCI command line tool:
adrci

To end an adrci session use either exit or quit:
adrci>exit
adrci>quit

To see the list of available commands:
adrci>help
adrci>help <command name>
eg
adrci>help set home

Setting the Home
The structure of the ADR allows it to be the repository for many database instances, each instance having its own sub-directories within ADR base directory. The default behaviour for the adrci tool to 'act' on all the 'homes' within the ADR. You can also 'set' the home to limit the source of diagnostic data to manage.

Viewing Incidents.
To view the incidents of the current ADR home use the adrci>show incident command. Two arguments that control the output are the -p “predicate string” and the -mode {BASIC | BRIEF | DETAIL} indicator. For example:

The show incident -p “incident_id=7489” -mode detail will include in the output the location of the related trace files for the selected incident.

Viewing Trace Files
To display a list of trace files within the currently set Home use adrci> show tracefile. Two useful methods to control the tracefile output are -i <incident_id> and the use of % wildcard. Use the -i <incident_id> to list tracefiles related to the incident. Use the %wildcard to limit the tracefiles to those that match a particular filename. For example:

adrci> show tracefile
or
adrci> show tracefile %dw00%
or
adrci> show tracefile -i 7489

Packing it (For Oracle Support or for yourself)
There are multiple ways to achieve this, here I am creating an empty 'package' and adding incident data to it. The steps are
1. Create a package (A logical container for the diagnostic data)
2. Add diagnostic data into the package (from an incident or manually by adding trace files)
3. Generate a 'physical' package (file on the filesystem)
4. Finalize the package

Friday, 10 January 2014

object lock of WRI$_OPTSTAT_HISTHEAD_HISTORY

The old statistics are purged automatically at regular intervals based on the statistics history retention setting and the time of recent statistics gathering performed in the system. Retention is configurable using the ALTER_STATS_HISTORY_RETENTION procedure. The default value is 31 days.

The SYSAUX and SYSTEM tablespaces have been continually growing due tothe databases where a high amount of
import/exports and RMAN are taking place.SELECT occupant_name "Item", space_usage_kbytes/1048576 "Space Used (GB)", schema_name "Schema", move_procedure "Move Procedure"FROM v$sysaux_occupantsORDER BY 1/Item Space Used (GB) Schema Move ProcedureSM/OPTSTAT 12.45 SYSTo resolve this I set the stats retention period to 5 days.SQL> exec dbms_stats.alter_stats_history_retention(5);

To find out the oldest available stats you can issue the following:SQL> select dbms_stats.get_stats_history_availability from dual;GET_STATS_HISTORY_AVAILABILITY---------------------------------------------------------------------------28-SEP-13 00.00.00.000000000 +04:00To find out a list of how many stats are gathered for each day between the retention the current date and the oldest stats history issue the following:SQL> select trunc(SAVTIME),count(1) from WRI$_OPTSTAT_HISTHEAD_HISTORY group by trunc(SAVTIME) order by 1;TRUNC(SAV COUNT(1)--------- ----------28-SEP-13 292014029-SEP-13 84368330-SEP-13 51983401-OCT-13 95883602-OCT-13 315805203-OCT-13 28704-OCT-13 125395205-OCT-13 73236106-OCT-13 50718607-OCT-13 18941608-OCT-13 261909-OCT-13 149110-OCT-13 28711-OCT-13 12632412-OCT-13 13955613-OCT-13 18106814-OCT-13 483215-OCT-13 25802716-OCT-13 115217-OCT-13 28718-OCT-13 2783921 rows selected.What has happened here is that the job run by MMON every 24hrs has checked the retention period and tried to run a purge of all stats older than the retention period. As the job has not compeleted within 5 minutes because of the high number of stats collected on each day, the job has given up and rolled back. Therefore the stats are not being purged.As each day continues the SYSAUX table is continuing to fill up because the job fails each night and cannot purge old stats.To resolve this we have to issue a manual purge to clear down the old statistics. This can be UNDO tablespace extensive so it’s best to keep an eye on the amount of UNDO being generated. I suggest starting with the oldest and working fowards.To manually purge the stats issue the following:SQL> exec dbms_stats.purge_stats(to_date('28-SEP-13','DD-MON-YY'));PL/SQL procedure successfully completed.OR

exec DBMS_STATS.PURGE_STATS(SYSDATE-5);

Purge stats older than 5 days

(best to do this in stages if there is

a lot of data (sysdate-30,sydate-25 etc)

Then I tried rebuilding the stats indexes and tables as they would now be fragmented.

First, the correct place for this is probably in /etc/cron.d not in /etc/crontab. If you really want to keep it where it is now, I'd suggest looking in /var/log/cron and making sure that it is executing at all. I'd look at `aureport -a` and see if anything is logged as an selinux denial around the time you expect this to execute and then use `ausearch -a nnn` where nnn is the number from the far right hand end of the aureport output line(s). Trying it in permissive mode by running `setenforce 0` would be a good test of this.