So the archiver also wants to archive to my $ORACLE_HOME/dbs directory, it cannot do that because there is not enough space on that filesystem, but the real question is why does it want to write to that destination?When I do a show parameter I don't get the answer:

Friday, December 12, 2008

I had to create a Data Guard instance on test, so that we can evaluate Transparent Application Failover (TAF) for the app. I wanted to test the same procedure to build Data Guard that I will use for Production, and that meant that I had to do a RMAN backup to ASM. Our Storage team loaned me LUNs of different sizes for my temporary ASM on test.

As you can see disk 13 only had 70MB of space available. I removed all the disks of varying sizes and only kept the disks of 69052 MB Size. The total size of the FRA came down to 8493396 MB, but the RMAN backup completed successfully.

Lesson Learned:ASM spreads file extents evenly accross all the disks disks on a diskgroup. An ORA-17505 error can still be encountered due to imbalanced free space between disks. The reason for this is that one disk lacking sufficient free space makes it impossible to do any allocation in a disk group because every file must be evenly allocated across all disks.

One of the big selling points of ASM is the ability to reconfigure the storage online. I had to remove 10 disks from a +ASM test system that had to be redeployed on another server. The steps seemed easy enough until I ran into a problem;

You can happily drop disks in a disk group and ASM will seamlessly migrate the data to the existing disks in the disk group. The prompt returns immediatly, but the job (of migrating the data) is not yet done. In order to monitor progress use the following SQL.

SQL> select * from v$asm_operation/

When the job is done the SQL will retun no rows. The status of the disks are also updated.

Is it safe to physically remove these disks? +ASM still knows about these disks and Oracle processes are still attached to these disks. I logged a SR with Oracle Support and got the following feedback:

Bug 7516653 - DROPPED ASM DISK STILL HELD BY ASM PROCESSESClosed as duplicate for bug:Bug 7225720 - ASM DOES NOT CLOSE OPEN DESCRIPTORS EVEN AFTER APPLYING THE Patch 4693355

Fixed in 11.2

Please perform the following workaround:1. create dummy diskgroup using this disk:SQL> create diskgroup test external redundancy disk ;2. drop this diskgroup:SQL> drop diskgroup test;3. check if the disk still help by any process.If the disk still held then we will have to restart the ASM instance.

Wednesday, December 10, 2008

Using the DBA_HIST_XXX views from the AWR (automatic workload repository), it has become easier for a DBA to track changes in workload metrics over time. Through a SR I logged with Oracle Support they have supplied me with the following sql statement to track such changes for a single SQL statement.

The view DBA_HIST_SQLSTAT displays historical information about SQL statistics. Each statistic is stored in two separate columns:metric_TOTAL for the total value of the statistic since instance startup.metic_DELTA for the change in a statistic’s value between BEGIN_INTERVAL_TIME to the END_INTERVAL_TIME that is stored in the DBA_HIST_SNAPSHOT view.

You can also query DBA_HIST_SQL_PLAN to compare the execution plans, if PLAN_HASH_VALUE has changed .

# Now run the SQL script to generate a rm scriptsqlplus -s /nolog <<EOFconnect / as sysdbaset heading off feedback off timing offspool /usr/users/oracle/cronscripts/logs/rm_arch_${ORACLE_SID}.kshselect 'rm '||NAME from v\$archived_log where REGISTRAR='RFS' and APPLIED='YES' and DELETED='NO' and COMPLETION_TIME < (SYSDATE-1);spool offexitEOF

# Now run the generated rm scriptchmod 740 /usr/users/oracle/cronscripts/logs/rm_arch_${ORACLE_SID}.ksh/usr/users/oracle/cronscripts/logs/rm_arch_${ORACLE_SID}.ksh

Notes:columns V$ARCHIVED_LOG.REGISTRAR and APPLIEDIf REGISTRAR='RFS' and APPLIEDis NO, then the log has arrived at the standby but has not yet been applied.If REGISTRAR='RFS' and APPLIED is YES, the log has arrived and been applied at the standby database.

column V$ARCHIVED_LOG.DELETEDIndicates whether an RMAN DELETE command has physically deleted the archived log file from disk (YES) or not (NO)

RMAN EXPIREDRemoves only files whose status in the repository is EXPIRED. RMAN marks backups and copies as expired when you run a CROSSCHECK command and the files are absent or inaccessible. To determine which files are expired, run a LIST EXPIRED command.

RMAN NOPROMPTBeginning in Oracle9i, RMAN's default behavior is to prompt for confirmation when you run DELETE EXPIRED. In prior releases, RMAN did not prompt.

Wednesday, November 19, 2008

We are getting our hands dirty with ASM and will be moving all our databases from RAW devices to ASM in the next couple of months. My worst fear came to be when a Unix Sys Admin used a disk already allocated to ASM for a local filesystem (luckily this did not happen on a production server!).When ASM realised this is immediatly issued a ALTER DISKGROUP PCASDG1 DISMOUNT FORCE command.By the time the Unix Sys Admin returned the disk it was unrecognozable by ASM as the disk headers had been overwritten. The diskgroup was unrecoverable and unusable.

SQL>drop diskgroup PCASDG1 including contents;drop diskgroup PCASDG1 including contents*ERROR at line 1:ORA-15039: diskgroup not droppedORA-15001: diskgroup "PCASDG1" does not exist or is not mounted

SQL>alter diskgroup PCASDG1 check;alter diskgroup PCASDG1 check*ERROR at line 1:ORA-15032: not all alterations performedORA-15001: diskgroup "PCASDG1" does not exist or is not mounted

I had to recreate the whole diskgroup from scratch. I followed Metalink Note 387103.1 to recreate the diskgroup.Basically you clean up the asm disk header of all the disks in the diskgroup. This will also remove the whole diskgroup from asm.

Monday, November 10, 2008

I had the privilege to present at the SAOUG (South African Oracle User Group) in Sun City last month. The conference was well organized, the location was great and we had close to 1000 attendees. I presented on Oracle Data Guard.

Thanks to Karen Morton (one of my favorite bloggers) for making me aware of SlideShare. To access my presentation, just click on the slideshare image on the right side and it will take you to my presentation on slideshare. Happy viewing!

Thursday, October 30, 2008

We have Call Centres all over the country. The Call Centre’s IVR (interactive voice response, is a phone technology that allows a computer to detect voice and touch tones using a normal phone call) is used for routing calls to the correct call centre. The subscriber data is stored in 1 location and I was asked to replicate that data to 2 databases in other parts of the country. The main reasons are to have the data closer to the IVR system and also to relieve the main production system from all these queries.

Decisions: Use one-directional replication. Capture the changes at the source. Do the instantiation (first copy of data) with DataPump.

Set specific initialization parameters at the databases participating in the Streams setup

ALTER SYSTEM SET STREAMS_POOL_SIZE=200M SCOPE=BOTH;ALTER SYSTEM SET AQ_TM_PROCESSES = 1 scope=both;ALTER SYSTEM SET JOB_QUEUE_PROCESSES=10 scope=both;ALTER SYSTEM SET “_job_queue_interval”=1 scope=spfile;

Monday, September 15, 2008

We had to setup SQL Server Replication to an Oracle database. The 2 tables in questions are huge. The small table has 63mill rows and the big one has 1.5bill rows. The big table has 50 date ranged partitions.

I used SQL Loader direct path without any indexes in place from csv files to do the initial load.

The first thing to do was to prepare the database for Direct Loading by executing the script $ORACLE_HOME/rdbms/admin/catldr.sql.

I also had to set my session’s DATE format to be the same as the format in the CSV file.

$ export NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS'

To get the fastest possible load performance on the 50 partitions I followed these rules:There were no indexes on the tableThe table was set to NOLOGGINGThe database was put in NOARCHIVELOG mode

And the following SQL Loader options were used:DIRECT=TRUE to invoke Direct Loading. This will effectively bypass most of the RDBMS processingUNRECOVERABLE to turn off database loggingPARELLEL=TRUE and APPEND in order to run multiple load jobs concurrently for the different partitions.

Wednesday, September 3, 2008

We found a lot of sessions waiting for Log File Sync and in turn found that LGWR was waiting for Log File Parallel Write.

We have 2 dedicated disks for Online redo logs. They are multiplexed, with members of the same group on different disks. When a Log Switch occurs 1 member of the Redo Logs being Archived gets READS from the ARCH process while both members of the CURRENT group are receiving WRITE requests from LGWR. We decided to add 2 more disks and split the even numbered groups from the uneven numbered groups. So that we can separate the READS and WRITES.

We use RAW devices, so after the Unix Admins created the new RAW volumes on the new disks I simply added the members:

Creating Redo Log Members for the even groups on the 2 new disks:

ALTER DATABASE ADD LOGFILE MEMBER '/dev/vx/rdsk/redodg/casprd_REDO2c' TO GROUP 2;ALTER DATABASE ADD LOGFILE MEMBER '/dev/vx/rdsk/redodg/casprd_REDO2d' TO GROUP 2;ALTER DATABASE ADD LOGFILE MEMBER '/dev/vx/rdsk/redodg/casprd_REDO4c' TO GROUP 4;ALTER DATABASE ADD LOGFILE MEMBER '/dev/vx/rdsk/redodg/casprd_REDO4d' TO GROUP 4;ALTER DATABASE ADD LOGFILE MEMBER '/dev/vx/rdsk/redodg/casprd_REDO6c' TO GROUP 6;ALTER DATABASE ADD LOGFILE MEMBER '/dev/vx/rdsk/redodg/casprd_REDO6d' TO GROUP 6;

The status of the new log member is shown as INVALID. This is normal and it will change to active (blank) when it is first used.

SQL> select GROUP#,STATUS,TYPE,MEMBER from v$logfile order by group#, member;

Before dropping Redo Log Members you should be aware of the following rules:You can drop a redo log member only if it is not part of an active or current group.Make sure the group to which a redo log member belongs is archived before dropping the member.If the member you want to drop is the last valid member of the group, you cannot drop the member until the other members become valid.So after a few log switches (ALTER SYSTEM SWITCH LOGFILE) I was ready to drop the old members.Note that V$LOG.STATUS show ACTIVE for group 1 and 2. This means the Log is active but is not the current log. It is needed for crash recovery. It may be in use for block recovery. It may or may not be archived. This was the result of all the log switches I did. Once their STATUS changed to INACTIVE I continued dropping the old member.

Tuesday, July 8, 2008

I was recently called in to take over a performance problem. I quickly determined that only a portion of the database was affected, but it was the most visible part, affecting the business ability to offer a certain feature to millions of our customers. Some queries that should take less than 5 seconds suddenly took minutes to complete.

Of course the App Vendor claimed that nothing changed around that particular query in years, so it had to be a database problem.

Looking at OEM’s Average Active Sessions graph on the Performance tab, I could clearly see a huge red stripe (red means the wait is Application related). Drilling down I found waits for an event called “enq: UL – contention” to be the problem. A metalink search retuned no hits and a google search retuned hundreds of forum websites with DBA’s asking “What is enq: UL – contention?”, most of them did not have any replies. The only reference to this event in the manual was found in Oracle® Database Reference 10g Release 2 (10.2) (B14237-01) Appendix E: Oracle Enqueue Names . It simply stated “UL, User-defined Locks”. To me that meant that the app could be using DBMS_LOCK, but the Vendor said they don’t and app support told me that these queries do not write to the database.

I queried V$SESSION for sessions waiting for EVENT=”enq: UL – contention” to find the SID of the BLOCKING_SESSION, but the BLOCKING_SESSION kept on changing. One blocking session would have confirmed an App issue and the short term solution would have been to kill that process.

Then I logged a priority 1 SR and was asked to provide system state dumps. Eventually BUG 4367986 was identified as a possibility. The patch was applied, but it made no difference!

OK, I clearly had to try something else, so I decided to trace 1 session and go through the TKPROF output. I found that the app related SQL statements all executed in less than 5 second, but one statement stood out at 60 seconds elapsed time. That SQL statement was;

Schema ctxsys means Oracle Text is in use and the Vendor then confirmed that Oracle Text indexes where build on an app related audit table almost 2 months ago. The TKPROF also showed INSERTs into the app audit table, so we were finally on the right track.

This information was uploaded in the SR and I had to check the SYNC intervals via a select from dr$index and confirmed that the 2 audit indexes had SYNC ON COMMIT. We changed the COMMIT interval to MANUAL , just to proof if this problem is related to the SYNC ON COMMIT. Once we changed the commit interval to MANUAL, the wait event disappeared.