Questions and Answers

Redo Generation

Log buffer size

I am wondering if I have set my log buffer too high (it is 10M).
We have a lot of batch transactions that do 50M to 100M per transaction, and when those happen, it's not too bad.
But most often I see the LGWR grabbing 75%-100% of the CPU and doing I/O in intermittent chunks.
It will churn out 1M, then wait for 2 or 3 secs, and spit out another 1M for example.
The DBWR does much better it seems.
Bandwidth is smaller but I see a steady stream of I/O.
Is this due to my log buffer being set too high, or do I have another problem?

A large log buffer is bad for performance not only because of the inconsistent CPU usage,
but also because of the increased log file sync time.
A large log buffer allows LGWR to be lazy,
and then when a user commits it has to flush the entire log buffer up to and including the commit marker.
In the case of a 10M log buffer, that could be as much as 3M of I/O to perform before the user's gets a response back.
I've seen instances where this has added seconds to the commit response time.
I've also seen it exacerbating free buffer waits
because DBWR has to sync the log buffer before it can write blocks for which the redo has not yet been flushed.
Log buffer sizing requires balance.
Neither extreme is good. I have found values between 120K and 160K to work best.

Enlarging the redo logs

If I want to enlarge the redo logs, do I have to create new larger redo logs and then drop the older smaller ones?
Is there some way of making the redo logs bigger?

If you need to reuse the same raw partition for your log files, you can use ALTER DATABASE BACKUP CONTROLFILE TO TRACE
and then edit the CREATE CONTROLFILE statement to adjust the size of the redo logs and specify a resetlogs open of the database.
You then shutdown cleanly, run the edited CREATE CONTROLFILE script and take a new baseline backup.
Not for the faint-hearted, but perfectly safe really, if you know what you're doing.

Redo buffers in V$SGA

When I query V$SGA in one of our 8.0.5 databases, I notice that the value for "Redo Buffers"
does not agree with log_buffer as set in the init.ora (and reflected in V$PARAMETER).
V$SGA shows 81920; V$PARAMETER shows 8192, and V$SGASTAT shows 65536.
The values yielded by V$SGA and V$SGASTAT don't even agree with each other.
What's going on here?

There is a minimum size for the log buffer in Oracle8, and if the parameter is set to something less, then it is silently enlarged.
The minimum is 4 times the largest possible db_block_size on the platform - in your case 4 * 16K.

On platforms that support memory protection there are also guard pages on each side of the log buffer.
These are one memory protection unit in size - 8K each in your case.
Oracle turns off permission on the guard pages with an mprotect call to set the permission to PROT_NONE,
so that Oracle software bugs or hacker tricks like corrupting the stack will not be able to corrupt the log buffer.

Spilt blocks during hot backups

Can you explain about split blocks and the impact of _log_blocks_during_backup = FALSE in online backups?

A "split block" is one that was being written by Oracle at the same time that it was being read for the backup.
Say the first SCN on the block was 100, and the write was for SCN 200.
Then the backup may contain the first half of the block as at SCN 100, but the second half as at SCN 200.

By default, whole blocks are logged to the redo stream before the first change to any block during backups to allow for the recovery of split blocks.
For example, in this case the change would log the entire block as at SCN 100 into the redo stream, followed by the change vectors for the change.
Then during recovery roll forward, the whole block can be over-written with the correct image before applying the change vectors.

The flags field in the buffer header for each buffer contains an indication of whether the block has been logged.
If the flag is set, the block is not logged for subsequent changes.
However, if a buffer is reused, and the block is then read back into the cache for further modification,
the whole block will be logged once again prior to such modification.

Redo wastage

Can you explain the redo wastage statistic in V$SYSSTAT and how to tune it.

When LGWR writes, it writes all the available redo blocks including the current block, which is normally only partially full.
Therefore, before LGWR releases the redo allocation latch prior to writing,
it must advance the SGA variable containing the index into the log buffer for redo generation to the beginning of the next log buffer block.
The number of bytes skipped in the last log block of each redo write is accumulated in the redo wastage statistic.
Redo wastage is not a problem.
However, high redo wastage is a symptom of a very active LGWR, which might be a problem.
LGWR is expected to be highly active in many instances,
but it can be overactive if the log buffer (more correctly, the _log_io_size) is too small, or if the commit rate is too high.
If LGWR is overactive, it will place unnecessary load on the redo latches, and also on the I/O subsystem.

The following diagram illustrates redo wastage within the first few blocks of a sample redo log file.

redo log space requests statistic

Every day our production database has more than 10 redo log space requests.
Oracle suggested that this number should be near 0.
I resized the log files, and increased the log buffer to 1M.
What else can I do?

This idea of increasing the log_buffer to get space requests down to 0 is both ineffective and bad for performance.
There is always a risk of space requests at log switches and a few other points when redo generation is disabled entirely.
It does not matter how big your log buffer is then.
You will get space request waits if you try to generate redo at that time.
Also, a large log buffer is actually much worse for performance than the occasional space request,
even if that is a genuine log buffer space wait.

Raw log files and redo latching

We are considering moving the redo logs to raw disks but I am not sure if this will reduce the number of redo copy latch sleeps?
Will the reduced write time mean the latch is freed up sooner and the number of sleeps will reduce accordingly?

Moving the logs to raw is a good thing to do, but not for the reason of affecting latching. It will not.

Commit frequency

When you work on a large block of records, if you commit once for every record,
then you get a lot of overhead and worse performance than if you commit say every 50 records.
What exactly is the optimal number of records to process between commits?

Commits place load on LGWR and the redo allocation latch.
The higher your commit frequency, the harder these are to tune.
Therefore your commit frequency should be LOW.

Why not use DBMS_UTILITY.GET_TIME to commit every 5 minutes (30000 centiseconds)?

You say that a 2M log buffer is too big, but what about these log buffer space waits.

These log buffer space waits are secondary, not primary, waits.
The primary waits in this case were the log file switch completion waits.
Redo generation is disabled during log switches, and latent demand for log buffer space builds up.
To tune this, you should be working to reduce and speed up log switches, not increasing the log buffer.

Note also that you have spent a lot more time waiting for log file sync waits than for log buffer space,
and the reason your syncs are so slow is that you have too large a log buffer.
If you are not going to set _log_io_size to get the best of both worlds,
then you should size log buffer to minimize the total time for log buffer space and log file sync waits.
You are well above that balance point with a 2M log buffer.

Large log buffer

What is the problem with a large log buffer if you have sufficient memory to support it?
The majority of the tuning manuals suggest increasing log_buffer
until redo log space requests in V$SYSSTAT are (close to) zero.

The problem is that it increases the average log file sync wait time.
That is the wait that users get when they press COMMIT.
DBWn also waits on log file sync when checkpointing, and possibly at other times.

redo log space requests is the wrong metric on which to base a decision to increase the log buffer.
There are lots of reasons for space requests, only one of which is that the log buffer was full.
Unless you are getting log buffer space waits in V$SYSTEM_EVENT, then your log buffer is not too small.
Even then it may not be, because space waits can be a side effect of slow log switches.

log_buffer in V$SGASTAT

There are two entries for log_buffer in V$SGASTAT.
One is the same as the parameter setting.
Any idea where the other one comes from?

The log buffer consists of blocks and for each block a header is needed in the variable area of the shared pool.
The second entry represents those headers.
However, this will not be seen on all platforms, because
what is actually shown in V$SGASTAT, and what just gets bundled into miscellaneous is quite variable.
I don't understand it myself (yet).

Redo generation for locks

We have a program that generates about 300M of redo, which seems excessive.
It applies an exclusive table lock repeatedly inside a loop.
The logic is

lock table in exclusive mode
do some updates and inserts
for certain types of errors
rollback
else
commit

When I try to duplicate this on our development system, I find that I only get 2 or 3M of redo.
Is there an initialization parameter that affects how this type of transaction impacts redo logs?

There is no relationship between the table lock and the redo generation.
However, there is a very close relationship between row-level locks and redo generation.
The state of the freelist(s) on the table and the location of the updates
could account for large amounts of redo if each change is hitting a different block, because of the block level overhead for row level locking.
If it is possible to rebuild the table, possibly in a sensibly sorted order,
then you should see a significant reduction in the amount of redo generated.

_log_io_size

You've been writing quite a lot about _log_io_size.
According to Gurry and Corrigan, the default changed from 1/3
to 2/3 of the log buffer at 7.3.
However, it seems to be 0 in X$KSPPSV.
Where can we see what it really is -- maybe in another X$ table or with a dump?
Could it be operating system dependent?

I believe that Mark got that bit of information from an internal Oracle person,
but so far as I have been able to determine, it is not correct.
I checked at the time against 7.3.2.3 on both HP-UX and Solaris
and found that it was still 1/3, at least on those platforms.
The test consists of switching into a new logfile, then doing a large delete while using a large cache, and taking
the ratio of (redo size + redo wastage + 16 * redo blocks written) / redo writes.
If there are any log file sync waits, the test is invalid and should be repeated with a larger cache.
I have not done this test for a while, but I see a lot of statistics, and I've not seen anything to suggest
that any value other than the good old 1/3 of log_buffer
is being used for _log_io_size anywhere unless explicitly set.

Also, I've just checked the 8.1.5 documentation,
and in chapter 20 of the Tuning Guide under the heading "Tuning LGWR I/O" it says,
"LGWR starts writing if the buffer is one third full, or when it is posted by a foreground process such as a COMMIT".

The relatively low value for the log buffer space event combined with a high log file sync value
leads me to believe that the log_buffer parameter (currently 512K) is too large.
I'm thinking perhaps 256K (or 128K) may be more appropriate.

What is the meaning of log file parallel write?
Also, what can be done about log file switch completion?

The log file switch completion happens because redo generation is disabled during a log switch,
and so processes that want to generate redo have to wait, even though the log buffer is entirely empty during the switch.
If you have too many controlfiles, or if log switches are slow for some other reason,
then there can be enough build up of demand for log buffer space
that you get some log buffer space waits shortly after redo generation has been enabled again.
This is probably what you are seeing here.

Your guess that the large log buffer is probably contributing to your log file sync time may not be right,
based on the fact that the average log file sync time
is of the same order of magnitude as the log file parallel write time (this is just LGWR doing the I/O).
That means that LGWR does not normally do multiple I/O operations for each sync.
Given that you have a large log buffer, this ratio problem means that you have a high commit rate.
If so, there is nothing you can do about the log file sync waits
without changing the application to commit less often.
Reducing the log buffer will probably only help slightly.
However, I would probably do it anyway, if I were tuning this system.

log file sync waits

We are benchmarking an insert-intensive application, using Oracle 8.0.5 on HP-UX.
We would like to get every ounce of performance possible.
Presently our big issue is log file sync waits.
What can be done about this?
A full report.txt is attached.

Firstly, try to limit commits to an absolute minimum in the application.
Commit only where it is essential for the application logic.
Also, consider running LGWR as a real-time priority process.
Use rtprio with a priority of 60.
If possible, use hardware mirroring for the online log files, rather than Oracle log file multiplexing.
Also, check to make sure that the online log files are raw and on dedicated disks.
Use our hold_logs_open.sh script to improve log switch performance.
There are a lot of other issues here, but that should keep you busy for a few days.
Note that the rtprio change is very important.

Rapid log switching

I have a problem with my database getting stuck on log switch checkpoint waits.
I have increased db_block_checkpoint_batch to 500, but it has not helped.
A report.txt file is attached.

The real problem here is that you are switching logs every 15 seconds.
I suggest that you create some much bigger log files, and then drop the little ones that you are presently using.
The new log files should 100 times bigger than they are now.
In keeping with this change the log_checkpoint_interval parameter should be multiplied by 100 too.

There is a big difference in the number of current mode block gets, and a corresponding big difference in CPU usage and elapsed time.

This is a very interesting question.
I was previously unaware of this performance difference, and it has taken me several hours to get to the bottom of it using dumps and traces.

For a before row update trigger, Oracle first takes a row lock on the target row, then executes the trigger, then performs the update.
This does two current mode block gets and generates two redo records per row.
For an after row update trigger, Oracle first locks and updates the row, and then executes the trigger.
This halves the work done, if the trigger makes no database changes, as in your case.

Upon reflection, it makes sense that Oracle would need to take a row lock on the target row before executing a before row update trigger.
Otherwise, it would be possible for the row to be locked by another transaction while the trigger is executing,
which would lead to an unreasonable risk of deadlocks.

The moral of the story is: Always use after row triggers in preference to before row triggers, where the choice is otherwise indifferent.

_wait_for_sync

Oracle Support are working with us on a major project, and I notice that they have set _wait_for_sync to FALSE.
I guess that this is intended to prevent log file sync wait problems.
However, the log files are not raw, and there is a large Unix buffer cache (6G) to buffer the writes.

That _wait_for_sync setting is a disaster waiting to happen. You should demand a written explanation.

Performance pauses

We're doing some initial benchmarking of a new application.
What we see is that a checkpoint stalls the database for a couple of seconds, and since this is a real-time system that cannot be tolerated.
Where would you suggest we start digging?
Redo logs are stored on a separate disk with direct-io enabled.

I have hit something like this before, but it was the log switch itself that was freezing things, rather than the checkpoint.
The solution was to have a sleeping process hold open file descriptors on all the log files to speed up the open() system call.
Have a look at our tip on Tuning Log Switches for more explanation and a script.

Real-time LGWR

You suggested that we make LGWR a real-time priority process to speed up our data loads.
HP-UX has priority levels 0 (highest) to 127 (lowest).
Should we use the lowest?
Also, a Unix guy here is concerned about this.

Yes, 127 is fine, just as long as it is in the real-time class so that there is no priority degradation.
I understand your Unix guy being concerned.
If there were a bug in LGWR, it could chew up all of one CPU.
On a single CPU machine, "best practice" is to have a higher priority real time shell running on the console - just in case.
On your multi-CPU machine that is not really necessary.

_log_io_size

I am interested in setting the _log_io_size parameter in order to try to cut back on the number of log file sync waits.
What should it be set to under Oracle 8.0.5 on Solaris 2.5?

On raw, the answer would be a little less than 128 blocks (64K).
For file systems, 15 blocks is often the best setting, as long as that does not make LGWR overactive.
But before you play,
find the average size of redo writes by dividing redo blocks written by redo writes from V$SYSSTAT.
If that number is smaller than the proposed setting you are unlikely to see any benefit.

Redo log file size

How do you set about sizing the redo logs for an OLTP system when no accurate tests have been carried out on the database?
The redo logs are currently set to 10M.
How can I be confident that this is enough or too much?
I know that I can monitor the activity, but is there a rule of thumb?

My rule of thumb is 500M, but it is somewhat arbitrary.
Have a look at our log_switch_interval.sql script.
If you switch more frequently than every 30 minutes, that is too fast.

redo copy sleep rate

Your book states that the redo copy sleep rate is normally high.
Does this mean that a higher sleep rate is better?
Using your latch_sleeps.sql script, our most intense system has a 107% sleep rate,
whereas our least intense system has a 0% sleep rate.
Is there a generic rule that any sleep rate above a certain threshold needs immediate attention?

It is normally high, but not normatively high.
It is quite reasonable for the sleep rate to be 0% on a lightly loaded instance.
The intent of the comment is to stop people worrying about numbers like 107%.
There is fuller explanation of the issues in the tip on Redo Latch Tuning.