Blogroll

Tuning ‘log file sync’ wait events

In this blog entry, we will discuss strategies and techniques to resolve ‘log file sync’ waits. This entry is intended to show an approach based upon scientific principes, not necessarily a step by step guide. Let’s understand how LGWR is inherent in implementing commit mechanism first.

Commit mechanism and LGWR internals

At commit time, process creates a redo record [ containing commit opcodes] and copies that redo record in to log buffer. Then that process signlas LGWR to write contents of log buffer. LGWR writes from log buffer to log file and signals user process back completing a commit. Commit is considered successful after LGWR write is successful.

Of course, there are minor deviation from this general concept such as latching, commits from plsql block or IMU based commit generation etc. But general philosophy still remains the same.

Signals, semaphore and LGWR

Following section introduces internal workings of commit and LGWR interation in unix platform. There are minor implementation
differences between few unix flavors or platform like NT/XP such as use of post wait drivers instead of semaphores etc. This section is to introduce internals, not necessarily dive deep in to internals. Truss is used to trace LGWR and user process to explain here.

2. When a session commits, a redo record created and copied in to log buffer. Then that process posts LGWR semaphore
using a semctl call, if LGWR is not active already. Then, process goes to sleep with semtimedop call, in its own semaphore.
Semaphore set id is 9, but semnum is 118 which is for the user process I was tracing.

First semctl calls is posting LGWR. Then process is sleeping on semtimedop call.

Commit is not complete until LGWR writes log buffers including commit redo recods to log files. In a nutshell, after posting LGWR to write, user or background processes waits for LGWR to signal back with 1 sec timeout. User process charges this wait time as ‘log file sync’ event.

Root causes of ‘log file sync’, essentially boils down to few scenarios and following is not an exhaustive list, by any means!

1. LGWR is unable to complete writes fast enough for one of the following reasons:
(a) Disk I/O performance to log files is not good enough. Even though LGWR can use asynchronous I/O, redo log files are opened with DSYNC flag and buffers must be flushed to the disk (or at least, written to disk array cache in the case of SAN) before LGWR can mark commit as complete.

(b) LGWR is starving for CPU resource. If the server is very busy, then LGWR can starve for CPU too. This will lead to slower response from LGWR, increasing ‘log file sync’ waits. After all, these system calls and I/O calls must use CPU. In this case, ‘log file sync’ is a secondary symptom and resolving root cause for high CPU usage will reduce ‘log file sync’ waits.

(c) Due to memory starvation issues, LGWR can be paged out. This can lead to slower response from LGWR too.

(d) LGWR is unable to complete writes fast enough due to file system or unix buffer cache limitations.

2. LGWR is unable to post the processes fast enough, due to excessive commits. It is quite possible that there is no starvation for cpu or memory and I/O performance is decent enough. Still, if there are excessive commits, then LGWR has to perform many writes/semctl calls and this can increase ‘log file sync’ waits. This can also result in sharp increase in redo wastage’ statistics’.

3. IMU undo/redo threads. With Private strands, a process can generate few Megabytes of redo before committing. LGWR must write generated redo so far and processes must wait for ‘log file sync’ waits, even if redo generated from other processes is small enough.

4. LGWR is suffering from other database contention such as enqueue waits or latch contention. For example, we have seen LGWR freeze due to CF enqueue contention. This is a possible scenario however unlikely.

It is worthwhile to understand and identify root cause and resolve it.

1. First make sure, ‘log file sync’ event is indeed a major wait events. For example in the statspack report for 60 minutes below, ‘log file sync’ is indeed an issue. Why? Statspack is for 1800 seconds and there are 8 CPUs in the server. Approximately, available CPU seconds are 14,400 CPU seconds. There is just one database alone in this server and so, approximate CPU usage is 7034/14,400 : 50%

But, 27021 seconds were spent waiting. In average, 27021/3600=7.5 processes were waiting for ‘log file sync’ event. So, this is a major bottleneck for application scalability.

When LGWR is waiting ( using semtimedop call) for posts from the user sessions, that wait time is accounted as ‘rdbms ipc message’ event. This event, normally, can be ignored. Next highest waited event is ‘io done’ event. After submitting async I/O requests, LGWR waits until the I/O calls complete, since LGWR writes are done synchronous writes. [ asynchronous and synchronous are not contradictory terms when comes to I/O! Google it and there is enormous information about this already]

It is worth to note that v$session_event is a cumulative counter from instance startup and hence, this can be misleading. Difference between two snapshots from this view, for the same session, can be quite useful.

Tanel Poder has written an excellent tool for this. Using that tool, we can print out 1 second snapshot for LGWR session 3. Some part of the output removed to improve clarity. From the table below, 813 milli seconds were spent waiting for ‘io done’ event in an 1 second interval. That’s 81%.

3. Confirm LGWR is waiting for that event by SQL*Trace also. Tracing LGWR can detoriate performance further. So, careful consideration must be given before turning sqltrace on LGWR. Packages such as dbms_system / dbms_support or oradebug can be used to turn on 10046 event at level 12.

Few trace lines shown below. In this specific case, LGWR is waiting for I/O events. From the output below shows that LGWR submitted 2 write calls with 16 redo blocks. Waits for I/O completion events are counted as ‘io done’ event and shows that between two calls, LGWR waited for 1600 micro seconds or 1.6ms. Performance of write itself is not entirely bad.

a. From the above statistics, we can see that 5632 user calls / second were made and 495 commits
were executed per second, in average.

b. 3000 redo blocks (of size 512 bytes in solaris) is written by LGWR, approximately 1.5MB/sec.

c. 1.4MB/sec redo was generated, which is approximately 16Mbps.

d. For 3000 redo blocks, 38,155 commits were created.

Essentially, redo size is high, but not abnormal. But, 500 commits per second is on the higher side.

5. Knowledge about application will be useful here. Trace few sessions and understand why there are so many commits, if commit frequency is higher. For example, in the following trace file, there is a commit after *every* SQL statement and that can be a root cause for these waits.

6. If the commit frequency is higher, then LGWR could be essentially spending time signalling user process. Just like any other process, unix scheduler can kick out LGWR off the CPU and this can cause foreground process to wait for ‘log file sync’ event. In Solaris, prstat microstat accounting (-mL) is quite useful in breaking down how much time is spent by LGWR waiting for CPU.

8. In few scenarios, ‘log file sync’ waits are intermittent and hard to catch. Statistics might need to be captured with more granularity to understand the issue. For e.g., in the example below, we had instance freeze intermittently and we wrote a small script to gather statistics from v$sysstat every 20 seconds and spool to a file. Also, we were collecting iostat using Oracle supplied tool LTOM.

Between 4:29:43 and 4:30:14 redo blocks written statitics did not change since delta is 0. Meaning, LGWR did not write any redo blocks. But, ‘redo size’ statistics during the same duration was at ~7MB in that instance. Number of processes waiting for ‘log file sync’ event increased to 100. So, we know that freeze is because LGWR wasn’t able to write any blocks during this time frame.

In a RAC cluster, this problem is magnified, since both CR and CUR mode buffers needs log flush to complete before these blocks can be transferred to other instance’s cache.

Looking closely at I/O statistics, we could see that average service time was quite high for few devices during that time period (we mapped these devices back to log file systems later). This lead to more granular time frame and finally, it was an hardware switch issue intermittently freezing.

9. Unix tools such as truss,tusc and strace can be used to debug those scenarios if above techniques are not sufficient. But, tools such as truss and strace should be used as a last resort. In Solaris 10, dtrace can be effectively used to debug I/O or processor specific issues. Dtrace is safer by design. Brendangregg has dtrace tool kit and wealth of information here

Guide lines for resolving few root causes

Finding and understanding root cause is essential to resolve a performance issue.

1. If I/O bandwith is an issue, then doing anything other than improving I/O bandwidth is not useful. Switching to file systems providing better write throughput is one option. RAW devices are another option. Reducing # of log file members in a group is another option as it reduces # of write calls. But, this option comes with a cost.

2. If CPU starvation is an issue, then reducing CPU starvation is the correct step to resolve it. Increasing priority of LGWR is a work around.

3. If commit rate is higher, then decreasing commits is correct step but, in few case, if that is not possible, increasing priority of LGWR (using nice) or increasing priority class of LGWR to RT might provide some relief.

4. Solid State Disk devices also can be used if redo size is extreme. In that case, it is also preferable to decrease redo size.

5. If excessive redo size is root cause, redo size can be reduced using various techniques. More information can be found here

In summary, finding and resolving root cause for log file sync event, would improve application response time. This blog can be found as pdf file here

53 Responses to “Tuning ‘log file sync’ wait events”

Hans-Peter Slootsaid

I am currently investigating performance problem of an APPS database.
The average log file sync wait time is about 10 to 20 times the log file parallel write time.

I thought that I was hitting bug 2669566 (its 9.2.0.8) but snapper shows that the log file parallel write time must be correct.

I always thought that a commit resulted in always 1 log file parallel write. Statspack however shows that the number of log file parallel writes is more than 2 times the number or log file syncs.

Do you have any clue about why log file sync is so much higher than the log file parallel write?
If 1 log file sync result in about 2 log file parallel write I would expect the ratio to be a bit higher than 2 but not about 10.

Hello Hans-Peter
Thanks for visiting my blog.
LGWR processing is much more than just writes. It is quite possible that another phase of LGWR processing has a bottleneck.
Since this version is 9.2.0.8, ASH can’t be used. Do you have any other issues in OS? High CPU Usage ?Paging, swapping etc? Is LGWR getting enough CPU?
What OS are you using? Depending upon that, You could potentially use couple of methods:

1. SQLTRACE lgwr to see if there are any other issues. [ Remember SQLTRACE can slow down LGWR ]
2. Truss or strace lgwr to see if there are any bottlenecks [ Very dangerous since truss or strace can cause LGWR crash and instance crash ]
3. If you are in Solaris 10, then dtrace will be much better option to debug LGWR.

Observations and questions:
1. The number of commits/s does not match the number of log file syncs/s??????
2. The number of log file parallel writes/s is more than 2 times the number of log file sync/s
Almost everyone assumes that because of the piggy backing there is 1 parallel write for 1 or more log file syncs.

Since there are 2 lfpw per sync I would expect the log file sync time to be about 2 times the lfpw time.
But is it way way higher 153ms instead of 2 * 8ms.
I will see if I can have root privileges.

Do you have an explanation for the difference between the commit/s and log file sync/s.

Kay Guosaid

best article I’ve read so far on log file sync. Thanks Riyaj. I have one follow up question, I have a data guard setup with high wait on log file sync. when querying
select sid, event, time_waited, time_waited_micro
from v$session_event
where sid=4 — sid of LGWR
order by 3

Hi Kay
Thank you for reading my blog and sorry for the late reply.
I don’t really know your dataguard setup. So, I have more questions then answers :-(
Are you using SYNC or ASYNC mode? Looking at your output, it looks like, you may be able to improve performance of DG increasing network buffer size.
Can you also please post few lines from statspack? And more importantly, how good is your network throughput and latency?

Kay Guosaid

sorry for the late response. I signed up for email notification if/when you replied to the thread, but never seemed to got the email.

My main goal is to find out whether ‘LGWR wait on LNS’ is part of ‘log file sync’ wait. Based on the following observations, I think the answer is yes. Please let me know if it’s incorrect or inaccurate.

Our data guard setup is LGWR ASYNC. My interpretation of this issue is that, no matter it’s LGWR SYNC (where LGWR waits for the acknowledgement of RFS process on the remote server before posting back to the foreground/user session that the write has completed) or LGWR ASYNC (where LGWR wait for the acknowledgement of the LNS process on the local server before posting back to user session), network latency does contribute to log file sync. In the case of ASYNC, the impact is indirect as LGWR will only get delayed if the LNS buffer is full.

Metalink note 34592.1 seems to confirm this interpretation –
step 3 (Time for the log write I/O to complete
)also includes the time for the network write and the RFS/redo write to the standby redo logs.

From your statspack, I think, you are correct. Even though LGWR ASYNC mode prevents primary from encountering ‘log file sync’ waits due to network latency, it doesn’t necessarily prevent when LGWRLNS communication and messaging has latencies. In your case, your statspack output indicates that you may be running in to issues with LNS buffer flushing or LNS communication not completing quick enough. I guess, truss or dtrace of LGWR will confirm this.

I guess, LNS process tuning is in order to resolve this intermittent issue. Of course, network latency is another area to look at.

Kay Guosaid

Thanks.. LGWR-LNS communication and messaging latency is a more accurate way of putting it. In our case, this latency happened to be caused by network. Noted your point that there could be other reasons that slow down LNS too. Thanks for your help.

A_R_T_Y_E_Msaid

Thanks for this post, and sorry for my bad English.
I have top waits:
controlfile sequential read;
log file sync
enq: CF – contention.
Are you have a idea, where is bottleneck? and what to do?
every hour – RMAN with ‘backup archivelog all delte input all’ (to ahother server by NFS). RMAN work without catalog.
Also, DB monitored by ControlGrid (user DBSNMP).
Thanks.

Hi
I am afraid that we need more information to debug this issue. Can you please print top 5 wait events from statspack or AWR report here and also let us know duration of statspack or AWR report? That should be a good starting point to debug this.

Hello
First of all, AWR duration of 15 hours too long. AWR report is an aggregated report and so, it is very hard to decipher details based upon aggregated data.

What is that you are trying to tune? A specific business process?

I will try to help.
1. It looks like write performance to log files are not great. 12ms average for log file parallel write is not sufficient. Couple of things you can do to improve throughput to log files:
a) Try using direct and asynch I/O. I am not sure, whether underlying file system supports direct I/O or not. You might want to research that. I do see that filesystemio_options is set to setall, which means that Oracle will try to use asynch and direct I/O if the file system supports it.
Avg
%Time Total Wait wait Waits
Event Waits -outs Time (s) (ms) /txn
—————————- ————– —— ———– ——- ———
..
log file parallel write 90,505 .0 1,114 12 1.2
log file sequential read 82,154 .0 1,057 13 1.1
..
control file parallel write 48,544 .0 326 7 0.6
…
control file single write 8,255 .0 154 19 0.1
..
LGWR-LNS wait on channel 4,380 100.0 47 11 0.1
..

2. Same with control files too. control file single writes are taking an average of 19ms. Much of the control file writes are performed under the protection of CF enqueue and so, CF enqueue waits may be related to this issue.
3. Looks like LGWR ASYNC is used for log archive to standby. You might want to consider tuning LNS process also. Simply even increasing buffer size might be a good start.
4. _log_io_size is set to 256. I am not sure why. In most platforms that will result in triggering LGWR after 128KB of redo. That might trigger hyperactive LGWR. I don’t know your application well enough to recommend something here. Check the ratio between redo size and user commits + user rollbacks to see if that parameter value is warranted.
5. Last but not least, there are bugs in 10.2.0.3 with LGWR and log file sync events. In some cases, LGWR is not efficient enough when the workload is not so high. Read 34592.1 for further information. You might want to think about 10.2.0.4 upgrade. Of course, test your upgrade thoroughly :-)

In summary, this goes to basic things. Improve write throughput to redo log files and control files. Eliminate bugs with better software etc.

It is also critical to review this again with an AWR report of 30 minutes or so.

Chansaid

We had slow performance on the database and the requests from app server are queued up. I looked at AWR and ASH reports. I saw log file sync in one top top 5 wait event in AWR and it is listed as Blocking session in ASH report. THe blocking session with log file sync by LGWR was there the same time period of problem. I do not see high CPU usage or user commits during that time.

I forgot to point out that AWR report has OS_CPU_WAIT_TIME of 133,200. USER_TIME is 769,745. So, at some point during this AWR report duration, there were waits for CPUs. It is possible that LGWR was also affected by this high CPU usage in the server. That ,in turn, can lead to massive waits for ‘log file sync’ event.

User commits is low; so is user calls and redo blocks written is 0. Only redo synch time, redo synch writes and redo entries are huge.

(And the redo size is a negative number. Metalink says this is a bug in the pre-10g releases (ours is 9.2.0.7) and asked to refer the ‘redo blocks written’)

I had checked the CPU usage. In the morning, there was a request set that laucnhed 5 child threads and they all sucked up the CPU to the tune of more than 50% and I waited for their completion. But even after the completion of them and the CPU usage came down to under 10 %, the issue remained the same.

I had discussed with our OS sysadmin and he had involved the SAN guys and confirmed that, though the filesystem is active, it’s quite normal.

Hello Muthu
I am really not sure what problem you are trying to address here? Are you trying to tune that concurrent request? or Are you trying to identify why there is high CPU usage?

If you are trying to tune the request, I guess, redo size need to be reduced (or) LGWR & its components need to be tuned. Since there are 5 requests running, all of them may be generating much redo. Is this a custom program or standard seeded program? Is there a possibility of reducing redo size by dropping few indices on that table and then rebuild them later? Look for the opportunities to tune them.

Also, all these inserts, are they inserts with bind variables? or ‘insert into table select * from table1′ form of SQLS?

How many Log members you have? What type of file system? How big are the log files?

Sorry to ask you more questions, but I would like to understand the root cause before giving you some suggestions.

Muthusaid

Thanks for the quick update. I’m trying to address the concurrent request that took forever due to the ‘log file sync’ wait event. I referred the other request set and it’s CPU usage (which’s not what ppl. are worrying about), because you had asked to check the CPU usage (guideline # 2).

Here’re the answers for your questions :

This is the standard Oracle report (Journal Entries Report). It’s a good point to drop/rebuild the indices to reduce the redo generation. I’ve never thought about that. Since this is not a real production, I think, we can think about it.

It’s a straight Insert statement with bind variables.

We’ve 4 log members (4 group with one each) of 1.5 Gb each. Ours is a JFS2 type of file system.

The issue here is that, this is the break fix environment that’s being refreshed every night from the copy from production.

Hello Muthu
Sorry, it took a while to respond.
Do you have tkprof for this report run? Is the parent job slow or report is slow? What is the short_name?
There have been many performance issues reported for journal entries report. One prominent bug I see is that parent job is deleting from ar_journal_interim. Bug fix was to truncate that table instead of delete. Bug 2830651.
I guess, we will have to look at tkrpof output files to see where the slowness. If you are trying to tune the instance for log file sync issues, I guess, converting delete statement to truncate statement might help.

Great article, thanks. I’ve been playing around with strace and lsof +fG on my linux server tracing the lgwr process but cannot find any evidence that redo data is opened/written in DSYNC mode as you mention above:

Recommendation 1: Host Configuration
Estimated benefit is 29.38 active sessions, 65.66% of total activity.
———————————————————————
Action
Investigate the possibility of improving the performance of I/O to the
online redo log files.
Rationale
The average size of writes to the online redo log files was 176 K and
the average time per write was 255 milliseconds.
Rationale
The total I/O throughput on redo log files was 636 K per second for
reads and 1.2 M per second for writes.
Rationale
The redo log I/O throughput was divided as follows: 0% by RMAN and
recovery, 66% by Log Writer, 0% by Archiver, 0% by Streams AQ and 33% by
all other activity.

Symptoms That Led to the Finding:
———————————
Wait class “Commit” was consuming significant database time.
Impact is 29.38 active sessions, 65.66% of total activity.

Finding 2: I/O Throughput
Impact is 12.78 active sessions, 28.55% of total activity.
———————————————————-
The throughput of the I/O subsystem was significantly lower than expected.

Recommendation 1: Host Configuration
Estimated benefit is 12.78 active sessions, 28.55% of total activity.
———————————————————————
Action
Consider increasing the throughput of the I/O subsystem. Oracle’s
recommended solution is to stripe all data files using the SAME
methodology. You might also need to increase the number of disks for
better performance.
Rationale
During the analysis period, the average data files’ I/O throughput was
1.3 M per second for reads and 945 K per second for writes. The average
response time for single block reads was 89 milliseconds.

Recommendation 2: Host Configuration
Estimated benefit is 2.4 active sessions, 5.36% of total activity.
——————————————————————
Action
The performance of some data and temp files was significantly worse than
others. If striping all files using the SAME methodology is not
possible, consider striping these file over multiple disks.
Rationale
For file /oracle/PR1/sapdata2/dat_6/dat.data6, the average response time
for single block reads was 265 milliseconds, and the total excess I/O
wait was 22709 seconds.
Related Object
Database file
“/oracle/PR1/sapdata2/dat_6/dat.data6″
Rationale
For file /oracle/PR1/sapdata2/dat_10/dat.data10, the average response
time for single block reads was 279 milliseconds, and the total excess
I/O wait was 20281 seconds.
Related Object
Database file
“/oracle/PR1/sapdata2/dat_10/dat.data10″
Rationale
For file /oracle/PR1/sapdata1/dat_5/dat.data5, the average response time
for single block reads was 249 milliseconds, and the total excess I/O
wait was 20268 seconds.
Related Object
Database file
“/oracle/PR1/sapdata1/dat_5/dat.data5″
Rationale
For file /oracle/PR1/sapdata4/dat_21/dat.data21, the average response
time for single block reads was 124 milliseconds, and the total excess
I/O wait was 20230 seconds.
Related Object
Database file
“/oracle/PR1/sapdata4/dat_21/dat.data21″
Rationale
For file /oracle/PR1/sapdata3/dat_12/dat.data12, the average response
time for single block reads was 251 milliseconds, and the total excess
I/O wait was 20001 seconds.
Related Object
Database file
“/oracle/PR1/sapdata3/dat_12/dat.data12″

Symptoms That Led to the Finding:
———————————
Wait class “User I/O” was consuming significant database time.
Impact is 13.61 active sessions, 30.41% of total activity.

Hello Yousuf
LGWR can submit concurrent asynchronous I/O requests to multiple redo log members, if asynchronous I/O is enabled. If not, LGWR will execute write system calls in succession. And, Yes, writes to both members must complete before the commit is declared successful.

The application is several perl processes processing incomming messages on several message queues. There is 4 perl processes (per queue) polling every 10 ms (average of every 2,5 ms). The response time requirement is 100 ms, but we see log file sync waits in trace above 300 ms:

Hi Lasse
Background elapsed time and background cpu time comes from v$sess_time_model view. Background processes charge time to background% statistics instead of DB time and DB CPU statistics.
Background elapsed time ~= background cpu time + background wait time.
In your example, 31% of time spent on CPU and 76% spent on waiting for idle wait event (rdbms ipc message).
Is your log file sync wait time consistently that high? Can you print the histogram using the script: https://orainternals.files.wordpress.com/2008/07/event_histogram_delta-sql.doc.
Also, check LGWR trace file to see if the feature adaptive log file sync is in play. Of course, it would be great if you can post RDBMS version and OS version.
I have seen numbers not adding up due to various rounding issues with the statistics. In your case, 1second seems to be causing the rounding issues. Can you increase the time window to 10 seconds to see if you can eliminate rounding errors?

We’re a group of volunteers and starting a new scheme
in our community. Your site offered us with valuable info to work
on. You’ve done an impressive job and our entire community will be thankful
to you.

Hi superb website! Does running a blog like this require a
lot of work? I’ve very little understanding of computer programming
however I had been hoping to start my own blog in the near future.
Anyways, if you have any recommendations or tips for new blog owners please share.

Juergensaid

I have the following performance problem – log file sync increases while switching a single tablespace in backup mode (impact not visible for “alter database begin backup”).
In the application (SAP) all workprocesses are in status “commit” ;-) The most impact is switching the UNDO-tablespace in backup mode. I think this depends also on our db_cache_size of 155GB.

Under normal operation the db seq read time is 3ms and the log file sync 7ms – and I see also no “private strand flush”/”cannot allocate new log”/”checkpoint not complete” in alertlog.

Hello Juergen
Thank you for visiting my blog. ‘alter database begin backup’ and ‘alter tablespace begin backup’ commands work differently.
What database /OS version?
Begin backup commands should trigger a checkpoint for all the blocks of the tablespace files to be written to the disk, which would trigger DBWR to work harder; DBWR can trigger LGWR to do log file sync.
To debug this, I would start to understand what the DBWR and LGWR are doing at this time. Since the application is suffering from commit issues, focus on LGWR, review statistics of LGWR session, strace LGWR to see if it is stuck on I/O etc.
If you don’t mind gmail me a text AWR report to rshamsud, of course, remove all sensitive information.
Thanks,
Riyaj

Hi
Thanks for reading. I uploaded a zip file containing all scripts to the blog entry.
I have not tested them in the recent versions though, however, I do expect them to work. If you find issues, please let me know.
Thanks..