December 4, 2006

Resizing the SGA

I’m still a bit twitchy (i.e. nervous) about using the automatic SGA tuning – which means setting the sga_target and letting Oracle decide when to move memory between (typically) the db cache and the shared pool. I’ve come across four systems so far – two by direct contact, two by indirect report – which have had hanging problems which magically disappeared when the sga_target was disabled. It has to be said that all four systems had large SGAs – 8GB to 64GB – and they were all busy with lots of active sessions, so small office systems are probably quite safe using the sga_target.

Whilst it’s hard to be specific about why the problems occur, it seems likely that it is possible that some sort of race condition occurs where the code decides that the db cache needd to grow moments after it has started to shrink it to allow the shared pool to grow. (Or vice versa, of course).

If you think it’s worth keeping an eye what the sga_target is doing to your system, you might want to run a quick query against the v$sga_resize_ops view from time to time, to see if there are times when a couple of the heaps swap memory back and forth in rapid sequence. The following SQL was a quick script I hacked together to do this on the last site where random hangs were occurring:

Note: I only realised some days after I wrote this note that my output was only appearing in the correct because of timing. The order by clause is using my column alias of start_time, not the underlying table (view) column start_time. If I had had any date from the previous month in v$sga_resize_ops then my choice of character conversion would have made it appear at the end of the output.

At the linesize of 90 I set, this produced a two-line per row output – the following is from my laptop, not from one of the big systems:

Another symptom that you might watch out for at the same time is memory in the shared pool being used for data blocks – as Tanel Poder has pointed out, when v$sgastat is showing memory in the class “KGH: NO ACCESS”, this is a case of shared pool granules partially freed for use by the db cache and waiting for the rest of the granule content to be unpinned so that the entire granule can become a cache granule.

Related

I’ve been using automatic SGA tuning in all of my databases (10.1.0.3, 10.1.0.5, 10.2.0.2) for a couple of years now and have not had a problem with it. Typically there are no SGA size changes for a few weeks, then a flurry of about a dozen small changes over a couple of minutes, then another month with the same configuration. My DB’s rarely have more than about 100 connections, though.

I have had an entire DB hang when changing the value of SGA_TARGET, scope=both, so I don’t do that anymore.

This is an interesting post as at a previous place that I worked we saw strange activity to do with ASMM.

Basically it looked like, in certain situations, it would try and grow the shared pool which obviously meant that it had to take memory from some other pool. By looking at v$sga_resize_ops we could see that when we had poor performance coincided with resizing operations.

As far as I could see it looked like it reached a point where it was trying to grow the shared pool but was unable to decrease another pool, at this stage the database would appear to hang.

We managed to get around it by setting lower limits for each of the pools so that it would not try to take it below this threshold. This seemed to work for us.

We also saw problems with our 10.2.0.1 system immediately after migrating from 8i. Even though we did masses of testing before migration and had ASMM switched on during the testing, we didnt see any problems until we went live.

It looked like we migrated, overnight batch ran, and then the shared-pool shrunk right down while the buffer cache grew and grew. Our shared pool got stuck at a size that was 50% of what we needed when we were running 8i! Then the OLTP daytime users came on line and the pools couldnt successfully resize; Result was that our parse rates went through the roof and performance was very very bad. This wasnt the only issue we had mind but it is one to be careful of. I would point out that most of our other systems with smaller workload, fewer attached sessions ( thanks again for the help.

[…] Jonathan Lewis discusses his experience with automatic SGA sizing. I have an Oracle10g database that utilizes automatic SGA sizing. I have not experienced any problems to date, however this installation used by a development team is small and surely doesn’t see a high amount of activity other systems might. […]

I’m wondering if someone could shed some light on the values of both the STATUS and OPER_MODE columns, in context, within v$sga_resize_ops.

I’ve been doing some load testing using ASMM. I see both the shared pool and the db buffer caches resize themselves as I would anticipate, under load, but I’m never sure if they actually do resize as the OPER_MODE column often indicates ‘defered’ while STATUS indicates ‘complete’. This seems a little incongruent to the ignorant.

We have ha d problems with ASMM. We had over 900 grows and shrinks per day, leaving effectively the pools the same. But there were many ora-4031. We have set the shared_pool manually. Increase of 20% The ora-4031 seem to have disappeared.
The grows and shrinks are persistent (600) per day. I can’t see the benefit of constantly exchanging 16M of memory. In other databases I see similar behouviour. Should you advice to turn ASMM off?

Albert-Frank, I would raise an SR with Oracle about this and give them a few days to diagnose the problem – especially since you seem to have avoided the critical issue. But if you get any clue that this constant fluctuation is causing a performance problem, I would turn the feature off to see if it made any noticeable difference.

[…] Infrastructure — Jonathan Lewis @ 8:29 pm UTC Apr 16,2007 Some time ago, I wrote about resizing the SGA and the problems of automatic SGA management. In the follow-up, one of of the suggestions was to […]

Just found this post and thought I’d let you know that we are getting intermittent ORA-04031 errors using ASMM on 10.1.0.3 on Solaris. This always coincides with memory being moved from the buffer cache to the large pool (we’re using shared server too). I’m intending to turn off ASMM to resolve this.

We are running 10.2.0.3 HP Itanium and will have to support a few thousands sessions in production. Since 2 years we are using automatic memory management. I did not have SGA issues with a limited amount of sessions and a limted SGA of let’ s say in the 500Mb – 800Mb range. We scaled up our development SGA to support thousands of connections by configuring shared server. Since we are running shared server mode I configured a SGA of 2Gb with threshold values for both shared pool, large pool and buffer cache. Today I faced an ORA 4031 at the shared pool level at that moment my shared pool was about 900Mb however trace files shows me the “KGH: NO ACCESS” component of the shared pool was about 600 Mb. If the above assumption “this is a case of shared pool granules partially freed for use by the db cache and waiting for the rest of the granule content to be unpinned so that the entire granule can become a cache granule” is true – what I would like to believe – I conclude there was a lack of free contiguous memory in my shared pool because of part of it was not -yet- available for it. More and more I ‘ve been thinking of turning the automatic SGA (re)sizing off and to move back to the old fashionned manual sizing of the various components. I wonder where I can get some info of the various shared and library cache components to get some in dept knowledge of it.
Regards
Guy

Metalink Note:396940.1 mentions Bug 5045507 related to “KGH : NO ACCESS” increase in v$sgastat.
The only possible workaround is to disable ASMM, what I will do.
Personally I think there could be an interference between the likelyhood of occurence and a non zero value for session_cached_cursors combined with a huge amount of sessions. ASMM has never been my favourite 10G feature, this experience moves ASMM to my least appreciated features of 10G
Regards
Guy

I have to confirm that in highly turbulent db environments (highly concurrent, many transactions, many sessions, so quite typical OLTP) ASMM is unstable.
We set it after moving to 10gR2, but had to return back to “static” memory settings. There were quite critical problems (few times some of databases crash – traces rather laconic, sometimes one or more ORA-04031 errors, actually there was nothing to send to Oracle Support) – after change of parameters we enjoy stable work (no mysterious crashes no more)

We suspect issues in our heavy OLTP application with AMM in regards to dynamic shared pool shrinks particularly. When AMM was turned on we had nearly constant adjustments and under heavy load we were getting lots of shared pool contention in the form of “Cursor: Pin S wait on X” waits. These waits would eventually render our DB unoperable due to excessive CPU utilization. We found flushing shared pool alleviated the issue temporarily. Worked a tickey with Oracle for a few months and started monitoring sql area, shared_pool free and dynamic shrinks. Turns out that our debilitating events were corresponding to successive shared pool shrinks. We’ve fixed the shared_pool and db_cache_size by setting sga_target = 0 and our issues seem to have disappeared.

I don’t like the expression “best practice” – a better term would probably be “commonly used defense mechanism”.

It’s been 6 years since I wrote the note, so things have moved on and the threat is diminished – but I would still be inclined to include some minimum figures for the shared_pool_size and db_cache_size if using the sga_target.

11gR2 has introduced the memory_target, of course – and my reluctance to use that matches my previous reluctance to use the sga_target.