January 21, 2007

Shared SQL

Shared pool is where the SQL code and the data dictionary lives. Isn’t this structure ‘mostly’ READ_ONLY? Why latch/lock? Why do you care to latch? What will be corrupted in the library cache? Why does Oracle have to “protect” it, as you say above?

It’s a good question, so I’ve decided to write a note about it, rather than replying directly to the comment.

A lot of the work done by Oracle uses linked lists. For example, each “cache buffers chain” is a linked list, and the reason you grab the cache buffers chains latch is to make sure that the chain (linked list) that you want to traverse does not get damaged (by someone else adding or removing an entry) as you walk along it.

So the latching for data blocks has very little to do with updating data, it’s essentially about finding out whether the block you want is in memory, or getting it safely into memory if it isn’t already there.

The latches for the library cache and shared pool are the same. You acquire a library cache latch when you want to walk along a linked list of objects in the library cache to find, or insert, an object. (In the case of the library cache, you tend to talk about library cache hash buckets, rather than library cache chains).

The shared pool latch covers the lists of chunks of free memory; if you need a chunk of free memory you acquire the latch so that you can take a piece of memory off a list and use it. If you free up some memory (by kicking something out of the library cache) you grab the shared pool latch so that you can safely add the free chunk to the correct chain in the shared pool. (Of course, you will previously have grabbed the library cache latch to knock the object out of a library cache hash bucket to free the memory it was using)

There are lots more details and, for a little more information, you might want to look at a couple of the notes that I originally wrote for the Dizwell Knowledge Base on this and related topics.

Related

“by someone else adding or removing an entry”
I’m trying to get a mental picture of this. In a perfect (OLTP) world, would most of the changes to linked list entries (data blocks and SQL area) involve movement up and down the lists in accordance with ‘least recently used’ rather than things being removed or added to the list ?

Gary, you have to remember that there are several different classes of linked lists running through the buffer headers array, and a couple running through the library cache.

For example, you may have 128,000 cache buffers chains (linked lists) which make it possible for your session to find a particular block very quickly if it is in memory – compute the id of the chain it’s on and the linked list is very short. At the same time, there may only be 16 LRU chains (linked lists) running through your whole buffer and (virtually) every buffer is on one of those chains.

So when you kick a block out of memory, you go to the end of an LRU chain to disconnect the buffer from it – but you then have to work out which cache buffers chain it’s on and remove it from that chain as well.

If I see the report , Library cache lock is second(114 minutes). I was using one of your stats pack analysis link from your jl comp page . Looking deep into the report I found we are having a hard parse of 1.69/ second.

I am wondering does this hard parse alone is constituting for the library_cache_lock or some other factor can also constitute. If so, what should I be looking for………your guidence in this regard would be really helpful.

I took the snap shot for this hour , because one of my batch program when I run in development it runs for 35 minutes and when I run during the above hour in production it is taking more than an hour ( to be exact 1 hour 20 minutes).

CT – which specific version of 10.2, it is a little odd (though not signficant) that your percentages sum to 50%.

It seems unlikely, though not impossible, that a small number of “hard parses” would cause the time loss. However, it is a possible that a single operation hitting a silly bug could jam your batch job for ages. (Are you, for example, running with automatic SGA management – there might be a previously unrecorded bug in the code that caused a single parse call to hang for ages).

Bottom line, though, it is not possible to draw any conclusions about your batch job from this summary. After all, the batch job might have voluntarily lost its time by yielding the CPU in the resource manager code path; it might have had to wait for 22 of the ‘lost’ minutes because of TM lock waits; it might have had to compete more for latches, and therefore used more CPU

Since you have a batch job to watch, a low-impact strategy would be to add selects from v$mystat and v$sesstat (for the current SID) to capture the work done and time lost by the session. Otherwise you just need to trace the batch job (10046 level 8 ) to see exactly where the time was spent.

Looking in to the other batch pgms running at the same time I found sequence cache are set for 20 where as we generate 35,000 unique numbers. I am not sure whether this is causing the latch problem……..I need to change the cache and see how they are performing.

I have tuned my batch program with the help of 10046 level 8.
.

I did not understand what do you mean by this

“After all, the batch job might have voluntarily lost its time by yielding the CPU in the resource manager”

CT – read about the Resource Manager and the package dbms_resource_manager. A session which is subject to resource management will go into ‘resmgr’ waits when the CPU is under pressure. Note – the fact that this is happening suggests that at some stage in the batch, you were under CPU pressure – which may be why your other task had trouble running, and may have exacerbated any latch problem you had.

Precede the code with <code>, follow it with </code>, and replace all spaces with &nbsp;.

Dear Mr.Lewis
Thank you very much for the guidance. I believe you are right ,it is the resource manager where I have to look and might be I should talk to our DBA who is controlling this.

Thought you might be interested in the observation what I made after changing the sequence cache, Here it is….

I changed the cache for the few sequences, last night snap shot shows me the library cache lock waits went down to 695 (from previous nights 1816) and wait time has gone down to 2023 (from previous nights 5307). But my program tokk the same time to complete (Not so lucky last night).

once again Thanks for your time . I am working in oracle (or any data base) for last 1 and 1/2 years so not quite knowledgeable , please bear with me if I throw any stupid questions .

While working in the telecommunications closet at Evco Insurance, you walk around the equipment rack and notice one hub with 2 ports whose collision lights are blinking almost constantly. Being a conscientious network professional, you point out this problem to the network manager.
Question:
What troubleshooting step would be a wise follow-up to this discovery?