Montag, 18. Mai 2009

The whole idea to create a 'before select trigger' is based on a comment Martin Jensen dropped during his Seminar about Advanced Materialized Views in Vienna this week. So this all is not my idea, I just wanted to test the suggestion to get used to it.The whole functionality utilizes FGAC.As someothershave also noticed, the function is called twice per every execution, so I have to reduce the effective execution of my action.

So let's start with - a log_table:

create table log_table(datum date,c varchar2(80)) ;

Now something which can write into this table.

The package is necessary to hold my package variable tot_calls and eliminate every 2nd execution of the function (otherwise it would fire twice, not recommended in most cases, I guess).

The autonomous_transaction might be necessary in most cases, but if someone find business-cases where it's recommended not to be autonomous, just drop this line and have the trigger under full transaction control.

Anything more to say?Of course, as allways, this is only a proove of concept.As allways, most parts are borrowed from several sites.Don't use this blindly anywhere. I totally left everything out, which could confuse me. Like exception handling, hard testing, theoretical and practical security checks, performance baselines, etc....and at the end, the most global citation from Martin Jensen: 'If it's not tested, it does not work'

Samstag, 9. Mai 2009

An hour after I finished my work on explaining and supporting one poor session, I got a call from my honoured colleague again.

'The session stopped doing any sorts in v$session_longops and also no more segments on the indices are written. What's going on?'

My first guess was 'Did it just finish?', but that was not the case. The table still seemed to be empty (so no commit yet). The session was still active, but SQL_HASH_VALUE was NULL. Also sampling v$session_wait showed diferent events, so the process was doing anything. We just did not know, what.

Even I didn't know, what's going on, I knew something is going on. I couldn't pin it down with the knowledge I have using plain oracle methods, so I decided to switch to another method. First OStackProf came to my mind, but unfortunately I was sitting on an Apple Computer at this moment, and I didn't want to care about VBS on Apple. Not a big deal at all, because after a short crawling through Tanles box of marvelous miracles, I grabbed os_explain.

Let's have a look what the process is doing.

After running

pstack 11460 | ./os_explain

some times (5 or 6 where enough) it was clear there is no ordinary execution plan running. We got 2 stacks ( with little changes in the end, but this didn't count). Putting these side by side for easier camparison:

To decipher these names (ok, in this particular case it wasn't too difficult, even for me) we had a look in Note:175982.1 - ORA-600 Lookup Error Categories. kcbgcur will be more or less Kernel Cache Buffer Current (don't know about the g) and ktundo is about undo and rollback management.

This still didn't answer all my questions, but what can a session be doing if it doesn't execute a SQL, (there is no sign for PL/SQL, java or similar currently executed by the session) and there is a lot of work performed in Cache and Undo? The next idea was: Rollback!

v$transaction came into my mind. And a little bit googling led me to this query:

Yes, USED_UREC was decreasing! So the poor session, which worked for mor than 12 hours now, did a rollback.

Some phone calls later, the operations-guy who started the whole script told us 'Yes, about noon there where some hanging situations on my PC, but Toad looks ok again, I'm just waiting for the script to finish'.

Poor guy, and lucky me for enabling Dead connection Detection in sqlnet.ora. Otherwise the whole statement would have run until the end and started the rollback THEN.

This was the last chapter in the odyssey of my little poor session. After the rollback finished, it just closed (as the client connection died before).

Some words at the end?

I assume, the script which caused the problem was never tested properly.

Even with limited knowledge, a more or less systematic aproach is possible.

There is a lot of free information and tools out there. Just try to know about them before you need them.

I should try to learn more about sessions in rollback.

No fancy Oracle-features where needed. No statspack, no AWR (as the DB is 9i - did I mention that?). Not even dtrace.

It was a great day for me, working with a DB again and not only talking about DBs in endless meetings.

Donnerstag, 7. Mai 2009

Today a single session made my day!In the later morning I got a call form a colleague. I would call him a kind of ApplicationDBA, even there is no such definition in our company. He asked my, why a particular session is doing a lot of sorts in gv$session_longops. He also tole me, what the session was doing:this script was started in the early morning by an operations-guy:

In fact, I did not know what's going on. But that's enough teason to start digging.And as the session did run for some time now, I was pretty sure it would continue to do so and therefore give me enough time to search and think. (Yes, some Services where unavailable, but he agreed not to kill any session or reboot an instance without knowing what's going on).

My first search was the actual SQL (v$session sql_hash_value => v$sql_text).It was something like

INSERT /*+ APPEND */ INTO MV ASSELECT .... FROMUSER.TABLE@LINK;

I told my colleague about this, but he replied 'Yes, but I checked the MV, it's empty'.Ok, this even I could explain by a simpe 'ASIC' - or in this phone call 'you will see the rows after the commit, and this can last'.This answer was acceptable, but nevertheless, the legitimate question was 'how long will it run; and wehre is it standing right now?'How to answer this?

v$session_longops

Does only give informations about the 'current' longop, not about the total transaction.

But as we assumed, there might be indices written right now (vsession_longops showed sort and sort merges), the next step was to check anything we can get about the indices.Even we could not access them directly, we could check for some footprints:

By sampling this query we saw some full populated indices, and one just growing. So even for us it was clear what's going on: the indices are calculated and populated.

The next question I had to face was 'why is the index-creation so slow?'Hmmm. Who can answer this?manual sampling vsession_wait gave some hints:some gc... events and latches on the buffer where top.(Did I mention, this is on a 2-node RAC? ;-) )

It was time to switch to the other node and check for running sessions which has also cache fusion waits, and could interfer with our session of the day.

After some sampling and searching (vsession, vsession_wait, later on vsql_text) and additional clicking in Grid control (yes, it can be useful!) I found a possible culprit:USER1 run a lot of queries on a view which contains USER.MV.I didn't proove, but I guess, Instance2 held the master for MV as there where 100 sessions hanging on the View. But I didn't care of - GV$GCSPFMASTER_INFO - I just wanted to get rid of all the cache-fusion waits until the MV is created.Some might have seen the solution for this yet, it's not hidden.Our Silver Bullet of The Day was a simple:

REVOKE SELECT ON MV FROM USER1;

(had to be run for several times, as there where DDL-timeouts...))

Afterwards, the index-creation run quite fast - fast enough for us.

It's not the end of the story, but enough for now,just to leave enough space for Part 2 ;-)

Some infos at the end:

Many parts I can not explain in more detail, simply as I didn't log everything I did and therefore have no exact queries, output or whatever. (Maybe this is the most important area to improve myselve!)

I also used Tanel Poders latchprofx and snapper to collect well sampled data about the session.