Coskan’s Approach to Oraclehttps://coskan.wordpress.com
What I learned about OracleTue, 11 Jul 2017 18:00:48 +0000enhourly1http://wordpress.com/https://s2.wp.com/i/buttonw-com.pngCoskan’s Approach to Oraclehttps://coskan.wordpress.com
oratop from MOShttps://coskan.wordpress.com/2013/08/02/oratop-from-mos/
https://coskan.wordpress.com/2013/08/02/oratop-from-mos/#commentsFri, 02 Aug 2013 10:12:10 +0000http://coskan.wordpress.com/?p=1222]]>This is just a pointer post to a tool which I discovered today. It is ORATOP, to monitor databases real time like top of unix flavours. It is pretty much like MOATS (tool from Tanel Poder and Adrian Billington and RAC version by Jagjeet Singh), with a bit better output (at least for my taste)

It doesn’t need anything to be installed which is the best part of the tool.

oratop – utility for near real-time monitoring of databases, RAC and Single Instance (Doc ID 1500864.1)

Sample output is like below. It can be a good addition to your performance tuning toolkits.

]]>https://coskan.wordpress.com/2013/08/02/oratop-from-mos/feed/6coskanoratopDirect Path Reads Againhttps://coskan.wordpress.com/2013/04/19/direct-path-reads-again/
https://coskan.wordpress.com/2013/04/19/direct-path-reads-again/#commentsFri, 19 Apr 2013 19:20:31 +0000http://coskan.wordpress.com/?p=1206]]>Note: This post is about diagnosing real life scenario caused by direct path decision change on 11.2.0.3 which already covered by Tanel Poder on his awesome Optimizer statistics-driven direct path read decision for full table scans (_direct_read_decision_statistics_driven)post
If you already read and know how to approach the scenario you can easily skip this post and wait another year till I write another post while I’m working at the bank.

Story

Number of Oracle database in our estate is a high number and they are on shared storage arrays which means if one of them is naughty on I/O it has a potential impact on others.
To be proactive, from Time to time I run bunch of queries on all of the databases in a loop to check what is the wait profiles from dba_hist_active_sess_history and v$active_session_history
to see if we can optimize some of the work and offload those unnecessary I/O to unused CPU by utilizing more memory or getting rid of direct path reads when necessary.

Query is really simple and it works for my needs.
All I do is on the first query get the count(*) from ASH grouped by event. By this way I can get the overall wait profile within the ASH.
For the second query get count(*) for wait_class I/O grouped by sql_id ,sql_plan_hash_value,force_matching_signature,event. By this sql I can get the top sql_id for I/O waits (not top elapsed which is all matters for end users)

I’m not saying this is the right approach but practically it works for the purpose.

SQL: RR columns are ratio to report

prompt ************************************
prompt **** ASH OVERALL WAIT PROFILE
prompt ************************************
SELECT MIN(sample_time) min_ash_available,sysdate-MIN(sample_time) available duration FROM v$active_session_history;
select * from (
select NVL(event,'CPU') event,count(*),
round((ratio_to_report(sum(1)) over ()*100),1) rr
from gv$active_session_history
WHERE user_id<>0
AND sample_time<trunc(SYSDATE+1) AND sample_time>trunc(sysdate-1)
group by event
order by 2 desc
) where rownum<10;
prompt ************************************
prompt **** ASH I/O by SQL_ID
prompt ************************************
COLUMN force_matching_signature FOR 999999999999999999999999999
select * from (
select
sql_id ,sql_plan_hash_value,force_matching_signature,
NVL(event,'CPU') Event,
count(*),
round((ratio_to_report(sum(1)) over ()*100),1) rr
from gv$active_session_history
where
1=1
AND wait_class LIKE '%I/O'
--AND event IS null
and user_id<>0
AND sql_id IS NOT NULL
group by
sql_id,sql_plan_hash_value,event,force_matching_signature
order by 5 desc
) where rownum<30;

For one of the databases which is not looked after by me the output was like this

From this output, we can say, for the period I checked, overall sessions are %63 on CPU and %20 on direct path read. If we checke the ASH I/O we can see same sqls with different plans (804t23t68dbmw,9wthcvamy608f) are contributing in total 77% of that %20 direct path read. Then next step check how much I/O per run the sql (804t23t68dbmw) is doing

131MB per run (PIO_MB_PE) is looking small at first, but If you check the total reads and executions it looks a bit scary (Note: I use I/O_INTER_GB as we have exadata environments as well, in this case it is not relevant) I checked how many times in a day it runs and how much total being read

It looks like our table (all partitions) is in the cache. 15.2K blocks over total 15.3K blocks

SQL> select count(distinct block#) from v$bh
2 where objd in (select data_object_id from dba_objects where object_name='PROBLEM_TABLE' and object_type like 'TABLE%');
COUNT(DISTINCTBLOCK#)
---------------------
15260

So, table Size is small to fit in cache also Table is already in cache then Why we are scanning via direct path read ?

To know the answer you need to be a good blog follower like me and immediately recall
Tanels awesome post “Optimizer statistics-driven direct path read decision for full table scans (_direct_read_decision_statistics_driven),
which is new change on 11.2.0.3 where our version is.

To make sure this behaviour change is the case, I run the test with _direct_read_decision_statistics_driven=TRUE (default) and then FALSE

It looks like table partitions are truncated after the nightly stats gathering and starting to get load till the next truncate.

When we check with business, they also confirmed the behaviour.

So what is next,

As the filter is the partition key which needs full partition access, indexing is not a solution.
To increase sga, will help but then why would we, if the table can fit to the cache most of the time (till it grows to a certain point during the day)
To change the parameter session wide, is not possible via trigger
To change the parameter in the whole database is also not something we want to implement

Last option is to have representative statistics in the database gathered when the table is full in half way and then lock them. (or set the stats manually like Tanel)
By this way we can avoid using undocumented parameters, we can still get buffered reads till a certain point throughout the day (oracle may decide to change the path if the table gets bigger in size)

Well thats about it. Finally I manage to write a blog after 11 months right before a year ended (banking is killing productivity and increasing spam comments)

I have to admit, when the scripts are ready, it takes about 10 mins to sort the problem but it took 2 hours to share it , so I hope you enjoyed it.

Morals of the story are check your systems regularly, read blogs and god bless Tanel:)

]]>https://coskan.wordpress.com/2013/04/19/direct-path-reads-again/feed/10coskanPlan stability through Upgrade to 11G-Why is my plan changed?-Auto Adjusted Dynamic Samplinghttps://coskan.wordpress.com/2012/05/31/plan-stability-through-upgrade-to-11g-why-is-my-plan-changed-auto-adjusted-dynamic-sampling/
https://coskan.wordpress.com/2012/05/31/plan-stability-through-upgrade-to-11g-why-is-my-plan-changed-auto-adjusted-dynamic-sampling/#commentsThu, 31 May 2012 19:59:41 +0000http://coskan.wordpress.com/?p=1191]]>Todays story is about, how dynamic sampling can effect your plan stability during upgrade and how you can recover from it as long as you know what your are doing.

A little bit of background, I’m currently in the middle of migrate and upgrade cycle for 12TB warehouse from 10.2.0.3 Solaris to 11.2.0.3 (yet) Linux.
Plan stability on this system is maintained by locking the main driving table statistics and hinting the queries in case CBO does not pick the right driving table.
The system uses temporary tables a lot for staging operations and they rely on dynamic sampling Level 2 which 2 is enough. They also use parallelism for the heavy batch operations. Good part of the system is, since they use partitioning heavily they can simulate the close to real workload on cut down version of the database. What we are doing is getting the cut down version 10.2.0.3 export from solaris imported into 11.2.0.3 linux and run the batches for the close of business date with same memory and other settings. Initial load test done on OFE(optimizer_features_enable)=11.2.0.3 and we saw many parallel queries started to suffer. Our aim was not to check what is slow just collect the necessary data so we did not tune anything. Then we did another test on OFE=10.2.0.3 and saw that 10.2.0.3 did way better for parallel queries but it also suffer for other issues which is not subject of the current post. We have many other runs with different options for OFE and new stats but they will also be covered later if I come up something interesting. Todays story is why our parallel queries suddenly went bad on OFE=11.2.0.3

After this background now I can start the actual story.

Once I check the plans I saw something interesting. Oracle suddenly started to use dynamic sampling 7 without any manual setting. I did ask it on twitter and thankfully Timur Akhmadeev pointed me the right direction on the manual (RTFM rocks)

About where the dynamic sampling is used

Oracle 10GR2 documentation says

14.5.6 Estimating Statistics with Dynamic Sampling
This dynamic sampling feature is controlled by the OPTIMIZER_DYNAMIC_SAMPLING parameter.
For dynamic sampling to automatically gather the necessary statistics, this parameter should be set to a value of 2 or higher.
The default value is 2. See "Dynamic Sampling Levels" for information about the sampling levels that can be set.

Oracle 11GR2 documentation says

13.6.2.2 When the Optimizer Uses Dynamic Sampling
During compilation, the optimizer decides whether to use dynamic sampling based on a number of factors,
including whether the statements use parallel processing or serial processing.
For parallel statements, the optimizer automatically decides whether to use dynamic sampling and which level to use.
The decision depends on the size of the tables and the complexity of the predicates. The optimizer expects parallel statements to be resource-intensive,
so the additional overhead at compile time is worth it to ensure the best plan. The database ignores the OPTIMIZER_DYNAMIC_SAMPLING setting
unless set to a nondefault value, in which case the value is honored.
For serially processed SQL statements, the dynamic sampling level depends on the value of the OPTIMIZER_DYNAMIC_SAMPLING parameter and
is not triggered automatically by the optimizer. Serial statements are typically short-running, so that any overhead at compile time
could have a huge impact on their performance.

Did you spot the difference ? on 10GR2 the decision of the level completely depends on the optimizer_dynamic_sampling parameter (system,session,hint level),
however on 11GR2 for parallel statements optimizer “automatically” decides whether to use dynamic sampling and which level to use.

Since 2005 where my Oracle career starts, I learned one thing, When there is something “automagically” adjusted and decided, there is always big possibility for less stability.

Below is the difference in a simple case where I create a table, do not gather statistics, keep optimizer_dynamic_sampling=2 and run explain plan with parallel hint.

As you can see Oracle adjusted my default dynamic sampling level to 6 on OFE 11.2.0.3 when the statement is parallel and keeps it as Level 2 when statement is serial or OFE is lower than 11.2.0.1. Since this is a simple case there is no effect but when you have complex joins which you are sure about the plan you can suddenly start to see changes and got heavily hit by the feature. I can assure, I had the issue with tables that has got statistics as well so it is not the case for tables that need dynamic sampling.

Since I introduced you with the feature, next question is how can we control the feature?. I think optimizer guys did a mistake here and did not give any control via an optimizer parameter. How do I know it ? If you follow what I have done in the optimizer features post you can see there is no controlling on this feature via a parameter. So we only have optimizer bugfixes as controller.
Lets check the bugfixes related with dynamic sampling.

For this I use the sql below for bugfix.sql

column sql_feaure for a34
column ofe for a10
column description for a64
select bugno,optimizer_feature_enable ofe ,sql_feature,description,value from v$system_fix_control where (sql_feature like upper('%&1%') or upper(description) like '%&1%')
order by 2 desc nulls last,1 asc;

As you can see dynamic sampling is back to the default level 2. You can use this fix control till oracle comes up with a better idea for your parallel queries where you want plan stability and you are happy with your defauly dynamic sampling level.

Attention I must also warn the a bit of unexpected behaviour that this automagical adjustment is working a bit weird. When I set the dynamic sampling to level higher than 2 Oracle accepts the new level as adjusted value like below

Update:After reading the blog posts Dominic shared in comments, I ‘m now sure feature is only active with default optimizer_dynamic_sampling=2

]]>https://coskan.wordpress.com/2012/05/31/plan-stability-through-upgrade-to-11g-why-is-my-plan-changed-auto-adjusted-dynamic-sampling/feed/5coskanCarlos Sierra started blogginghttps://coskan.wordpress.com/2012/04/25/carlos-sierra-started-blogging/
https://coskan.wordpress.com/2012/04/25/carlos-sierra-started-blogging/#commentsWed, 25 Apr 2012 11:34:32 +0000http://coskan.wordpress.com/?p=1188]]>Just a quick post for letting people know that Carlos Sierra started blogging in his Oracle SQL Tuning Notepad. He is the developer of many tools like “coe_xplain.sql, followed by profiler.sql, Trace Analyzer (TRCANLZR), SQLTXPLAIN (SQLT) and SQL Health-Check (SQLHC). ” which are available in MOS.

Never seen or talk him personally but I’m big fan and user of his tools for a long time and I’m sure he will write plenty of tuning stories. I would like to say welcome to Oracle Blogosphere and wish him good luck with his blogging journey

]]>https://coskan.wordpress.com/2012/04/25/carlos-sierra-started-blogging/feed/1coskanWhen dbms_xplan.display_sql_plan_baseline fails to show the planhttps://coskan.wordpress.com/2012/04/11/when-dbms_xplan-display_sql_plan_baseline-fails-to-show-the-plan/
https://coskan.wordpress.com/2012/04/11/when-dbms_xplan-display_sql_plan_baseline-fails-to-show-the-plan/#commentsWed, 11 Apr 2012 20:45:16 +0000http://coskan.wordpress.com/?p=1180]]>Last week my colleague (Paul Reed – my new content provider with issues on his databases ), had a problem where he needed to identify which baseline was using index
on a plan where collection table (like SELECT column FROM TABLE(:B1) ) was used.

When he tried DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE he got “ERROR: An error has happened, error code: 22905” which was related with the bug below.

When he asked me, if there is a workaround, I was under the impression that dba_sql_plan_baselines has got the outline information, so I asked him to check if he can get the outline info.
He checked and make us realize the truth that cbo developers did not make outline information available to the public within dba_sql_plan_baselines table (I’m still thinking why not ??)

We checked the other options (dba_hist_active_sess_history.SQL_PLAN_OPERATION) and found out what we needed but this was not enough for me.
While I was thinking how we can get the outline, my collegue managed to get the answer accidentally which is the main reason, I’m writing this blog post.

To keep the baseline safe he was backing up them to staging table with DBMS_SPM.pack_stgtab_baseline procedure
and when he select from the staging table he saw that comp_data column is actually the outline information.

Here is how he did it with a little bit of polishing.
(I will be using the baseline generated on this post – dbms_xplan.display_sql_plan_baseline works for this but not important for the post purposes!!)

It is simple for this query but It can be hard to guess the plan from this output what you can do is paste the outline information the the sql as hint like below.
Only line you need to remove is IGNORE_OPTIM_EMBEDDED_HINTS to avoid ignoring hints.

As you can see with this method we can find see outline for the baseline and with using that outline the plan inside the baseline (%99 of the time-sql should be same and also underlying objects)

You can use this method (till oracle provides outlines) when you hit the bugs below as well.
As long as you have a baseline it can be used. I also think about couple of other usages but I will keep them for future posts so you can stay tuned.

]]>https://coskan.wordpress.com/2012/04/11/when-dbms_xplan-display_sql_plan_baseline-fails-to-show-the-plan/feed/2coskanAdaptive Cursor Sharing with SQL Plan Baselines – Bind Sensitivenesshttps://coskan.wordpress.com/2012/03/23/adaptive-cursor-sharing-with-sql-plan-baselines-bind-sensitiveness/
https://coskan.wordpress.com/2012/03/23/adaptive-cursor-sharing-with-sql-plan-baselines-bind-sensitiveness/#commentsFri, 23 Mar 2012 11:16:38 +0000http://coskan.wordpress.com/?p=1174]]>Today I saw a post from my friend Dominic Brooks (orastory) whom I work together at the same place for different environments (Wish he is working on the environments I look after together we could have made a great performance team :(). His post is about how Adaptive Cursor Sharing and SQL Plan Baselines work together . Post looks long but easy to read and enlightening about the issue with one small addition/correction/observation needed. I think he covered well enough but finished his tests a bit soon. I will not repeat what he has done but I need to cover the additional observation/correction part. After the introduction, half way through wrote a section “How then do ACS and SQL Plan Baselines work together?” (wish he uses some marking for subtopics – thats why we have bold and headings Dom). At the end of that section he says “We’re using one of our baseline plans but we’ve lost our bind-awareness.” which is not complete truth according to my tests on 11.2.0.1.

I assume table is created, index is created, stats are gathered, two sql_plan_baseline is generated and cursor is flushed as below

Statement is not bind aware which is “normal” as Optimizer group already wrote about in their optimizer blog and cursor is still sharable.
Optimizer group said

"Oracle monitors the behavior of the queries, and determined that the different bind values caused the data volumes
manipulated by the query to be significantly different. Based on this difference, Oracle "adapts" its behavior so that
the same plan is not always shared for this query.
Hence on the next execution, a new plan is generated based on the current bind value."

Now on the second run of table scan again new cursor is generated and again bind aware also using the right full table scan baseline. Please also look at the first cursor which is not sharable anymore.

Oracle is doing the right thing by using the second child no 2 with right baseline.

Long story short I personally think they work perfectly fine together but I also wish if Oracle gives option to hold this runtime monitoring info in SYSAUX for env where people can accommodate more data in SYSAUX. This will save a lot of time for the initial loads.

To do a fair comparison I needed to do the tests on 11.2.0.1 but Don’t have 11.2.0.2 right now but I’m assuming it should be the same (I think I lost my appetite of installing new oracle version just for fun of it )

]]>https://coskan.wordpress.com/2012/03/23/adaptive-cursor-sharing-with-sql-plan-baselines-bind-sensitiveness/feed/7coskanReduce Transportable Tablespace Downtime using Cross Platform Incremental Backups (on Exadata!!)https://coskan.wordpress.com/2012/02/06/reduce-transportable-tablespace-downtime-using-cross-platform-incremental-backups-on-exadata/
https://coskan.wordpress.com/2012/02/06/reduce-transportable-tablespace-downtime-using-cross-platform-incremental-backups-on-exadata/#commentsMon, 06 Feb 2012 07:47:54 +0000http://coskan.wordpress.com/2012/02/06/reduce-transportable-tablespace-downtime-using-cross-platform-incremental-backups-on-exadata/]]>Just a quick note to point a MOS note which I found out yesterday . I already tweeted about it and it has been retweeted many time by my follower, but for wider audience I decided to write a blog post as well.

As you may already know, to migrate databases between different endian formats, using RMAN Transportable Tablespaces is an option but for this option to work, you need to have a downtime since tablespaces needs to be read only, and this downtime can be very long duration depending on the data size.

To sort this problem for migrations to Exadata platform (method only supported on Exadata) Oracle now brought incremental backup availability for TTS, till Oracle puts this as a public document (or somebody post about the steps) you need a MOS access to see the method.

In my humble opinion, Oracle better decide to support this on other platforms since method should not have anything to do with Exadata spesific other than being supported (don’t know any rman method which needs Exadata spesific features)

]]>https://coskan.wordpress.com/2012/02/06/reduce-transportable-tablespace-downtime-using-cross-platform-incremental-backups-on-exadata/feed/2coskanSession Snapper on Duty 2 – (how to support your answers, with proof)https://coskan.wordpress.com/2012/01/13/session-snapper-on-duty-2-how-to-support-your-answers-with-proof/
https://coskan.wordpress.com/2012/01/13/session-snapper-on-duty-2-how-to-support-your-answers-with-proof/#commentsFri, 13 Jan 2012 22:51:33 +0000http://coskan.wordpress.com/?p=1088]]>Warning : This long blog is for educational and quality purposes and especially targeting Junior DBAs and DBAs who does not know how to use the Session Snapper
(greatest Oracle Tuning Tool yet) by Tanel Poder . If you feel comfortable with snapper and you think your relationship with your
clients just rocks then there is no need to read the post but you may still give it a go

Story started with a long email thread forward by a developer who is asking question below for a system I am responsible but not feeling perfectly comfortable about how things really work.

Hi Josh (note to reader: Coskan is pronounced as Joshkahn so I use Josh at office)
Any ideas on how we can work with dbas to find out why our database is only committing at a range of
2-30 sec for 100 commits instead of the ideal 100 commits a sec for a third party process?

I answered

From my point of view
I will be requesting
1- Time interval when this process run
2- If you have SID information when this process run
3- If it gives same results on UAT as well then can we run the operation while tracing is on (which need Change Request-hassle on production)
Then we can contione to understand what can be the problem

Developer replied

It runs 7/24 and UAT testing is not possible since we can't point the feeds to UAT but username is USER_XXX
if it helps.

Long story short all I understood was in simplest words they want to be able to process more data in same amount of time, so I replied with a long mail below

BEGINNING OF THE EMAIL

I did some sampling on your sessions to understand what is happening and I think I can come up with rough conclusions about improvement suggestions.

When we check the stats, you have done 326 commits in 500 seconds which makes 0.65 commit per second but if we take 180 second of idle events (SQL*Net message to client + SQL*Net message from client) out, it is 326 commits in 320 (500-180) seconds active time which is roughly 1 commit per second.

When we look at the wait profile we can see 276 seconds of “db file sequential read” (physical read) over 320 second ofnon-iddle period and 36 second “log file sync” which we will come later

If we focus on the “db file sequential read” we can see it is %86 (276/320) of the active time and there are 3 main contributer sqls for this case.

There is nothing much we can do about index unique scans other than changing the table structure but I think there is room for improve on index range scan for sql_id ggjkmtv8cxxh3 for delete statement which is waitig on db file sequential read for %31 (500*%20 / 320 ) of the active time.

The reason the delete is doing index range scan is because we are trying to access 3 column composite index PK_TABLE_T by only using 1 column which is COL1. If business logic lets you also give other 2 columns as filter as well we will be doing much throughput with less LIO and PIO

This time 94 seconds idle with 406 (500-94) second processing time which gives us 0.84 (343/406) commit per second and 392 seconds of wait
(364 “db file sequential read”+28 “log file sync”) wait.

When we look at the sqls below this time we can see 160 (500*%32) of 364 seconds (%43 of active time) is being spent on again index range scan (sqlid 29zz9n1b9sppr) on delete statement again due to the lack of composite index column filtering.

I again believe we can improve the access if we are able to give all columns as filter and maintain index unique scan.

I think I managed to make it clear about the index range scan (for more information see the bonus material below), Now Lets see what the log file sync is all about and why we are seeing it .
<p

Log File sync event has got definition below in Oracle Documentation.

log file sync

When a user session commits, the session’s redo information needs to be flushed to the redo logfile. The user session will post the LGWR to write the log buffer to the redo log file. When the LGWR has finished writing, it will post the user session.

Wait Time: The wait time includes the writing of the log buffer and the post.

Parameter

Description

buffer#

The number of the physical buffer in the redo log buffer that needs to be synchronized

Potential causes

Wait Event

General Area

Possible Causes

Look for / Examine

log file sync

I/O, over- committing

Slow disks that store the online logs

Un-batched commits

Check the disks that house the online redo logs for resource contention. Check the number of transactions (commits + rollbacks) each second, from V$SYSSTAT.

First rule of tuning log file sync is doing less commit which means instead of commiting every dml or every single operation like

begin
for i in 300001..600000 loop
delete from t1 where id1=to_char(i);
commit;
end loop;
end;
/

Doing commits in batch like

begin
for i in 300001..600000 loop
delete from t1 where id1=to_char(i);
end loop;
commit;
end;
/

To prove how you are doing commits, I just get before sql statistics then get sampling of the session then again get after sql statistics and to see if number of user commits statistics during sampling interval are matching with number of increase in the sql executions

—-note to blog reader: this is, how I did, no rocket science (ones you arm yourself with right tools) just run the block as whole

As you can see, both sql increase by 12 (after-before execution times) and both sql show user commits statisticsdelta as 12 which means there is a commit per execution that leads log file sync event.

After proving that the app is not doing batch commits, I wonder if this is something avoidable or not, in terms of business logic. Doing batch commits will definitely help to increase your overall throughput plus whole system health.

So in my humble opinion we have two options without me seeing any code logic.

1- Providing unique index access for the delete operations by giving all columns of the index
2- Decreasing the number of commits and try to do batch commit.

Timings for the test are as follows for deleting 100K row sets as you can see

Elapsed

Improvement

nonbatch-commit-range

84

0%

nonbatch-commit-unique

47

44%

batch-commit-range

55

34%

batch-commit-unique

35

58%

END OF THE EMAIL

Note to blog reader : (In case they are still reading), I wish you found something interesting about how I used the right tool right time
and how I answered the development questions without talking bollocks but numbers and proofs. One thing I need to admit there were only 2 sessions running this operation which made my life easy.

If you wonder the duration of whole diagnose: It took 15 mins in total to find out what is going on and 1 hour to write the mail but that 1 hour save at least 3-4 unnecessary email ping pong just because nobody can prove their fact with numbers.

Quick Tip: If you are sending any output to anybody use Courier New character set and make everybodies eyes happy.

]]>https://coskan.wordpress.com/2012/01/13/session-snapper-on-duty-2-how-to-support-your-answers-with-proof/feed/4coskanOn demand AWR Report Generator, for Many Node Clusterhttps://coskan.wordpress.com/2011/11/18/awr-generator-on-demand-for-many-node-cluster/
https://coskan.wordpress.com/2011/11/18/awr-generator-on-demand-for-many-node-cluster/#commentsFri, 18 Nov 2011 11:34:20 +0000http://coskan.wordpress.com/?p=1071]]>update ( 10/10/2012 ): I have changed the script to make work better with hourly requests at it was causing issues. Thanks to Dominic Brooks for the trickupdate ( 11/10/2012 ): Added ADDM and Global report generation

Since I got turbo boost (over 1000 hits a day-wooow) from Tanel Poder the Great, better I write something to deserve the fame. This one is a real quick one which made my life easier.

On my second week at current company, I had to cover Exadata for 2 weeks which were completely nightmare due to some problems. I had to work with Oracle Support too much and most of the time they ask AWR reports from 8 nodes for x hour duration. First day I was not armed with anything automatic, so tried to do it manually but it was proper nightmare for 8 nodes. Then I said this is complete rubbish way of wasting my time and I need to automate the process otherwise my life will be miserable. Finally I came up with something after couple of attempts

The script below is the result of this automation attempts.

update ( 11/10/2012 ): –not equal sign may have been corrupted during paste for (next_snap_id”0 AND previous_snap_id”0)

!!!To save it click on the view source link on the sourcode upright corner

Nothing rocket science, just using dbms_workload_repository.awr_report_html with some junior level analytic.

All you have to do is give the date, interval (only tested for half hour or one hour intervals)and folder to be saved then rest will be done by the script.

It will create a gen_awr.sql script which you will run and go have a coffee then when you come back your awr reports are ready to be sent.

so the sequence is

@gen_snap ---assume you save it with this name
@gen_awr
--coffee break
--zip+upload

I would like to automate it further by doing the same for rac comparison or group of instance but I think I am too lazy or maybe busy with some other non-automatable rubbish (I also had some issues with rac comparison report formatting with the DBMS_WORKLOAD_REPOSITORY package so I gave up on that one)

Hope this tiny script save your time as much as it saves mine.

]]>https://coskan.wordpress.com/2011/11/18/awr-generator-on-demand-for-many-node-cluster/feed/8coskanWhere is the sql_id of active session?https://coskan.wordpress.com/2011/11/11/where-is-the-sql_id-of-active-session/
https://coskan.wordpress.com/2011/11/11/where-is-the-sql_id-of-active-session/#commentsFri, 11 Nov 2011 15:39:16 +0000http://coskan.wordpress.com/?p=1057]]>In this post I will give information on quick ways to find the sql_id’s of sessions for which you can’t see any entry when you look at v$session.

I usually see this behavior when sql is burning CPU in a nested loop and after a long time of running sql_id column suddenly not updated anymore. When I tweeted about it , ,Tanel Poder answered (If I a recall it right damn twitter does not have a proper search and history) something like ” this happens when oracle does too many recursive calls and loses track of the sql_id” (Tanel, if you are reading, please correct me if I am wrong about what you said)

On Metalink there are two notes I could find

Reason for Apparently Incorrect or Null Sql_id in v$sql [ID 406452.1]

This is caused by an architectural change in the 10g release: from 10.2 onwards, the v$sql columns (sql_id, sql_address and so on) will only show the top level user cursor, if any (ie it will not display the recursive cursors),.
When using a remote procedure call, such as when a rpc call from a client (eg forms client) requests that a plsql object is executed on the server, there is no top level sql and, consequently, there is no sql_id in v$session.
BUG:5146994 - CANNOT FIND RECURSIVE SQL IN V$SQLAREA FOR A RUNNING SNAPSHOT REFRESH JOB
BUG 5724661 Abstract: WRONG VALUES FOR SQL_ID IN V$SESSION

On Null SQL_ID For Active Sessions In V$Session [ID 1343503.1]

From 10.2 onwards sql_id will only show the top level user cursor ( if any).
Sometimes,Session does a plsql rpc call ie a client ( eg forms client) can request a plsql object to be executed at the server. In this case there is no top level sql and as such we don't see the sql_id in v$session.
Currently, oracle don't store the current sql thru any v$ views. The only way to find that is thru event 10046.
This is covered in Unpublished bug 5528670: "NULL VALUES FOR SQL_ID, SQL_HASH_VALUE, SQL_ADDRESS AND OTHERS IN V$SESSION", Which was closed as "Not a Bug".

When I have this issues I sometimes get it even on a sql which I run from sql_plus without a pl/sql call so I think limitation on these notes is a bit misleading or I am interpreting the information wrong. Anyway finding the reasons needs too much tracing research bla bla bla which you will never have time to go through when you hit the issue on a production system so best I cut the chase and start giving information how to find the sql_id

Notes says there is no v$ view which is a a bit misleading and notes also say 10046 is the only way to find it which is also another documentation error ignoring method 2 below.

On 10G there are 2 ways that I know will usualls work one of which needs tuning and diagnostics pack license other does not

On 11G there is one more way which again needs tuning and diagnostics pack license.

Elapsed times + users_executing columns saying that these sqls has been running for a very long time and they are %99.999 the ones we are looking for

There are tiny challenges with this approach but not that much challeging.

I used it very much in RAC env with a system where multiple users and child cursors are also running
so if you are careful enough to compare apples with apples (same instance, same plan hash value,right session serial, longer elapsed time cursor) you are very unlikely to be mislead

code used for last_sql is like below (modified a bit for rac aware)

select inst_id,sample_time,session_id,session_serial#,sql_id from (
select inst_id,sample_time,session_id,session_serial#,sql_id from
gv$active_session_history
where
sql_id is not null
and session_id=&1 and inst_id=&2
order by 1 desc
) where rownum

Method 2- Oradebug dump error stack (10g-11g no license needed)- Most reliable and certain way it is %100 accurate and good part is it does not need any input from me since Tanel Poder already covered how to go through error stack on his post how to read errorstack output

I tried 10046 and never seen any sql_id information written for the 10 mins period I gathered the trace so I’m not %100 sure if it needs whole call to be finished before it writes the sql_id information. I personally won’t use 10046 for this case.

Method-3- SQL Monitoring (11g onwards-license needed). This is my favourite and its damn quick and %99.9 it works if the sql was running long enough to get monitored and then lost the track. It is damn simple like this piece of code

select distinct inst_id,sid,status,sql_id,sql_plan_hash_value,sql_child_address,sql_exec_id
from gv$sql_plan_monitor
where sid in (&1) and status='EXECUTING';

The second sql_id 2kgnch9h5hbkz is from pl_sql parent call which was calling 3p1b93fq81x0f so they both get monitored by v$sql_monitor

Apart from all these again if you have diagnostics pack license you can always use OEM/Grid Control to check back the sessions history but comparing with querying v$sql_monitor it is as slow as driving an electric car when you have ferrari.

As you can see there are more ways then listed if your session lost track of the sql_ids. If you have more ways (especially with tracing -like dump sql if there is an option like that) I would really like to hear.

Update after first comment

Method-4 Using v$open_cursor

First comment on which Michael Dinh mentioned he is using toad open cursors option to hunt down the sqls but he was not sure for the sqlplus.
I had a chance to to check with one of our systems. Looks like v$open_cursor is giving us information which not as clear as the other methods but still can be very
useful for standart edition or users without diagnostics pack and also can be easier than error stack which usually needs a change request on PRD systems.