Disclaimer!

FatDBA or Oracle ‘Ant’ is an independent web-blog/site.The experiences, Test cases, views, and opinions expressed in this website are my own and does not reflect the views or opinions of my employer.

This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content.
Product and company names mentioned in this website may be the trademarks of their respective owners.

Posts Tagged ‘Tuning’

I see a lot of confusion, mix-ups and perplexity in between the DBAs on few of the metrics under ‘Instance Efficiency % section’ in AWR reports. Specially for one of the metric “Parse CPU to Parse Elapsd %”. In today’s post i will try to explain the metrics in detail which will help you to understand it in depth and clear the muddiness.

Few of the lines you might have read about this metric i.e.“we should always look for as low as possible numbers for this metric …”
“Try to achieve the impossible value of zero for this one “
– Even some of the Metalink notes are misleading too.

Scenario:

Below is the snippet from one of my test box.

Note: Just taking a look at the instance efficiency ratios can be very dangerous and i advise to first start with Load Profile, top 5 waits and there on …

Okay, so my definition of this statistic differ from what you judge after reading the name of the metric – This datum signals the delay/wait in parsing of SQL queries during the snap interval.
In our example the value is 1.37% this means that for every CPU second spend parsing we spent about 72.99 (100/1.37) Seconds of clock time. It can happen due to various reasons i.e Latch or any contention between the sessions etc.

The ideal value for this stat should also be 100% like rest of the ratios (Yes, that’s correct!!).

Let’s see how this value was calculated, what all it considers while deducing that final figure of 1.37%.
It takes the “parse time cpu/parse time elapsed * 100” to get the ‘Parse CPU to Parse Elapsed’ figure.

Statistic Total per Second per Trans
------------------------------------------------------------------------------------------------
parse time cpu 398 0.11 0.01
parse time elapsed 29,055 8.07 0.81

So, in short each time there is a drop in this metric from 100%, means the database was waiting for something which slowed down the parse times.
If you want to dig in deep then you should trace the session using 10046 tracing with level 8 to see where the other % of parse time is being spent and leaving this for readers to test.

Like this:

Few days while working on Shared Pool issue where we were getting ORA-4031 on one of the 11gR2 database i discovered something interesting and new regarding the SGA duration management. Here i wont discuss the problem that i faced and how we fixed but would try to show the architectural changes that has happened with 12c that can now fix these errors/issues.

Let me explain what are ‘Durations’ first. The shared pool is made up of a number of granules. The shared pool then split into sub-pools if you have a large enough SGA, and each sub-pool consists of a number of non-overlapping granules. In 11g each sub-pool also split into four sub-sub-pools known as durations.

What was there before 12c arrived —
Starting from Oracle 10g each sub-pool in SGA was divided in to four durations.
Let’s check the distribution by generating the Heap Dump for shared pool, here i used oradebug with level 2 (This provides you the full SGA Summary or you can try with level 2050 to get full summary with contents).

So above stats shows that we have three sub-pools [Sub pool (1,0), (1,1), (1,2), (1,3) ….. (3,0), (3,1), (3,2), (3,3)] of SGA Heaps with Four Durations each heap. And every duration has its own size, free space and reserved free space. This type of distribution possibly causes the ORA 4031 even when you have enough free space in other durations and this is what the actual cause in my earlier case, but here we won’t discuss how we fixed that.

Okay so now lets do the same with 12c database. Lets generate the Heap Dump for SGA in 12c database and see the distributions of durations here. Once again we will use the oradebug to dump heapdump with Level 2.

So, here i am with the maiden post of this year.
Today i will discuss about a script that might be handy and useful while you are investigating any performance problem with the database.
In fact this is one of my favorite script which i always considers to run at a point when we have identified the leading wait event and the time-frame of the spike or for any transient variation in system behavior. This script helps you to measure the source and contribution or the impact of any specific wait event.

The script is a join between ASH views and historical views (dba_hist_active_sess_history AND dba_hist_snapshot) and provides you stats based on Average Number of Active Sessions (AAS) were waiting on this event during the period and with what total contribution in terms of percentage.

Lets work on a use case.
Suppose you have a system with huge User IOs happening, as conformed by any report or tool (AWR, ASH, Via any Script, OEM etc.)

For example using AWR we found one of the User IO wait class event ‘db file scattered read’ was leading the ‘Top Foreground Wait Events’ charts on a specific time when we have the high User IO load on system.

Okay so result shows one of the SQL statement with SQL id 5av23g8w7f3ka was responsible for more than 45% of these waits.
So, now you have the evidence and can start troubleshooting this specific statement to reduce the USER IOs.

Like this:

Hi Fellas,
Starting from Oracle 12c there is a new feature added which collects the statistics when you perform the Bulk Loads when using any of the two methods:
– CREATE TABLE AS SELECT (CTAS)
– INSERT INTO … SELECT (Into an empty table using DPR or Direct Path Read).

Yup, stats were collected!
Same way stats will be auto collected during the other type of Bulk Load method (INSERT INTO .. SELECT).

There may be times when you want to disable this feature, situations like.
– Long/Huge Insert operations which is taking huge time on STATS GATHERING.
– With an extremely large dataset where you don’t want to collect stats.

In order to achieve that we have the option, with the use of a hint which instructs oracle to not gather table statistics.

Now, something interesting i would like to discuss …..Is there any other condition when the stats won’t be collected automatically except barring it using NO_GATHER_OPTIMIZER_STATISTICS Hint ?

Lets try to do some conventional bulk loading using INSERT INTO .. SELECT method.
To do some tests – I am intentionally commenting few of the columns both the tables. I’ve commented DATE_VAL column of newly created table TABLE1 and DATE_VALUE of the table selected SAMPLE.

Yes, the the stats were collected this time when we’ve included all the columns of the tables.
I haven’t seen any documentation on this restriction on this new feature of Oracle 12c. Hope Oracle adds this soon this to their documentation 🙂 …..

Would discuss a scenario that i faced some time back with one of the customer where one of the query (Part of Monthly application maintenance) started taking huge amount of time to complete.
During the analysis I’ve found it doing some extremely expensive Full Table Scans on a fat table when it shouldn’t. Let me elaborate what was the issue.

After reading the execution plan and the extended traces of Optimizer (10053) to understand the decisions made by the CBO.

Traces gave me a hint that the optimizer was able to accurately estimate the cardinality for all of the predicates as long as string is 32 bytes or less. But at one of the operation step in execution plan the estimations were wrong and crooked. I saw the CBO decided to ‘Return each of the row’ irrespective of strings and it was actually happening because the histogram contains only the first 32 characters of the string data.

So the histogram was considering only first 32 characters of the column value and the optimizer this particular value i.e “siebelm01-vip@2002:101:11f::101:1” is happening 5776 times in the table. So it ignored the index and used full table scan. Oracle will only consider the first 32 characters in the column in generating histograms and in such cases what we faced as a aftermath, the optimizer will make wrong decisions.

So as a immediate solution to the problem I’ve deleted the histogram on this specific column and this plan fixed this query plan change issue.

In my last post “How to fix SQL Plan issues using OUTLINE DATA.” where we learned how to fix wrong cost estimates by the CBO using OUTLINE DATA of optimal plan available. But there are few issues with the approach and only worked with ad-hoc statements, as failed to handle situations when

– How to deal with situations when the SQL is coming from bundled applications and cannot be modified ?
– How to look for less number of hints leading to better execution plan?
– How, if it’s not possible to use the SQL BASELINES.

As discussed in the same last post where we added the full list of OUTLINE DATA, it seems little complicated, dirty and disordered to add the full outline set to the SQL statement.
ex: select /*+ …full outline here…*/

So, how to add required set of HINTS to force the better plan for the SQL.

Lets do a test to explain this subject in detail.

SQL TEXT: select count(*) from dixit.test;Problem Statement: Query taking huge time to complete the COUNT of rows for TEST table and doing a FULL TABLE SCAN despite of Indexes.

Now after querying the DBA_HIST_SQLSTAT view i saw a total of 2 execution plans available and query is somehow picking the wrong plan or the costliest plan.
So, lets grab the OUTLINE data of the better plan first.

Like this:

Hi Mates,
Today would like to discuss about the ‘SQL Plan Stability’, specially situations when the performance of a SQL statement degrades and performs poor in one environment and working absolutely good in other spheres.
I experienced such issues a lot during past few years while working as a performance consultant for many of the projects, but recently i encountered similar issue and would like to discuss about the same.
In short the situation is given below for that problematic SQL

Same request they say takes in SIT 268 sec:
• I run in SIT again –> 500 secs
• I run also in DEV –> 20 secs
• I run in preSIT –> 5 secs

I see around 5 different PHV’s or plans coming for this statement and last two 18733286 and 2202641467 are the worst plans here, first one with PHV 1663132818 is the BEST among all others.
1663132818 is with very less elapsed time, CPU Usage, IO time, LIOs, Disk reads, Rows Processed and Cost.
** From DBA_HIST_SQLSTAT view.

Let’s collect the OUTLINE and QUERY Block details of the statement, and this can be collected usingselect * from table(dbms_xplan.display_awr(‘SQL_ID’, PHV, format => ‘ADVANCED’));

In order to fix or correct the cost estimates for CBO to adopt a good execution plan we can follow below steps.
• Run the script coe_xfr_sql_profile.sql as SYSDBA user providing the sql_id and the good Plan Hash Value (PHV)
SQL> START coe_xfr_sql_profile.sql SQL_ID PHV

• Above step generates a script named in the format: “coe_xfr_sql_profile_SQL_ID_PLAN_HASH_VALUE.sql” i.e. including the sql_id and plan hash value.
Example: “coe_xfr_sql_profile_ SQLID_PHV.sql”

• If you like to use the profile for all literals, you can modify the force_match parameter in the generated script from false to true: force_match => TRUE
This will ensure the profile will be enabled even when different literals are used in the SQL query.

• Run the generated script as SYSDBA user in order to correct the optimizer cost estimates used in the execution plan for the sql_id and encourage it to use the desired plan obtained from the stated plan hash value.

• Next we can check if profile is accepted successfullyselect SQL_ID, SQL_PROFILE,PLAN_HASH_VALUE from V$SQL where SQL_ID=’ SQL ID’;

• In case if profile is required to be disabled/dropped then do following steps.
The profile can be disabled or dropped as follows using the the SQL_PROFILE name returned from the query above:

Below is the execution plan for the SQL (Lets forgot about the behemoth elapsed time and Cost and Rows Processed in the plan for a minute 🙂 ) ….

So the above plan doesn’t show any estimations or Cardinality details what it considered during the creation of the plan, But starting from 10g we have GATHER_PLAN_STATISTICS hint. The GATHER_PLAN_STATISTICS hint tells Oracle to collect execution statistics for a SQL statement.

These execution statistics are then shown next to the original Optimizer estimates in the execution plan if you use the function DBMS_XPLAN.DISPLAY_CURSOR to display the plan. You also have to set the FORMAT parameter to ‘ALLSTATS LAST’ (DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>’ALLSTATS LAST’)).

Like this:

Hi Fellas,
Back and this time with some performance tuning scopes for Cassandra DB during the ‘Compaction’ process.
Before i proceed, would like to explain a bit about the compaction in Cassandra and what exactly is this and why a necessary evil …

Compaction in Cassandra refers to the operation of merging multiple SSTables into a single new one. Typically, compaction is done in a database for two primary reasons:

– To reduce the storage usage.
– To improve read performance by merging keys and obtaining a consolidated index.

For example, in Apache Cassandra, data files are merged periodically to form compacted SSTables.

There is a good chance of contention happening in database due to Compaction activity as the Compaction increases I/O contention on SSTable data read. Writing data in Cassandra database is generally fast and the write impacts may not be seen but reading data from SSTables will be slow in case when I/O contention increases due to compaction activities and degrades the performance of the database.

First would like to discuss how to identify the compaction related contentions on the database.
– We can use the “nodetool tablestats” or the old “nodetool cfstats” command to
monitor or watch-keep SSTables.
Below is a sample result from one of the Cassandra database server, here we need to check
– Check if the count is keep on growing, because that points out that there may be contention between reading SST
and the compaction process.
– Read generally slows down due to an obvious reason of data distributed or fragmented across many SSTs and
Compaction running continuous in the background.

Solution to the problem
1. First one is quite simple – Avoid merging of update/delete requests.
2. Reduce the frequency of in-memory objects (In Memtables) flush.

This can be done by increasing the size of the memtables to avoid or stop database to perform frequent flushes.
– Less number of flushes leads to fewer SSTs compaction.
– Less Compaction reduces the I/Contentions and this in turn improve reads.
– There are couple of parameters that you can adjust in your cassandra.yaml file to control the flushing.
i.e. memtable_flush_after_mins, memtable_throughput_in_mb , memtable_operations_in_millions.

3. One more solution but that only applies on systems where this stress in IO is not much frequent, we can reduce
the “thread priority” which reduces the IOs.
As lowering the priority slows down the compaction writes but only applies if it doesn’t happen frequently.

Recently during one Performance Problem i have attached one better SQL Profile to the statement which includes to add DOP (Degree Of Parallelism) to reduce the impacts of a definite FTS (Full Table Scan) but found one NOTE coming during the execution plan generation which reads“automatic DOP: skipped because of IO calibrate statistics are missing”