Introduction
Although Oracle is smart enough it is always helpful to double check. Even in cases where you expect that Oracle aware about nulls and it will apply necessary filters itself, result can be unexpected.
Below are several cases where you would get a benefit from manually added filter which could look unnecessary and redundant.Read more…

This means that output can be in not chronological order. It is not something Digger specific, it is how DTrace works.

When DTrace script is being executed there are two parts to DTrace: the in kernel part and the DTrace script process in userland. When a dtrace probe fires, the data traced is placed in a per CPU buffer. Then, periodically, the DTrace script reads the buffers (in round-robin style) and continues processing the data for final output. While the data from any single CPU was entered into its buffer in order the probes are firing asynchronously with respect to all the CPUs.

If probe1 fired on CPU3 and after that probe2 fired on CPU2, and after that probe3 fired on CPU1 it is possible to get output as, for example, as following:
probe3
probe2
probe1

The most common case where we can see this are the cases when a process(thread) migrated from one CPU to another and DTrace consumer reads and prints current CPU-buffer before previous. It does not happen always but sometimes you can see it.

So, do not panic if some part of the output is not chronological in multi-CPU environment. It is expected.

If the output order is important, then it is strongly recommended to print such columns as timestamp or relative timestamp in order to be sure that the output is chronological, otherwise to fix it manually or using tools like sort (1). Column CPU# also is helpful because allows to see places where a process(thread) migrated from one CPU to another.

2. How much is performance impact of the Digger?

It depends on the usage. When you trace everything – all application functions or all library calls – performance impact can be significant and amount of printed info can be huge.

The Digger allows to restrict
a) traced area – where tracing is enabled
b) traced contents – which functions are traced

For example, in order to trace kcb* functions and all syscalls *read* inside function qertbFetch
command line should be:

It helps to trace only relevant information, only required functions. It decreases performance impact and decreases probability of drops.
So, if you need to decrease impact of the tracing just try to restrict traced area and contents.

3. Important bug is fixed.
There was an issue related with replacement tab symbols on spaces during generating html wordpress page with source code (thanks Zhenx Li for let me know). The issue led to possible error “syntax error at line n: `end of file’ unexpected” when the source of the tool was copied/pasted. Now links on html pages with source are replaced on direct download links.

There are many cases where Oracle revisit some buffer in the buffer cache many times inside one database call. It such cases it can pin the buffer and hold the buffer pinned and just read pinned buffer in consequences visits. It allows to avoid redundant logical reads.

There are statistics “buffer is pinned count” and “buffer is not pinned count”.
The concept is simple Read more…

I would like to introduce new tool – Digger.
This tool allows to see tree of process’ calls such as application calls, library, system calls (and even kernel functions and OS scheduler actions) with additional information as function arguments, result, cpu & elapsed time.
This is not something Oracle specific, the tool can be used for tracing of any unix process (DTrace is required).

Note: this post is not about some exact statement, but about importance of column projection which should not be ignored, especially in cases as operations requiring workareas, data access optimization, Exadata offloading and others.

Let’s consider merge of two simple tables.

merge into t1
using t2 on (t1.id = t2.id)
when matched
then update set n = 1;

So, our statement is executing more than 8 minutes. All this time has been consumed on the step 3 – HASH JOIN. Notice amount of used temp space 4346 Mb (this is old oddity that value in Used-Temp should be multiplied by 1000) as a result high amount of I/O operations on temp space (columns Reads, Writes – 556K of physical reads and 556K physical writes via direct path read temp and direct path write temp) due to size of workarea (hash_area_size) was not enough (Used-Mem = 30Mb).Note: obviously in your case it can be absolutely another time and amount of I/O, in particular it depends on hash area size, in my case it was about 30Mb

How can we improve the performance of the statement and especially hash join?
Should we increase hash area size?

May be, but before let’s take look at the same execution plan with column projection:

Pay attention on column projection of the steps 3, 4, 5. It contains ALL columns from the tables T1 and T2 although the query uses only some of them. The part of the execution plan below of the step 2 is equivalent to

It looks like a bug or “not implemented yet”. In our case it means that all columns from the table T1 (especially the fat column PAD) will be retrieved and stored in the workarea.

This can dramatically affect the performance:

For operations using workareas it means increased amount of dataset. If required amount of available memory is not enough then it leads to additional I/O in temp space.
In our particular case all columns from the table T1 are retrieved and put into the workarea. If the condition in the statement would be something like

t1.id between t2.id and t2.id + 100

then Hash Join would be impossible, because it can be based on “equality” condition. In that case we can expect Merge Join and then both datasests must be sorted.

It affects data access optimization. If there would be indexes containing all columns using in a query as t1(id, n) or t2(id,n), then Index Fast Full Scan will be impossible here because the indexes do not contain required redundant columns (as PAD).
If there would be a condition doing Index Range Scan appropriate access path then TABLE ACCESS BY INDEX ROWID would exist to get rest of unnecessary columns.

If a row is chained then it leads to additional logical (and maybe physical) I/O to read redundant columns

If the query would be run on Exadata and one or both Full Table Scan would be offloading, then there would no Column Projection optimization, returned amount of data from Storage Cells to PGA would be redundant and contains unnecessary columns from a table.

Now column projection does not contain unnecessary columns, amount of required memory and as a result amount of temp space (Used-Tmp ~ 16Mb) and I/O (2032 physical reads and 2032 physical writes via direct path read temp and direct path write temp) was significantly reduced, and the statement was executed about 20 seconds instead of 8 minutes.

Notice, that in this case Oracle takes the table T2 as build input. It is because RowID from the table T1 is required to perform MERGE.
The part of the execution plan below of the step 2 now looks like