Blogroll

Posts Tagged ‘performance logfile dump’

After reading my blog entry about a performance issue due to excessive HCC decompression ( Accessing HCC compressed objects using index access path, a reader asked me about the CPU profiling method I mentioned in that blog entry. I started responding to that comment, and realized that the response was too big for a comment. So, in this blog entry, I will cover basics of the CPU profiling in Linux. Other platform provides similar utilities, for example, Solaris provides an utility dtrace.

Tool Box

There are many ways to do CPU profiling, and a few techniques are:
(a) Using Linux perf utility.
(b) Using pstack output and grouping the output by top functions in the call stack. Tanel has a great blog entry about the variant of this method: Tanel’s blog entry

I will discuss the perf utility briefly in this entry.

perf utility

There are two parts to perf utility usage: perf record to sample the function calls executed by a process or processes. perf record command writes the output to a file perf.data, by default, in the current directory. perf report reads the perf.data file and prints a human readable output of top function calls. Think of the functionality of the perf report command as “group by function_call, order by count desc” operation.

You can use perf utility to watch the server, a process, or a subset of processes. In this example, I recreated the issue in my database session and so we will use the command: perf record -p to record the function call samples on the dedicated server process associated with my session.

-- Login to the database using your favorite tool.
-- Find the pid of the Oracle dedicated server process. (You could skip this step if you know the pid of the process that you want to probe. Note that, you can use this method to probe an application performance issue also, not just database performance issue.)
select spid from v$process where addr=(select paddr from v$session where sid=(select sid from v$mystat where rownum=1));
-- In another window, login to the server and become root (sudo, direct login etc)
cd /tmp; perf report -p
-- Recreate the issue in the first window.
@lu.sql
exit;
-- Control+c the perf record command also.

perf report

So, perf record command mentioned above would have written the file perf.data. We can use perf report command to identify the function calls frequently seen executed by the process. ( Partial output shown to improve brevity)

From the output of perf report command, you can infer that the process was seen executing the BZ2_decompress function calls in ~35% of samples. That BZ2_decompress function call is executed to decompress the segments, compressed with HCC archive compression. Next function call in the list(25% of samples) unRLE_obuf_to_output_FAST is also associated with bzip2 decompression function (Refer unRLE_obuf_to_output_FAST ).

We were able to identify the function calls consuming the CPU cycles quickly using perf tool. In this case, it is HCC archive high decompression. So, we can correlate this data with the SQL Monitor output, identify the line number spending time, and understand the reasoning behind high CPU usage (and high elapsed time).

perf tool has many options. Read the wiki for further details: perf wiki

My earlier blog about shared pool duration got an offline response from one of my reader:
” So, you say that durations aka mini-heaps have been introduced from 10g onwards. I have been using Steve Adams’ script shared_pool_free_lists.sql. Is that not accurate anymore?”

</p

Shared pool free lists

I have a great respect for Steve Adams . In many ways, he has been a great virtual mentor and his insights are so remarkable.

Coming back to the question, I have used Steve’s script before and it is applicable prior to Oracle version 9i. In 9i, sub-heaps were introduced. Further, shared pool durations were introduced in Oracle version 10g. So, his script may not be applicable from version 9i onwards. We will probe this further in this blog.

This is the problem with writing anything about internals stuff, they tend to change from version to version and In many cases, our work can become obsolete in future releases(including this blog!).

In an Oracle-l thread, a question was raised: How to find objects creating nologging changes?

What is a ‘nologging’ change?

Redo logging mechanism plays critical role in media recovery. Media recovery relies on archivelog files generated to roll forward the database. Standby database or dataguard recovery also relies on archivelog files. It is possible to do DML changes with minimal logging, aka nologging changes or direct mode inserts. For example, insert /*+ append */ can be used to populate the rows in to a table without generating much redo. This can invalidate standby database and might trigger rebuilding some or all parts of the standby database.

Nologging changes generates minimal redo, since the blocks are pre-formatted and written to disk directly. A redo record is generated invalidating a range of affected blocks. This invalidation redo record size is far smaller, for e.g. hundreds of blocks can be invalidated using just a single redo record. Of course, recovery is severely affected as the changes performed with nologging operations can NOT be reapplied / recovered.

Internals of nologging changes

Since nologging is all about redo records, dumping redo log file or archivelog file is a concrete way to see what happens under the hood. Let’s consider an example to explain internals of nologging changes.

Prior online redo log file contains nologging changes from table t4 above. We could dump that log file in its entirety, but we will dump just a layer specific to direct mode changes alone to limit trace file size. Layer 19 is for nologging changes. Following script will dump last online redo log file for layer 19.

OP:19.2 indicates that layer is 19 for this change vector and 2 is the opcode. This vector also specifies that BLKS=1 and DBA as 1038832. Essentially this change vector says “Invalidate range of blocks starting at block DBA 0x01038832 for 1 block”. This could be for a range of blocks too.

For nologging changes a block range invalidation redo generated, block formatted, populated with rows and written directly to disk. But, standby database and media recovery rely on redo records to replay these changes. When recovery code encounters above direct loader invalidate redo record, it throws a warning to alert log and simply mark that range of blocks as invalid. Any attempt to access those blocks will throw ORA-1578 block corruption error.

Just to reiterate, standby and media recovery are affected by nologging changes. But, Current database is still fine and there is no corruption in primary database. If we backup the primary database again or rebuild that tablespace in standby database, we avoid corruption. Point is that, if there is a business need to have nologging changes, consider taking a backup immediately after nologging changes. Of course, if there is a standby database involved, then that tablespace need to be re-copied from production.

Back to our problem

We need to find the object causing nologging changes. Field OBJ:72852 shows object_id in decimal. We can query dba_objects to find object_name.