Confessions of an Oracle Database Junkie - Arup Nanda
The opinions expressed here are mine and mine alone. They may not necessarily reflect that of my employers and customers - both past or present. The comments left by the reviewers are theirs alone and may not reflect my opinion whether implied or not. None of the advice is warranted to be free of errors and ommision. Please use at your own risk and after thorough testing in your environment.

Pages

Saturday, August 09, 2008

Resource Manager I/O Calibration in 11g

You are considering a new platform for the Oracle database; or perhaps worried about the I/O capabilities of the existing platform. How do you know if the I/O drivers, LUN creation, etc. are capableof handling the I/O load? This is a very important question; I/O is the single most important thing in an Oracle database that can cripple an application, however large or small.

Here comes the Calibrate I/O tool in Oracle Database 11g. There are some really cool features of 11g that never made into the blitzkrieg of Oracle Marketing. This is one of them. This is a procedure in the DBMS_RESOURCE_MANAGER package. It drives some I/O to the database files to test the throughput and much more metrics.

So, what is the big deal? This type of tests can be done by simple unix system as well; or even by little apps from the storage vendors.

The important difference is that the tool uses Oracle database software libraries to drive the I/O; not some generic I/O generator. This capability is important - you are testing the I/O for an Oracle database, right?

So, let's get on with the how-it-works part of the post. You need to have asynch I/O capability otherwise this will fail with the following message:

DECLARE*

ERROR at line 1:ORA-56708: Could not find any datafiles with asynchronous i/o capability

ORA-06512: at "SYS.DBMS_RMIN", line 453ORA-06512: at "SYS.DBMS_RESOURCE_MANAGER", line 1153ORA-06512: at line 7

To enable asynch I/O, set two values in the init.ora file (or spfile) and recycle the instances:

disk_asynch_io = true

filesystemio_options = asynch

The first one is default; so you may not have to change it. The default of the second one is NONE; so you will probably have to change it. After setting these values and recycling the instance, confirm at the file level:

Caution: even if the above query returns ASYNC_ON, you may not be able to run the calibrate process. The culprit is the number of asynch I/O slots available in the system. If there are no free slots, the files will not use asynch I/O and you will get ORA-56708. To find out the maximum avalable asynch I/O slots:

$ cat /proc/sys/fs/aio-max-nr65536

To find out how many are being used:

$ cat /proc/sys/fs/aio-nr65536

Well, that sort of tells it all. Of the 65536, all are being used; so we are getting ORA-56708. The reason was I was runninganother database on the server, which was consuming all those slots. After shutting down the other database, I checked the utilization again:

$ cat /proc/sys/fs/aio-nr

35456

The value dropped substantially and allowed my calibration to proceed.

Now start the calibration process. The precedure CALIBRATE_IO of the package DBMS_RESOURCE_MANAGER accepts two parameters:

NUM_PHYSICAL_DISKS - the number of physical disks you have. Remember: this is the number for disks, not LUNs.

Since this has OUT parameters, we will have to write a little PL/SQL program to get the values.

set serveroutput on

declare

l_latency integer;

l_iops integer;

l_mbps integer;

begin

dbms_resource_manager.calibrate_io (

18, /* # of disks */

10, /* max latency */

l_iops, /* I/O Ops/sec */

l_mbps, /* MBytes/sec */

l_latency /* actual latency */

);

dbms_output.put_line ('I/O Ops/sec = ' l_iops);

dbms_output.put_line ('Actual Latency = ' l_latency);

dbms_output.put_line('MB/sec = ' l_mbps);

end;

/

When this is going on, you can check the status:

SQL> select * from v$io_calibration_status

2 /

STATUS-------------CALIBRATION_TIME----------------IN PROGRESS

This shows that the operation is still going on. After some time you will see the status change to READY. Here is the output from my system (a RAC database with 2 nodes, each with 4 CPUs, with a Dell NAS, with 28 disks).

I/O Ops/sec = 122Actual Latency = 15MB/sec = 13

PL/SQL procedure successfully completed.

So, it shows us that the actual latency is somewhat like 15 milliseconds. The system can sustain a maximum of 122 I/O Ops per second with a throuput of 13 MB/sec.

You can check the results of the I/O Calibration run later from the data dictionary view DBA_RSRC_IO_CALIBRATE. Let's check this view. Since it has only one row and there are many columns, I used the very handy Print Table utility by Tom Kyte.

In addition to the columns, this view has a lot more columns. Here is the output for a single datafile:

SQL> exec print.tbl('select * from v$iostat_file where file_no = 1')

FILE_NO : 1

FILETYPE_ID : 2

FILETYPE_NAME : Data File

SMALL_READ_MEGABYTES : 267

SMALL_WRITE_MEGABYTES : 11

LARGE_READ_MEGABYTES : 230

LARGE_WRITE_MEGABYTES : 0

SMALL_READ_REQS : 33235

SMALL_WRITE_REQS : 1083

SMALL_SYNC_READ_REQS : 33013

LARGE_READ_REQS : 256

LARGE_WRITE_REQS : 2

SMALL_READ_SERVICETIME : 2650538

SMALL_WRITE_SERVICETIME : 10676

SMALL_SYNC_READ_LATENCY : 2647154

LARGE_READ_SERVICETIME : 2745759

LARGE_WRITE_SERVICETIME : 10

ASYNCH_IO : ASYNC_ON

RETRIES_ON_ERROR : 0

-----------------

Using these merics you can decide if the I/O is sufficient or still needs more tweaking.

Recently I used it on a system that required Oracle 10g. But I installed 11g on that server and ran this calibration test, which unearthed some inherent issues with the I/O system. Once rectified, and reconfirmed with the calibration tool, it was ready for the 10g installation.