Archive

Every so often a DSS query that usually takes 10 minutes ends up taking over an hour. (or one that takes an hour never seems to finish)

Why would this happen?

When investigating the DSS query, perhaps with wait event tracing, one finds that the query which is doing full table scans and should be doing large multi-block reads and waiting for “db file scattered read” is instead waiting for single block reads, ie “db file sequential read”. What the heck is going on?

Sequential reads during a full table scan scattered read query is a classic sign of reading rollback and reading rollback can make that minute(s) full table scan take hours.

What can happen especially after over night jobs, is that if an overnight job fails to finished before the DSS query is run and if that overnight job does massive updates without committing till then end, then the DSS query will have to rollback any changes made by the updates to the tables the DSS query is accessing.

How do we quickly identify if this our issue?

ASH is good at identify it. On the other hand it’s often impractical to whip up from scratch an ASH query and that’s where ashmasters on Github comes in. This ASH query and others are on Github under ashmasters.

What is right deep verses left deep? Good question. In join trees (not VST) the object on the left is acted upon first then the object on the right. Below are left deep and right deep examples of the same query, showing

query text

join tree

join tree modified to more clearly show actions

VST showing the same actions

All of this boils down to the point that a right deep HJ can return rows earlier than a left deep HJ. A left deep HJ has to wait for each join to finished completely so that the result set can be hashed before the next step can probe it. On the other hand, in a right deep HJ, it’s not the result sets that are being hashed, but a table at each level, thus each table can be hashed without waiting for intermediary results and once these hashes are complete a probed row can flow all the way through the join tree, from bottom to top, similar to how a nested loop can start giving results early. The Left Deep HJs only have two open work areas at a time where as the Right Deep can have multiple work areas open. One of the key thing to keep in mind is how much data we have to hash. If the intermediate result sets are large (and/or growing each level) then that represents more work each step of the way.

Normally I don’t think in left deep verses right deep because it doesn’t change the the path through the join tree. On the other hand it does change whether we are hashing a result set or we are hashing a table. If hashing a table then the results can be returned, in theory, more quickly.

For NL joins there are only left deep join. The object on the left always probes the object on the right. The object on the left is always accessed first ( there is no need to modify the object on the left first and probe from the right with NL).

Besides left deep and right deep, there are also bushy joins which Oracle doesn’t do unless forces to through sub-queries or views as in the following:

Finally, the above VST diagrams were modified to be more easily compared to the join tree diagrams. Below are the VST diagrams as displayed by default in Embarcadero’s DB Optimizer. DB Optimizer shows the tables all in a line because the tables are one to one relationships. Keeping the VST diagram layout constant, it is easier to see the differences in execution paths:

Note on hinting and order of tables in explain plan:

For NESTED LOOPS and HASH JOINS the hint takes one argument, which is the second table that shows up in the explain plan.

For NESTED LOOPS the second table is the table probed into i.e. the second in order of access.

For HASH Joins the second table is the table doing the probing into the hash result set. Hashing is the first operation which creates the hash result set that the second table probes into.

For NESTED LOOPS if order is “LEADING(X,Y) then nested loops hint can only be onY, ie USE_NL(Y)

For HASH JOINS if the order is “LEADING(X,Y) then the hash join hint can only be on Y, ie USE_HASH(Y)

Introduction

Unfortunately, what Oracle calls “Network Waits” most often have little to do with Network but and almost exclusively to do with the time it takes to pack messages for the network before they are sent.

Client = you, the tool, sqlplus, application

Not the client, the other side = the shadow process is communicating to the client

Of the three waits, only “more data” is possibly related to network issues and that’s not even clear, the other two are simply the time it takes to pack a message before sending it.

The first two “SQL*Net message from client’ are in the middle of cursor processing and are considered non-idle waits.

The third “SQL*Net message from client” is between cursors and considered an idle event, ie we are waiting for the next command from the client.

SQL*Net message to client

Time it takes to pack a message to be sent to the client
Doesn’t include network timing
see Tanel Poder’s analysis of SQL*Net message to client

SQL*Net more data to client

Same as SQL*Net message to client except this is for data that spans SDU packets.

Wait represents the time it takes to pack data.

Doesn’t include network timing

SQL*Net more data from client

The only SQL*Net wait that can indicate a possible NETWORK problem

Client is sending data to shadow that spans packets (think large data inserts, possibly large code blocks, large SQL statements)

Shadow waits for next packet.

Can indicate network latency.

Can indicate a problem with the client tool

Here is an example with ASHMON where the application server died mid-stream on inserts. The shadow processes were left waiting for completion of the message. You can see the regular load on the database on the left, then just past the middle the load crashes, and all that’s left is waits on “SQL*Net more data from client”

Possibly set SDU=32768 as well as setting RECV_BUF_SIZE and SEND_BUF_SIZE to 65536.

Unfortunately Oracle doesn’t give much information about debugging unless you are trace. If you don’t trace, the SQL won’t be captured because from Oracle’s point of view the problem statement isn’t an acceptable SQL statement so there is no SQL ID to track down.

DBLINK SQL*Net Waits

These waits are the same as

SQL*Net message to dblink

SQL*Net more data from dblink
SQL*Net more data to dblink
SQL*Net break/reset to dblink

Analysis and Tuning

There isn’t much to do on the Oracle side for tuning. You can try optimizing the SDU and SEND_BUF_SIZE and RECV_BUF_SIZE.

For actually getting information on network speeds you will have to use something like

ping

tnsping

network sniffer

SDU

The default SDU can be set in the sqlnet. ora

If it’s not set, the default is 2048

The max is 32768

The default,or the value in sqlnet.ora, can be overridden in the tnsnames. ora and the listener.ora. The client and server negotiate the size aggreeing on the smaller of the two settings.

(TDU – Transmission Data Unit – see note 44694.1 The TDU parameter has been deprecated in the Oracle Net v8.0 and beyond and is ignored. It is only mentioned here for backward compatibility.)

Buffer Busy Waits usually happen on Oracle 10 and 11 mainly because of insert contention into tables or Indexes. There are a few other rare cases of contention on old style RBS segments, file headers blocks and freelists.

Before Oracle 10 and 11 there was one other major reason which was readers waiting for readers, ie one user does a phyiscal IO of a block into memory and a second user want to read that block. The second user waits until the IO is finished by the first user. Starting in 10g this wait has been given the name “read by other session“. Before Oracle 10g this was also a “buffer busy wait”.

The easiest way to analyse the bottleneck and find a solution is to use ASH (active session History) available in Oracle 10g with the diagnostics pack license or using Simulated ASH for free or using a product like DB Optimizer.

Data block class, which can be found in ASH, is the most important piece of information in analysing buffer busy waits. If we know the block class we can determine what kind of bottleneck:

If CLASS=

data block

IF OTYPE =

INDEX , then the insert index leaf block is probably hot, solutions are

Hash partition the index

Use reverse key index

TABLE, then insert block is hot,solutions

Use free lists

Put Object in ASSM tablespace

Segment header – If “segment header” occurs at the same time as CLASS= “data block” on the same object and the object is of OTYPE= “TABLE” then this is just a confirmation that the TABLE needs to use free lists or ASSM.

File Header Block – Most likely extent allocation problems, look at extent size on tablespace and increase the extent size to there are few extent allocations and less contention on the File Header Block.

free lists – Add free list groups to the object

undo header – Not enough UNDO segments, if using old RBS then switch to AUM

undo block – Hot spot in UNDO, application issue

How do we find the block class? With a quick query on the ASH data like:

If we find that CLASS=datablock, then we will want more information to diagnose, such as the object type “OTYPE” , object name and what kind of tablespace the object is stored in. The following query provides that information:

On version 8 and 9, the p3 value has a different meaning. Instead of meaning the block type (which is the best thing to know) it means the kind of buffer busy wait. There are only two values that matter to us, values in

100 range = read waits (basically just an IO wait)

Reader blocking Reader, ie one reader is reading a block in and another person wants to read this block and waits on a buffer busy wait p3=130.

200 range = write contetion (same as in 10g)

Writers blocking other writers for example while doing inserts either because of no free lists on the table or because everyone is inserting into the same index block.

If you have set up ASH style collection with S-ASH or have a product like DB Optimizer you can run a query like:

select

count(*) cnt,

o.object_name obj,

o.object_type otype,

ash.CURRENT_OBJ#,

ash.SQL_ID,

decode(substr(ash.p3,1,1),1,'read',2,'write',p3) p3

from v$active_session_history ash,

all_objects o

where event='buffer busy waits'

and o.object_id (+)= ash.CURRENT_OBJ#

group by o.object_name, o.object_type, ash.sql_id, ash.p3,ash.CURRENT_OBJ#

order by cnt

/

And see what kind of buffer busy waits there are and what the objects are:

CNT OBJ OTYPE CURRENT_OBJ# SQL_ID P3

--- ------- ------- ------------ ---------- ------

1 -1 1375352856 read

2 -1 996767823 read

2 -1 2855119862 write

17 -1 1375352856 write

89 TOTO1 TABLE 296030 1212617343 write

109 296022 1212617343 write

Often the Current_obj# is -1 so we can’t figure out what the object is . There is an alternative method

col block_type for a18

col objn for a25

col otype for a15

col event for a15

col blockn for 999999

col segment_name for a20

col partition_name for a15

col owner for a15

set timing on

/*

drop table myextents;

l

create table myextents as select * from dba_extents;

l

*/

select

count(*),

ext.owner,

ext.segment_name,

ext.partition_name,

ext.segment_type,

decode(substr(ash.p3,1,1),1,'read',2,'write',p3) p3

--ash.p1,

--ash.p2

from v$active_session_history ash,

myextents ext

where

event = 'buffer busy waits'

and ( current_obj# = -1 or current_obj#=0 or current_obj# is null )

--and sample_time > sysdate - &minutes/(60*24)

--and session_state='WAITING'

and ext.file_id(+)=ash.p1 and

ash.p2 between ext.block_id and ext.block_id + ext.blocks

group by

ext.owner,

ext.segment_name,

ext.partition_name,

ext.segment_type,

p3

--ash.p1,

--ash.p2,

--ash.sql_id

Order by count(*)

/

Because querying DBA_EXTENTS is a slow operation, I made a copy of DBA_EXTENTS which will be faster to query.

CNT OWNER SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE P3

--- ------ -------------- --------------- ------------- --------

1 SYS _SYSSMU2$ TYPE2 UNDO read

1 SYS _SYSSMU3$ TYPE2 UNDO write

This second option of getting the object from P1 and P2 (file and block) should probably be done only with the users consent, because we would have to create a copy of the dba_extent table which might take a long time if it’s big.

No ASH ?

If you don’t have ASH data you will have to do some guess work.

Block Class (block type)

The first step in finding out the source of buffer busy waits is looking at

V$waitstats

This will tell us what kind of datablocks we have contention on.

File with contention

You can also get an idea of what file contains the object with the buffer busy waits by looking at:

X$KCBFWAIT

Object with contention

Starting in version 9i there is the table

v$segstat

That will list the objects with buffer busy waits.

If you are on version 7 or 8 good luck finding the object without setting up ASH style data collection.

Why do buffer busy waits happen?

To put it most succinctly, buffer busy waits happen because two users want to change a block at the same time. Two users can change the same block, or even same row “at the same time” ie without committing, but that’s different from the actual operation of modifying the block. The modification on the block in RAM, or computer memory, can only be done by one process at at time in order to avoid memory corruptions. Different users can modify different blocks at the same time but only one user or process can modify a the same block at a time.

In order to really understand what’s going on we have to take a look at how Oracle manages memory and block access and modifications.

Here is the layout of

Above is a diagram shows some of the essential parts of Oracle in regards to performance tuning.

In the machine memory are

Oracle’s SGA, or System Global Area, a memory that is shared between Oracle users

LGWR – log writer process

DBWR – database writer process

User1,2,3 … – user processes, in this case “shadow processes”

On the machine file system are

Redo log files

Data files

The SGA is composed of (among other things)

Log Buffer

Library Cache

Buffer Cache

What’s important for understanding buffer busy waits is how the buffer cache is managed. Here is view of the buffer cache with more components:

In order to access a block, a user (shadow process) has to get a latch (cache buffer chains latch) which protects buckets or linked lists of buffer headers. Once the header desired if found the latch is released. The buffer headers point to the actual data block in memory. Before modifying a block in memory a user has to lock the buffer header. The buffer header is locked any time a modification is made whether it is reading a block into memory or modifying a block that is already in memory. Usually the header is locked only for a brief amount of time but when there is a lot of concurrent access the buffer header can become a bottleneck.

BBW when readling data – read by other session

A buffer busy can happen on oracle 7,8 and 9 when one user is reading a block into memory and a second user wants to read that block. Instead of the second user trying to read that block into memory as well, they just wait for the first user to finish. Starting in Oracle 10, this kind of wait was renames “read by other session”

BBW on insert

If multiple concurrent users are inserting into a table that doesn’t have free lists or is not in an ASSM tablespace then all users will end up inserting into the same block, the first one on the free list and this block will become the hot block

by adding free lists or moving the table to an ASSM tablespace we will alleviate the bottleneck.

Multiple free lists:

The other option is ASSM or Automatic Segment Space Management which is set at the tablespace level.

In this case free block information is kept in Level 1 BMB (or bitmapped blocks). These Level 1 BMBs are chosen by a hash on the users process ID thus distributing the inserts across the table.

The inserts would look something like this (somewhat exaggerated drawing)

the ASSM BMB blocks take up more space in the table , about 1 extra block for every 16 data blocks and there is overhead first looking in the header/level 3 BMB block then going to the Level 2 then level 1 and finally to the datablock but all in all ASSM is worth reduced costs of management verses free lists.

Identifying and creating ASSM tablespaces

Which tablespaces are ASSM or not?

select

tablespace_name,

extent_management LOCAL,

allocation_type EXTENTS,

segment_space_management ASSM,

initial_extent

from dba_tablespaces

TABLESPACE_NAME LOCAL EXTENTS ASSM

--------------- ---------- --------- ------

SYSTEM LOCAL SYSTEM MANUAL

UNDOTBS1 LOCAL SYSTEM MANUAL

SYSAUX LOCAL SYSTEM AUTO

TEMP LOCAL UNIFORM MANUAL

USERS LOCAL SYSTEM AUTO

EXAMPLE LOCAL SYSTEM AUTO

DATA LOCAL SYSTEM MANUAL

creating an ASSM tablespace:

create tablespace data2

datafile '/d3/kyle/data2_01.dbf'

size 200M

segment space management auto;

BBW on index (because of insert)

If users are inserting data that has a rising key value, especially a monotonically rising value, then all the new inserts will have to update the leading edge leaf block of the index and with high concurrent inserts this can cause buffer busy waits.

Solutions

Hash partition the index

Reverse Key Index

BBW on old style RBS

IF block class > 18 it’s an old style RBS segment

Select CURRENT_OBJ#||' '||o.object_name objn,

o.object_type otype,

CURRENT_FILE# filen,

CURRENT_BLOCK# blockn,

ash.SQL_ID,

w.class ||' '||to_char(ash.p3) block_type

from v$active_session_history ash,

(select rownum class#, class from v$waitstat ) w,

all_objects o

where event='buffer busy waits'

and w.class#(+)=ash.p3

and o.object_id (+)= ash.CURRENT_OBJ#

Order by sample_time;

OBJN OTYPE FILEN BLOCKN SQL_ID BLOCK_TYPE

----------- ------ ------ ------ ------------- ------------

54962 TOTO1 TABLE 16 45012 8gz51m9hg5yuf data block

54962 TOTO1 TABLE 16 161 8gz51m9hg5yuf segment header

0 14 9 8gz51m9hg5yuf 87

0 14 9 8gz51m9hg5yuf 87

IF the block is of class > 18, the there will be no object name, so we have to look it up ourselves to be sure:

select segment_name,

segment_type

from dba_extents

where

&P2 between

block_id and block_id + blocks – 1

and

file_id = &P1 ;

Plug in 14 for P1 the file # and 9 for P2 the block number:

SEGMENT_NAME SEGMENT_TYPE

-------------- --------------

R2 ROLLBACK

solution

move to new AUM or Automatic Undo Mangement

alter system set undo_management=auto scope=spfile;

BBW on a file header

The ASH data has two different fields that indicate the file # and block # when the wait is a buffer busy wait.

For a buffer busy wait

File # = p1 *and* File # = current_file#

Block # = P2 *and* Block # = current_block#

if p1 != current_file# or p2 != current_block# then use p1 and p2. They are more reliable.

If you are getting buffer busy waits on the file header block for a tempfile (datafile in a temporary tablespace) then try increasing the “next extent” size in the temporary tablespace.

This wait can happen when lots of extents are being allocated in the temporary tablespace.

What Would ADDM do?

Interstingly enough the ADDM page doesn’t show the new load that has recently come on the system but the analysis is there. I clicked on the next to bottom line in the page, “Read and write contention on database blocks was consuming significant database time.

Oracle 12c introduces the new Pluggable Database (PDB) functionality into the Oracle database. What’s the advantage of PDBs? PDBs eliminate the heavy memory overhead of starting up a full Oracle instance requiring a new SGA and full set of background processes. Instead, we startup one container database (CDB) and then PDBs all share resources of the CDB. With a CDB & PDBs, there is one instance, one set of background processes, one SGA and these are shared among the PDBs. Starting a PDB only requires about 100MB of memory as opposed to half a GB normally required to start an Oracle instance. Because of this reduced memory cost one can go from running 50 instances of Oracle on 20GB of memory to running 250 PDBs in that same 20GB of memory. (numbers given by Oracle benchmarks)

One of the main use cases for PDBs is cloning a database for use in development and QA. With PDBs, one can easily give every developer in a development team their own copy of a database, except for one thing. Each of those PDBs, even though they take up hardly any memory, still require a full set of datafiles. Creation of these datafiles is slow and costly. That’s where Delphix comes in.

With Delphix, having 10 copies of the same database takes up less than the size of the original database! How is that done? It’s done by compressing the original copy and then sharing all the duplicate blocks in that copy among all the clones.

Thus where PDBs give you database for free at the memory level, Delphix gives you free databases at the disk footprint level and the combination of the two gives you practically free databases!

Oracle marketing slide showing 50 separate databases can run in 20 GB of RAM and 250 PDBs can run in 20 GB of RAM. “5x more scalable”

Video example of how easy it is with Delphix to link to a PDB in one CDB and the provision it, thin clone, to another CDB on a different machine:

Oracle OEM 12c introduces a new feature that enables the creation of Oracle database thin clones by leveraging file system snapshot technologies from either ZFS or Netapp. The OEM adds a graphic interface to the process of making database thin clones. The feature that enables database thin cloning in OEM is called Snap Clone and is part of OEM’s Cloud Control Self Service for data cloning. Snap Clone is available via the feature Database as a Service (DBaaS). Snap clone leverages the copy on write technologies available in some storage systems for database cloning. Support is initially available for NAS storage and specifically on ZFS Storage and NetApp Storage.

In order to use Snap Clone, one has to install the source database such that the source database data files are on a ZFS storage or Netapp array and have the storage managed by agents on a LINUX machine and then one can thin clone data files on that same storage array.

Snap Clone offers role based access, so storage admin can login in and only have access to areas they are responsible for as well as limiting access to source databases, clones and resource by end users.

Setting Snap Clone

The prerequisites for getting start with Snap Clone are having available storage on ZFS Storage Appliance or Netapp storage array as well as having access to a master test database. A master test databse is a database that has a sanatized version of a production database such that it is either a subset and or masked. The test master database has to be registered with OEM. After the test master is registered with OEM, Snap Clone can be setup. To set up snap clone, come into Oracle Cloud Control 12c as “cloud administrator” role with “storage adminstator” priviledge or super administrator and register the storage. To register the storage navigate to “ setup -> provisining patching -> storage registration”.

Navigate to “ setup -> provisining patching -> storage registration”

Click “Register” tab, and choose storage, either Netapp or ZFS,

Supply storage information

Name: Storage array name registered in DNS

Vendor

Protocol: http or https

Storage Credentials: credentials for interacting with storage

Install agents on a separate LINUX machine to manage the Netapp or ZFS storage. An agent has to run on Linux host to manage the storage. Supply the

Agent host

Host credentials

Pick a database to make the test master

Put the test master on ZFS storage or Netapp storage

Register the ZFS storage or Netapp storage with OEM

Enable Snap Clone for the test master database

Set up a zone – set max CPU and Memory for a set of hosts and the roles that can see these zones

Set up a pool – a pool is a set of machines where databases can be provisioned

Set up a profile – a source database that can be used for thin cloning

Set up a service template – reference values such as a init.ora for database to be created

Figure 1. Shows the entry page in OEM 12c Cloud Control. From here go to the top right and choose setup, then provisining patching then storage registration as shown above.

Once on the Storage Registration page, choose “Register” and then choose the storage, either Netapp or Sun ZFS.

Register the Storage

Figure 5. Storage Registration Page

To register the storage supply the following information

Name: Storage array name registered in DNS

Vendor

Protocol: http or https

Storage Credentials: credentials for interacting with storage

All of which is documented in cloud administration guide.

Define agents used to manage storage

Then define agents used to manage storage. Agents have are required to run on a LINUX host. More than one agen can define to provide redundancy. The agents will be the path by which OEM communicates with the storage. For each agent, supply the following information

Host name

Credential type

Credentials

And finally define the frequency with which the agent synchronizes with the storage to gather the sorage hardware details such as information on aggregates shares volumes.

After the storage information, agent information and agent synchronization information has been filled out, then hit “submit” button in the top right. Hitting the submit button will return the UI back to the “Storage Registration”. On the “Storage Registration”, click on the storage appliance listed in the top, then click on the contents tab on the bottom half of the page. This will list all the volumns and aggregates in the storage appliance.

Looking at volumns on Storage Array

For each aggregate one can set storage ceilings. Click on the aggregate or FlexVol and the click “Edit Storage Ceilings” tab.

Choosing a Database Test Master

On the database tab is a list of databases that can be used for cloning. OEM detects the database automatically on the hosts it is managing. OEM will also automatically correlate databases that have storage on the storage array added storage registration. OEM looks for all databases that have files on the registered storage. Click on database, then the show files tabs which will show the files and volumes for this database.

Figure 7. List of files by volumn for database

Figure 8. Enable Snap Clone for databases that will be used as test masters.

Nominating a database as test master requires enabling snap clone. To enable snap clone for a database, click on the chosen database, then click “Enable Snap Clone” tab just above the list of databases. This will automatically validate that all the volumes are flex clone enabled (in the case of Netapp).

Setting up Zones

The next step is to configure zone which can be used to organize cloud resources

Middleware and Database Cloud page

Setting up a Zone

In order to see the zones defined, click on the number next to the title “Paas Infrastructure Zones” in the top left under General Information.

Figure 11. PaaS Infrastructure Zones

To create a zone, click the tab “Create”.

Figure 12. first page of wizard to create a PaaS Infrastructure Zone, give a meaningful name and description of the zone and define maximum CPU utilizaiton and memory allocation.

In the first page of the “PaaS Infrastructure Zone”, give zones a meaningful name and description. Define constraints such as maximum host CPU and memory allocations.

Figure 13. Second page of the “PaaS Infrastructure Zone” wizard, add hosts that are available to the zone.

Next define hosts that are members of the zone and provide credentials that operate across all members of this zone

Figure 14. Third page of the “PaaS Infrastructure Zone” wizard, limit which roles can see the zone.

Next define what roles can see and access this szone. The visibiliy of the zone can be limited to a certain class of users via roles like Dev, QA etc

Figure 15. Final review page for “PaaS Infrastructure Zone” wizard

Finally review settings and click submit

Figure 16. Showing the Confirmation that the PaaS Infranstructure Zone has been successfully created.

Creating Database Pool and Profiles

The remaining steps required to enable snap clone is to create a database pools which is a collection of servers or nodes that have database software installed. The remaining part of the setup is done by a differnet user who is the administrator for database as a service.

Log in as DBAAS_ADMIN.

For the next part navigate to the menu “Setup -> Cloud -> Database”.

Figure 17. Middleware and Database Cloud page

Figure 18. Navigate to “Setup -> Cloud -> Database”.

Figure 19. Database Cloud Self Service Portal Setup. To create a database pool choose the “Create” button in the center of the page and from the pull down, choose “For Database”.

To create a new pool click on the “Create” button in the center of the page, and chose “For Database” from the pull down menu that appears.

Figure 20. Choose “Create -> For Database”

Figure 21. Edit pool page. Provide a meaningful name a descrpition of the pool. Add Oracle home directories in the bottom of the page. At the very bottom of the page set a constraint on the number of databases instances that can be created in the pool. On the top right, set the host credentials.

Set

Name and description

Oracle Home

Maximum number of databases per host

Credentials

In the “Edit Pool” page, at the top left of the screen, provide a meaningful name and description for the pool. In the middle of the screen add Oracle homes that will be used for databse provisioning. Every member of a database pool is required to be homogeneous. Homogenous requires that the platform and Oracle version is the same across all the hosts and Oracle homes in the pool. All the Oracle installations also have to be of the same type either single instance or RAC. In the top right add the Oracle home provide oracle credentials and root credentials. Finally at the bottom of the page a constraint can be set on the number of database instances that can be started in this pool.

Figure 22. Set request limits on the pool

The next page sets the request settings. The first restriction sets how far in advanced can requrest can be made. Second restricts how long a request can be kept which is the archive retension. After the archive retention time the requests will be deleted. Finally is the request duration which is the maximum duration for which the request can be made.

Figure 23. Set memory and storage quotas per role for the pool. The quotas cover memory, storage, database requests and schema requests.

The above page configures quotas. Quota is allocated to each and every self service user. The quotas controls the amount fo resources users have access to. Quotas are assigned to a role and users inherit quota values from the role. Click “Create” in the middle of the screen.

Figure 24. Editing the quotas on a pool for a role.

The popup dialogue has for these entries

Role name

memory GB

storage GB

number of database request

Figure 25. Profiles and Service Templates

Profiles and service templates

.A profile is use to capture information about the source database which can then be used for provisioning.

A service template is a standardized service definition for a database configuration that is offered to the self service users. A collection of service templates forms the service catalogue. A service template will provision databsae with or without seed data. To capture an ideal configuration, the easist thing to do is to point at an existing database and fetch information of interest from that database. The information from the database can be captured using a profile.

To create a profile click on the “Create” button under “Profiles”

Creating a profile

Figure 26. Specify a reference target for a profile.

Click the magnifying glass to search for a source database.

Figure 27. Search for a reference target database

Pick a refrence target by clicking on it, the click the “Select” button in bottom right.

Figure 28. Creating a Database Provisioning Profile

To pick a database for use in thin cloning, choose the check box “Data Content” with suboption selected fro “Structured Data” with sub-option selected for “Create” with sub-option selected for “Storage Snapshot”. This option is only enabled only when the “enable snapshot” option is enabled on the storage registration page. Disable option capture oracle home.

Provide credentials for the host machines Oracle account and for the database login.

The “Content Option” step is not needed for the above selections.

Figure 29. Give the profile a meaniful name and description

Next provide credentials for Oracle home and Oracle databse, then provide a meaningful name for the profile as well as a location. The profile will be userful when creating a service template.

Figure 30. Review of create profile options.

Next review the summary and click subit which will connect to storage and take snapshots of the storage

Figure 31. Shows a zoom into the menus to choose

To create a new service template choose a profile and in this case use “thin provisioning for reference DB” profile. Now to create a new service template click “create” and choose “for database”. Service templates are part of the service catalogue and exposed to the self service users.

Figure 32. Provide a meaningful name and description for the service template.

Provide a meaningful name and description. For the rest of service template provide information about the databses that will be created from the snapshots such as providing database type, rac or single instance, for rac provide number of nodes. Provide the SID prefix to appended to the SIDs generated for the clones, provide the Domain Name and the port.

Figure 33. Provide a storage area for writes to use.

The cloning operation only creates a read only copy thus it is required to provide write space elsewhere in order to allow writing to the thin clone.

click on the edit button

click on volumne, then edit button

Figure 34. set diretory and maximum space usage for the write location

Provide the mount point prefix and amount of writeable space wish to allocate

Users of the thin clone databses can also be allowed to take further snapshots. These snapshots can be used as a mechinism to rollback changes, The number of thiese snapshtos can limited just below storage size section:

Figure 35. set the number of snapshots that can be taken of a a thin clone

Figure 36. set the initial passwords for database accounts on the thin clone.

next provide credentials for administrative accounts

SYS

SYSMAN

DBSMNP

for all other non-administartive accounts can choose to leave them as is or change them all to one password you can modify certain init.ora parameters for exmaple memory

Figure 37. Modify any specific init.ora parameters for the thin clone

Figure 38. Set any pre and post provision scripts to be run at the creation of a thin clone.

custom scripts can be provide as pre or post creation steps this can be very useful if you want to register databses with OID or certian actions that are specific to your organization

Figure 39. Set the zone and pool for the thin clone

Figure 40. set the roles that can use the service template.

you associate this srvice template with a zone and a template this insures that the service template can actualy work on the group of resources that you have identified and can limit the visibility of the service tempalte usein roles

Figure 41. review of the service template creation requites

finally we review the summary and click submit

Creating a Thin Clone

Figure 42. 12c Cloud Control

Figure 43. 12c Cloud Control Self Service Portal

Contents of the Database Cloud Self Service Potal screen

Left hand side

Notification – any instances that are about to expire

Usage

databases (number provisioned out of maximum)

schema services

Memory

Storage

Right side

Top

Databases

Bottom

requests – requests that created the database services and the database instances

Figure 44. To clone a database, choose the “Request” then “Database” menu.

Figure 45. From the list choose a Self Service Template. In this case “SOEDB Service Template”

Options are

RMAN backups which are full clones

empty databases

snap clone which are thin clones

Figure 46. Fill out the clone Service Request

request wizard asks for

request name

select zone – collection of servers

select a start and end time

provide a user name and password, new user and password

Users do not get system access to the databases but instead get a slightly less privilege user who becomes the owner of the database

Here are the tuning metrics tables (SQL stats are not in “metric” tables per say)

(*DBA_HIST_…_HISTORY views are sort of confusing. AFAI remember they were storing alert history, but apparently they are used for adaptive thresholds – an area for future investigation)

I’ve noticed a number of people posting queries using DBA_HIST_SYSSTAT instead of DBA_HIST_SYSMETRIC_SUMMARY which leads me to believe that there is some confusion or lack of information on the metric tables.

Oracle 10g introduced metric tables which compute deltas and rates of statistics thus hugely simplifying the ability to answer simple questions like “what is the I/O rate on my databases right now.” This question, before 10g, was surprisingly tedious to answer. To answer the question one would have to query v$sysstat for example:

Select value from v$sysstat where name=’physical reads’;

but querying v$sysstat just once fails to answer the question but instead answers the question “How much I/O has been done since the database was started”. To answer the original question one would have to query v$sysstat twice and take the delta between the two values:

Take value at time A

Take value at time B

Delta = (B-A)

and/or get Rate = (B-A)/elapsed time

Getting these deltas and rates could be a pesky task especially working with a customer over the phone. Then 10g Oracle introduced metric tables which answer the questions in one single query using

Notice that the query returns 2 rows. The first row is the the last minute (ie 59.59 seconds) and the second row is the last 15 seconds (ie 15.15 seconds). Oracle collects both the deltas and rates for 60 second and 15 second intervals.

Oracle has the average, maximum, minimum for the values for the last hour in

For trending data over multiple days, the view DBA_HIST_SYSMETRIC_SUMMARY can be used. The view has a history of all theSystem Metrics Long Duration statistics. If you store multiple databases in the same AWR repository you can check the statistics available to a particular DBID with the view DBA_HIST_METRIC_NAME.

Easy query

The view DBA_HIST_SYSMETRIC_SUMMARY can be queried easily for trending metrics, such as the simple query for bytes read by the database per second:

Its a bit disconcerting to note that the above two queries don’t return the exact same data on my laptop. If it was roughly the same that would be fine, and in general the stats are the similar but there are cases where they differ dramatically. I don’t see anything obvious in the way the queries are written. Possibly has to do with database bounces or the way the database is affected by the laptop’s sleep and hibernate modes. Will have to look into this farther.

One trick to make the data easy to load into Excel is to use the html output format and spool to a file with an “.html” extension

The first line of the table is the classic wait event and statistic views. The following lines are the metric views. The metric views were introduced in Oracle 10g.

Why Metrics are good

Metric views compute deltas and rates which hugely simplifying the ability to answer simple questions like “what is the I/O rate on my databases right now?” This question, before 10g, was surprisingly tedious to answer. To answer the question one would have to query v$sysstat for example:

Select value from v$sysstat where name=’physical reads’;

but querying v$sysstat just once fails to answer the question but instead answers the question “How much I/O has been done since the database was started?” To answer the original question one would have to query v$sysstat twice and take the delta between the two values:

Take value at time A

Take value at time B

Delta = (B-A)

and/or get Rate = (B-A)/elapsed time

Getting these deltas and rates could be a pesky task especially working with a customer over the phone. Then 10g Oracle introduced metric tables which answer the questions in one single query .

Using Metrics with Waits

The metric views apply to wait events as well as statistics. In a future posting we will go over statistics. In this posting we will go over wait events. The number of views available to analyze wait events can be confusing. The point of this post is to clarify what the different views available are and how they can be used.

The wait event views are (at system level)

V$SYSTEM_EVENT – wait events cumulative since startup

V$EVENTMETRIC – wait event deltas last 60 seconds

DBA_HIST_SYSTEM_EVENT – wait events by snapshot (hour) for last week, cumulative since startup

The wait events are rolled up in to groups called wait classes. For wait class we have the following views:

V$SYSTEM_WAIT_CLASS – cumulative since start up

V$WAITCLASSMETRIC – last 60 seconds deltas

V$WAITCLASSMETRIC_HISTORY – 60 seconds deltas for last hour

Note: DBA_HIST_WAITCLASSMETRIC_HISTORY is used for alerts and or baselines not everyday values.

Use Wait Event Metrics for Latency

I use wait event metrics for I/O latencies.

It may be surprising that I don’t mention using waits to identify bottlenecks and load on the system. For bottlenecks and load on the system the data in V$ACTIVE_SESSION_HISTORY (ASH) is probably better for a few reasons. One the data in ASH is mult-dimesional so it can be grouped by SQL and Session Also CPU information is derivable from ASH. CPU information is not in the event/waitclass views but is in ASH along with the waits.

The second part, the latencies, specifically I/O latencies, are only available in the wait event and waitclass views (and the filestat views on a per file basis)

User I/O latency with WAIT CLASS

One use of wait metrics is determining the average read I/O for all the various kinds of read I/O and read sizes:

One issue with V$WAITCLASSMETRIC is that the field WAIT_CLASS name is not in the view, so we either have to use the WAIT_CLASS_ID (the hash of the name) as above or join to V$SYSTEM_WAIT_CLASS as below

For specific I/O latencies there are two choices – v$eventmetric and v$system_event. With v$system_event it requires running multiple queries and taking the deltas but the deltas are are already calculated in v$eventmetric. Here is an example of getting I/O latencies from v$eventmetric

What are the sizes of the I/O requests?

One issue with looking at I/O latencies is determining the I/O sizes. It would be awesome if there was a view with I/O counts, sizes and latencies in one place. ASH does have this information but ASH data is weighted to the longer latencies and sizes and not the average. The average sizes have to be gotten from system statistics. The I/O sizes for ‘db file sequential read’ are single block reads so are single value that can be determined , but the other read events can vary in size. To get a general idea of I/O sizes one could just average across all I/O using the system statistics

but the big drawback with wait event and/or wait class views is that they lack information on CPU load. CPU load can be found in the system statistics but it’s just easier to do it all in one query using v$active_session_history. Here is a query using ASH to calculate AAS load on the database over the last 60 seconds:

If one is seeing waits for enq: TX – row lock contention then there could be a lot of reasons. One distinguishing factor is the lock mode. If the lock mode is exclusive (mode 6) then it’s most likely a classic row lock where two sessions are trying to modify the same row. On the other hand if the lock mode is share (mode 4) it’s typically going to be

Primary/Unique Key: inserting a unique key when someone else has already inserted that key but not committed

Foreign Key: Inserting a foreign when then parent value has been inserted but not committed or deleted and not commited (not to be confused with locks due to un-indexed foreign key which cause a “enq: TM – contention” wait not a TX wait)

Bitmap Index: bitmap index chunk contention

Now how to tell which of these is happening? Well here is a query on ASH (I’ve commented out some of the useful fields to limit the output) that will generate output to help us distinguish between the 3 different cases