/***
StyleSheet for use when a translation requires any css style changes.
This StyleSheet can be used directly by languages such as Chinese, Japanese and Korean which need larger font sizes.
***/
/*{{{*/
body {font-size:0.8em;}
#sidebarOptions {font-size:1.05em;}
#sidebarOptions a {font-style:normal;}
#sidebarOptions .sliderPanel {font-size:0.95em;}
.subtitle {font-size:0.8em;}
.viewer table.listView {font-size:0.95em;}
/*}}}*/

To get started with this blank TiddlyWiki, you'll need to modify the following tiddlers:
* SiteTitle & SiteSubtitle: The title and subtitle of the site, as shown above (after saving, they will also appear in the browser title bar)
* MainMenu: The menu (usually on the left)
* DefaultTiddlers: Contains the names of the tiddlers that you want to appear when the TiddlyWiki is opened
You'll also need to enter your username for signing your edits: <<option txtUserName>>

<<<
0) It would be ideal if you can create a separate database for your tests
1) If you really want a quick IO test, then do calibrate_io then the iperf2/netperf
http://docs.oracle.com/cd/E11882_01/appdev.112/e40758/d_resmgr.htm#ARPLS67598
2) If you have a bit of time, then do Orion
You can quickly do a -run dss or -run oltp
But you can also explore the attached oriontoolkit.zip
3) If you have a lot of time, then do SLOB and test the large IOs
For OLTP test
http://kevinclosson.net/2012/02/06/introducing-slob-the-silly-little-oracle-benchmark/
If it’s DW, then you need to test the large IOs. See attached IOsaturationtoolkit-v2.tar.bz2
http://karlarao.tiddlyspot.com/#[[cpu%20-%20SillyLittleBenchmark%20-%20SLOB]]
Kyle also has some good reference on making use of FIO https://github.com/khailey/fio_scripts/blob/master/README.md
<<<

http://www.cyberciti.biz/tips/what-is-devshm-and-its-practical-usage.html
http://superuser.com/questions/45342/when-should-i-use-dev-shm-and-when-should-i-use-tmp
http://download.oracle.com/docs/cd/B28359_01/server.111/b32009/appi_vlm.htm
tanel mentioned he used it as a persistent storage when he was doing a migration on this one database because it needs to do fast writes so he put the redo log on the /dev/shm.. this is dangerous because when the server crash then you have to do a restore/recover.. data residing in /dev/shm is not persistent on OS reboot..

http://www.freelists.org/post/oracle-l/SQL-High-version-count-because-of-too-many-varchar2-columns,12
http://t31808.db-oracle-general.databasetalk.us/sql-high-version-count-because-of-too-many-varchar2-columns-t31808.html
SQLs With Bind Variable Has Very High Version Count (Doc ID 258742.1)
{{{
event="10503 trace name context forever, level "
For eg., if the maximum length of a bind variable in the application is 128, then
event="10503 trace name context forever, level 128"
The EVENT 10503 was added as a result of BUG:2450264
This fix introduces the EVENT 10503 which enables users to specify a character bind buffer length.
Depending on the length used, the character binds in the child cursor can all be created
using the same bind length;
skipping bind graduation and keeping the child chain relatively small.
This helps to alleviate a potential cursor-sharing problem related to graduated binds.
The level of the event is the bind length to use, in bytes.
It is relevant for binds of types:
Character (but NOT ANSI Fixed CHAR (type 96 == DTYAFC))
Raw
Long Raw
Long
* There really is no limit for the EVENT 10503 but for the above datatypes.
For non-PL/SQL calls, the maximum bind buffer size is 4001 (bytes). For PL/SQL,
the maximum bind buffer size is 32K.
* Specifying a buffer length which is greater than the pre-set maximum will cause the
pre-set maximum to be used. To go back to using the pre-set lengths, specify '0' for the buffer
length.
Test the patch and event in development environment before implementing in the production environment.
}}}

{{{
the only difference it would make on the databases that will have the DBV and TDE configured is that when
DBAs would try to create a user it has to go through the dvadmin user. Other databases that doesn’t have the
DV schemas created and configured will still behave as is.
Below is a sample of create a user in a DBV environment
SYS@dbv_1> SYS@dbv_1> select username from dba_users order by 1;
USERNAME
------------------------------
ANONYMOUS
APEX_030200
APEX_PUBLIC_USER
APPQOSSYS
BI
CTXSYS
DBSNMP
DIP
DVADMIN
DVF
DVOWNER
DVSYS
SYS@dbv_1> conn / as sysdba
SYS@dbv_1> create user karlarao identified by karlarao;
create user karlarao identified by karlarao
*
ERROR at line 1:
ORA-01031: insufficient privileges
SYS@dbv_1> conn dvadmin/<password>
Connected.
DVADMIN@dbv_1> create user karlarao identified by karlarao;
User created.
}}}

http://www.evernote.com/shard/s48/sh/1a9c1779-94ec-4e5a-a26f-ba92ea08988e/3bb10603e76f4fb346d7df4328882dcd
Also check out this thread at oracle-l for options on 10GbE on V2 http://www.freelists.org/post/oracle-l/Exadata-V2-Compute-Node-10GigE-PCI-card-installation

making use of smart scan made the run times faster, cpu on a lower utilization, + can accommodate more databases
http://www.evernote.com/shard/s48/sh/b1f43d49-1bcd-4319-b274-19a91cf338ac/f9f554d2d03b3f20db591d5e68392cbf

{{{
Glenn Fawcett
http://glennfawcett.files.wordpress.com/2013/06/ciops_data_x3-2.jpg
---
It wasn’t actually SLOB, but that might be interesting.
I used a mod of my blkhammer populate script to populate a bunch of tables OLTP style to
show how WriteBack is used. As expected, Exadata is real good on
“db file sequential read”… in the sub picosecond range if I am not mistaken :)
---
That was just a simple OLTP style insert test that spawns a bunch of PLSQL. Yes for sure
their were spills to disk... But the benefit was the coalescing of blocks. DBWR is flushing really
mostly random blocks, but the write back flash is pretty huge these days. I was seeing average
iosize to disk being around 800k but only about 8k to flash.
}}}

http://kevinclosson.wordpress.com/2012/02/12/how-many-non-exadata-rac-licenses-do-you-need-to-match-exadata-performance/
{{{
kevinclosson
February 14, 2012 at 8:52 pm
Actually, Matt, I see nothing wrong with what the rep said. A single Exadata database grid host can drive a tremendous amount of storage throughput but it can only eat 3.2GB/s since there is but a single 40Gb HCA port active on each host. A single host can drive the storage grid nearly to saturation via Smart Scan…but as soon as the data flow back to the host approaches 3.2GB/s the Smart Scan will start to throttle. In fact single session (non-Parallel Query) can drive Smart Scan to well over 10GB/s in a full rack but, in that case you’d have a single foreground process on a single core of WSM-EP so there wouldn’t sufficient bandwidth to ingest much data..about 250MB/s can flow into a single session performing a Smart Scan. So the hypothetical there would be Smart Scan is churning through, let’s say, 10GB/s and Smart Scan is whittling down the payload by about 9.75GB/s through filtration and projection. Those are very close to realistic numbers I’ve just cited but I haven’t measured those sort of “atomics” in a year so I’m going by memory. Let’s say give or take 5% on my numbers.
<<<
}}}
http://forums.theregister.co.uk/forum/1/2011/12/12/ibm_vs_oracle_data_centre_optimisation/
{{{
Exadata: 2 Grids, 2 sets of roles.
>The Exadata storage nodes compress database files using a hybrid columnar algorithm so they take up less space and can be searched more quickly. They also run a chunk of the Oracle 11g code, pre-processing SQL queries on this compressed data before passing it off to the full-on 11g database nodes.
Exadata cells do not compress data. Data compression is done at load time (in the direct path) and compression (all varieties not just HCC) is code executed only on the RAC grid CPUS. Exadata users get no CPU help from the 168 cores in the storage grid when it comes to compressing data.
Exadata cells can, however, decompress HCC data (but not the other types of compressed data). I wrote "can" because cells monitor how busy they are and are constantly notified by the RAC servers about their respective CPU utilization. Since decompressing HCC data is murderously CPU-intensive the cells easily go processor-bound. At that time cells switch to "pass-through" mode shipping up to 40% of the HCC blocks to the RAC grid in compressed form. Unfortunately there are more CPUs in the storage grid than the RAC grid. There is a lot of writing on this matter on my blog and in the Expert Oracle Exadata book (Apress).
Also, while there are indeed 40GB DDR Infiniband paths to/from the RAC grid and the storage grid, there is only 3.2GB/s usable bandwidth for application payload between these grids. Therefore, the aggregate maximum data flow between the RAC grid and the cells is 25.6GB/s (3.2x8). There are 8 IB HCAs in either X2 model as well so the figure sticks for both. In the HP Oracle Database Mahine days that figure was 12.8GB/s.
With a maximum of 25.6 GB/s for application payload (Oracle's iDB protocol as it is called) one has to quickly do the math to see the mandatory data reduction rate in storage. That is, if only 25.6 GB/s fits through the network between these two grids yet a full rack can scan combined HDD+FLASH at 75 GB/s then you have to write SQL that throws away at least 66% of the data that comes off disk. Now, I'll be the first to point out that 66% payload reduction from cells is common. Indeed, the cells filter (WHERE predicate) and project columns (only the cited and join columns need shipped). However, compression changes all of that.
If scanning HCC data on a full rack Exadata configuration, and that data is compressed at the commonly cited compression ratio of 10:1 then the "effective" scan rate is 750GB/s. Now use the same predicates and cite the same columns and you'll get 66% reduced payload--or 255GB/s that needs to flow over iDB. That's about 10x over-subscription of the available 25.6 GB/s iDB bandwidth. When this occurs, I/O is throttled. That is, if the filtered/projected data produced by the cells is greater than 25.6GB/s then I/O wanes. Don't expect 10x query speedup because the product only has to perform 10% the I/O it would in the non-compressed case (given a HCC compression ratio of 10:1).
That is how the product works. So long as your service levels are met, fine. Just don't expect to see 75GB/s of HCC storage throughput with complex queries because this asymmetrical MPP architecture (Exadata) cannot scale that way (for more info see: http://bit.ly/tFauDA )
}}}
http://kevinclosson.wordpress.com/2011/11/23/mark-hurd-knows-cios-i-know-trivia-cios-may-not-care-about-either-hang-on-im-booting-my-cell-phone/#comment-37527
{{{
kevinclosson
November 28, 2011 at 7:09 pm
“I can see the shared nothing vs shared everything point in a CPU + separate storage perspective.”
…actually, I don’t fester about with the shared-disk versus shared nothing as I really don’t think it matters. It’s true that Real Application Clusters requires shared disk but that is not a scalability hindrance–so long as one works out the storage bandwidth requirements–a task that is not all that difficult with modern storage networking options. So long as ample I/O flow is plumbed into RAC it scales DW/BI workloads. It is as simple as that. On the other hand, what doesn’t scale is asymmetry. Asymmetry has never scaled as would be obvious to even the casual observer. As long as all code can run on all CPUs (symmetry) scalability is within reach. What I’m saying is that RAC actually has better scalability characteristics when running with conventional storage than with Exadata! That’s a preposterous statement to the folks who don’t actually know the technology, as well as those who are dishonest about the technology, but obvious to the rest of us. It’s simple computer science. One cannot take the code path of query processing, chop it off at the knees (filtration/projection) and offload that to some arbitrary percentage of your CPU assets and pigeon-hole all the rest of the code to the remaining CPUs and cross fingers.
A query cannot be equally CPU-intensive in all query code all the time. There is natural ebb and tide. If the query plan is at the point of intensive join processing it is not beneficial to have over fifty percent of the CPUs in the rack unable to process join code (as is the case with Exadata).
To address this sort of ebb/tide imbalance Oracle has “released” a “feature” referred to as “passthrough” where Exadata cells stop doing their value-add (filtration and HCC decompression) for up to about 40% of the data flowing off storage when cells get too busy (CPU-wise). At that point they just send unfiltered, compressed data to the RAC grid. The RAC grid, unfortunately, has less CPU cores than the storage grid and has brutally CPU-intensive work of its own to do (table join, sort, agg). “Passthrough” is discussed in the Expert Oracle Exadata (Apress) book.
This passthrough feature does allow water to find its level, as it were. When Exadata falls back to passthrough mode the whole configuration does indeed utilize all CPU and since idle CPU doesn’t do well to increase query processing performance this is a good thing. However, if Exadata cells stop doing the “Secret Sauce” (a.k.a., Offload Processing) when they get busy then why not just build a really large database grid (e.g., with the CPU count of all servers in an Exadata rack) and feed it with conventional storage? That way all CPU power is “in the right place” all the time. Well, the answer to that is clearly RAC licensing. Very few folks can afford to license enough cores to run a large enough RAC grid to make any of this matter. Instead they divert some monies that could go for a bigger database grid into “intelligent storage” and hope for the best.
}}}
http://www.snia.org/sites/default/education/tutorials/2008/fall/networking/DrorGoldenberg-Fabric_Consolidation_InfiniBand.pdf
3.2 GB/s unidirectional
theoretical limit 3.2 GB/s measured due to server IO limitations
http://www.it-einkauf.de/images/PDF/677C777.pdf
{{{
INFINIBAND PHYSICAL-LAYER CHARACTERISTICS
The InfiniBand physical-layer specification supports three data rates, designated 1X, 4X, and 12X, over both copper and fiber optic media.
The base data rate, 1X single data rate (SDR), is clocked at 2.5 Gbps and is transmitted over two pairs of wires—transmit and receive—and
yields an effective data rate of 2 Gbps full duplex (2 Gbps transmit, 2 Gbps receive). The 25 percent difference between data rate and
clock rate is due to 8B/10B line encoding that dictates that for every 8 bits of data transmitted, an additional 2 bits of transmission
overhead is incurred.
}}}
infiniband cabling issues
{{{
InfiniBand cable presents a challenge within this environment because the cables are considerably thicker, heavier, and shorter in length
to mitigate the effects of cross-talk and signal attenuation and achieve low bit error rates (BERs). To assure the operational integrity and
performance of the HPC cluster, it is critically important to maintain the correct bend radius, or the integrity of the cable can be
compromised such that the effects of cross-talk introduce unacceptable BERs.
To address these issues, it is essential to thoroughly plan the InfiniBand implementation and provide a good cable management solution
that enables easy expansion and replacement of failed cables and hardware. This is especially important when InfiniBand 12X or DDR
technologies are being deployed because the high transmission rates are less tolerant to poor installation practices.
}}}
http://www.redbooks.ibm.com/abstracts/tips0456.html
A single PCI Express serial link is a dual-simplex connection using two pairs of wires, one pair for transmit and one pair for receive, and can only transmit one bit per cycle. Although this sounds limiting, it can transmit at the extremely high speed of 2.5 Gbps, which equates to a burst mode of 320 MBps on a single connection. These two pairs of wires is called a lane.
{{{
Table: PCI Express maximum transfer rate
Lane width Clock speed Throughput (duplex, bits) Throughput (duplex, bytes) Initial expected uses
x1 2.5 GHz 5 Gbps 400 MBps Slots, Gigabit Ethernet
x2 2.5 GHz 10 Gbps 800 MBps
x4 2.5 GHz 20 Gbps 1.6 GBps Slots, 10 Gigabit Ethernet, SCSI, SAS
x8 2.5 GHz 40 Gbps 3.2 GBps
x16 2.5 GHz 80 Gbps 6.4 GBps Graphics adapters
}}}
http://www.aiotestking.com/juniper/2011/07/when-using-a-40-gbps-switch-fabric-how-much-full-duplex-bandwidth-is-available-to-each-slot/
{{{
When using a 40 Gbps switch fabric, how much full duplex bandwidth is available to each slot?
A.
1.25 Gbps
}}}
Sun Blade 6048 InfiniBand QDR Switched Network Express Module Introduction
http://docs.oracle.com/cd/E19914-01/820-6705-10/chapter1.html
{{{
IB transfer rate (maximum)
40 Gbps (QDR) per 4x IB port for the Sun Blade X6275 server module and 20 Gbps (DDR) per 4x IB port for the Sun Blade X6270 server module. There are two 4x IB ports per server module.
1,536 Gbps aggregate throughput
}}}
''email with Kevin''
<<<
on Exadata the 3.2 is establsihed by the PCI slot the HCA is sitting in. I don't scrutinize QDR IB these days. It would be duplex...would have to look it up.
<<<
wikipedia
<<<
http://en.wikipedia.org/wiki/InfiniBand where it mentioned about "The SDR connection's signalling rate is 2.5 gigabit per second (Gbit/s) in each direction per connection"
<<<
''The flash and HCA cards uses pci-e x8''
http://jarneil.wordpress.com/2012/02/02/upgradingdowngrading-exadata-ilom-firmware/
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CFkQFjAA&url=http%3A%2F%2Fhusnusensoy.files.wordpress.com%2F2010%2F10%2Foracle-exadata-v2-fast-track.pptx&ei=M4zTT_mvLcGC2AWpufi6Dw&usg=AFQjCNFMAJgvIx9QuD3513dWS9nETkeXqw
{{{
IB Switches
3 x 36-port managed switches as opposed to Exadata v1 (2+1).
2 “leaf”
1 “spine” switches
Spine switch is only available for Full Rack because it is for connecting multiple full racks side by side.
A subnet manager running on one switch discovers the topology of the network.
HCA
Each node (RAC & Storage Cell) has a PCIe x8 40 Gbit HCA with two ports
Active-Standby Intracard Bonding.
}}}
F20 PCIe Card
{{{
Not a SATA/SAS SSD driver but a x8 PCIe device providing SATA/SAS interface.
4 Solid State Flash Disk Modules (FMod) each of 24 GB size
256 MB Cache
SuperCap Power Reserve (EnergyStorageModule) provides write-back operation mode.
ESM should be enabled for optimal write performance
Should be replaced in every two years.
Can be monitored using various tools like ILOM
Embedded SAS/SATA configuration will expose 16 (4 cards x 4 FMod) Linux devices.
/dev/sdn
4K sector boundary for Fmods
Each FMod consists of several NAND modules best performance can be reached with multithreading (32+ thread/FMod etc)
}}}

{{{
How To Avoid ORA-04030/ORA-12500 In 32 Bit Windows Environment
Doc ID: Note:373602.1
How to convert a 32-bit database to 64-bit database on Linux?
Doc ID: Note:341880.1
-- PAE/AWE
Some relief may be obtained by setting the /3GB flag as well as the /PAE flag in Oracle. This at least assures that up to 2 GB of memory is available for the Large Pool,
the Shared Pool, the PGA, and all user threads, after the AWE_WINDOW_SIZE parameter is taken into account. However, Microsoft recommends that the /3GB flag not be set if
the /AWE flag is set. This is due to the fact that the total amount of RAM accessible for ALL purposes is limited to 16 GB if the /3GB flag is set. RAM above 16 GB simply
�disappears� from the view of the OS. For PowerEdge 6850 servers that can support up to 64 GB of RAM, a limitation to only 16 GB of RAM is unacceptable.
As noted previously, the model used for extended memory access under a 32-bit Operating System entails a substantial performance penalty. However, with a 64-bit OS, a flat linear model for memory used, with no need for PAE to access memory above 4 GB. Improved performance will be experienced for database SGA sizes greater than 3 GB, due to elimination of PAE overhead.
MAXIMUM OF 4 GB OF ADDRESSABLE MEMORY FOR THE 32 BIT ARCHITECTURE. THIS IS A MAXIMUM PER PROCESS. THAT IS, EACH PROCESS MAY ALLOCATE UP TO 4 GB OF MEMORY
2GB for OS
2GB for USER THREADS
1st workaround on 4GB limit:
- To expand the total memory used by Oracle above 2 GB, the /3GB flag may be set in the boot.ini file.
With the /3GB flag set, only 1 GB is used for the OS, and 3 GB is available for all user threads, including the Oracle SGA.
2nd workaround on 4GB limit:
- use the PAE, Intel 32-bit processors such as the Xeon processor support PAGING ADDRESS EXTENSIONS for large memory support
MS Windows 2000 and 2003 support PAE through ADDRESS WINDOWING EXTENSIONS (AWE). PAE/AWE may be enabled by setting the /PAE flag in the boot.ini file.
The �USE_INDIRECT_BUFFERS=TRUE� parameter must also be set in the Oracle initialization file. In addition, the DB_BLOCK_BUFFERS parameter must be used
instead of the DB_CACHE parameter in the Oracle initialization file. With this method, Windows 2000 Server and Windows Server 2003 versions can support
up to 8 GB of total memory.
Windows Advanced Server and Data Center versions support up to 64 GB of addressable memory with PAE/AWE.
- One limitation of AWE is that only the Data Buffer component of the SGA may be placed in extended memory. Threads for other
SGA components such as the Shared Pool and the Large Pool, as well as the PGA and all Oracle user sessions must still fit inside
a relatively small memory area. THERE IS AN AWE_WINDOW_SIZE REGISTRY KEY PARAMETER THAT IS USED TO SET THE SIZE OF A KIND OF �SWAP� AREA IN THE SGA. <-- swap area in SGA
This �swap� area is used for mapping data blocks in upper memory to a lower memory location. By default,
this takes an additional 1 GB of low memory. This leaves only 2 GB of memory for everything other than the Buffer cache, assuming
the /3GB flag is set. If the /3GB flag is not set, only 1 GB of memory is available for the non-Buffer Cache components.
- Note that the maximum addressable memory was limited to 16 GB of RAM
Some relief may be obtained by setting the /3GB flag as well as the /PAE flag in Oracle. This at least assures that up to 2 GB of memory is available
for the Large Pool, the Shared Pool, the PGA, and all user threads, after the AWE_WINDOW_SIZE parameter is taken into account. However, Microsoft
recommends that the /3GB flag not be set if the /AWE flag is set. This is due to the fact that the total amount of RAM accessible for ALL purposes
is limited to 16 GB if the /3GB flag is set. RAM ABOVE 16 GB SIMPLY �DISAPPEARS� FROM THE VIEW OF THE OS. For PowerEdge 6850 servers that can support
up to 64 GB of RAM, a limitation to only 16 GB of RAM is unacceptable.
This will give you (/3GB is set):
3-4GB for Buffer Cache
1GB for the swap area
2GB for everything other than the Buffer Cache
1GB for OS
This will give you (/3GB is not set):
3-4GB for Buffer Cache
1GB for the swap area
1GB for everything other than the Buffer Cache
2GB for OS
- Performance Tuning Corporation Benchmark:
This will give you (/3GB is set):
11GB for Buffer Cache
.75GB for the swap area (AWE_MEMORY_WINDOW..minimum size that allowed the database to start)
2.25GB for everything other than the Buffer Cache
1GB for OS
This will give you (/3GB is not set):
11GB for Buffer Cache
.75GB for the swap area (AWE_MEMORY_WINDOW..minimum size that allowed the database to start)
1.25GB for everything other than the Buffer Cache
2GB for OS
}}}
Using Large Pages for Oracle on Windows 64-bit (ORA_LPENABLE) http://blog.ronnyegner-consulting.de/2010/10/19/using-large-pages-for-oracle-on-windows-64-bit-ora_lpenable/

http://karlarao.wordpress.com/2009/12/31/50-sql-performance-optimization-scenarios/
{{{
ORACLE SQL Performance Optimization Series (1)
1. The types of ORACLE optimizer
2. The way to visit Table
3. Shared SQL statement
ORACLE SQL Performance Optimization Series (2)
4. Select the table name of the most efficient order (only in the effective rule-based optimizer)
5. WHERE clause in the order of the connections
6. SELECT clause to avoid using ‘*’
7. Access to the database to reduce the number of
ORACLE SQL Performance Optimization Series (3)
8. Using the DECODE function to reduce the processing time
9. Integration of simple, non-associated database access
10. Remove duplicate records
11. Alternative DELETE with TRUNCATE
12. As much as possible the use of COMMIT
ORACLE SQL Performance Optimization Series (4)
13. Calculate the number of records
14. Where clause with the HAVING clause to replace
15. To reduce the query table
16. Through an internal function to improve SQL efficiency
ORACLE SQL Performance Optimization Series (5)
17. Use the table alias (Alias)
18. Replace IN with EXISTS
19. Replace NOT IN with NOT EXISTS
ORACLE SQL performance optimization Series (6)
20. Connect with the table to replace EXISTS
21. Replace DISTINCT with EXISTS
22. Recognition ‘inefficient implementation of the’ in SQL statements
23. Use TKPROF tool to query SQL Performance Status
ORACLE SQL Performance Optimization Series (7)
24. Analysis of SQL statements with EXPLAIN PLAN
ORACLE SQL Performance Optimization Series (8)
25. With the index to improve efficiency
26. Operation index
ORACLE SQL Performance Optimization Series (9)
27. The choice of the basis of the table
28. Number of equal index
29. Comparing and scope of the comparison equation
30. The index level is not clear
ORACLE SQL Performance Optimization Series (10)
31. Force index failure
32. Avoid the use of columns in the index calculation.
33. Auto Select Index
34. Avoid the use of NOT in the index column
35. With “= substitute”
ORACLE SQL Performance Optimization Series (11)
36. UNION replaced with the OR (for the index column)
37. To replace the OR with the IN
38. Avoid the use of columns in the index IS NULL and IS NOT NULL
ORACLE SQL Performance Optimization Series (12)
39. Always use the first column index
40. ORACLE internal operations
41. With the UNION-ALL replaced UNION (if possible)
42. Usage Tips (Hints)
ORACLE SQL Performance Optimization Series (13)
43. WHERE replaced with ORDER BY
44. Avoid changing the index of the column type
45. Need to be careful of the WHERE clause
ORACLE SQL Performance Optimization Series (14)
46. Connect multiple scan
47. CBO to use a more selective index of
48. Avoid the use of resource-intensive operations
49. GROUP BY Optimization
50. Use Date
51. Use explicit cursor (CURSORs)
52. Optimization EXPORT and IMPORT
53. Separate tables and indexes
ORACLE SQL Performance Optimization Series (15)
EXISTS / NOT EXISTS must be better than IN / NOT IN the efficiency of high?
ORACLE SQL Performance Optimization Series (16)
I used the view of how query results are wrong?
ORACLE SQL Performance Optimization Series (17)
Page Which writing efficient SQL?
ORACLE SQL Performance Optimization Series (18)
COUNT (rowid) / COUNT (pk) the efficiency of high?
ORACLE SQL Performance Optimization Series (19)
ORACLE data type implicit conversions
ORACLE SQL Performance Optimization Series (20)
The use of INDEX should pay attention to the three questions
ORACLE Tips (HINT) use (Part 1) (21)
ORACLE Tips (HINT) use (Part 2) (22)
Analysis of function-based index (Part 1) (23)
Analysis of function-based index (Part 2) (24)
How to achieve efficient paging query (25)
ORACLE achieved in the SELECT TOP N method (26)
}}}

Average Active Sessions (AAS) is a metric of the database load. This value should not go above the CPU count, if it does then that means the database is working very hard or waiting a lot for something.
''The AAS & CPU count is used as a yardstick for a possible performance problem (I suggest reading Kyle's stuff about this):''
{{{
if AAS < 1
-- Database is not blocked
AAS ~= 0
-- Database basically idle
-- Problems are in the APP not DB
AAS < # of CPUs
-- CPU available
-- Database is probably not blocked
-- Are any single sessions 100% active?
AAS > # of CPUs
-- Could have performance problems
AAS >> # of CPUS
-- There is a bottleneck
}}}
''AAS Formula''
--
{{{
* AAS is either dbtime/elapsed
* or count/samples
* in the case of dba_hist_ count is count*10 since they only write out 1/10 samples (19751*10)/600 = 329.18
}}}

<<showtoc>>
This Tiddler will show you a new interesting metric included in the performance graph of Enterprise Manager 11g.. which is the ''CPU Wait'' or ''CPU + CPU Wait''
a little background..
I've done an IO test with the intention of bringing the system down to its knees and characterizing the IO performance on that level of stress. That time I want to know the IO performance of my R&D server http://www.facebook.com/photo.php?pid=5272015&l=d5f2be4166&id=552113028 (which I intend to run lots of VMs) having 8GB memory, IntelCore2Quad Q9500 & 5 x 1TB short stroked disk (on the outer 100GB area) and I was able to built from it an LVM stripe that produced about 900+ IOPS & 300+ MB/s on my ''Orion'' and ''dbms_resource_manager.calibrate_io'' runs and validated those numbers against the database I created by actually running ''256 parallel sessions'' doing SELECT * on a 300GB table http://goo.gl/PYYyH (the same disks are used but as ASM disks on the next 100GB area - short stroked).
I'll start off by showing you how AAS is computed.. Then detail on how it is being graphed and show you the behavior of AAS on IO and CPU bound workload..
The tools I used for graphing the AAS:
* Enterprise Manager 11g
** both the real time and historical graphs
* ASH Viewer by Alexander Kardapolov http://j.mp/dNidrB
** this tool samples from the ASH itself and graphs it.. so it allows me to check the correctness and compare it with the ''real time'' graph of Enterprise Manager
* MS Excel and awr_topevents.sql
** this tool samples from the DBA_HIST views and graphs it.. so it allows me to check the correctness and compare it with the ''historical'' graph of Enterprise Manager
Let's get started..
!
! How AAS is computed
AAS is the abstraction of database load and you can get it by the following means...
!!!! 1) From ASH
<<<
[img[picturename| https://lh5.googleusercontent.com/_F2x5WXOJ6Q8/TZtyRXwwiOI/AAAAAAAABLA/BYOUYtXO1Vo/AASFromASH.png]]
<<<
!!!! 2) From DBA_HIST_ACTIVE_SESS_HISTORY
* In the case of DBA_HIST_ ''sample count'' is sample count*10 since they only write out 1/10 samples
<<<
[img[picturename| https://lh4.googleusercontent.com/_F2x5WXOJ6Q8/TZtyRcp7m_I/AAAAAAAABLI/sLqztbLY3Mw/AASFromDBA_HIST.png]]
<<<
!!!! 3) From the AWR Top Events
* The Top Events section unions the output of ''dba_hist_system_event'' (all the events) and the ''CPU'' from time model (''dba_hist_sys_time_model'') and then filter only the ''top 5'' and do this across the SNAP_IDs
** To get the ''high level AAS'' you have to divide DB Time / Elapsed Time
** To get the ''AAS for the Top Events'', you have to divide the ''time'' (from event or cpu) by ''elapsed time''
* You can see below that we are having ''the same'' AAS numbers compared to the ASH reports
<<<
[img[picturename| https://lh3.googleusercontent.com/_F2x5WXOJ6Q8/TZtyRdPqm3I/AAAAAAAABLE/o23FMIG1yeQ/AASFromAWRTop.png]]
<<<
!
! How AAS is being graphed
I have a dedicated blog post on this topic.. http://karlarao.wordpress.com/2010/07/25/graphing-the-aas-with-perfsheet-a-la-enterprise-manager/
So we already know how we get the AAS, and how is it graphed.. ''so what's my issue?''
''Remember I mentioned this on the blog post above.. ?''
<<<
"So what’s the effect? mm… on a high CPU activity period you’ll notice that there will be a higher AAS on the Top Activity Page compared to Performance Page. Simply because ASH samples every second and it does that quickly on every active session (the only way to see CPU usage realtime) while the time model CPU although it updates quicker (5secs I think) than v$sysstat “CPU used by this session” there could still be some lag time and it will still be based on Time Statistics (one of two ways to calculate AAS) which could be affected by averages."
<<<
I'll expound on that with test cases included.. ''see below!''
!
! AAS behavior on an IO bound load
* This is the graph of an IO bound load using ASH Viewer, this will be similar to the graph you will see on ''real time'' view of the Enterprise Manager 11g
<<<
[img[picturename| https://lh3.googleusercontent.com/_F2x5WXOJ6Q8/TZ9Cp2Kc8aI/AAAAAAAABN0/1konJAJZMUo/highio-3.png]]
[img[picturename| https://lh4.googleusercontent.com/_F2x5WXOJ6Q8/TZt3yMWQUCI/AAAAAAAABLM/8d-I2RqvF3I/AASIObound.png]]
<<<
* This is the graph of the same workload using MS Excel and the script awr_topevents.sql, this will be the similar graph you will see on the ''historical'' view of the Enterprise Manager 11g
<<<
[img[picturename| https://lh6.googleusercontent.com/_F2x5WXOJ6Q8/TZ9FJ6cXxRI/AAAAAAAABN4/eWRs8SQd0ws/highio-4.png]]
<<<
As you can see from the images above and the numbers below.. the database is doing a lot of ''direct path read'' and we don't have a high load average. Although when you look at the OS statistics, from this IO intensive workload you will see high IO WAIT from the CPU.
Looking at the data below from AWR and ASH.. ''we see no discrepancies''.. now, let's compare this to the workload below where the database server is CPU bound and has a really high load average.
''AAS Data from AWR''
<<<
[img[picturename| https://lh4.googleusercontent.com/_F2x5WXOJ6Q8/TZ8-nKFw2pI/AAAAAAAABNk/oozsoEgnmeE/highio-1.png]]
<<<
''AAS Data from ASH''
<<<
[img[picturename| https://lh6.googleusercontent.com/_F2x5WXOJ6Q8/TZ8-nFhmP7I/AAAAAAAABNo/x5kIF-HuhnY/highio-2.png]]
<<<
!
! AAS behavior on a CPU bound load
This is the Enterprise Manager 11g graph of a CPU bound load
<<<
[img[picturename| https://lh5.googleusercontent.com/_F2x5WXOJ6Q8/TZt8ZxUeEUI/AAAAAAAABLY/gmclSmutRVg/AASCPUbound.png]]
<<<
This is the ASH Viewer graph of a CPU bound load
* The dark green color you see below (18:30 - 22:00) is actually the ''CPU Wait'' metric that you are seeing on the Enterprise Manager graph above
* The light green color on the end part of the graph (22:00) is the ''Scheduler wait - resmgr: cpu quantum''
* The small hump on the 16:30-17:30 time frame is the IO bound load test case
<<<
[img[picturename| https://lh3.googleusercontent.com/_F2x5WXOJ6Q8/TZ6emvgui7I/AAAAAAAABNI/fxVzQryIwKc/highcpu-4.png]]
<<<
Below are the data from AWR and ASH of the same time period ''(21:50 - 22:00)''.. see the high level and drill down numbers below
... it seems like if the database server is ''high on CPU/high on runqueue'' or the ''"wait for CPU"'' appears.. then the AAS numbers from the AWR and ASH reports don't match anymore but I would expect ASH to be bigger because it has fine grained samples of 1 second. But as you can see (below)..
* the ASH top events correctly accounted the CPU time ''(95.37 AAS)'' which was tagged as ''CPU + Wait for CPU''
* while the AWR CPU seems to be idle ''(.2 AAS)''.
And what's even more interesting is
* the high level AAS on AWR is ''356.7''
* while on the ASH it is ''329.18''
that's a huge gap! Well that could be because of
* the high DB Time ''(215947.8)'' on AWR
* compared to what Sample Count ASH has ''(197510)''.
Do you have any idea why is this happening? Interesting right?
''AAS Data from AWR''
<<<
[img[picturename| https://lh5.googleusercontent.com/_F2x5WXOJ6Q8/TZ6BdKu23hI/AAAAAAAABMw/Nuwg_qTt6m8/highcpu-1.png]]
[img[picturename| https://lh5.googleusercontent.com/_F2x5WXOJ6Q8/TZ6BdrU46FI/AAAAAAAABM4/6Inv_8_Z5dc/highcpu-2.png]]
<<<
''AAS Data from ASH''
<<<
[img[picturename| https://lh6.googleusercontent.com/_F2x5WXOJ6Q8/TZ8rp2UTbWI/AAAAAAAABNg/6VBzvJxxApM/highcpu-3.png]]
<<<
''A picture is worth a thousand words...'' - To clearly explain this behavior of ''CPU not properly accounted'' I'll show you the graph of the data samples
__''AWR Top Events with CPU "not properly" accounted''__
<<<
* This is the high level AAS we are getting from the ''DB Time/Elapsed Time'' from the AWR report across SNAP_IDs.. this output comes from the script ''awr_genwl.sql'' (AAS column - http://goo.gl/MUWr) notice that there are AAS number as high as 350 and above.. the second occurence of 350+ is from the SNAP_ID 495-496 mentioned above..
[img[picturename| https://lh5.googleusercontent.com/_F2x5WXOJ6Q8/TZ61tG_iQ0I/AAAAAAAABNY/iKAy7j4Y534/highcpu-5.png]]
* Drilling down on the AAS components of that high level AAS we have to graph the output of the ''awr_topevents.sql''... given that this is still the same workload, you see here that only the ''Direct Path Read'' is properly accounted and when you look at the CPU time it seems to be idle... thus, giving lower AAS than the image above..
* Take note that SNAP_ID 495 the AWR ''CPU'' seems to be idle (.2 AAS) which is what is happening on this image
* Also on the 22:00 period, the database stopped waiting on CPU and started to wait on ''Scheduler''.. and then it matched again the high level AAS from the image above (AAS range of 320).. Interesting right?
[img[picturename| https://lh5.googleusercontent.com/_F2x5WXOJ6Q8/TZ53u_cLWLI/AAAAAAAABMY/9QP2C4S7AUI/highcpu-6.png]]
* We will also have this same behavior on Enterprise Manager 11g when we go to the ''Top Activity page'' and change the ''Real Time'' to ''Historical''... see the similarities on the graph from MS Excel? So when you go ''Real Time'' you are actually pulling from ASH.. then when you go ''Historical'' you are just pulling the Top Timed events across SNAP_IDs and graphing it.. but when you have issues like CPU time not properly accounted you'll see a really different graph and if you are not careful and don't know what it means you may end up with bad conclusions..
[img[picturename| https://lh5.googleusercontent.com/_F2x5WXOJ6Q8/TZ6fz5UzkVI/AAAAAAAABNM/9xL8IukSM4A/highcpu-10.png]]
<<<
__''AWR Top Events with CPU "properly" accounted''__
<<<
* Now, this is really interesting... the graph shown below is from the ''Performance page'' and is also ''Historical'' but produced a different graph from the ''Top Activity page''...
* Why and how did it account for the ''CPU Wait''? where did it pull the data that the ''Top Activity page'' missed?
* This is an improvement in the Enterprise Manager! So I'm curious how is this happening...
[img[picturename| https://lh3.googleusercontent.com/_F2x5WXOJ6Q8/TZ6ogMsAp0I/AAAAAAAABNQ/b9dTIxATxoY/highcpu-11.png]]
<<<
__''ASH with CPU "properly" accounted (well.. I say, ALWAYS!)''__
From the graph above & below where the CPU is properly accounted, you see the AAS is consistent at the range of 320..
[img[picturename| https://lh6.googleusercontent.com/_F2x5WXOJ6Q8/TZ53uvK1xkI/AAAAAAAABMU/7HThzn4uoEo/highcpu-7.png]]
What makes ASH different is the proper accounting of the ''CPU'' AAS component unlike the chart coming from awr_topevents.sql (mentioned on the AWR Top Events with CPU "not properly" accounted) where there's no CPU accounted at all... this could be the problem of DBA_HIST_SYS_TIME_MODEL - ''DB CPU'' metric that when the database server is high on runqueue and there are already scheduling issues in the OS the ''ASH is even more reliable'' on accounting all the CPU time..
Another thing that bothers me is why is it that the ''DB Time'' when applied to the AAS formula gives much higher AAS value than of the ASH? so that could also mean that ''the DB Time is another reliable source'' if the database server is high on runqueue..
If this is the case, from a pure AWR perspective... what I would do is have the output of ''awr_genwl.sql''.. then run the ''awr_topevents.sql''..
and then if I would see that my AAS is high on awr_genwl.sql with a really high "OS Load" and "CPU Utilization" and then if I compare it with the output of awr_topevents.sql and see a big discrepancy that would give me an idea that I'm experiencing the same issue mentioned here, and I would investigate further with the ASH data to solidify my conclusions..
If you are curious about the output of Time model statistics on SNAP_ID 495-496
the CPU values found here does not help either because they have low values..
{{{
DB CPU = 126.70 sec
BG CPU = 4.32 sec
OS CPU (osstat) = 335.71 sec
Statistic Name Time (s) % of DB Time
------------------------------------------ ------------------ ------------
sql execute elapsed time 215,866.2 100.0
DB CPU 126.7 .1
parse time elapsed 62.8 .0
hard parse elapsed time 60.0 .0
PL/SQL execution elapsed time 33.9 .0
hard parse (sharing criteria) elapsed time 9.7 .0
sequence load elapsed time 0.6 .0
PL/SQL compilation elapsed time 0.2 .0
connection management call elapsed time 0.0 .0
repeated bind elapsed time 0.0 .0
hard parse (bind mismatch) elapsed time 0.0 .0
DB time 215,947.9
background elapsed time 1,035.5
background cpu time 4.3
-------------------------------------------------------------
}}}
''Now we move on by splitting the ASH AAS components into their separate areas..''
* the ''CPU''
* and ''USER IO''
see the charts below..
This just shows that there is something about ASH properly accounting the ''CPU + WAIT FOR CPU'' whenever the database server is high on runqueue or OS load average... as well as the ''DB Time''
[img[picturename| https://lh6.googleusercontent.com/_F2x5WXOJ6Q8/TZ53wDeLd4I/AAAAAAAABMc/G5lodk6IAqE/highcpu-8.png]]
this is the ''USER IO'' AAS.. same as what is accounted in awr_topevents.sql
[img[picturename| https://lh3.googleusercontent.com/_F2x5WXOJ6Q8/TZ53wKTIMVI/AAAAAAAABMg/dAihs-LYGfY/highcpu-9.png]]
So the big question for me is...
How does ASH and the Enterprise Manager performance page account for the "CPU + WAIT FOR CPU"? even if you drill down on the V$ACTIVE_SESSION_HISTORY you will not find this metric. So I'm really interested on where they pull the data.. :)
''update''... and then I asked a couple of people, and I had a recent problem on a client site running on Exadata where I was troubleshooting their ETL runs. I was running 10046 for every run and found out that my unaccounted-for time is due to the CPU wait that is shown on this tiddler. So using Mr. Tools, and given that I'm having a similar workload.. I had an idea that the unaccounted-for time is the CPU wait. See the write up here http://www.evernote.com/shard/s48/sh/3ccc1e38-b5ef-46f8-bc75-371156ade4b3/69066fa2741f780f93b86af1626a1bcd , and I was right all along ;)
''AAS investigation updates: Answered questions + bits of interesting findings''
http://www.evernote.com/shard/s48/sh/b4ecaaf2-1ceb-43ea-b58e-6f16079a775c/cb2e28e651c3993b325e66cc858c3935
''I've updated the awr_topevents.sql script to show CPU wait to solve the unnaccounted DB Time issue'' see the write up on the link below:
awr_topevents_v2.sql - http://www.evernote.com/shard/s48/sh/a64a656f-6511-4026-be97-467dccc82688/de5991c75289f16eee73c26c249a60bf
Thanks to the following people for reading/listening about this research, and for the interesting discussions and ideas around this topic:
- Kyle Hailey, Riyaj Shamsudeen, Dave Abercrombie, Cary Millsap, John Beresniewicz
''Here's the MindMap of the AAS investigation'' http://www.evernote.com/shard/s48/sh/90cdf56f-da52-4dc5-91d0-a9540905baa6/9eb34e881a120f82f2dab0f5424208bf

<<<
Active Session History (ASH) performed an emergency flush. This may mean that ASH is undersized. If emergency flushes are a recurring issue, you may consider increasing ASH size by setting the value of _ASH_SIZE to a sufficiently large value. Currently, ASH size is 16777216 bytes. Both ASH size and the total number of emergency flushes since instance startup can be monitored by running the following query:
select total_size,awr_flush_emergency_count from v$ash_info;
<<<

Visualizing Active Session History (ASH) Data With R http://structureddata.org/2011/12/20/visualizing-active-session-history-ash-data-with-r/
also talks about TIME_WAITED – micro, only the last sample is fixed up, the others will have TIME_WAITED=0
thanks to John Beresniewicz for this info. http://dboptimizer.com/2011/07/20/oracle-time-units-in-v-views/

DAVE ABERCROMBIE research on AAS and ASH
http://aberdave.blogspot.com/search?updated-max=2011-04-02T08:09:00-07:00&max-results=7

{{{
ASM considerations on SinglePath and MultiPath across versions (OCR,VD,DATA)
In general you gotta have a facility/mechanism for:
* multipathing -> persistent naming -> ASM
on 10gR2, 11gR1 for your OCR and VD you must use the following:
*
* clustered filesystem (OCFS2) or NFS
* raw devices (RHEL4) or udev (RHEL5)
on 11gR2, for your OCR and VD you must use the following:
*
* clustered filesystem or NFS
* ASM (mirrored at least 3 disks)
-----------------------
Single Path
-----------------------
If you have ASMlib you will go with this setup
*
* ASMlib -> ASM"
If you don't have ASMlib and Powerpath you will go with this setup
*
* 10gR2 and 11g
* raw devices
* udev -> ASM
* 11gR2
* udev -> ASM
-----------------------
Multi Path
-----------------------
If you have ASMlib and Powerpath you will go with this setup
*
* 10gR2, 11g, 11gR2
* "powerpath -> ASMlib -> ASM"
If you don't have ASMlib and Powerpath you will go with this setup
*
* 10gR2
* "dm multipath (dev mapper) -> raw devices -> ASM"
* 11g and 11gR2
* "dm multipath (dev mapper) -> ASM"
you can also be flexible and go with
*
* "dm multipath (dev mapper) -> ASMlib -> ASM"
-----------------------
Notes
-----------------------
kpartx confuses me..just do this..
- assign and share luns on all nodes.
- fdisk the luns and update partition table on all nodes
- configure multipath
- use </dev/mapper/<mpath_alias>
- create asm storage using above devices
https://forums.oracle.com/forums/thread.jspa?threadID=2288213
}}}
http://www.evernote.com/shard/s48/sh/0012dbf5-6648-4792-84ff-825a363f68d3/a744de57fdb99349388e21cdd9c6059a

http://www.pythian.com/news/1078/oracle-11g-asm-diskgroup-compatibility/
http://www.freelists.org/post/oracle-l/Does-ocssdbin-started-from-11gASM-home-support-diskgroups-mounted-by-10g-ASM-instance,5
{{{
Hi Sanjeev,
I'd like to clear some info first.
1st)... the ocssd.bin
the CSS is created when:
- you use ASM as storage
- when you install Clusterware (RAC, but Clusterware has its separate
home already)
For Oracle Real Application Clusters installations, the CSS daemon
is installed with Oracle Clusterware in a separate Oracle home
directory (also called the Clusterware home directory). For
single-node installations, the CSS daemon is installed in and runs
from the same Oracle home as Oracle Database.
You could identify the Oracle home directory being used to run the CSS daemon:
# cat /etc/oracle/ocr.loc
The output from this command is similar to the following:
[oracle@dbrocaix01 bin]$ cat /etc/oracle/ocr.loc
ocrconfig_loc=/oracle/app/oracle/product/10.2.0/asm_1/cdata/localhost/local.ocr
local_only=TRUE
The ocrconfig_loc parameter specifies the location of the Oracle
Cluster Registry (OCR) used by the CSS daemon. The path up to the
cdata directory is the Oracle home directory where the CSS daemon is
running (/oracle/app/oracle/product/10.2.0/asm_1 in this example). To
confirm you could grep the css deamon and see that it's running on
that home
[oracle@dbrocaix01 bin]$ ps -ef | grep -i css
oracle 4950 1 0 04:23 ? 00:00:00
/oracle/app/oracle/product/10.2.0/asm_1/bin/ocssd.bin
oracle 5806 5609 0 04:26 pts/1 00:00:00 grep -i css
Note:
If the value of the local_only parameter is FALSE, Oracle Clusterware
is installed on this system.
2nd)... ASM and Database compatibility
I'll supply you with some references..
Note 337737.1 Oracle Clusterware - ASM - Database Version Compatibility
Note 363254.1 Applying one-off Oracle Clusterware patches in a mixed
version home environment
and Chapter 4, page 116-120 of Oracle ASM (under the hood & practical
deployment guide) 10g & 11g
In the book it says that there are two types of compatibility settings
between ASM and the RDBMS:
1) instance-level software compatibility settings
- the COMPATIBLE parameter (mine is 10.2.0), this defines what
software features are available to the instance. Setting the
COMPATIBLE parameter in the ASM instance
to 10.1 will not enable you to use 11g ASM new features (variable
extents, etc.)
2) diskgroup-specific settings
- COMPATIBLE.ASM and COMPATIBLE.RDBMS which are persistently stored
in the ASM diskgroup metadata..these compatibility settings are
specific to a diskgroup and control which
attributes are available to the ASM diskgroup and which are
available to the database.
- COMPATIBLE.RDBMS, which defaults to 10.1 in 11g, is the minimum
COMPATIBLE version setting of a database that can mount the
diskgroup.. once you advanced it, it cannot be reversed
- COMPATIBLE.ASM, which controls the persistent format of the on-disk
ASM metadata structures. The ASM compatibility defaults to 10.1 in 11g
and must always be greater than or equal to the RDBMS compatibility
level.. once you advanced it, it cannot be reversed
The combination of the compatibility parameter setting of the
database, the software version of the database, and the RDBMS
compatibility setting of a diskgroup determines whether a database
instance is permitted to mount a given diskgroup. The compatibility
setting also determines which ASM features are available for a
diskgroup.
An ASM instance can support different RDBMS clients with different
compatibility settings, as long as the database COMPATIBLE init.ora
parameter setting of each database instance is greater than or equal
to the RDBMS compatibility of all diskgroups.
You could also read more here...
http://download.oracle.com/docs/cd/B28359_01/server.111/b31107/asmdiskgrps.htm#CHDDIGBJ
So the following info will give us some background on your environment
cat /etc/oracle/ocr.loc
ps -ef | grep -i css
cat /etc/oratab
select name, group_number, value from v$asm_attribute order by 2;
select db_name, status,software_version,compatible_version from v$asm_client;
select name,compatibility, database_compatibility from v$asm_diskgroup;
I hope I did not confuse you with all of this info.
- Karl Arao
http://karlarao.wordpress.com
}}}

#
However, the Unbreakable Enterprise Kernel is optional,
and Oracle Linux continues to include a Red Hat compatible kernel, compiled directly from Red Hat
Enterprise Linux source code, for customers who require strict RHEL compatibility. Oracle also
recommends the Unbreakable Enterprise Kernel when running third party software and third party
hardware.
# Performance improvements
latencytop?
# ASMlib and virtualization modules in the kernel
Updated Kernel Modules
The Unbreakable Enterprise Kernel includes both OCFS2 1.6 as well as Oracle ASMLib, the kernel
driver for Oracle’s Automatic Storage Management feature. There is no need to install separate RPMs
to implement these kernel features. Also, the Unbreakable Enterprise Kernel can be run directly on
bare metal or as a virtual guest on Oracle VM, both in hardware virtualized (HVM) and paravirtualized (PV) mode, as it implements the paravirt_ops instruction set and includes the xen_netfront and
xen_blkfront drivers.
#
Unbreakable Enterprise Kernel itself already includes ocfs2 and oracleasm
Questions:
1) Since it will be a new kernel, what if I have a third party module like EMC Powerpath? I'm sure ill have to reinstall it once I use the new
kernel. But, once reinstalled.. will it be certified with EMC (or vice versa)?
2) Also, Oracle says, if you have to maintain compatibility with a third party module. You can use the old vanilla kernel. Questions is, since the
ASMlib module is already integrated on the Unbreakable Kernel, once I use the non-Unbreakable kernel do they also have the old style RPM
(oracleasm-`uname -r` - kernel driver) for having the ASMlib module?
OR
if it's not supported at all and I'm
ASMLIB has three components.
1. oracleasm-support - user space shell scripts
2. oracleasmlib - user space library (closed source)
3. oracleasm-`uname -r` - kernel driver <-- kernel dependent
###############################################################################################3

-- from this thread http://www.freelists.org/post/oracle-l/ASM-and-EMC-PowerPath
!
! The Storage Report (ASM -> Linux -> EMC)
Below is a sample storage info that you should have, it clearly shows the relationship from the Oracle layer (ASM), Linux, and SAN storage. This info is very useful for you and the storage engineer. So you would know which is which in case of catastrophic problems..
Very useful for storage activities like:
* SAN Migration
* Add/Remove disk
* Powerpath upgrade
* Kernel upgrade
//(Note: The images below might be too big on your current screen resolution, to have a better view just right click and download the images or ''double click'' on this page to see the full path of the images..)//
[img[picturename| https://lh6.googleusercontent.com/_F2x5WXOJ6Q8/TaFARe_nqfI/AAAAAAAABOI/jXAshWxpfw8/powerpath1.png]]
[img[picturename| https://lh6.googleusercontent.com/_F2x5WXOJ6Q8/TaFARSJDiLI/AAAAAAAABOE/SoDU7jrddUQ/powerpath2.png]]
[img[picturename| https://lh3.googleusercontent.com/_F2x5WXOJ6Q8/TaFAUat0HcI/AAAAAAAABOM/qe5qoeF3wTw/powerpath3.png]]
!
! What info do you need to produce the report?
''You need the following:''
* AWR time series output (my scripts http://karlarao.wordpress.com/scripts-resources)
* output of the command ''powermt display dev=all'' (run as root)
* RDA
* SAR (because I just love looking at the performance data)
* sysreport (run as root)
''You have to collect'' this on each server / instance and properly arrange them per folder so you won't have a hard time documenting the bits of info you need on the Excel sheet
[img[picturename| https://lh4.googleusercontent.com/_F2x5WXOJ6Q8/TaFK4F_SexI/AAAAAAAABPQ/VjSQm0_uUUM/powerdevices4.png]]
''Below is the drill down on each folder'', the data you'll see is from a separate two RAC clusters.. each with it's own SAN storage.. the project I'm working on here is to migrate/consolidate them into a single SAN storage (newly purchased). So I need to collect all these data to help on planning the activity and mitigate the risks/issues. Also the collection of performance data is a must to verify if the IO requirements of the databases can be handled by the new SAN. On this project I have verified that the Capacity exceeds the current requirements.
* AWR
<<<
per server
[img[picturename| https://lh6.googleusercontent.com/_F2x5WXOJ6Q8/TaFDmcOnhBI/AAAAAAAABOk/lxo8_tbLqX4/powerdevices5-awr.png]]
> per instance
> [img[picturename| https://lh3.googleusercontent.com/_F2x5WXOJ6Q8/TaFKbFENV1I/AAAAAAAABPE/nUCFo_HOjHY/powerdevices5-awr2.png]]
>> awr output on each instance
>> [img[picturename| https://lh5.googleusercontent.com/_F2x5WXOJ6Q8/TaFKbZB6nnI/AAAAAAAABPI/8MVhDN5Q_rI/powerdevices5-awr3.png]]
<<<
* powermt display dev=all
<<<
[img[picturename| https://lh5.googleusercontent.com/_F2x5WXOJ6Q8/TaFDmG2XayI/AAAAAAAABOg/0Lo8QoDbm_A/powerdevices6-powermt.png]]
<<<
* RDA
<<<
[img[picturename| https://lh3.googleusercontent.com/_F2x5WXOJ6Q8/TaFDmsNLT2I/AAAAAAAABOs/sHa-KUryYFo/powerdevices7-rda.png]]
<<<
* SAR
<<<
[img[picturename| https://lh6.googleusercontent.com/_F2x5WXOJ6Q8/TaFDmcOnhBI/AAAAAAAABOk/lxo8_tbLqX4/powerdevices5-awr.png]]
> sample output
> [img[picturename| https://lh3.googleusercontent.com/_F2x5WXOJ6Q8/TaFDmyLyKAI/AAAAAAAABOw/f5UgyqVu09I/powerdevices8-sar.png]]
<<<
* sysreport
<<<
[img[picturename| https://lh4.googleusercontent.com/_F2x5WXOJ6Q8/TaFDm08TuYI/AAAAAAAABO0/bZzSwZX6Vqc/powerdevices8-sysreport.png]]
> sample output
> [img[picturename| https://lh4.googleusercontent.com/_F2x5WXOJ6Q8/TaFDnQpFRHI/AAAAAAAABO4/SiDKdZE9kOY/powerdevices8-sysreport2.png]]
<<<
!
! Putting it all together
On the Excel sheet, you have to fill in the following sections
* From RDA
** ASM Library Information
** ASM Library Disk Information
** Disk Partitions
** Operating System Setup->Operating System Packages
** Operating System Setup->Disk Drives->Disk Mounts
** Oracle Cluster Registry (Cluster -> Cluster Information -> ocrcheck)
* From ''powermt'' command
** Logical Device IDs and names
* From sysreport
** raw devices (possible for OCR and Voting Disk)
** fstab (check for OCFS2 mounts)
* Double check from OS commands
** Voting Disk (''crsctl query css votedisk'')
** ls -l /dev/
** /etc/init.d/oracleasm querydisk <device_name>
''Below are the output from the various sources...'' this will show you how to map the ''ASM disk'' to a particular ''EMC power device'' (follow the ''RED ARROWS'').. you have to do it on all "ASM disks" and the method will also be the same on accounting the ''raw devices'', ''OCFS2'', and ''OCR'' for their mapping on their respective EMC power devices..
To do the correlated report of the ASM, Linux, and SAN storage.. follow the ''BLUE ARROWS''..
You will also see below that having this proper accounting and correlating it from the ASM, Linux, and EMC storage level you will never go wrong and you have the definitive information that you can share with the EMC Storage Engineer which they can also ''double check''.. in that way both the ''DBAs and the Storage guys will be on the same page''.
[img[picturename| https://lh3.googleusercontent.com/_F2x5WXOJ6Q8/TaFvF-b2-tI/AAAAAAAABPk/cdnpkq6yLeg/emcreport10.png]]
Notice above that the ''emcpowerr'' and ''emcpowers'' have no allocations, so what does that mean? can we allocate these devices now? ... mm ''no!'' ... ''stop''...''move back''... ''think''...
I will do the following:
* Run this query to check if it's recognized as ''FOREIGN'' or ''CANDIDATE''
{{{
set lines 400
col name format a20
col label format a20
col path format a20
col redundancy format a20
select a.group_number, a.name, a.header_status, a.mount_status, a.state, a.total_mb, a.free_mb, a.label, path, a.redundancy
from v$asm_disk a
order by 1,2;
GROUP_NUMBER NAME HEADER_STATU STATE TOTAL_MB FREE_MB LABEL PATH REDUNDANCY
------------ -------------------- ------------ -------- ---------- ---------- -------------------- -------------------- --------------------
}}}
* I've done some precautions on my data gathering by checking on the ''fstab'' and ''raw devices config'' and found out that ''there are no pointers to the two devices''..
** I have an obsessive–compulsive tendencies just to make sure that these devices are not used by some services. If accidentally these EMC power devices were used for something else let's say as a filesystem.. Oracle will still allow you to do the ADD/DROP operation on these devices wiping out all the data on those devices!
* Another thing I would do is validate it with my storage engineer or the in-house DBA if these disks exist for the purpose of expanding the disk group.
If everything is okay. I can safely say they are candidate disks for expanding the space of my current disk group and go ahead with the activity.
!
! From Matt Zito (former EMC solutions architect)
<<<
Hey guys,
I haven't gotten this email address straightened out on Oracle-L yet, but I figured I'd drop you a note, and you could forward it on to the list if you cared to.
The doc you read is correct, powerpath will cheerfully work with any of the devices you send IOs to, because the kernel driver intercepts requests for all devices and routes them through itself before dishing them down the appropriate path.
However, setting scandisks to the emcpower has the administrative benefits of making sure the disks don't show up twice. However, even if ASM picks the first of the two disks, it will still be load-balanced successfully.
Thanks,
Matt Zito
(former EMC solutions architect)
<<<

''10mins AWR snap interval, 144 samples in a day, 1008 samples in 7days, 4032 samples in 4weeks, 52560 samples in 1year''
''Good chapter on HOW to read AWR reports'' http://filezone.orapub.com/FF_Book/v4Chap9.pdf
{{{
Understand each field of AWR (Doc ID 884046.1)
AWR report is broken into multiple parts.
1)Instance information:-
This provides information the instance name , number,snapshot ids,total time the report was taken for and the database time during this elapsed time.
Elapsed time= end snapshot time - start snapshot time
Database time= Work done by database during this much elapsed time( CPU and I/o both add to Database time).If this is lesser than the elapsed time by a great margin, then database is idle.Database time does not include time spend by the background processes.
2)Cache Sizes : This shows the size of each SGA region after AMM has changed them. This information
can be compared to the original init.ora parameters at the end of the AWR report.
3)Load Profile: This important section shows important rates expressed in units of per second and
transactions per second.This is very important for understanding how is the instance behaving.This has to be compared to base line report to understand the expected load on the machine and the delta during bad times.
4)Instance Efficiency Percentages (Target 100%): This section talks about how close are the vital ratios like buffer cache hit, library cache hit,parses etc.These can be taken as indicators ,but should not be a cause of worry if they are low.As the ratios cold be low or high based in database activities, and not due to real performance problem.Hence these are not stand alone statistics, should be read for a high level view .
5)Shared Pool Statistics: This summarizes changes to the shared pool during the snapshot
period.
6)Top 5 Timed Events :This is the section which is most relevant for analysis.This section shows what % of database time was the wait event seen for.Till 9i, this was the way to backtrack what was the total database time for the report , as there was no Database time column in 9i.
7)RAC Statistics :This part is seen only incase of cluster instance.This provides important indication on the average time take for block transfer, block receiving , messages ., which can point to performance problems in the Cluster instead of database.
8)Wait Class : This Depicts which wait class was the area of contention and where we need to focus.Was that network, concurrency, cluster, i/o Application, configuration etc.
9)Wait Events Statistics Section: This section shows a breakdown of the main wait events in the
database including foreground and background database wait events as well as time model, operating
system, service, and wait classes statistics.
10)Wait Events: This AWR report section provides more detailed wait event information for foreground
user processes which includes Top 5 wait events and many other wait events that occurred during
the snapshot interval.
11)Background Wait Events: This section is relevant to the background process wait events.
12)Time Model Statistics: Time mode statistics report how database-processing time is spent. This
section contains detailed timing information on particular components participating in database
processing.This gives information about background process timing also which is not included in database time.
13)Operating System Statistics: This section is important from OS server contention point of view.This section shows the main external resources including I/O, CPU, memory, and network usage.
14)Service Statistics: The service statistics section gives information services and their load in terms of CPU seconds, i/o seconds, number of buffer reads etc.
15)SQL Section: This section displays top SQL, ordered by important SQL execution metrics.
a)SQL Ordered by Elapsed Time: Includes SQL statements that took significant execution
time during processing.
b)SQL Ordered by CPU Time: Includes SQL statements that consumed significant CPU time
during its processing.
c)SQL Ordered by Gets: These SQLs performed a high number of logical reads while
retrieving data.
d)SQL Ordered by Reads: These SQLs performed a high number of physical disk reads while
retrieving data.
e)SQL Ordered by Parse Calls: These SQLs experienced a high number of reparsing operations.
f)SQL Ordered by Sharable Memory: Includes SQL statements cursors which consumed a large
amount of SGA shared pool memory.
g)SQL Ordered by Version Count: These SQLs have a large number of versions in shared pool
for some reason.
16)Instance Activity Stats: This section contains statistical information describing how the database
operated during the snapshot period.
17)I/O Section: This section shows the all important I/O activity.This provides time it took to make 1 i/o say Av Rd(ms), and i/o per second say Av Rd/s.This should be compared to the baseline to see if the rate of i/o has always been like this or there is a diversion now.
18)Advisory Section: This section show details of the advisories for the buffer, shared pool, PGA and
Java pool.
19)Buffer Wait Statistics: This important section shows buffer cache waits statistics.
20)Enqueue Activity: This important section shows how enqueue operates in the database. Enqueues are
special internal structures which provide concurrent access to various database resources.
21)Undo Segment Summary: This section gives a summary about how undo segments are used by the database.
Undo Segment Stats: This section shows detailed history information about undo segment activity.
22)Latch Activity: This section shows details about latch statistics. Latches are a lightweight
serialization mechanism that is used to single-thread access to internal Oracle structures.The latch should be checked by its sleeps.The sleepiest Latch is the latch that is under contention , and not the latch with high requests.Hence run through the sleep breakdown part of this section to arrive at the latch under highest contention.
23)Segment Section: This portion is important to make a guess in which segment and which segment type the contention could be.Tally this with the top 5 wait events.
Segments by Logical Reads: Includes top segments which experienced high number of
logical reads.
Segments by Physical Reads: Includes top segments which experienced high number of disk
physical reads.
Segments by Buffer Busy Waits: These segments have the largest number of buffer waits
caused by their data blocks.
Segments by Row Lock Waits: Includes segments that had a large number of row locks on
their data.
Segments by ITL Waits: Includes segments that had a large contention for Interested
Transaction List (ITL). The contention for ITL can be reduced by increasing INITRANS storage
parameter of the table.
24)Dictionary Cache Stats: This section exposes details about how the data dictionary cache is
operating.
25)Library Cache Activity: Includes library cache statistics which are needed in case you see library cache in top 5 wait events.You might want to see if the reload/invalidations are causing the contention or there is some other issue with library cache.
26)SGA Memory Summary:This would tell us the difference in the respective pools at the start and end of report.This could be an indicator of setting minimum value for each, when sga)target is being used..
27)init.ora Parameters: This section shows the original init.ora parameters for the instance during
the snapshot period.
There would be more Sections in case of RAC setups to provide details.
}}}
''A SQL Performance History from AWR''
http://www.toadworld.com/BLOGS/tabid/67/EntryId/125/A-SQL-Performance-History-from-AWR.aspx <-- This could also be possible to graph using my awr_topsqlx.sql
''miTrend AWR Report / StatsPack Gathering Procedures Instructions'' https://community.emc.com/docs/DOC-13949 <-- EMCs tool with nice PPT and paper, also talks about "burst" periods for IO sizing, raid adjusted IOPS, EFDs IOPS
http://pavandba.files.wordpress.com/2009/11/owp_awr_historical_analysis.pdf

http://karlarao.wordpress.com/scripts-resources/
<<<
AWR Tableau Toolkit – create your own performance data warehouse and easily characterize the workload, CPU, IO of the entire cluster (30 instances) with months of perf data in less than 1 hour updated 20120912
I no longer update this toolkit. This served as a version 1 of a more comprehensive tool called eAdam which I started with Carlos Sierra (the main developer), Frits Hoogland, and Randy Johnson at Enkitec.
<<<
see [[eAdam]]

Master Note on AWR Warehouse (Doc ID 1907335.1)
Oracle® Database 2 Day + Performance Tuning Guide 12c Release 1 (12.1)
http://docs.oracle.com/database/121/TDPPT/tdppt_awr_warehouse.htm#TDPPT145
Analyze Long-term Performance Across Enterprise Databases Using AWR Warehouse
https://apex.oracle.com/pls/apex/f?p=44785:24:0::NO:24:P24_CONTENT_ID,P24_PREV_PAGE:9887,29
blogs about it
http://www.dbi-services.com/index.php/blog/entry/oracle-oem-cloud-control-12104-awr-warehouse
http://dbakevlar.com/2014/07/awr-warehouse-status/
! the workflow
''source''
when awr warehouse is deployed on a target.. it deploys jobs, and this is an agent push only
source exports the awr dump as a job on the local filesystem, this doesn't have to be the same directory path as the warehouse
if you have a 12 month retention, it will not do it all at once, it does the export per 2GB every 3hours
it first exports the oldest snap_id
''oms''
oms does an agent to agent talk when source is transferring the dump file and then puts on the "warehouse stage directory"
oms agent checks the directory every 5 mins and if there's a new file that file gets imported to the warehouse
''warehouse''
warehouse has this ETL job and databases are partitioned per DBID
there's a mapping table to map the database to DBID
ideally if you want to put other data on the warehouse put it on another schema and views as well, because a warehouse upgrade or patch may wipe them out
as long as you have a diag pack license on the source, you are good with the warehouse license, this is pretty much like the licensing scheme of OMS
<<<
Karl Arao
kellyn, question on awr warehouse, so it is recommended to be on a separate database. and on that separate database it requires a separate diag pack license?
so let's say you have a diag and tuning pack on the source... and then you've got to have diag pack on the target awr warehouse?
thanks!
Kellyn Pot'vin
11:09am
Kellyn Pot'vin
No, you're diag pack from source db grants the limited ee license for the awrw
Does that make sense?
Karl Arao
11:09am
Karl Arao
i see
essentially it's free
because source would have diag and tuning pack anyway
Kellyn Pot'vin
11:10am
Kellyn Pot'vin
It took them while to come up with that, but same for em12c omr
Now, if you add anything, rac or data guard, then you have to license that
Karl Arao
11:11am
Karl Arao
yeap which is also the case on omr
Kellyn Pot'vin
11:12am
Kellyn Pot'vin
Exactly
Karl Arao
11:12am
Karl Arao
now, can we add anymore tables on the awr warehouse
let's say i want that to be my longer retention data for my metric extension as well
Kellyn Pot'vin
11:13am
Kellyn Pot'vin
Yes, but no additional partitioning and it may impact patches/upgrades
Karl Arao
11:13am
Karl Arao
sure
Kellyn Pot'vin
11:14am
Kellyn Pot'vin
I wouldn't do triggers or partitions
Views are cool
Karl Arao
11:16am
Karl Arao
we are going to evaluate this soon, just upgraded to r4
Kellyn Pot'vin
11:16am
Kellyn Pot'vin
Otn will post a new article of advance awr usage next week from me
Karl Arao
11:16am
Karl Arao
another question,
on my source i have 12months of data
Kellyn Pot'vin
11:17am
Kellyn Pot'vin
And know exadata team is asking how to incorporate it as part of healthcheck design
Karl Arao
11:17am
Karl Arao
will it ETL that to the warehouse
like 1 shot
Kellyn Pot'vin
11:17am
Kellyn Pot'vin
That is my focus in dev right now
Karl Arao
11:17am
Karl Arao
that's going to be 160GB of data
and with exp warehouse
there's going to be an impact for sure
Kellyn Pot'vin
11:18am
Kellyn Pot'vin
No, it has throttle and will take 2gb file loads in 3hr intervals, oldest snapshots first
Karl Arao
11:18am
Karl Arao
I'm just curious on the etl
what do you mean 3hours intervals ?
2GB to finish in 3hours
Kellyn Pot'vin
11:19am
Kellyn Pot'vin
Tgen go back to 24 hr interval auto after any catchup, same on downtime catchup
<<<

Mmm.. It's a long story, just check out this blog post.. http://karlarao.wordpress.com/2010/04/10/my-personal-wiki-karlarao-tiddlyspot-com/ :)
Also check out my Google profile here https://plus.google.com/102472804060828276067/about to know more about my web/social media presence

http://gjilevski.wordpress.com/2010/03/14/creating-oracle-11g-active-standby-database-from-physical-standby-database/
''Oracle Active Data Guard: What’s Really Under the Hood?'' http://www.oracle.com/technetwork/database/features/availability/s316924-1-175932.pdf
''Read only and vice versa''
http://www.adp-gmbh.ch/ora/data_guard/standby_read_only.html
http://juliandyke.wordpress.com/2010/10/14/oracle-11gr2-active-data-guard/
http://www.oracle-base.com/articles/11g/data-guard-setup-11gr2.php#read_only_active_data_guard
! to be in Active DG, remove "read only" step for normal managed recovery
{{{
startup mount
alter database open read only;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE disconnect;
}}}
!''Snapper on Standby Database''
<<<
On the standby site, if the database is open read only with apply you should be able to run snapper on it or do ash queries as well
Check out some commands here http://karlarao.tiddlyspot.com/#snapper
And if you want to loop it and leave it running and check the data the next day you can do this http://karlarao.tiddlyspot.com/#snapperloop (sections “snapper loop showing activity across all instances (must use snapper v4)” and “process the snap.txt file as csv input”)
Some commands you can use and things to check is attached as well. But I would start with
@snapper ash 5 1 all@*
Just to see what’s going on during the slow period
<<<
! triggers on ADG
Using Active Data Guard Reporting with Oracle E-Business Suite Release 12.1 and Oracle Database 11g (Doc ID 1070491.1)
<<<
Section 7: Database Triggers
ADG support delivers three schema level database triggers as follows:
Logon and Logoff
These triggers are a key component of the simulation testing. The logon trigger enables the read-only violation trace, whereas the logoff trigger records the actual number of violations. If these triggers are not enabled, the trace errors and V$ data are not recorded, in other words, the simulations are treated as having no errors.
Servererror
The error trigger is only executed if an ORA-16000 is raised, which is read-only violation (the trigger does nothing on the primary). The error count for the concurrent program is incremented only if standby_error_checking has been enabled as described in 4.2 General Options. If the error trigger is not enabled, report failures will not be recorded and failures will not lead to run_on_standby being disabled.
<<<
http://www.toadworld.com/platforms/oracle/b/weblog/archive/2014/04/27/oracle-apps-r12-offloading-reporting-workload-with-active-data-guard.aspx
! resource management on ADG
Configuring Resource Manager for Oracle Active Data Guard (Doc ID 1930540.1)
<<<
Configuring a resource plan on a physical standby database requires the plan to be created on primary database.
I/O Resource Manager helps multiple databases and workloads within the databases share the I/O resources on the Exadata storage. In a data guard environment, IORM can help protect the I/O latency for the redo apply I/Os from the standby database.
Critical I/Os from standby database backgrounds such as Managed Recovery Process (MRP) or Logical Standby Process (LSP) are automatically prioritized by enabling IORM on the Exadata storage. Database resource plans enabled on the standby databases are automatically pushed to the Exadata storage. Enabling IORM enforces database resource plans on the storage cells to minimize the latency for the critical redo-apply I/Os. To enable IORM, set the IORM objective to 'auto' on the Exadata storage cells.
Bug 12601274: Updates to consumer group mappings on the primary database are not reflected on the standby database. This bug is fixed in 11.2.0.4 and 12.1.0.2. On older releases, the updates are only reflected on the standby upon a restart of the standby database.
<<<

http://www.oracle-base.com/articles/11g/AwrBaselineEnhancements_11gR1.php <-- a good HOWTO
http://neerajbhatia.files.wordpress.com/2010/10/adaptive-thresholds.pdf
http://oracledoug.com/serendipity/index.php?/archives/1496-Adaptive-Thresholds-in-10g-Part-1-Metric-Baselines.html
http://oracledoug.com/serendipity/index.php?/archives/1497-Adaptive-Thresholds-in-10g-Part-2-Time-Grouping.html
http://oracledoug.com/serendipity/index.php?/archives/1498-Adaptive-Thresholds-in-10g-Part-3-Setting-Thresholds.html
http://oracledoug.com/metric_baselines_10g.pdf <-- ''GOOD STUFF''
http://oracledoug.com/adaptive_thresholds_faq.pdf <-- ''GOOD STUFF''
http://www.cmg.org/conference/cmg2007/awards/7122.pdf
http://optimaldba.com/papers/IEDBMgmt.pdf
http://www.oracle-base.com/articles/11g/AwrBaselineEnhancements_11gR1.php
Strategies for Monitoring Large Data Centers with Oracle Enterprise Manager http://gavinsoorma.com/wp-content/uploads/2011/03/monitoring_large_data_centers_with_OEM.pdf
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1525205200346930663
<<<
{{{
You Asked
In version 11g of Oracle database, there is a new feature whereby current performance data (obtained from AWR snapshots) can be compared against an AWR baseline and an alarm triggered if a given metric exceeds a certain threshold. From what I understand, there are 3 types of thresholds : fixed value, percent of maximum and significance level. The first type (fixed value) is very easy to understand - alarms are triggered whenever the metric in question exceeds certain fixed values specified for the warning and critical alerts (without reference to the baseline). The 2nd type (percent of maximum) presumably means that an alert is triggered whenever the current value of the metric exceeds the specified percent of the maximum value of the metric that was observed in the whole baseline period (if I understood this correctly - correct me if I'm wrong).
However, the 3rd type (significance level) is not at all easy to understand. The Oracle documentation is not at all clear on that point, nor could I find any Metalink notes on the subject. I also tried searching the OTN forums, to no avail. Could you please explain, in very simple terms, when exactly an alarm would be triggered if "significance level" is specified for the threshold type, if possible by giving a simple example. There are apparently 4 levels of such thresholds (high, very high, severe and extreme).
and we said...
I asked Graham Wood and John Beresniewicz for their input on this, they are the experts in this particular area
they said:
Graham Wood wrote:
> Sure,
> Copying JB as this is his specialty area, in case I don't get it right. :-)
>
> The basic idea of using significance level thresholds for alerting is that we are trying to detect outliers in the distribution of metric values, rather than setting a simple threshold value.
>
> By looking at the historical metric data from AWR we can identify values for 25th, 50th (median), 75th, 90th, 95th and 99th percentiles. Using a curve fitting algorithm we also extrapolate the 99.9th and 99.99th percentiles. We derive these percentiles based on time grouping, such as day, night, and hour of day.
>
> In the adaptive baselines feature in 11g we allow the user to specify the alert level, which equates to one of these percentile values:
> High 95th percentile
> Very High 99th percentile
> Severe 99.9th percentile
> Extreme 99.99th percentile
>
> Using the AWR history (actually the SYSTEM_MOVING_WINDOW baseline) the database will automatically determine the threshold level for a metric that corresponds to the selected significance level for the current time period.
>
> Setting a significance level of Extreme means that we would only alert on values that we would only expect to see once in a 10000 observations (approximately once in every years for hourly thresholds).
>
> Cheers, Graham
JB wrote:
Shorter answer:
---------------
The significance level thresholds are intended to produce alert threshold values for key performance metrics that represent the following:
"Automatically set threshold such that values observed above the threshold are statistically unusual (i.e. significant) at the Nth percentile based on actual data observed for this metric over the SYSTEM_MOVING_WINDOW baseline."
The premise here is that systems with relatively stable performance characteristics should show statistical stability in core performance metric values, and when unusual but high-impact performance events occur we expect these will be reflected in highly unusual observations in one or more (normally statistically stable) metrics. The significance level thresholds give users a way to specify alerting in terms of "how unusual" rather than "how much".
Longer (original) reply:
-----------------------------
Hi Tom -
Graham did a pretty good job, but I'll add some stuff.
Fixed thresholds are set explicitly by user, and change only when user unsets or sets a different threshold. They are based entirely on user understanding of the underlying metrics in relation to the underlying application and workload. This is the commonly understood paradigm for detecting performance issues: trigger an alert when metric threshold is crossed. There are numerous issues we perceived with this basic mechanism:
1) "Performance" expectations, and thus alert thresholds, often vary by application, workload, database size, etc. This results in what I call the MxN problem, which is that M metrics over N systems becomes MxN threshold decisions each of which can be very specific (i.e. threshold decisions not transferable.) This is potentially very manually intensive for users with many databases.
2) Workload may vary predictably on system (e.g. online day vs. batch night) and different performance expectations (and thus alert thresholds) may pertain to different workloads, so one threshold for all workloads is inappropriate.
3) Systems evolve over time and thresholds applicable for the system supporting 1,000 users may need to be altered when system supports 10,000 users.
The adaptive thresholds feature tries to address these issues as follows:
A) Thresholds are computed by the system based on a context of prior observations of this metric on this system. System-and-metric-specific thresholds are developed without obliging user to understand the specifics (helps relieve the MxN problem.)
B) Thresholds are periodically recomputed using statistical characterizations of metric values over the SYSTEM_MOVING_WINDOW baseline. Thus the thresholds adapt to slowly evolving workload or demand, as the moving window moves forward.
C) Metric statistics for adaptive thresholds are computed over grouping buckets (which we call "time groups") that can accommodate the common workload periodicities (day/night, weekday/weekend, etc.) Thresholds resets can happen as frequently as every hour.
So the net-net is that metric alert thresholds are determined and set automatically by the system using actual metric observations as their basis and using metric-and-system-independent semantics (significance level or pct of max.)
JB
From Tom - Thanks both!
}}}
<<<
''Oracle By Example:''
http://docs.oracle.com/cd/E11882_01/server.112/e16638/autostat.htm#CHDHBGJD
metric baseline http://www.oracle.com/webfolder/technetwork/tutorials/demos/db/10g/r2/metric_baselines.viewlet/metric_baselines_viewlet_swf.html
create baseline http://www.oracle.com/webfolder/technetwork/tutorials/demos/db/11g/r2/11gr2_baseline/11gr2_baseline_viewlet_swf.html
OEM system monitoring http://www.oracle.com/webfolder/technetwork/tutorials/obe/em/emgc10gr2/quick_start/system_monitoring/system_monitoring.htm
**Creating the Monitoring Template
**Creating the User-Defined Metrics
**Setting the Metric Baseline
SQL baseline http://www.oracle.com/webfolder/technetwork/tutorials/demos/db/10g/r2/sql_baseline.viewlet/sql_baseline_viewlet_swf.html
''Proactive Database Monitoring'' http://docs.oracle.com/cd/B28359_01/server.111/b28301/montune001.htm
''15 User-Defined Metrics'' http://docs.oracle.com/cd/B16240_01/doc/em.102/e10954/udm2.htm
''3 Cluster Database'' http://docs.oracle.com/cd/B19306_01/em.102/b25986/rac_database.htm
http://oracledoug.com/serendipity/index.php?/archives/1302-Oracle-Workload-Metrics.html
http://oracledoug.com/serendipity/index.php?/archives/1470-Time-Matters-Throughput-vs.-Response-Time.html
http://docs.oracle.com/cd/B14099_19/manage.1012/b16241/Monitoring.htm#sthref333
http://carymillsap.blogspot.com/2008/12/performance-as-service-part-2.html
-- notes and ideas about R2 and adaptive thresholds
[img[ https://lh5.googleusercontent.com/-JNRShrEzpiQ/T4W3xfgGzII/AAAAAAAABiY/VBPKaiA-zus/s800/AdaptiveThresholds.JPG ]]
http://en.wikipedia.org/wiki/Control_chart
http://en.wikipedia.org/wiki/Exponential_smoothing
http://www.sciencedirect.com/science/article/pii/S0169207003001134
http://prodlife.wordpress.com/2013/10/14/control-charts/