I think I’ve posted before about how deep a good DBA should dig into solving issues, as opposed to fixing them as soon as possible and moving on to the next urgent task.

Well, a friend of mine, Neil Chandler, has just posted on this topic, giving his reasons why you don’t run a 10046 trace on production. Neil raises some good points about how difficult it can be to get permission to do something as intrusive as a 10046 trace on a production system as well as the fact that most problems can be solved way before you get down to the level of tracing. Especially if it is not your job to go around solving the problems that have stumped the in-house team, which is the lot of many people who are recognised as being very good with Oracle.

That leads me onto a slightly different topic. For every one of those technicians who’s names you know there are at least a dozen who are just as good but not as visible. Some of us choose to make more “noise” {blogging, presenting, writing articles} so we become visible. Others also support the user community but in a less noticeable way, some even positively choosing to keep a lower profile. Neil is one of those. He’s a very good Oracle and SQL*Server DBA and also very knowledgeable about Unix OS’s and SANs/storage {though he would maybe argue he is not – don’t believe him}. When he comes along to the pub his is an opinion worth listening to {once he’s finished demeaning me that is, and I’m sure he will give me a hard time about this article about him next time we meet} and when I’ve asked his opinion he has never failed to help. Neil also supports the UKOUG, he’s been deputy chair of one of the SIGs since it’s inception – but he refuses to be chair, has kept away from presenting and he keeps in the background.

IOT 6 part A covered the sort of insert work used to create an IOT or push a large number of rows into it via an “insert into…select from” type format. ie Bulk Load.
In this, part B, I am going to test inserting a small number of rows (10,000) into a table already holding a million rows. Something more like an hourly load or the inserts that would occur over a working day – a more OLTP-like test

I’m going to do the tests against four things. A “non-insert” test to ensure my data creation is not taking a lot of resource {the Null test}, then against my three versions of test table – a normal heap table, my IOT and my Partitioned IOT. I run each test on each segment type three times, the minimum number needed to help identify variation that could invalidate the test.

You will probably be surprised by the results….

All three tables have the same column structure, the heap has indexes on the primary key and a unique constraint, the IOT and partitioned IOT are organized on a modified version of the Primary key and have the second index for the unique key. Below is the creation statement for the partitioned IOT (the IOT is the same without the partition clause). You can get full creation details from IOT part 2:

Insert 10,000 records for random accounts, in time order for the same day. This would be like processing data for or during a “day”

Generating the 10,000 records in order of account, to better match the IOT structure, again for a single day. For any given account 0,1,2 or 3 records are created, to be a little more “real life”. This would be like loading a pre-processed set of data.

All 10,000 records are created for the one account, in datetime order, as an extreme example of inserting into an IOT data that is ordered to perfectly match the order of the IOT.

The code is written ‘badly’, ie a single record is prepared and inserted into the table at a time, within a loop of 10000 iterations. This is to avoid the bulk-processing that may benefit the heap table and also to better match the reality of an OLTP system. I do however only commit after all records are created for any given test, which is maybe a mistake but that’s what I did.

Now, before I go any further, does anyone want to guess by how much inserting into the heap table will be faster than inserting into the IOT (partitioned or otherwise)? Also, whether the partitioned IOT will be faster or slower than the IOT and what, if any, impact there will be by the ordering of data?

To allow you to think about that and not see the results by accident, I’ll stick in one of the insert harnesses here so you can see how I am doing it. Skip over if you just want the results

First, the preparation of the data and cycling around the loop, the no-insert tests, are negligible. {The first no_insert test is taking 2 seconds due to the parsing and collecting dictionary information overhead of the first thing done}. We can ignore the overhead of the test harness.

Much more significantly:

Each iteration of each test, things get faster {with one odd exception}

For Random and semi-ordered inserts, inserting into the IOT is slower than the Heap by a factor of 300% to 400% (3 times to 4 times slower) – what we all probably expected.

Inserting into the IOT is Faster than the heap when the insert order matches the structure of the IOT in the perfect_IOT test. Not just the same but 40% faster

The partitioned IOT is as Fast or Faster than the the Heap table, never mind the IOT. Over all the partitioned IOT takes 50% TO 75% THE TIME OF THE HEAP to be populated.

Hands up everyone expecting this? NB My hand would be down if I had not done similar tests before.

Why do we see these results?

I’ll just mention that the reason all the tests get quicker with each iteration is that the computer is doing the same thing over and over again and gets used to it, the relevant circuits in the chip are “oiling in”. Of course not, how silly – but it seems to be how non-IT people think of it🙂 Basically more and more of each segment is getting cached and that reduces physical IO and speed things up. You see this a lot in testing and you need to be constantly vigilant about it. You might think I should really run the tests until they are steady-state, but that is a very false situation. On real system you don’t tend to get the same thing run over and over and over again on a system with nothing else running.

Much more interestingly, why is the partitioned IOT winning? Well, for the Heap table data is being poured into the leading edge of the table, which takes a little processing and some blocks to be created. However, there are also two indexes being maintained, the Primary Key and that Unique key. For each insert oracle has to find the right spot in each index (root block ->branch block->leaf block->check for row) and insert the new record, occasionally splitting a full leaf block.

For the IOT and the Partitioned IOT there is no table to maintain. In effect there are just two indexes to maintain. For the normal IOT this benefit is outweighed by the cost of working with a less space efficient Primary Key index. As there is the whole of the row in each index entry there are fewer index entries possible per leaf block and so more block need to be read in from disc and more splits are needed. Thus the inserts into the IOT are slower.
But for the partitioned IOT not only is there no table to maintain but the table (really the Primary Key) is partitioned – on date. We are creating data for a single day at a time. All the relevant blocks will be in the one partition (and it’s the latest and smallest partition) and quickly all cached in memory and processed in memory. It’s the efficient use of the cache again. For both the IOT and the Heap, the primary key segment being processed is larger and the inserts are spread across the whole index.

Ordering data to match the index order (ordered_by_account tests) helps for all three segment types – heap, IOT and partitioned IOT. We have two indexes and the data is being ordered for both in effect, as one is on the columns we are ordering the data creation for and the other, the unique key, is generated by an ascending sequence. The gain is about 25, 30% and is due to working through the indexes in order as opposed to skipping about. If the indexes were significantly larger than the buffer cache or on a highly active system where the turnover in the cache was very high (indeicated by a low buffer cache hit ratio🙂 ) then the benefit would be higher.
The perfect ordering test (perfect_iot_order_match) is a special case. Not only is the data being processed in order of the indexes but only one part of the index is being updated – for a single account. In this situation the maintaining of the index is so efficient that the time saved not processing the heap table means both the IOT and IOT_P win out over the heap. So in this special case, inserts into IOTs are faster than inserts into a heap table.

And finally a quick summary:

Inserting records into an IOT carries about a 200-400% overhead, depending on the number of indexes

Ordering the data to match the IOT gives a 25-30% advantage, but it similarly aids update of the primary key on the heap

Partitioning the IOT such that a smaller chunk of the index is processed can be highly beneficial. I don’t cover it specifically in the above but you can probably appreciate that partitioning on a heap can be beneficial to processing indexes on the heap also.

If you can organise to create your IOT such that data is created only in one or a few distinct points in the structure (you might have to tweak you data load too of course), insert performance might match or even beat that for a heap table with the same indexes

Using methods to maximise efficient use of your buffer cache and reduce IO is a very powerful performance tuning aid, not just for select

BTW I am conscious of the fact that the above may surprise many and you might wonder if my data or tests are flawed. I have stripped out most of the code from this post as it was just becoming massive and confusing, but contact me and I will happily supply all the example scripts and data.

In my last post, IOT part 6, inserts and updates slowed down, I made the point that IOT insert performance on a relatively small Oracle system was very slow, much slower than on a larger system I had used for professional testing. A major contributing factor was that the insert was working on the whole of the IOT as data was created. The block buffer cache was not large enough to hold the whole working set (in this case the whole IOT) once it grew beyond a certain size. Once it no longer fitted in memory, Oracle had to push blocks out of the cache and then read them back in next time they were needed, resulting in escalating physical IO.

I’ve just done another test which backs up this claim. I altered my test database so that the block buffer cache was larger, 232MB compared to 100MB in my first tests. The full IOT is around 200MB

Bottom line, the creation of the IOT was greatly sped up (almost by a factor of 4) and the physical IO dropped significantly, by a factor of 20. As a result, the creation of the IOT was almost as fast as the partitioned IOT. It also shows that the true overhead on insert of using an IOT is more like a factor of 2 to 4 as opposed 6 to 8.

You can see some of the details below. Just to help you understand them, it is worth noting that I had added one new, larger column to the test tables (to help future tests) so the final segments were a little larger (the IOT now being 210MB as opposed to 180MB in the first tests) and there was a little more block splitting.

The heap table creation was faster with more memory available. I’m not really sure why, the cpu effort was about the same as before and though there was some reduction in physical IO with the larger cache, I suspect it might be more to do with both the DB and the machine having been recently restarted.

All three tests are doing a little more “work” in the second run due to that extra column and thus slightly fewer rows fitting in each block (more branch node and leaf node splits), but this just highlights even more how much the IOT performance has improved, which correlates with a massive drop in physical IO for the IOT creation. If you check the session logical reads they are increased by a very small, consistent amount. Physical read IO requests have dropped significantly and, in the case of the IOT, plummeted.

I believe the 90:10 leaf node splits are consistent as that will be the maintaining of the secondary index on ACCO_TYPE and ACCO_ID, which are populated in order as the data is created (derived from rownum).

What this second test really shows is that the efficiency with which you are able to make use of the database cache is incredibly significant. Efficiently accessing data via good indexes or tricks like IOTs and hash tables is important but it really helps to also try and consider how data is going to be recycled within the cache or used, pushed out and then reused. A general principle for batch-type work seems to me to be that if you can process it in chunks that can sit in memory, rather than the whole working set, there are benefits to be gained. Of course, partitioning can really help with this.

{If anyone is wondering why, for the heap table, the number of physical IO requests has dropped by 70% but the actual number of bytes has dropped by only 30%, I’m going to point the finger to some multi-block read scan going on, either in recursive code or, more likely, my code that actually gathers those stats! That would also help explain the drop in user IO wait time for the heap run.}

Just for completeness, here is a quick check of my SGA components for the latest tests, just to show I am using the cache size I claim. All of this is on Oracle 11.1 enterprise edition, on a tired old Windows laptop. {NB new laptop arrived today – you have no idea how hard it has been to keep doing this blog and not play with the new toy!!!}. If anyone wants the test scripts in full, send me a quick email and I’ll provide them.:

A negative impact of using Index Organized Tables is that inserts are and updates can be significantly slowed down. This post covers the former and the reasons why – and the need to always run tests on a suitable system. (I’m ignoring deletes for now – many systems never actually delete data and I plan to cover IOTs and delete later)

Using an IOT can slow down insert by something like 100% to 1000%. If the insert of data to the table is only part of a load process, this might result in a much smaller overall impact on load, such as 25%. I’m going to highlight a few important contributing factors to this wide impact spread below.

If you think about it for a moment, you can appreciate there is a performance impact on data creation and modification with IOTs. When you create a new record in a normal table it gets inserted at the end of the table (or perhaps in a block marked as having space). There is no juggling of other data.
With an IOT, the correct point in the index has to be found and the row has to be inserted at the right point. This takes more “work”. The inserting of the new record may also lead to an index block being split and the extra work this entails. Similar extra work has to be carried out if you make updates to data that causes the record to move within the IOT.
Remember, though, that an IOT is almost certainly replacing an index on the heap table which, unless you are removing indexes before loading data and recreating them after, would have to be maintained when inserting into the Heap table. So some of the “overhead” of the IOT would still occur for the heap table in maintaining the Primary Key index. Comparing inserts or updates between a heap table with no indexes and an IOT is not a fair test.

For most database applications data is generally written once, modified occasionally and read many times – so the impact an IOT has on insert/update is often acceptable. However, to make that judgement call you need to know

what the update activity is on the data you are thinking of putting into an IOT

the magnitude of the impact on insert and update for your system

the ratio of read to write.

There is probably little point putting data into an IOT if you constantly update the primary key values (NB see IOT-5 as to why an IOT’s PK columns might not be parts of a true Primary Key) or populate previously empty columns or hardly ever read the data.

There is also no point in using an IOT if you cannot load the data fast enough to support the business need. I regularly encounter situations where people have tested the response of a system once populated but fail to test the performance of population.

Now to get down to the details. If you remember the previous posts in this thread (I know, it has been a while) then you will remember that I create three “tables” with the same columns. One is a normal heap table, one is an Index Organized Table and one is a partitioned Index Organized Table, partitioned into four monthly partitions. All tables have two indexes on them, the Primary Key index (which is the table in the case of the IOTs) and another, roughly similar index, pre-created on the table. I then populate the tables with one million records each.

These are the times, in seconds, to create 1 million records in the the HEAP and IOT tables:

This is the average of three runs to ensure the times were consistent. I am using Oracle V11.1 on a machine with an Intel T7500 core 2 Duo 2.2GHz, 2GB memory and a standard 250GB 5000RPM disk. The SGA is 256MB and Oracle has allocated around 100MB-120MB to the buffer cache.

We can see that inserting the 1 million rows into the IOT takes 860% the time it does with a heap table. That is a significant impact on speed. We now know how large the impact is on Insert of using an IOT and presumably it’s all to do with juggling the index blocks. Or do we?

This proof-of-concept (POC) on my laptop {which you can also run on your own machine at home} did not match with a proof-of-concept I did for a client. That was done on V10.2.0.3 on AIX, on a machine with 2 dual-core CPUS with hyper-threading (so 8 virtual cores), 2GB SGA and approx 1.5GB buffer cache, with enterprise-level storage somewhere in the bowels of the server room. The results on that machine to create a similar number of records were:

In this case the IOT inserts required 135% the time of the Heap table. This was consistent with other tests I did with a more complex indexing strategy in place, the IOT overhead was around 25-35%. I can’t go into too much more detail as the information belongs to the client but the data creation was more complex and so the actual inserts were only part of the process – this is how it normally is in real life. Even so, the difference in overhead between my local-machine POC and the client hardware POC is significant, which highlights the impact your platform can have on your testing.

So where does that leave us? What is the true usual overhead? Below are my more full results from the laptop POC.

Note that with the partitioned IOT the creation took 341 second, the performance ratio to a heap table is only 198% and is much better than the normal IOT. Hopefully you are wondering why!

I’m running this test on a windows laptop and other things are going on. The timings for Run_Quiet are where I took steps to shut down all non-essential services and applications. This yielded a significant increase for all three object types but the biggest impact was on the already-fastest Heap table.

The final set of figures is for a “mistake”. I created the partitions wrong such that half the data went into one partition and the rest into another and a tiny fraction into a third, rather than being spread over 4 partitions evenly. You can see that the Heap and normal IOT times are very similar to the Run_Normal results (as you would expect as these test are the same) but for the partitioned IOT the time taken is half way towards the IOT figure.

We need to dig into what is going on a little further to see where the effort is being spent, and it turns out to be very interesting. During my proof-of-concept on the laptop I grabbed the information from v$sesstat for the session before and after each object creation so I could get the figures just for the loads. I then compared the stats between each object population and show some of them below {IOT_P means Partitioned IOT}.

The first section shows that all three used similar amounts of CPU, the IOT and partitioned IOT being a little higher. Much of the CPU consumed was probably in generating the fake data.The DB Time of course pretty much matches the elapsed time well as the DB was doing little else.
It is interesting to see that the Heap insert uses array inserts which of course are not available to the IOT and IOT_P as the data has to be inserted in order. {I think Oracle inserts the data into the heap table as an array and then updates the indexes for all the entries in the array – and I am only getting this array processing as I create the data as an array from a “insert into as select” type load. But don’t hold me to any of that}.

In all three cases there are two indexes being maintained but in the case of the IOT and IOT_P, the primary key index holds the whole row. This means there has to be more information per key, less keys per block and thus more blocks to hold the same data {and more branch blocks to reference them all}. So more block splits will be needed. The second section shows this increase in branch node and leaf block splits. Double the branch blocks and triple the leaf block splits. This is probably the extra work you would expect for an IOT. Why are there more leaf block splits for the partitioned IOT? The same data of volume ends up taking up more blocks in the partitioned IOT – 200MB for the IOT_P in four partitions of 40-60MB as opposed to a single 170MB for the IOT. The larger overall size of the partition is just due to a small overhead incurred by using partitions and also a touch of random fluctuation.

So for the IOT and IOT_P there is about three times the index-specific work being done and a similar increase in related statistics such as enqueues, but not three times as it is not just index processing that contribute to these other statistics. However, the elapsed time is much more than three times as much. Also, the IOT_P is doing more index work than the IOT but it’s elapsed time is less. Why?

The fourth section shows why. Look at the file io wait times. This is the total time spent waiting on IO {in millionths of a second} and it is significantly elevated for the IOT and to a lesser degree for the IOT_P. Physical IO is generally responsible for the vast majority of time in any computer system where it has not been completely avoided.
Session logical reads are only slightly elevated, almost negligably so but the number of physical reads to support it increases from 123 for the Heap table insert to 81,458 for the IOT and 3,068 for the IOT_P. A clue as to why comes from the hot buffers moved to head of LRU and free buffer requested statistics. There is a lot more activity in moving blocks around in the buffer cache for the IOT and IOT_P.

Basically, for the IOT, all the blocks in the primary key segment are constantly being updated but eventually they won’t all fit in the block buffer cache – remember I said the IOT is eventually 170MB and the buffer cache on my laptop is about 100MB – so they are flushed down to disk and then have to be read back when altered again. This is less of a problem for the IOT_P as only one partition is being worked on at a time (the IOT_P is partitioned on date and the data is created day by day) and so more of it (pretty much all) will stay in memory between alterations. The largest partition only grows to 60MB and so can be worked on in memory.
For the heap, the table is simply appended to and only the indexes have to be constantly updated and they are small enough to stay in the block buffer cache as they are worked on.

This is why when I got my partitioning “wrong” the load took so much longer. More physical IO was needed as the larger partition would not fit into the cache as it was worked on – A quick check shows that logical reads and in fact almost all statistics were very similar but 26,000 IO requests were made (compared to 81,458 for the IOT and 3,068 for the correct IOT_P).

Of course, I set my SGA size and thus the buffer cache to highlight the issue on my laptop and I have to say even I was surprised by the magnitude of the impact. On the enterprise-level system I did my client’s proof of concept on, the impact on insert was less because the buffer cache could hold the whole working set, I suspect the SAN had a considerable cache on it, there was ample CPU resource to cope with the added latching effort and the time taken to actually create the data inserted was a significant part of the workload, reducing the overall impact of the slowness caused by the IOT.

{Update, in This little update I increase my block buffer cache and show that physical IO plummets and the IOT insert performance increases dramatically}.

This demonstrates that a POC, especially one for what will become a real system, has to be a realistic volume on realistic hardware.
For my client’s POC, I still did have to bear in mind the eventual size of the live working set and the probably size of the live block buffer cache and make some educated guesses.

It also explains why my “run_quiet” timings showed a greater benefit for the heap table than the IOT and IOT_P. A windows machine has lots of pretty pointless things running that take up cpu and a bit of memory, not really IO so much. I reduced the CPU load and it benefits activity that is not IO, so it has more impact on the heap table load. Much of the time for the IOT and IOT_P is taken hammering the disk and that just takes time.

So, in summary:

Using an IOT increases the index block splitting and, in turn, enqueues and general workload. The increase is in proportion to the size of the IOT compared to the size of the replaced PK.

The performance degredation across the whole load process may well be less than 50% but the only way to really find out is to test

You may lose the array processing load that may benefit a heap table load if you do the load via an intermediate table.

With an IOT you may run into issues with physical IO if the segment (or part of the segment) you are loading into cannot fit into the buffer cache (This may be an important consideration for partitioning or ordering of the data loaded)

If you do a proof of concept, do it on a system that is as similar to the real one as you can

Just seeing the elapsed time difference between test is sometimes not enough. You need to find out where that extra time is being spent

I’ve thrown an awful lot at you in this one post, so I think I will stop there. I’ve not added the script to create the test tables here, they are in IOT-5 {lacking only the grabbing of the v$sesstat information}.

I had a real treat this summer during my “time off” in that I got to review Jonathan Lewis’s up-coming new book. I think it’s going to be a great book. If you want to know how Oracle actually holds it’s data in memory, how it finds records already in the cache and how it manages to control everything so that all that committing and read consistency really works, it will be the book for you.

{Update, Jonathan has confirmed that, unexpected hiccups aside, Oracle Core: Essential Internals for DBAs and Developers should be available from October 24, 2011}

Jonathan got in touch with me around mid-May to say he was working on the draft of his new book, one that would cover “how does Oracle work”, the core mechanics. Would I be willing to be one of his reviewers? Before anyone comments that there is not likely to be much about core Oracle that I know and Jonathan does not, he did point out that he had already lined up someone to be his technical reviewer, ie someone he expected to know as much as he and help spot actual errors. The technical reviewer is the most excellent Tanel Poder, who posted a little mention of it a couple of months back.

I was to act more like a typical reader – someone who knew the basics and wanted to learn more. I would be more likely to spot things he had assumed we all know but don’t, or bits that did not clearly explain the point if you did not already know the answer. ie an incomplete geek. I figured I could manage that🙂.

It was a lot harder work than I expected and I have to confess I struggled to supply back feedback as quickly as Jonathan wanted it – I was not working but I was very busy {and he maybe did not poke me with a sharp stick for feedback soon enough}. As anybody who has had to review code specifications or design documents will probably appreciate, you don’t just read stuff when you review it, you try and consider if all the information is there, can it be misunderstood and, if you find that you don’t understand a section, you need to work out if the fault is with you, with the way it is written or with what is written. When I read a technical {or scientific} document and I do not fully understand it, I usually leave it a day, re-read it and if it still seems opaque, I just move on. In this case I could not do that, I had to ensure I understood it or else tell Jonathan why I thought I did not understand it. If there are sections in the end book that people find confusing, I’ll feel I let Jonathan down.

Just as tricky, on the one hand, as I’ve been using Oracle for so long and I do know quite a lot about Oracle {although clearly not enough in the eyes of the author🙂 } I had to try and “not know” stuff to be able to decide if something was missing. On the other, when I wanted to know more about something was I just being a bit too nerdy? I swung more towards the opinion that if I wanted to know more, others would too.

I have to say that I really enjoyed the experience and I learnt a lot. I think it might change how I read technical books a little. I would run through each chapter once to get the feel of it all and then re-read it properly, constantly checking things in both version 11 and 10 of Oracle as I read the drafts and would not let myself skip over anything until I felt I really understood it. As an example, I’ve never dug into internal locks, latches and mutexes much before and now that I’ve had to learn more to review the book, I have a much better appreciation of some issues I’ve seen in the wild.

Keep an eye out for the book, it should be available by the end of this year and be called something like “Oracle Core” {I’ll check with Jonathan and update this}. I won’t say it will be an easy read – though hopefully a little easier as a result of my input – as understanding things always takes some skull work. But it will certainly be a rewarding read and packed full of information and knowledge.

Oracle Tuning is all about technical stuff. It’s perhaps the most detail-focused and technical aspect of Oracle Administration there is. Explain Plans, Statistics, the CBO, database design, Physical implementation, the impact of initialisation variables, subquery factoring, sql profiles, pipeline functions,… To really get to grips with things you need to do some work with 10046 and 10053 traces, block dumps, looking at latching and queueing…

But I realised a good few years ago that there is another, very important aspect and one that is very often overlooked. People and their perception. The longer I am on an individual site, the more significant the People side of my role is likely to become.

Here is a little story for you. You’ll probably recognise it, it’s one that has been told (in many guises) before, by several people – it’s almost an IT Urban Myth.

When I was but a youth, not long out of college, I got a job with Oracle UK (who had a nice, blue logo back then) as a developer on a complex and large hospital system. We used Pyramid hardware if I remember correctly. When the servers were put in place, only half the memory boards and half the CPU boards were initiated. We went live with the system like that. Six months later, the users had seen the system was running quite a bit slower than before and started complaining. An engineer came in and initiated those other CPU boards and Memory boards. Things went faster and all the users were happy. OK, they did not throw a party but they stopped complaining. Some even smiled.

I told you that you would recognise the story. Of course, I’m now going to go on about the dishonest vendor and what was paid for this outrageous “tuning work”. But I’m not. This hobbling of the new system was done on purpose and it was done at the request of “us”, the application developers. Not the hardware supplier. It was done because some smart chap knew that as more people used the system and more parts of it were rolled out, things would slow down and people would complain. So some hardware was held in reserve so that the whole system could have a performance boost once workload had ramped up and people would be happy. Of course, the system was now only as fast as if it had been using all the hardware from day one – but the key difference was that rather than having unhappy users as things “were slower than 6 months ago”, everything was performing faster than it had done just a week or two ago, and users were happy due to the recent improvement in response time. Same end point from a performance perspective, much happy end point for the users.

Another aspect of this Human side of Tuning is unstable performance. People get really unhappy about varying response times. You get this sometimes with Parallel Query when you allow Oracle to reduce the number of parallel threads used depending on the workload on the server {there are other causes of the phenomena such as clashes with when stats are gathered or just random variation in data volumes}. So sometimes a report comes back in 30 minutes, sometimes it comes back in 2 hours. If you go from many parallel threads to single threaded execution it might be 4 hours. That really upsets people. In this situation you probably need to look at if you can fix the degree of parallelism that gives a response time that is good enough for business reasons and can always be achieved. OK, you might be able to get that report out quicker 2 days out of 5, but you won’t have a user who is happy on 3 days and ecstatic with joy on the 2 days the report is early. You will have a user who is really annoyed 3 days and grumbling about “what about yesterday!” on the other 2 days.

Of course this applies to screens as well. If humans are going to be using what I am tuning and would be aware of changes in performance (ie the total run time is above about 0.2 seconds) I try to aim for stable and good performance, not “outright fastest but might vary” performance. Because we are all basically grumpy creatures. We accept what we think cannot be changed but if we see something could be better, we want it!

People are happiest with consistency. So long as performance is good enough to satisfy the business requirements, generally speaking you just want to strive to maintain that level of performance. {There is one strong counter-argument in that ALL work on the system takes resource, so reducing a very common query or update by 75% frees up general resource to aid the whole system}.

One other aspect of Human Tuning I’ll mention is one that UI developers tend to be very attuned to. Users want to see something happening. Like a little icon or a message saying “processing” followed soon by another saying “verifying” or something like that. It does not matter what the messages are {though spinning hour glasses are no longer acceptable}, they just like to see that stuff is happening. So, if a screen can’t be made to come back in less than a small number of seconds, stick up a message or two as it progresses. Better still, give them some information up front whilst the system scrapes the rest together. It won’t be faster, it might even be slower over all, but if the users are happier, that is fine. Of course, Oracle CBO implements this sort of idea when you specify “first_n_rows” as the optimizer goal as opposed to “all_rows”. You want to get some data onto an interactive screen as soon as possible, for the users to look at, rather than aim for the fastest overall response time.

After all, the defining criteria of IT system success is that the users “are happy” -ie accept the system.

This has an interesting impact on my technical work as a tuning “expert”. I might not tune up a troublesome report or SQL statement as much as I possibly can. I had a recent example of this where I had to make some batch work run faster. I identified 3 or 4 things I could try and using 2 of them I got it to comfortably run in the window it had to run in {I’m being slightly inaccurate, it was now not the slowest step and upper management focused elsewhere}. There was a third step I was pretty sure would also help. It would have taken a little more testing and implementing and it was not needed right now. I documented it and let the client know about it, that there was more that could be got. But hold it in reserve because you have other things to do and, heck, it’s fast enough. {I should make it clear that the system as a whole was not stressed at all, so we did not need to reduce system load to aid all other things running}. In six months the step in the batch might not be fast enough or, more significantly, might once more be the slowest step and the target for a random management demand for improvement – in which case take the time to test and implement item 3. (For those curious people, it was to replace a single merge statement with an insert and an update, both of which could use different indexes).

I said it earlier. Often you do not want absolute performance. You want good-enough, stable performance. That makes people happy.

One of the drawbacks of IOTs is that they have to be organised by the primary key of the table. If your table does not have a primary key, it cannot be Index Organized.

I would argue that any table that holds persistent data (ie it is not transient data about to be loaded into the database proper or a temporary working set) should have a Primary Key. If I am working on a system and come across a table without a Primary Key I immediately challenge it. {There are occasional, valid reasons for a persistent table to lack a PK, but I confess I am struggling right now to come up with one – but I digress}. I’m a big fan of database-enforced referential integrity.

The problem is, if you you are making a table into an Index Organized Table so that the records are clustered to match how you process the data, it could well be that the primary key is not related to how you want to order the data. Let me give you an example. {Oh, and for brevity, I’ll put the SQL statements to create the examples at the end of this post}.

This is a classic parent-child relationship, each account has a set of transactions. I’ve expanded on my prior example by:

changing the parent to be called ACCOUNT and giving it a two-part Primary Key, ACCO_TYPE and ACCO_ID.

Changing the child to be called TRANSACTION and given it a Primary Key of TRAN_TYPE and TRAN_ID.

In a real system I would create a foreign key from TRANSACTION.ACCO_TYPE,ACCO_ID to the ACCOUNT table primary key.

Note that the Primary Key on the TRANSACTION table is NOT based on the account columns. Maybe in theory the primary key on the transaction table would be the account columns and the cre_date – if the cre_date held a datetime AND two records could not be created on the same second. If we used a timestamp then you might be able to argue no record would be created in the same fraction of a second – except that often transactions get given a fixed time. Midnight springs to mind (consider when you would add the accrued interest on a savings account). So, a new surrogate Primary Key is intoduced, a transaction type and ID. TRAN_TYPE and TRAN_ID are the primary key of the TRANSACTION table.

I’d say that I see such two-part primary keys more often then single column primary keys these days. Possibly because so many databases receive information from other systems or even applications on the same database.

As before, I create 10,000 parent records (ACCOUNT) and 10,000 random child records (TRANSACTION_HEAP) each day for 100 days.

Also as before, I want to select information grouped by account. I want all the transactions for an account, not all transactions on a day or for a range of transaction IDs. Hopefully this is a scenario most of you will recognise.

Selecting a sum of one of the non-indexed columns and a count of records for a given account takes quite a bit of effort on the part of the HEAP table:

Of course, it has to do a full table scan as my Primary Key is on two columns that have nothing to do with the query. I can repeat this statement as often as I like, it takes the same number of physical reads and consistent gets as it is not caching the information.

I add an index on the ACCO_TYPE, ACCO_ID and CRE_DATE columns and re-run the query:

I ran it twice to get rid of the parse overhead, but the first time it did a load of physical reads to support those 120 consistent gets.

I could recreate the TRANSACTION_HEAP table as an IOT of course – but it will be organized by the TRAN_TYPE and TRAN_ID columns. That is useless to me. Even if I add a secondary index on the ACCO_TYPE, ACCO_ID and CRE_DATE columns it will at best be no better than the above HEAP table and, because the secondary index will hold rowid guesses and will sometimes have to use the primary key information to walk down the index, it will be worse. {I am not sure I have explained that bit yet about row guesses. Post 6?}

So, if you want the information organized in an order that is not helped by the Primary Key of the table, an IOT is useless to you. You cannot achieve that physical record grouping by the IOT method.

I am going to do something else though. I’m going to sort of change the rules to work around the issue.

As far as the physical implementation is concerned, a Primary Key is in effect just a unique index and two rules. The rules are that all the columns in the Primary Key must be mandatory and there can only be one PK on a table. I can have as many unique indexes as I like, so long as the key combinations lead to no duplicate rows. I can alter my Primary Key – it is not set in stone.

Before I go any further I am going to stress that I am about to abuse the concept of the Primary Key. I’d need to do a seperate blog to fully justify saying what a Primary Key is, but part of the concept is that no column must be derivable from other columns in the PK and it must be the minimum number of columns required to make the key unique.

We want to group the data by the account columns and the creation date. So let’s define a Primary Key that is ACCO_TYPE, ACCO_ID, CRE_DATE and whatever else we need to guarantee the key is unique. In our case that would be TRAN_TYPE and TRAN_ID – the current Primary Key! If I knew I would always want all records for the account, I could drop the CRE_DATE out of my fake Primary Key, but I know that the creation date is very often important. You may want activity for the last month, last quarter, a stated date or even an exact datetime. For all those cases, including the CRE_DATE column is highly beneficial.

5 consistent gets. It has walked down the IOT and scanned 3 blocks to collect that data. Our IOT based on an abused Primary Key does the job of supporting range scans efficiently, with the benefits to the Block Buffer Cache I refered to in IOT4

That “Primary Key” I created is NOT a real Primary key. It is not the minimum number of columns I need to uniquely identify a column. My Primary key is on ACCO_TYPE, ACCO_ID, CRE_DATE,TRAN_TYPE and TRAN_ID – the account, the datetime of the transaction and the transaction. What if I was to alter the datetime by a second? I could create a record with the same account, the same transaction_id as an existing record but a second into the future. That is just wrong. After all, the whole point of the TRAN_TYPE and TRAN_ID is to uniquely identify a record. If created the new record I stated above, there would be two records for the one TRAN_TYPE/TRAN_ID.

I protect against this ability to create incorrect records by creating a UNIQUE KEY against the table also, against columns TRAN_TYPE and TRAN_ID. This is unique index TRIO_TRAN_UQ as displayed in the information above. A Primary Key is usually the referenced parent of any referential integrity, ie foreign keys, between this table and any children. However, a Unique Key can also be the target of Referential Integrity. I cannot create a record in TRANSACTION_IOT with the same TRAN_TYPE/TRAN_ID as already exists due to this unique constraint:

So, I have my IOT to support querying code and I have my Unique Constraint to police my original Primary Key and be used as the target for any Foreign Key requirements I might need. This is not a perfect solution – the design will look a little strange to anyone who looks at this database and the Unique Key is supported by a secondary index on an IOT which can have some issues. But it does work.

My “primary key” is no longer a true Primary Key. It is just a tool for allowing me to organise the data physically in a way that will support my application. That is what I meant about changing the rules.

I am willing to abuse a Primary Key in this way because of the performance benefits. It is a solution for a system where most of the query access is against a set of records which would be scatter-gunned across a table if you did not use some sort of physical grouping. If you are reading this and thinking “oh, I am not sure about you doing that to a Primary Key Martin” then you are probably OK to consider this solution. If you can’t see a problem with it then you are either very used to turning off referential integrity and understand the consequences – or you simply do not understand what RI does for your database. If you are in the latter camp, do not even consider doing this. If you are one of those people who works on data warehouse and for whom is it just part of the DW process to turn off RI as that is what you do for data warehouses – DON’T do this!

OK, I’m nearly at the end of this topic but I want to touch on partitioning. You can range partitition an Index Organized Table from 9i I think. It is certainly supported in Oracle 10 upwards. Partitioning is important in this technique because a unique index must contain the partition key if the index is to be locally partitioned – otherwise the index must be global, ie the one index object references all the partitions across the table.

Below is my table creation statement for the IOT organized by the account, creation date and transaction. The table is ranged partitioned by CRE_DATE, into months.

You can see the definition of my fake Primary Key and the fact that it does not have a tablespace defined for it – as the ‘organization index’ statement lower down causes the table to be an IOT and the segment will go into the “table” tablespace.
I then state my Unique Index to police the integrity of my table – TIP_TRAN_UQ
I then state the partition clause, ‘partition by range (cre_date)’ followed by my initial partition definitions. It’s as simple as that to partition an IOT.

What gets created? A set of four segments for the IOT, which are primary key index segments of course, not table segments:

Note that the SEG (type) is “IP” – my script decodes the type into a short mnemonic and IP is Index Partition. You can see the tablespaces those segments are in and the size of the segments. What about that unique index I created?

It is a single segment, a normal index. I cannot have it as a locally partitioned index as it is a unique index and lacks the partitioning key in it’s definition.

This could be a problem. The usual reason you partition a table is because it is too large to comfortably be held as a single segment {and also for the benefit of partition exclusion, but you don’t usually need that on small tables!}. This means that the global index to support that primary key is going to be large. Now, I made a “mistake” when I created my partitioned IOT – I did not create a partition for this month, some data has gone into the MAXVALUE partition (see the size of the segment above, 34K and 49 extents). If I split that last partition to create a new partition for this month and a new MAXVALUE partition, I will invalidate the global index and I will have to rebuild it. Very large indexes can take a long time and a heck of a lot of temporary space to gather and sort the data. That could be an ongoing maintenance nightmare.

In a recent implementation I did using IOTs I did not create a global unique index to replace the original foreign key. I create a non-unique, locally partitioned index to support some queries using those columns and the table had no children so no Foreign Keys were needed. But there was something else I needed to do as I had removed the referential integrity rules for that table. Remember I sad I am a fan of database enforced referential integrity? Now I “know” the application will not create data that will break the removed Primary Key rule, I “know” I documented what I had done. And I know that in 12 months time there will almost certainly be data that will have duplicate values for that Primary Key if it is not enforced somehow, because it always happends. I need to implement a little script to regularly check for duplicate TRAN_TYPE/TRAN_ID conmbinations being created. If you remove RI from a relational database, you should replace it in some way. Otherwise, you will pretty soon have a non-relational database.

That’s it for this topic. The below is my example script for creating most of the above, in case anyone wants it or wants to verify what I have said.

It is where Oracle works through a SQL array. I’ve usually seen this within PL/SQL where an array variable is used within a procedure or package, being passed in containing a set of accounts or customers that are of interest and is used with a cursor. But you might also see it as the step processing the output of a pipelined function (one that returns a set of “rows”) being cast into a table. See this example by Tom Kyte. Tom also suggests that it is called a “Pickler” fetch as the data is pickled – packed and formatted. I never knew that, I just thought someone in Oracle development was having a giggle and it was “Pickled” as it was preserved from the PL/SQL side of the SQL engine. It seems that I was a little off-target with that.

{My thanks to Timur (see comments) who corrected me when I said it was a PL/SQL array. It is not, it is a SQL object}.

The CBO is not very good at looking “inside” such arrays to determine the cardinality of that step. This can cause performance issues.

Because using them involves a few steps and potentially involves grants, it is possible for them to be “temporarily removed” during testing and so their impact is not seen.

They can leak memory. I think.

I’m now going to cover each of those points in turn. If you just wanted to know what a pickler fetch is and I’ve answered that for you, I suggest you go back to whatever you were doing before you started reading this🙂

By “not very good at looking inside pickler fetches” I mean that the CBO seems to make a poor “base it on the first seen array” (V11 up) or utterly fails to identify how many records are inside the SQL array (V10 down), depending on the Oracle version. From an Oracle 10.2.0.4 system I’ve got the two following examples:

Note the cost and expected rows for the Pickler Fetch step. Or rather, the lack of them.

This would manifest itself in the following way in OEM screens:

Note the cardinality of the Pickler Fetch step is 0 {sorry, I cut off the column headings}. This resulted in this step having a cost of 0 and all the subsequent steps of having an expected cardinality of one and having very low costs – between 1 and 3 {Again, not shown, sorry}.

The end result of this was that the CBO struggled to accuratley cost any given plan and came up with several, usually quite terrible, plans that it swapped between as other table stats varied. The CBO was picking between very complex plans with total costs of 100 or 200 or so! Any difference was “significant”.

Please note, OPTIMIZER_DYNAMIC_SAMPLING was set to 4 on this system and I tried hints and session settings to higher levels and they did not prompt the CBO to look into the array, on 10.2.0.4 at least.

In 11.1 things seem to be better, as is shown in the explain plan at the top of this post. The step has a cost. I have to confess, I have not tested this very much on 11 {and if anyone has, feel free to correct me/add enlightenment via comments or pointers to other sources}, but it seems to set the cardinality to the number of elements the Pickler Fetch finds in the first itteration. Unless it uses the same sort of trick Oracle 11 now uses for bind variables (detecting when the value supplied is out of range and generates a new plan) this is going to lead to the old and much ‘loved’ issue of the plan being fixed by the first execution, irrespective of how suitable that plan is.

How do you fix this issue? Well, I resort to the cardinality hint. Usually the number of records being passed into the array is not too variable and any half-decent value is better than nothing in Oracle 10 and before. As for in 11, I like stating the value rather than risking a variable ‘first seen at parsing time’ setting. It is a judgement call. The below is from 11.1 but I’ve used it extensively in 10.2, where the impact is much more significant:

Note the change of ROWS to 11 in step 5. In V10 this is a change from blank to 11 and in real situations, do not be at all suprised if the plan changes dramatically – away from nested loop access and more to hash joins. {I should note, the cardinality hint is not documented in Oracle 10 or 11 and any use you make of it in live code is your responsibility. Sorry about that}.

What about my second point, about testing them? Well, as an example of testing Pickler processing of SQL arrays, which are defined SQL types, this is what I had to do to run my basic test:

I had to create a table type, which is the SQL array, and this was based on an object type which I had to create first {you can have table types based on standard SQL types but very often they are based on a “row” object}. After creating the stored procedure, I had to define and populate the array with a set of records which I then passed in to my procedure call . {If you want to repeat this yourself, check out my postings on IOTs to get the table creation statement for table CHILD_HEAP}.
Now, I created those types so I had access to them. If those types do not belong to you you have to be granted execute on the types to reference them. Not select, execute. Some sites have a pretty strict attuitude to granting execute on anything and types seem to get forgotten when the execute priviledges against packages and procedures are set up. In a recent situation I had, I was forced to do some testing work on Live and it had taken people with big sticks to get me select access on data. Execute privileges were totally refused. Calmly explaining why it was needed and how it was acceptable fell on not so much deaf as bricked-up ears.

So, for testing, the reference to an array passed in is often replaced by a little sub-select. After all, quite often what is being passed in for a pickler fetch is actually a set of records {but a subset of the rows} from a table that has been collected by a previous processing step.
As an example of such a change:

All that has changed is that we now have a little sub-select rather than the casting of the SQL array into a table and, heck, as the developer might say, those were the records that would have been passed in, the code still works as expected and the same data comes back. No identifying which object types you need, no getting the execute permissions, no populating it yourself in the test harness, we can just swap back in the array later.

{I apologise to good developers, who even now are throwing imaginary darts at me. I know Dawn, you would certainly not do this. But I’ve seen it a couple of times. Developers have enough on their plate to go worrying about esoteric aspects of the CBO}

But the thing is, Oracle can look at that select and evaluate it’s cost and get an expected cardinality. The pickler fetch version has the issues I’ve just covered. I had to deal with a situation just like this last year, it does happen. In dev it was fine, in pre-live testing it was not.

What about memory leaks? Well, I had lots of issues with SQL arrays and memory leaks with oracle 10.1. and 10.2 at one client site and there is a documented bug in oracle 8 with pickler fetch and memory leaks but I have to confess, a quick metalink search did not find any hits for Oracle 10 and 11. So maybe you should not trust me on that one. In the situation I saw the arrays were massive, several MB at times, and so if you are using SQL arrays to pass in a modest list of eg accounts or customers, it is not going to be an issue anyway.

You know, this was just going to be a quick post on something I’ve been meaning to mention for months, not a small essay🙂.

So far I have covered the basics of Index Organized Tables, created some examples and shown how IOTs can help reduce the cost of single row selects and then how they can greatly aid range scans. Follow the above links for details.

Now I’d like to show that the use of IOTs has the potential to make your block buffer cache (BBC) far more efficient. Going to disc is very,very slow compared to going to memory {NB solid state storage improves this situation but does not remove it}. The block buffer cache has always been critical to oracle SQL Select performance as it allows you to access data in memory rather than disc and in general the more block buffer cache you have the faster your system will be.
{I am of the opinion that the BBC is even more important now than ever. As hard discs get larger we are seeing fewer and fewer spindles per GB of storage and, in essence, disc storage is effectively getting slower – because more data is hosted on the same number of spindles and those spindles are not themselves getting faster – I digress, for more details see posts Big Discs are Bad and IOPs and Form Factors}

In the scenario I’ve covered in my previous posts on IOTs we have a system where child data is coming in for many parents every day for 100 days. With a heap table the data pours into the growing end of the table, usually a record or two per parent each day and no guarantee that if two records come in, they will be put into the same block.

So, when you select a child record for a parent you get the situation shown below:

For many systems, the Block Buffer Cache is holding a lot of data no queries asked for -collateral data

When oracle needs to collect a record from the table, it has to read the whole block. Oracle only reads in tablespace data in whole blocks. That record comes with many other records in it that you did not ask for or want. I refer to this as Collateral Data – innocent bystander data that has got pulled into the BBC just because it was in the same block as required data. The larger the block size, the more collateral data there is.

To get all the child records for the parent, you need to read all those individual table blocks holding one or two records of interest. For our 100 child records you will probably need to read in close to 100 table blocks. Your Block Buffer Cache is filling up with of blocks where only one row out of each block is “of interest”. If that is one row out of 80 in a block, you are effectively wasting 98.75% of the space that table takes up in the block buffer cache.

With an IOT the situation is very different. We have already seen in my previous post on reducing IO that for a range scan on the IOT, oracle does not need to go and collect records from blocks scattered throughout the table. It simply collects the IOT leaf blocks holding the relevant data. Not only does this require less IO, it also results in the fetched blocks mostly holding the required data. The percentage of collateral data is greatly reduced:

IOTs are a powerful tool in reducing collateral data and using the BBC more efficiently

Thus instead of 100 table blocks that mostly hold collateral data, you have 2 or 3 blocks holding mostly the data you are interested in. Your wastage, the collateral data, is about 33-50%. With my example tables from post IOT2, it is indeed an average of 2-3 IOT blocks holding all the data for a single parent and 100 heap table blocks holding the same data.

You can think of it another way.

With my IOT I use only 3% of the memory to cache a single set of records for a parent compared to that needed with a normal HEAP table.

Let’s extend that idea a little. Let’s say I have 100,000 customer and 5% of the customers are active.
Each customer has on average 500 * 200-byte activity records for 100K of data
Each 8K Heap Table block holds 40 records, a very inefficient IOT holds 20 records per block {I’ll go into the details of this in a later post, but I am being a little unfair on the IOT here}.
There is some grouping of records for the customer so that those 500 records are over 400 blocks.
The whole working set of 400 blocks * (100,000*0.05) customers *8K = 2,000,000K

Now replace the table with an IOT keyed on customer ID.
Those 400 records would be spread over (400/20)+2 blocks. The +2 is the start and end of the range.
That is 22 *(100,000*0.05) *8K = 110,000K

Both would also need the overhead of an index structure to be cached also, for the Heap table it is the Primary Key index, for the IOT it is the rest of the IOT structure.

So you would need 2GB or so of Block Buffer Cache dedicated to caching the working set held as a heap table and 110MB of Block Buffer Cache dedicated to caching the IOT equivalent.

How many of you spotted that the space needed to “cache the heap table working set” is actually twice the size of the table? Well, that is because with a 5% working set and 40 records per block, there is a fair chance that some of those Collateral Data records in each block are for the 5% working set. Over all, almost every block will hold data for two or three active users. This is one of the complications of working out how efficient you block buffer cache is likely to be and I’ll have to leave that to another post. In reality you would need to hold 99% of the heap table in memory to cache the whole working set, so still 1GB. The IOT is still far more efficient.

I’ll just finish by saying that on one project I worked on we would have needed several hundred GB of Block Buffer Cache to hold the working set of the main tables if held as heap tables. That volume of memory was simply not available. With IOTs this reduced to about 40GB. This was available. The majority of this working set was able to stay in their SGA and it meant that so long as the instance stayed up, that working set of data mostly stayed in cache. I was able to see on the live system that processing of the data for these active customers was mostly being supported by consistent gets and less than 1% disk reads.

So, by using IOTS I reduced consistent gets dramatically, made more efficient use of the block buffer cache and, as a result of that, reduced the number of physical reads needed to support the consistent gets.

For as long as I have been working with Oracle technology {which is now getting towards 2 decades and isn’t that pause for thought} there has been a constant search for Performance Silver Bullets – some trick or change or special init.ora parameter {alter system set go_faster_flag=’Y’} you can set to give you a guaranteed boost in performance. For all that time there has been only one.

There are a few performance Bronze Bullets…maybe Copper Bullets. The problem is, though, that the Oracle database is a complex piece of software and what is good for one situation is terrible for another. Often this is not even a case of “good 90% of the time, indifferent 9% of the time and tragic 1% of the time”. Usually it is more like 50%:30%:20%.

Cartridge with copper bullet &spent round

I’ve just been unfair to Oracle software actually, a lot of the problem is not with the complexity of Oracle, it is with the complexity of what you are doing with Oracle. There are the two extremes of OnLine Transaction Processing (lots of short running, concurrent, simple transactions you want to run very quickly by many users) and Data Warehouse where you want to process a vast amount of data by only a small number of users. You may well want to set certain initialisation parameters to favour quick response time (OLTP) or fastest processing time to completion (DW). Favouring one usually means a negative impact on the other. Many systems have both requirements in one… In between that there are the dozens and dozens of special cases and extremes that I have seen and I am just one guy. People get their database applications to do some weird stuff.

Partitioning is a bronze bullet. For many systems, partitioning the biggest tables makes them easier to manage, allows some queries to run faster and aids parallel activity. But sometimes (more often than you might think) Partitioning can drop rather than increase query or DML performance. In earlier versions of Oracle setting optimizer_index_caching and optimizer_index_cost_adj was often beneficial and in Oracle 9/8/7 setting db_file_multiblock_read_count “higher” was good for DWs….Go back to Oracle 7 and doing stuff to increase the buffer cache hit ratio towards 98% was generally good {and I will not respond to any comments citing Connors magnificent “choose your BCHR and I’ll achieve it” script}.
You know what? There was an old trick in Oracle 7 you could maybe still look at as a bronze bullet. Put your online redo logs and key index tablespaces on the fastest storage you have and split your indexes/tables/partitions across the faster/slower storage as is fit. Is all your storage the same speed? Go buy some SSD and now it isn’t….

Cartridge with Wooden Bullet

Then there are bronze bullets that you can use that very often improve performance but the impact can be catastrophic {Let’s call them wooden bullets🙂 }. Like running your database in noarchivelog mode. That can speed up a lot of things, but if you find yourself in the situation of needing to do a recovery and you last cold backup is not recent enough – catastrophe. A less serious but more common version of this is doing things nologging. “oh, we can just re-do that after a recovery”. Have you done a test recovery that involved that “oh, we can just do it” step? And will you remember it when you have a real recovery situation and the pressure is on? Once you have one of these steps, you often end up with many of them. Will you remember them all?

How many of you have looked at ALTER SYSTEM SET COMMIT_WRITE=’BATCH,NOWAIT’? It could speed up response times and general performance on your busy OLTP system. And go lose you data on crash recovery. Don’t even think about using this one unless you have read up on the feature, tested it, tested it again and then sat and worried about could possibly go wrong for a good while.

That last point is maybe at the core of all these Performance Bronze Bullets. Each of these things may or may not work but you have to understand why and you have to understand what the payback is. What could now take longer or what functionality have I now lost? {hint, it is often recovery or scalability}.

So, what was that one Silver Bullet I tantalizingly left hanging out for all you people to wait for? You are not going to like this…

Look at what your application is doing and look at the very best that your hardware can do. Do you want 10,000 IOPS a second and your storage consists of less than 56 spindles? Forget it, your hardware cannot do it. No matter what you tune or tweak or fiddle with. The one and only Performance Silver Bullet is to look at your system and your hardware configuration and work out what is being asked and what can possibly be delivered. Now you can look at:

What is being asked of it. Do you need to do all of that (and that might involve turning some functionality off, if it is a massive drain and does very little to support your business).

Are you doing stuff that really is not needed, like management reports that no one has looked at in the last 12 months?

Is your system doing a heck of a lot to achieve a remarkably small amount? Like several hundred buffer gets for a single indexed row? That could be a failure to do partition exclusion.

Could you do something with physical data positioning to speed things up, like my current blogging obsession with IOTs?

You can also look at what part of your hardware is slowing things down. Usually it is spindle count/RAID level, ie something dropping your IOPS. Ignore all sales blurb from vendors and do some real-world tests that match what you app is or wants to do.

It’s hard work but it is possibly the only Silver Bullet out there. Time to roll up our sleeves and get cracking…

{Many Thanks to Kevin Closson for providing all the pictures – except the Silver Bullet, which he only went and identified in his comment!}