Saturday, April 25, 2009

Understanding the different modes of System Statistics aka. CPU Costing and the effects of multiple blocksizes - part 1

This is the first part of a series of posts that cover one of the fundamentals of the cost based optimizer in 9i and later. Understanding how the different system statistics modes work is crucial in making the most out of the cost based optimizer, therefore I'll attempt to provide some detailed explanations and samples about the formulas and arithmetics used. Finally I'll show (again) that using multiple block sizes for "tuning" purposes is a bad idea in general, along with detailed examples why I think this is so.

One of the deficiencies of the traditional I/O based costing was that it simply counted the number of I/O requests making no differentation between single-block I/O and multi-block I/O.

System statistics were introduced in Oracle 9i to allow the cost based optimizer to take into account that single-block I/Os and multi-block I/Os should be treated differently in terms of costing and to include a CPU component in the cost calculation.

The system statistics tell the cost based optimizer (CBO) among other things the time it takes to perform a single block read request and a multi-block read request. Given this information the optimizer ought to be able to come to estimates that better fit the particular environment where the database is running on and additionally use an appropriate costing for multi-block read requests that usually take longer than single block read requests. Given the information about the time it takes to perform the read requests the cost calculated can be turned into a time estimate.

The cost calculated with system statistics is still expressed in the same units as with traditional I/O based costing, which is in units of single-block read requests.

Although the mode using system statistics is also known as "CPU costing" despite the name the system statistics have the most significant impact on the I/O costs calculated for full table scans due to the different measure MREADTIM used for multi-block read requests.

Starting with Oracle 10g you have actually the choice of three different modes of system statistics also known as CPU costing:

The important point to understand here is that starting with Oracle 10g system statistics are enabled by default (using the default NOWORKLOAD system statistics) and you can only disable them by either downgrading your optimizer (using the OPTIMIZER_FEATURES_ENABLE parameter) or using undocumented parameters or hints ("_optimizer_cost_model" respectively the CPU_COSTING and NOCPU_COSTING hints).

This initial part of the series will focus on the default NOWORKLOAD system statistics introduced with Oracle 10g.

Default NOWORKLOAD system statistics

The default NOWORKLOAD system statistics measure only the CPU speed (CPUSPEEDNW), the two other remaining values used for NOWORKLOAD system statistics IOSEEKTIM (seek time) and IOTFRSPEED (transfer speed) are using default values (10 milliseconds seek time and 4096 bytes per millisecond transfer speed).

Using these default values for the I/O part the SREADTIM (single-block I/O read time) and MREADTIM (multi-block I/O read time) values are synthesized for cost calculation by applying the following formula:

SREADTIM = IOSEEKTIM + db_block_size / IOTFRSPEED

MREADTIM = IOSEEKTIM + mbrc * db_block_size / IOTFRSPEED

where "db_block_size" represents your database standard block size in bytes and "mbrc" is either the value of "db_file_multiblock_read_count" if it has been set explicitly, or a default of 8 if left unset. From 10.2 on this is controlled internally by the undocumented parameter "_db_file_optimizer_read_count". This means that in 10.2 and later the "mbrc" used by the optimizer to calculate the cost can be different from the "mbrc" actually used at runtime when performing multi-block read requests. If you leave the "db_file_multiblock_read_count" unset in 10.2 and later then Oracle uses a default of 8 for cost calculation but uses the largest possible I/O request size depending on the platform, which is usually 1MB (e.g. 128 blocks when using a block size of 8KB). In 10.2 and later this is controlled internally by the undocumented parameter "_db_file_exec_read_count".

Assuming a default block size of 8KB (8192 bytes) and "db_file_multiblock_read_count" left unset, this results in the following calculation:

SREADTIM = 10 + 8192 / 4096 = 10 + 2 = 12ms

MREADTIM = 10 + 8 * 8192 / 4096 = 10 + 16 = 26ms

These values will then be used to calculate the I/O cost of single block and multi-block read requests according to the execution plan (number of single-block reads + number of multi-block reads * MREADTIM / SREADTIM), which means that the I/O cost with system statistics aka. CPU costing is expressed in units of single block reads.

You can derive from above formula that with system statistics the cost of a full table scan operation is going to be more expensive approximately by the factor MREADTIM / SREADTIM compared to the traditional I/O based costing used in pre-10g by default, therefore system statistics usually tend to favor index access a bit more.

Note that above factor MREADTIM / SREADTIM is not entirely correct since the traditional I/O costing introduces a efficiency reduction factor when using higher MBRC settings, presumably to reflect that the larger the number of blocks per I/O request the higher the possibility that it won't be possible to use that large number of blocks per I/O request due to blocks already being in the buffer cache or hitting extent boundaries.

So with a MBRC setting of 8 the adjusted MBRC used for calculation is actually 6.59. Using e.g. a very high setting of 128 for the MBRC will actually use 40.82 for calculation. So the higher the setting the more the MRBC used for calculation will be reduced.

The following test case shall demonstrate the difference between traditional I/O costing, CPU costing and the factor MREADTIM / SREADTIM when using different "db_file_multiblock_read_count" settings. The test case was run against 10.2.0.4 Win32.

Note that the test case removes your current system statistics so you should be cautious if you have non-default system statistics at present in your database.

-- Use default NOWORKLOAD system statistics-- for test but ignore CPU cost component-- by using an artificially high CPU speedbegin dbms_stats.delete_system_stats; dbms_stats.set_system_stats('CPUSPEEDNW',1000000);end;/

-- In order to verify the formula against the-- optimizer calculations-- don't increase the table scan cost by one-- which is done by default from 9i onalter session set "_table_scan_cost_plus_one" = false;

So as you can see the I/O costs for a full table scan are significantly different when using default NOWORKLOAD system statistics. You can also see that the SREADTIM and MREADTIM values derived are quite different when using different "db_file_multiblock_read_count" settings. Furthermore the difference between traditional I/O based costing and the CPU costing is not the factor MREADTIM / SREADTIM as suggested by the formula, but is reduced by the adjustment applied to the MBRC when using traditional I/O costing.

The next part of the series will cover the remaining available System Statistics mode.

14 comments:

Good idea for a series of posts! How long do you think it will be before you publish the more interesting scenarios? ;)

On a side note, I polled about 20 or so DBA consultants, and so far, none of them have reported working on a system that has set System Stats. They could all be too busy to respond, but I suspect that there are few systems using this feature, even though it's been available for a long time.

I find the NOWORKLOAD part already quite interesting :-), in particular the change in MREADTIM calculation when having different MBRC settings and the obvious significant cost change compared to traditional I/O based costing.

Regarding the system statistics usage: The point I wanted to make here is that in 10g you work with system statistics, always, period (except if you downgrade your optimizer_features_enable to 9i or use undocumented parameters).

So the DBAs you've asked are either not aware of this or took your question as if they had explicitly gathered system statistics or not.

A lot of trouble when migrating from pre-10g to 10g or 11g comes from the fact that people are not aware that without any changes to the application/database the cost calculation of the optimizer is by default significantly different due to the default NOWORKLOAD system statistics, not to mention the default statistics gathering job.

To answer your question: The work is already done, it's just that I realized that is was far too much for a single post, so I decided to split it into several ones.

Glad to hear you've got more almost ready to go. I'm looking forward to the rest of the series.

I wasn't clear in my statement. My question to the DBAs was whether any of the guys had worked with databases where System Stats had been set to something other than their default values. Although it may well have been misinterpreted, as I wasn't that clear in my question to them either.

Anyway, I did get a couple of responses - but for the most part, the answer was a pretty consistent "no, we are not working with databases where Systems Stats have been changed from their default values".

One guy reported having worked on a system where System Stats were gathered once per week - but I guess the results must have not changed too dramatically because he said they didn't have any stability issues (that he was aware of) due to the changing System Stats.

I did hear Jonathan Lewis make a comment about System Stats at the Hotsos Symposium a month or so ago. He said something to the effect that he had come to the conclusion that leaving them at the defaults seemed to work pretty well. He mentioned something about the calculated ratio between the mreadtime and sreadtime ending up being pretty reasonable and representative for a lot of systems. It was just a comment in passing, so not a lot of detail.

Anyway, I do think it's not a well understood feature, so I'll be interested to read the rest of your postings on the topic.

Jonathan's comment - as always - highlights the core message: The most important part of the system statistics is the ratio between the SREADTIM and MREADTIM values since this primarily drives the cost of the full table scan operation in comparison to index access paths.

That is the reason why I tried to explain how the SREADTIM and MREADTIM values are synthesized when using NOWORKLOAD system statistics and how they change when using different "db_file_multiblock_read_count" settings resulting in different cost calculations of the FTS operation.

As it can be seen from the formulas used the block size also influences the SREADTIM and MREADTIM calculation, so there are several parameters that determine how the optimizer arrives at these values.

So Jonathan is probably right that the defaults assumed are reasonable in most cases and most people won't care as long as everything runs smoothly but if there is a need for troubleshooting understanding how the values are derived can become a crucial part of the analysis.

Kerry,maybe you just polled the wrong DBAs... ;-) I fact, I encourage my customers to make use of system statistics where it makes sense. E.g., at an investment bank, system stats are in production for a few years now, although scarce (3 out of about 20 systems I do consulting for). Still not in production are alternating system stats - I'd like to see them in a data warehouse, applied for nightly batches and daily reports. I'm working on it...

[a little later]OK, now that I read your clarification: No, I did not tamper with the default stats, neither.

can you clarify what do you then mean by encouraging to use system statistics if you're not using custom system statistics?

My point is that in 10g system statistics are enabled by default, so there is not much to "encourage", and in 9i usually only individual WORKLOAD system statistics get used (more on this topic in the remaining parts of the series) and NOWORKLOAD system statistics are only used very rarely.

Regarding the alternating system statistics: I remember a comment by Jonathan Lewis here that he has come to the conclusion that he doesn't recommend that anymore, although he liked the idea in the beginning.

the simple answer to your question regarding the usage of ASSM in test cases is that by using ASSM the size of the object is slightly larger due to the additional blocks required for managing the segment space.

Using MSSM the size is easier to control, so my table here using PCTFREE 99 and generating 10,000 rows results in a table having exactly 10,000 blocks. You would notice that the same table created in an ASSM tablespace has more than 10,000 blocks, in my test database it's approx. 10,130 blocks making the maths more complex when demonstrating the formulas used for cost calculations.

That said I know that Oracle recommends ASSM, but if you follow carefully publications about ASSM you'll notice that it has - like all features - pros and cons, and you should be aware of them in order to decide whether ASSM is really beneficial for you or not:

Some pros:

- Possibly reduced contention in highly concurrent OLTP systems- No special treatment required in RAC environments (MSSM might require manual tuning using multiple freelists or freelist groups and you need to understand how it works to tuned it properly)

Some cons:

- Affect negatively clustering factor of indexes- Segments are slightly larger, applies in particular to small segments, negligible for larger segments- No manual tuning possible- Relatively new technology (ok, it's there since 9i...) therefore still some bugs, in particular when using non-standard block sizes != 8KB

So if you e.g. have a single-instance system that doesn't have to deal with highly concurrent operations using MSSM should just be fine and I don't see an issue in using an old, but mature technology.

I don't see a problem using MSSM too on my machine (except 1-2 more tablespace creation for demos )but when every expert uses MSSM just for simple demos then people might think ASSM is evil. that was the reason I asked the question.

Do you suggest manual assm to your clients if they are not tight fisted about disk space?

I should have pointed out that most of the databases at this site run on 9i, still. It's a bank, after all, which means that usually you aren't on the leading edge. This includes using new features, although nobody there complained about neither the default gathering of system stats nor the nightly schema stats job in 10g... ;-)Thanks for pointing me to Jonathan's article, I'll take a closer look at it.

Have a sunny weekend,Uwe (sitting in the garden working on the second edition of the Oracle Survival Guide).

I think there is a misunderstanding/confusion and may be my wording was not clear enough:

The "adjusted" MBRC is referring to the NOCPU_COSTING aka. traditional I/O-based costing where the MBRC defined by "db_file_multiblock_read_count" (or "_db_file_optimizer_read_count" in 10g and later) is "adjusted" by a fixed factor to cater for the fact that you might not get that large number of blocks at runtime as outlined in the post.

It's not about the MBRC that is used as part of the CPU_COSTING I/O calculation. Your example is simply using the formula provided in a different way, so I think we agree on that.