Sunday, May 24, 2009

Understanding the different modes of System Statistics aka. CPU Costing and the effects of multiple blocksizes - part 4

I've already mentioned it several times on my blog but I would like to take the chance here again to stress the point that the cost based optimizer does a bad job when it comes to calculating costs for full table scans of objects residing in non-default block sizes. It really looks like that this feature has been introduced to support transportable tablespaces but it obviously hasn't been tested very thoroughly when it comes to cost calculation.

Each of the different modes has its deficiencies when dealing with objects in non-default blocksizes. The somehow odd thing is that the traditional I/O costing does the best job, and all system statistics based calculations are utterly wrong.

Traditional I/O based costing

The traditional I/O based costing simply scales the MBRC up or down according to the non-default blocksize to come to the same I/O read request size. So if you e.g. have a MBRC of 8 and a default blocksize of 8KB and now calculate the cost for an object residing in a 2KB tablespace, the MBRC will be multiplied 4, which results in a MBRC of 32. The I/O cost will be different although due to the different adjustment used with the higher MBRC setting. The adjusted MBRC for 32 is 16.39 whereas the adjusted MBRC for 8 is 6.59, so the calculated cost for the full table scan of the object residing in the 2KB tablespace will be higher. Likewise the same happens when using an object in a 16KB non-default tablespace. The MBRC will be reduced accordingly to 4 to get the same I/O read size again. Since adjusted MBRC for MBRC = 4 is 4.17, the cost calculated will actually be less for the object residing the 16KB tablespace.

System statistics

Things get more complicated when using NOWORKLOAD or WORKLOAD system statistics.To recall the formula to calculate the I/O cost of a full table scan with system statistics is:

Number of blocks / MBRC * MREADTIM / SREADTIM

And in case of NOWORKLOAD system statistics the MREADTIM and SREADTIM are synthesized using the following formula:

SREADTIM = IOSEEKTIM + DB_BLOCK_SIZE / IOTRFSPEED

MREADTIM = IOSEEKTIM + DB_BLOCK_SIZE * MBRC / IOTRFSPEED

Now if the object resides in a non-default blocksize tablespace, the following inconsistent adjustments are applied to the formula:

So obviously something is odd in above formulas: The SREADTIM and MREADTIM values are synthesized with a mixture of a scaled MBRC (according to the block size) but a non-adjusted default DB_BLOCK_SIZE, resulting in a large variation in cost, so a full table scan in a small blocksize is much more expensive than in the default blocksize, and likewise a full table scan in a large blocksize is much cheaper. Unfortunately this doesn't reflect at all the runtime behaviour, since Oracle acutally scales the I/O read request size accordingly meaning that the runtime difference usually is negligible, but the cost calculated is dramatically different.

WORKLOAD system statistics:

MBRC as measured/set

SREADTIM as measured/set

MREADTIM as measured/set

I/O cost = Number of blocks / MBRC * MREADTIM / SREADTIM

This is even worse than above NOWORKLOAD result because the I/O cost calculated simply is different by the factor of number of blocks in non-default block size / number of blocks in default block size, e.g. an object residing in a 2KB block size will have an I/O cost four times higher than an object residing in a 8KB default blocksize, and the MBRC is not adjusted at all for the calculation.

Some calculation examples

The following examples attempt to summarize what you get if you're using an object in a non-default blocksize tablespace:

1. I/O Cost of a full table scan of a 10,000 blocks / 80MB segment residing in a default blocksize of 8KB (in case of system statistics excluding the CPU cost) and using an MBRC of 8:

WORKLOAD system statistics: same as above provided you use the same for MREADTIM and SREADTIM otherwise as measured

2. I/O Cost of a full table scan of a 40,000 blocks / 80MB segment residing in a non-default blocksize of 2KB (in case of system statistics excluding the CPU cost) and using a default blocksize of 8KB and a MBRC of 8:

3. I/O Cost of a full table scan of a 5,000 blocks / 80MB segment residing in a non-default blocksize of 16KB (in case of system statistics excluding the CPU cost) and using a default blocksize of 8KB and a MBRC of 8:

And here is a final example that shows by simply moving an object to a non-default blocksize tablespace a change in the execution plan can be observed. An index scan is used instead of the full table scan since the cost of the full table scan operation is significantly increased. The SQL trace will reveal that at runtime however there is no significant difference between the full table scan executed in the default tablespace and non-standard blocksize tablespace.

Don't be mislead by the different number of consistent gets, this is just due to the different block size. As can be seen from the raw trace files the multi-block reads in the 2KB block size are reading 32 blocks at a time, whereas in the 8KB default block size 8 blocks are read at a time, so the number of I/O requests will be similar given a segment of the same size.

>>> Yes, but consistent gets have an expense, mostly CPU, setting locks & latches. Less gets is less CPU overhead, the question is the size of the change. <<<

That is basically true, but the point that I tried to make here was a different one: Quite often the larger block sizes are recommended because they would offer a "faster I/O" simply due to fact that the blocks are larger.

As the test case demonstrates this is not true, because the I/O request size for multi-block reads is always the one defined by the default block size.

Regarding your point of less locks & latches: Usually when performing multi-block I/O most of the time is spent on waiting on the I/O and the difference in latch activity between the different block sizes is negligible, although it might be measurable.

If you have a system that performs so many consistent gets that you get contention due to latch activity moving the objects to a larger block size will probably only offer a small relief.

Very often this is caused by inefficient execution plans (e.g. NESTED LOOP operations with millions of iterations of random table lookups or full table scans) and tuning these plans will be the more scalable solution in most cases.

Another point to consider when using larger block sizes: If you have a workload that also consists of random row table access by rowids (usually when performing NESTED LOOP operations and/or index scans) you might be unlucky and the table rows accessed are scattered across many blocks which required more space in the cache buffer with the larger block size, competing for the buffer cache with other operations.

Of course you might also be lucky and the number of blocks accessed is lower with the larger block size than with a smaller block size potentially reducing the number of consistent gets (e.g. in case of a index range scan).

So as you can see, there is no black and white, but as always: It depends!