Counting the Cost #2 May 19, 2009

Let’s continue where I left off. I had shown that select count(*), count(1), count(pk) were equivalent under Oracle 10 onwards (and I am pretty sure Oracle 9 and 11, but you would need to check).

So, looking at this simple select count(whatever) from a table, what else can we establish? Please look back at the prior posting to see the test table and indexes I created and basic environment info, but to summarise:-

The Execution plan has a new hash plan value and looks at a new index.
This is because we have told Oracle to count the new column. What we are actually asking is “give me a count of all rows with a value set for the column ID_2.”
The column is not mandatory, ie set to not-null (check the tables description above), there could be many null values set for this column.
(Many null values do not break the UNIQUE constraint. Null, in Oracle terms, is a value that could be anything. Anything does not match anything. Thus you can have 0, 1 or many null values for a non-mandatory but unique column. A Primary key column, incidentally, MUST be mandatory.).
A count on the column will count all non-null values. So it uses the index on that column.
Let’s run it again, to get stats that do not include any parse overhead.:

52 consistent gets, which is more than for the scans of index T_PK. The cost is also one higher, 12 as opposed to 11. This shows that the Cost Based Optimiser is correctly identifying that count(ID_2) will take more effort than count(*), and this is borne out by the number of consistent gets.
Index T_UQ is a larger index (I did check). Oracle can’t use the smaller primary key index T_PK as this index does not hold information on column ID_2.

Now, let us select count of a column that is not unique but indexed. Column NUM_2. Remember, what we are actually asking is “give me a count of all rows with a value set for the column NUM_2.
What we see is:

There is a new plan hash value and plan. Again, it needs to check for rows with the column specified set to a value, and there is an index to support this, T_NONUQ and it does a fast scan of it. Second run:

44 consistent gets and a cost of 10. Hang on, that is less effort than count(*), count(1) or count(id). It is a smaller index. Why did the CBO not use this index to get the count of records in the table? Because, we just asked Oracle to count the rows with a value set for the NUM_2 column. The column is not mandatory, Oracle cannot guarantee there are no null values, so a count of the entries in the index would not be guaranteed to be the number of records in the table.

If you know a little more about Oracle, you may wonder if Oracle could check the column stats to see how many null values there are and, if there are none, us that smaller index for the count(*). But null values may have been created since the stats were gathered. If the CBO tried to use this trick, it could easily give the wrong answer, and then how would you feel about that licence fee?

As a further aside, I once came across a claim that counting a column with a unique index would be faster than counting a column with a non-unique index. They could have been right, depending on the size of the indexes, but they were not doing what they thought they were. They were counting the number of records with a value set for that column.

For this reason I would shy away from using count(column) to count the records in a table. If the column is mandatory, you will be counting the number of records, but your code is saying “count how often that column has a value”. As a pedant I feel it’s just wrong. As an experienced DBA who has seen mandatory columns made non-mandatory in live systems, I feel it is also slightly dangerous. Having also seen indexes, even unique indexes, be dropped from live systems, the performance impact could be significant, as you will soon see.

OK, I have been careful so far to count only columns with indexes on them. Let us count a column than has no index on it.

There is a new plan hash value and plan. Now a full table scan is being used and the cost has shot up. The CBO has no index holding information on this column, so it has to scan the whole table. Let’s do the second run:

A cost of 240 (the estimated effort) and buffer gets of 1058 (the actual effort). That is a lot more than scanning an index that is considerably smaller than the table. And now you can see why dropping an index that was on a column someone was counting (rightly or wrongly) can have an impact on performance.

Finally for now, let’s count a column that is sparsely populated, column NUM_4. Oh, I have only included the second run, so any parse overhead is gone:

A new plan hash value and a new plan. The CBO is using the index on the sparse column. The cost is 3 and the consistent gets only 7, a lot lower than we have seen before.. And the number of records is 1,999, not 19,999. We have counted the number of records with a value set for NUM_4. It needs so many fewer consistent gets as the index is a lot smaller.

This can be used as a performance trick, if you regularly want to count a small number of records in a large table, for example active orders in an orders table, which holds orders going back years and 99.9% of them are complete:

Have a column called ACTIVE, a CHAR(1) column.
Set it to a value, eg ‘Y’, when the record is created.
Set it to null when the order is complete.
Re-set it to ‘Y’ if the order ever becomes valid again.

A count on the column will use the very small index (helped by the fact the column is a CHAR(1)) to count the records. This little index will also allow the active records to be identified in other select queries also.

Don’t forget though, that indexes on the table has to be maintained by oracle and it does take up some space, so there is a cost to having it. Also, I have not mentioned what happens to that index over time… That might be for a later posting, but if you look at Richard Footes blog, you may well find something about such indexes growing over time.

My next post on this subject will consider if there is a way to count the records in a table even quicker than count(*).

[…] etc are immaterial with modern version of Oracle – they all take the same effort by the CBO. This next post looks at how indexes are used when you count a single column in a table and why it is not always a […]