Counting the Costs May 17, 2009

Anyone who has been around the Oracle DBA or Developer world for a while is sure to have come across people debating(*) the fastest way of selecting the number of rows from a table. Different suggestions have been that count(*) is not as fast as count(pk_column), count(rowid) or count(1).

(*) by debating I mean arguing, sometimes getting angry and, on one memorable occasion, pretty much asking each other to step outside to resolve this matter Man-to-Man (why do women not feel the need for this final step? Just more sensible I guess).

Bottom line, it makes no difference whether you select count(*), count(1), count(pk) on Oracle 10 (and, I am sure, Oracle 9). However, that is an unsupported claim and you should not believe me yet. Any web site or blog making a solid statement about how Oracle works needs to provide some proof, otherwise it is just opinion.

What I find more interesting is that some people seem happy to spend more time debating the question than doing a few tests to find out for sure. Also, proving the answer to such a simple question can actually lead to some more interesting insights.

I’ve followed the lead of my betters and I have created a script to run through all the below tests. If I can bend my very new blogging skills to it, the sql*plus script can be downloaded from this link.

Below follows my proof and some observations on the way.

First I better show what version of Oracle I am using and the basic set up as far as sql performance goes (I’ve trimmed the output to remove number of rows returned, timing etc where it is not significant)

test102>– basic setup info

test102>select version,archiver,database_status from v$instance;

VERSION ARCHIVE DATABASE_STATUS
—————– ——- —————–

10.2.0.1.0 STARTED ACTIVE

test102>select substr(name,1,30) name,substr(value,1,20) value
2 from v$parameter
3 where name in (‘db_block_size’,’compatible’,’cpu_count’,’db_file_multiblock_read_count’
4 ,’optimizer_mode’,’sga_target’,’sort_area_size’)
5 order by name
6 /

Now I create a reasonably sized table with a numeric column ID to be the primary key, column NUM_1 which is a small varying numeric, a unique column ID_2,NUM_2 holding random values between 0 and 99, another small number column NUM_3 and a final numeric column NUM_4 which is sparse, only 1 in 20 records will hold a value. Finally there are three columns VC_1 to VC_3 which just hold strings to pad out the table a little.

Now I add a primary key (a supporting unique index is created automatically), a unique index, a non-unique index and an index on the sparse column. Finally, I use the old ANALYZE command so compute statistics on the table and index, so that the Oracle Optimizer has information about the table and it’s indexes.

What does this show? Well, there are 19,999 rows, it took 1 millisecond (ms) to do the count.

The plan is simple, there was a fast full scan of the index T_PK, cost was 11 and it took 1ms

The statistics show 44 recursive calls, 0 block reads (this is physical reads from disc, as far as Oracle is concerned) and 53 consistent gets, which is reads from memory. I’ll ignore the rest for now.

However, I have done one test on one statement. I was taught to be a scientist at college, one isolated test is not enough, I do not know if the results are stable. I’ll do it again.

Well the result and the plan are the same, but now the code takes less than 1ms, there are no recursive calls and consistent gets drop from 53 to 46. The recursive calls and the extra consistent gets in the first statement are due to Oracle doing some parsing work on the statement – it has checked the syntax, looked up the objects in the data dictionary and come up with an execution plan (I have to be honest, this is not quite true, I have run the statement several times already so some of that work was done before, thus the parse work in this example is a little light).

The lesson here is, treat the first execution of a statement as a special case. The second and further executions give better stats on the execution of the statement. Run the select count(*) from test_1 half a dozen times yourself and you should see the stats are stable.

Now, select count(id) from test_1. Column ID is a primary key

test102>– now count on the Primary key
test102>select count(id) from test_1;

Why is there only a difference of 1 recursive call between the two runs? The Oracle kernel has spotted that the query can be satisfied with the same plan, so it just allocates it to the cursor (ie uses it for this statement). Note, the plan hash value is the same.

Now to try count(1). For brevity I am going to trim the output of things that do not change. You have the script to run yourself so you can check I am not deleting anything significant.

test102>– count on a static value
test102>select count(1) from test_1;

Like this:

Related

At risk of debating the topic to death: My guess is that any Unique Index could be used to “count” the records. It would be interesting to see if a “smaller” UK index (it may need an explicit constraint) would be chosen over a larger (overloaded, inflated) PK index, assuming a table had those kind of indexes.

As for displaying SQL and Code-text in blogs, in the blogspot area the tag works ok, but it still removes whitespace.
Any idea of better ways to display code in "courier" font ?
How do the pros do that?

Thanks Piet.
Your points are good ones and will be touched on in the next part on this topic. And thanks for the tags hint. I know other bloggers have trouble getting code fragments to look good in blogs.

Thanks for that Sandro. The fastest way to count the rows in a table is to scan the smallest index that includes all columns – which bitmaps do. I cover this in the third post on the topic, but I was not very good at adding links between posts when I started blogging.
If you check the fourth post on this topic, I show that there is potentially an even faster way to get the number of rows in a table :-)

Oh dear.
Thanks for letting me know. I used the hotlinks service before getting my website up and running. Looks like I need to do some fixing up of old posts :-( It might take me a while to get around to it, but if you want the scripts Sandro, send me an email

If you look at posting two on this topic (there is a link on one of the above comments) then you will see that I suggest that :-)
Posting three goes on to bitmap indexes, which have an entry for every row and are often THE fastest way to count all rows in a table.

I should add a work of warning about selecting count(row) on a not-null indexed column. You are NOT counting the rows in the table, you are counting the occurences of that column being populated. It just so happens that if the column is mandatory, the two is the same. What if someone changes that column to nullable?