We are using version 11.2.0.4 of oracle. We have a query running in ~30 seconds in one database(DB1) and the same query is taking ~5minutes in another database(DB2) with data volume and all other configuration remain same in both the databases. i have captured the sql monitor and the session stats for both the databases. I was checking the table/column/index stats and i see from the statistics of the table from both the databases there exists non zero chain_count in one of the databases(DB2), wondering if that is the reason which is causing this query running slow, btw i don't see significant different in "table fetch continued row" value of the both the session stats, so wanted to understand the actual cause of slowness?

We are using version 11.2.0.4 of oracle. We have a query running in ~30 seconds in one database(DB1) and the same query is taking ~5minutes in another database(DB2) with data volume and all other configuration remain same in both the databases. i have captured the sql monitor and the session stats for both the databases. I was checking the table/column/index stats and i see from the statistics of the table from both the databases there exists non zero chain_count in one of the databases(DB2), wondering if that is the reason which is causing this query running slow, btw i don't see significant different in "table fetch continued row" value of the both the session stats, so wanted to understand the actual cause of slowness?

Looks like a classic case of goodish clustering of data in one DB - you “only” had to do 33K read requests against the table to look at all 118k rows (before cutting it down to one!), compared to the 109K read requests against the table blocks to look at 118K rows (again, cutting it down to one!).

Otherwise, this could be a case of the table blocks being highly cached in the first DB but not so much in the second. We can see the difference in cost between the two DBs does suggest the clustering factor is playing a role.

How does DB2 have the same data as DB1? Was logical replication used (e.g. expdp impdp)? This would cause rows to be stored in different orders on both DBs so your clustering factors will be different. This is a prime example of why you shouldn't use logically replicated DBs to test performance.

The effect of row chaining here is going to be pretty small, only an extra 1K or so rows are chained/migrated but you've got a difference of 70K IOs.

Rebuilding the index could save you about 300 IOs, it's a tiny drop in a massive ocean, definitely not worth the effort (and risk).

The rownum=1 filter obviously means that there's a degree of luck involved in the execution, it's going to stop as soon as it finds a row that matches the other two conditions (after identifying rows that match the indexed column condition). However, both DBs stop after looking at 118K rows, so I don't think this is playing a part - although it would suggest that the position of the first matching row is similar in the two DBs which could hint at similar clustering - but we can see with the stats that they don't have the same clustering factors and just because these rows are similarly positioned, the rows coming up to them may not be.

Of course, the obvious suggestion is to just use a different index - if you only want one row then the index should be able to do a lot of work for you. You will have to bear in mind the golden rule of indexing as you have two non-equalities (https://ctandrewsayer.wordpress.com/2017/03/24/the-golden-rule-of-indexing/ ), but a quick guess would suggest that the sysdate filter will not do a lot of work - so something like

create index test_index_1 on tab1 (sts, cnt)

Might do the trick. Hopefully, you'll still have the inlist iterator for the sts filter. If you need the other column in the index then consider where it should come - although you'll be doing a lot less work just by being able to filter down before hitting the table.

Since the huge majority of the time is spent in touching the table blocks rather than the index (as shown in the live monitor reports), we can see that rebuilding the index will have no effect on the bulk of the execution.

Here we use "Streams replication" to replicate data between these two database, so isn't it true that in this case, the rows should be applied in traget database in same way/order as source, so in both the databases the clustering factor should not differ much? I need to see if both/two way replication is used and that is any way causing this, btw I do see that the difference in clustering factor is 21.9 million vs 22.2 million, not sure if that will have such drastic impact in query timing? And i agree that sql monitor showing plan_line_id- 3 i.e "TABLE ACCESS BY INDEX ROWID" has consumed ~97% of the time, so thinking of index rebuild as may be fragmented may not add much benefit to us.

Another doubt i had, is my understanding is correct regarding the point that ,may be someone did run the ANALYZE table command due to which this column-"chain_count", in dba_tables gets populated, but may be this has been done in long back, so we may need to do that again to see the percentage of chained row at current moment? But again as the session statistics shows the "table fetch continued row" stats as 3660 vs 2423 for both executions, so does it mean, this has not impacted the execution time much, so we should not waste effort looking in this(row chaining) direction? Some of the team mates suggesting to do table reorg, so not sure if that will help us anyway here?

This table is having ~172GB in size holding ~79million rows, it has ~3BLOB columns too(and we are selecting two of the blob columns, however commenting those two columns does not make any difference in elapsed time).And there exists two indexes, one on the primary key(on different columns altogether) and other one-TAB1_STS_IDX on column STS, but as i mentioned t he stats , this column is very less distinct in nature. And also the other colmn used in filter i.e. CNT is very less distinct in nature, so would it be advisable to go for new index on (STS,CNT) considering may not benefit other queries much and impact on DML?

table_name, column_name, num_distinct,num_nulls,density, avg_col_len

TAB1 DT 43433984 0 2.30234463410034E-8 8

TAB1 CNT 7 866 6.20717225848154E-9 2

finally, I tried executing the same query multiple times and seeing the execution time for the second run has been in few seconds in both the databases. the consistent gets were 102k vs 110k , but the physical reads was '1' in both the databases for each subsequent execution and the query was too giving result in seconds for the subsequent execution in both the databases, i was not seeing much difference. So now wondering if its simply due to the caching effect? is this caching anyway related to the presence of blob columns in the table and if this can be improved?

Your initial copy before you turned on the streams replication could have reoordered the rows unless you used physical copies of the data files.

Is the performance difference consistent? eg can you get it down to less than 30 seconds in DB2 just by rerunning the query? That would suggest caching, perhaps also your cache sizes are different so you can’t fit in the magic number of blocks required for this query in DB2.

There is certainly a difference due to clustering but it seems to be only 10%.

Blobs won’t really be touched until the row is projected out, this only happens for one row so I think it’s fine to ignore them. Although, be aware that lobs make everything different, including streams replication - it might have to do additional lookups on the source DB table which would increase the caching of blocks in this table.

num_distinct isn’t so important when your filter is a range ( < ), what matters more is the range of values that your column goes over (the low and high values, or the histogram).

For this SQL you will certainly eliminate about 100K buffers just by being able to filter on cnt and the date column in that index - as I said, you may only need one of those columns. It’s up to you how important the query is, but as this seems to be the target of an async replication: it‘s probably not too important that the DML time is minimized - the users won’t experience it and it’s extremely unlikely to really contribute to an unacceptable lag in your replication (but obviously test it)

I did run the sql query in both the databases three times each to test the caching effect. And i have published the sql monitor and trace output for each of them. The subsequent execution in each of the database does finishes within few seconds. So it hopefully justify the caching is the main reason behind such a large difference in elapsed time of the first execution of the query and it must be due to the fact the DB1 is heavily used so the table blocks are readily available in the cache so making the initial execution faster on DB1 as compared to DB2. And it may be happening sometimes to reach out to the first matching row(rownum=1) would be taking bit longer and the only possibility to further improve the query performance would be, as you suggested , to create composite index on STS,CNT, based on the importance of the query. Please correct me if my understanding is wrong here.

I confirmed the sga_max_size is set as 24gb in both the databases. sga_target set as 0, so it ASMM enabled and i don't see many resize operation in gv$sga_resize_ops. And max db_cache_size in gv$sga_dynamic_components noted as ~6gb and shared_pool as ~4gb.

As i see even the first execution on db-1 also took ~20 seconds(with ~8k physical reads) as compared to seconds(with ~2 physical reads) in subsequent execution. So my thought was , if the caching of this table blocks is depends on how the blob gets stored in that table? if it gets stored along with the row then chances are there it would flush out of the cache quickly due to the larger size of the row as compared to the scenario when its stored outside the row and is there any workaround possible to make this better?

I did run the sql query in both the databases three times each to test the caching effect.

Sorry - but no, that does NOT test 'the caching effect'. At least not the one Andrew was referring to.

Maybe you misunderstood what Andrew has tried to tell you? Because you keep wandering off into other things like LOBs, etc.

This is what you posted;

38573 physical reads -db1

109818 physical reads -db2

And this is what Andrew said

you “only” had to do 33K read requests against the table to look at all 118k rows (before cutting it down to one!), compared to the 109K read requests against the table blocks to look at 118K rows (again, cutting it down to one!).

Andrew mentioned clustering. Your response was to do three executions on each DB.

That tests the WRONG thing - it isn't important how well things work if you cache a lot of the data in memory. That has NOTHING to do with clustering factor.

What matters is how important it works when there is NOTHING in memory. Then the clustering factor will affect how many blocks need to be PHYSICALLY read to get the data needed.

Those 'physical' reads above already give you an indication.

The test you need to run is to FLUSH the buffer cache so that NONE of the blocks are in memory and then see how many blocks get read on each DB. That will give you an indication of the clustering factor.

Before you do anything I suggest you read Richard Foote's articles about clustering factor - both how to calculate it and how it can affect things.

Then please follow the guidelines in the FAQ about how to post a tuning request and the info you need to provide. That info includes:

1. the DDL for the tables and indexes

2. row counts for the tables AND query predicates

3. info about stats - are they current and how they were collected

And, for your use case, info about what Andrew ask about - how was the INITIAL table data populated on DB2 from DB1.

"Is the performance difference consistent? eg can you get it down to less than 30 seconds in DB2 just by rerunning the query? That would suggest caching, perhaps also your cache sizes are different so you can’t fit in the magic number of blocks required for this query in DB2."

"is certainly a difference due to clustering but it seems to be only 10%."

Sorry if i misunderstood but above were two of Andrew's points, so i have made the query rerun to evaluate caching effect and i have confirmed there was no difference in sga sizes fr the two databases. And i think even the initial execution of query on DB2 is slower as compared to DB1 as because of the DB1 is active/used by the external users and so the table blocks are more cached as compared to DB2(which is not exposed to external users and maintained as kind of standby). And i was going into the LOB side, as because i was wondering of those anyway impact the caching of rows/blocks due to larger size or anything?

And it may be because we may not have full control on improving the clustering factor, and as per Andrew's suggestion creating composite index on column (STS,CNT) should make the performance better i.e. by making the index to do maximum work for us rather relying on the post filter criteria.

As this issue is on production, so I will see if i can test/reproduce these on lower environment again by flushing buffer cache and post them. Actually i was thinking the first execution which i posted for DB1 vs DB2 should give the real impact/difference of clustering factor. I have published the table/column/index stats in previous post and stats are updated. I will post the DDL for table/index.

And i think even the initial execution of query on DB2 is slower as compared to DB1 as because of the DB1 is active/used by the external users and so the table blocks are more cached as compared to DB2(which is not exposed to external users and maintained as kind of standby).

Sorry - but you STILL seem to have it backwards - not sure why.

1. there is only so much memory

2. the buffer cache is only so big

3. if DB1 is used by other users then table blocks are LESS cached compared to DB2.

ALL USERS will use buffer cache. So if DB1 is used by other users they will use some of the buffer cache for the work they are doing.

That means there will be LESS CACHE available for the user you are testing - not more.

That means that 'caching' would be LESS PRONOUNCED between the two machines.

In normal operations you can NOT flush the buffer cache.

In normal operations you can NOT run things multiple times.

You need to test/tune so that things run properly and meet their SLA (service level agreement) in the NORMAL scheme of operations.

Flushing the cache is just a better way to see the difference in PHYSICAL reads between the two DBs. That is it.

The effort you've made to supply relevant information, and the description of your thought processes to date is something that many users of this forum could learn from.

There are several features of the information you've supplied that suggest your ideas about "recent use" are correct. If I were sitting in your seat with your access to the data I'd be checking the frequency histogram that almost certainly exists on the STS column to confirm the hypothesis, but everything about the figures says there's a tiny fraction of the data that has values A or B and that's the interesting fraction of the data that the users are accessing most frequently, which is why it's fairly well cached on the primary system but very poorly cached on the secondary system.

I'll be writing up some notes later on today or tomorrow about interpreting the plan and the statistics you've reported, with a couple of comments on how some of the stats might be misleading and need corroboration, and how some of the stats have to be describing exactly what's happening, and how some of them need to be cross-referenced to make sure that you have a consistent understanding of the data.