August 19, 2013

Distributed Queries – 3

A comment I’ve made many times in the past about distributed queries is that Oracle doesn’t try to retrieve histogram information from remote databases when optimizing a query. Checking back through previous posts, though, I don’t think I’ve ever mentioned it on the blog – so here’s an example demonstrating the point.

I’ll start with 11.2.0.3, with some fixed system stats (as they could make a difference to repeatability), and the following data set.

You’ll notice that the skew column is generated from carefully skewed data. I have one row with the value 1, two rows with the value 2, and so on until I end up with 80 rows with the value 80.

With this data in place, check the execution plan for a simple join – joining the table to itself on the id column selecting two different sets of data based on the value of column skew. Here’s the query, with the resulting execution plan from autotrace.

It’s a simple nested loop – for each row of a predicted 5, Oracle will use the t1_pk index to acquire a corresponding row in the second instance of the table and then check the table for the value of column skew.

Now watch what happens to the plan if I make the second instance of the table remote (in this case by simply setting up a loopback database link). I’ve got two versions of this query, one unhinted, and one with the /*+ driving_site(t2) */ hint telling Oracle to execute the query at the remote site (the one holding the table t2 referenced by the hint). If the optimizer had the same information for the distributed versions of the query as it did for the original case we would probably expect to see the same cardinalities and plans in all three cases – though the cost might vary slightly because of some estimated cost of network traffic.

The cardinality error in the first distributed join is easy to see – Oracle’s estimate for the remote predicate “skew = 80” is 41, which is exactly what we would expect if we dropped the histogram. Clearly the histogram hasn’t been pulled to the local database.

Similarly, though the mental gymnastics may be a little harder, the same cardinality error appears in the second distributed join – the “remote” database sends the “local” database a query with the predicate “skew = 5” expecting to get back 41 rows, the average number of rows per value. The remote database didn’t ask for the histogram data before optimising the join.

It’s interesting to note, by the way, that when I ran this test on 12c a couple of the plans were reported as adaptive – allowing Oracle to switch between nested loops and hash joins as the query progressed. Distributed queries tend to be a bit of a problem at the best of times – maybe adaptive plans will reduce the amount of time we need to spend re-engineering them.