February 28, 2014

Empty Hash

A little while ago I highlighted a special case with the MINUS operator (that one of the commentators extended to include the INTERSECT operator) relating to the way the second subquery would take place even if the first subquery produced no rows. I’ve since had an email from an Oracle employee letting me know that the developers looked at this case and decided that it wasn’t feasible to address it because – taking a wider view point – if the query were to run parallel they would need a mechanism that allowed some synchronisation between slaves so that every slave could find out that none of the slaves had received no rows from the first subquery, and this was going to lead to hanging problems.

The email reminded me that there’s another issue of the same kind that I discovered several years ago – I thought I’d written it up, but maybe it was on a newsgroup or forum somewhere, I can’t find it on my blog or old website). The problem can be demonstrated by this example:

You’ll notice that I’ve created a data set for table t1 where the values 12 does not appear in column rep_col; so my query will return no rows. However, for the purposes of demonstration, I’ve hinted an execution path that will scan t1 and distribute it by hash, then scan t2 to distribute that by hash before doing the join. Here’s the plan – which I’ve generated with rowsource execution statistics enabled and pulled from memory after executing the query:

Since this was 11.2.0.4 Oracle has used Bloom filtering to reduce the traffic between slave sets – but you can see that despite returning no rows from t1 (lines 7 – 10), Oracle still performed the parallel tablescan of t2 (lines 12 – 15). Thanks to the Bloom filter we don’t transfer 10,000 rows between slave sets, but we can see from the Buffers and Reads columns that we really did do the tablescan – and if we take a snapshot of instance activity we would have seen 10,000 rows fetched by tablescan at that point.

If I ran this query serially Oracle would stop after discovering that the first tablescan returned no rows – why bother scanning the probe table when the hash table is empty ? But because the query is running in parallel, a single slave that receives no data from the first tablescan cannot assume that every other slave in the same set has also received no data – there’s no cross-chat that allows the slaves to discover that every slave has no data – so the second scan goes ahead.

I was a little surprised by this when I first found it since I thought (from looking at some of the 1039x trace information) that the parallel slaves were telling the query coordinator how many rows they had acquired on each granule – which would allow the coordinator to spot the zero total. But it looks as if I was misinterpreting the trace.

On the plus side for this example – it’s probably the case that if zero is a possible volume of data returned by the query then the optimizer will have decided that it was going to get a “small” data set for the build table and therefore do a broadcast distribution – and if that happens the second tablescan won’t occur – as we see below (note the zero Reads, zero Buffers):