March 5, 2018

SEMI JOIN in modern relational databases

A semi join returns rows only from the first table provided at least one matching row is found in the second table. The difference between a conventional join and a semi join is that rows from the first table are returned once at most even when the join column of the second table contains duplicate rows. There is no ANSI SQL way to invoke a semi join. The following syntax doesn’t exist:

SQL> select
t1.*
from t1
semi join t2
on t1.n1= t2.n1

In almost all modern relational databases, the semi join logical operation is triggered via the predicate IN, ANY, SOME, EXISTS and DISTINCT. MS-SQL Server has an extra particularity which makes it using a semi join during an INTERSECT operation as we will see later in this article.

The following queries illustrate these diferent semi-join cases in a 12.2.0.1 Oracle release:

The apparition of the semi join in the above execution plan is the result of the partial join transformation introduced by Oracle as from 12cR1. Earlier versions were not able to use this partial join transformation scanning, as such, the entire content of t1 table is scanned instead of tiny 256 rows as shown below:

So far all the semi join logical operations have been executed via a HASH JOIN physical operation. This doesn’t mean that a semi joins can’t be used in a NESTED LOOPS physical operation. Here’s below the demonstratation:

4. SEMI join and uniqueness
I started this article by specifying that a semi join guaranties at most one row will be returned (per join column) from the left table even if there are duplicate rows in the join column of the second table. An example being worth a thousand words so here we go:

It is for this reason that the semi join has been implemented: do not duplicate rows from the outer table when they multiple join with rows from the inner table.

Since now we know the semi join main goal we can infer that, as long as, Oracle is sure that the right table has no duplicate values in the join column, it will fall back to a simple inner join. Let’s check:

Notice that we have no semi join anymore. Thanks to the unique index on t2(n1) table, Oracle knows that there are no duplicate rows in the join column of the outer table. It can then reliably use an inner join because rows from the outer table will at most join exactly once with rows from the inner table.

There exists another way to make Oracle falling to an inner join instead of a semi join even when there are duplicate rows in the join column of the outer table. In fact whenever Oracle uses an operation that eliminates duplicate rows from the outer table (for example sort unique or group by) it can confidently use an inner join in place of a semi join. Here’s below one case demontrated:

As you can see Oracle didn’t use a semi join because it has scanned all rows from the outer table, ordered them and eliminated duplicate rows via the SORT UNIQUE operation. As such when joining with the t1 table Oracle knows that it can join t2 rows at most exactly once. Notice in passing that while the semi join has full scanned the t1 table (10,000 rows) the inner join touched only 5 distinct index keys.

5. SEMI join and foreign key integrity

Suppose now that we want to select rows from a child table matching their parent rows in the parent table.

We have seen in this article how a semi join logical operation is triggered when using IN, ANY, SOME, EXISTS and DISTINCT in several modern relational databases. We’ve outlined as well that, in contrast to Oracle and PostgreSQL, SQL-Server can use a semi join during an INTERSECT set operation. We’ve demonstrated that while a regular joins can duplicate rows, semi joins do not produce any duplicates. They are defined to return (or not) rows only from one table of the join. We’ve also shown that, as long as the CBO is sure that the right table has no duplicate values in the join column, it will fall back to a simple inner join where extra transformations become possible and where the left table might not be fully scanned.

Advertisements

Share this:

Like this:

Related

Regarding the SQL-Server comment, though, Oracle has been able to handle conversion from intersect (and minus) to semi join (and anti join) since 10.1.x.x but the feature is not enabled unless you change the value of parameter _convert_set_to_join to true. It’s been around for such a long time that it still puzzles me that the default hasn’t changed.

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.