8.2.1.6 Nested-Loop Join Algorithms

Nested-Loop Join Algorithm

A simple nested-loop join (NLJ) algorithm reads rows from
the first table in a loop one at a time, passing each row to
a nested loop that processes the next table in the join.
This process is repeated as many times as there remain
tables to be joined.

Assume that a join between three tables
t1, t2, and
t3 is to be executed using the following
join types:

Table Join Type
t1 range
t2 ref
t3 ALL

If a simple NLJ algorithm is used, the join is processed
like this:

for each row in t1 matching range {
for each row in t2 matching reference key {
for each row in t3 {
if row satisfies join conditions, send to client
}
}
}

Because the NLJ algorithm passes rows one at a time from
outer loops to inner loops, it typically reads tables
processed in the inner loops many times.

Block Nested-Loop Join Algorithm

A Block Nested-Loop (BNL) join algorithm uses buffering of
rows read in outer loops to reduce the number of times that
tables in inner loops must be read. For example, if 10 rows
are read into a buffer and the buffer is passed to the next
inner loop, each row read in the inner loop can be compared
against all 10 rows in the buffer. This reduces by an order
of magnitude the number of times the inner table must be
read.

A join buffer is never allocated for the first
nonconstant table, even if it would be of type
ALL or
index.

Only columns of interest to a join are stored in its
join buffer, not whole rows.

The join_buffer_size
system variable determines the size of each join buffer
used to process a query.

One buffer is allocated for each join that can be
buffered, so a given query might be processed using
multiple join buffers.

A join buffer is allocated prior to executing the join
and freed after the query is done.

For the example join described previously for the NLJ
algorithm (without buffering), the join is done as follows
using join buffering:

for each row in t1 matching range {
for each row in t2 matching reference key {
store used columns from t1, t2 in join buffer
if buffer is full {
for each row in t3 {
for each t1, t2 combination in join buffer {
if row satisfies join conditions, send to client
}
}
empty join buffer
}
}
}
if buffer is not empty {
for each row in t3 {
for each t1, t2 combination in join buffer {
if row satisfies join conditions, send to client
}
}
}

If S is the size of each stored
t1, t2 combination in
the join buffer and C is the
number of combinations in the buffer, the number of times
table t3 is scanned is:

(S * C)/join_buffer_size + 1

The number of t3 scans decreases as the
value of join_buffer_size
increases, up to the point when
join_buffer_size is large
enough to hold all previous row combinations. At that point,
no speed is gained by making it larger.