Since union queries were introduced in MySQL 4.0.0, they have
been processed by executing each part of the union and adding the
result to a temporary table. Then a final query block is executed
that, if needed, filters out duplicates, do global ordering and
limit the number of output rows. The good thing about this method
is that it works for all union queries. The bad thing is that it
always uses a temporary table, even in the really simple cases,
and there’s always an extra query block to execute, even if
there’s no sorting and no duplicate filtering.

As of MySQL 5.7.3, UNION ALL doesn’t use temporary tables unless
needed for sorting. The result of each part of the union will be
sent directly back to the client, without waiting in a temporary
table or passing through an extra query block. When 5.7.3 came
out, Facebook (who filed the feature
request for this) cheered, and Morgan took the new feature for a test run. If
you haven’t already done so, I recommend reading Morgan’s blog
post that will tell you how to check if your query is optimized
by this feature.

Performance improvement

It’s hard to give an objective measure of the speedup, and we
haven’t claimed any numbers. The reason is that the most
important speedup depends on the data and the queries that are
executed. I could run some tests and present numbers, but unless
your queries look like mine, my numbers wouldn’t help. You really
need to evaluate this with your own queries and data.

The thing that has really changed a lot is the time the client
has to wait for the first row. Earlier, the client would have to
wait until the server had executed all parts of the query and
inserted the resulting rows into the temporary table before the
final query block could read the temporary table and output the
first row to the client. Now the server outputs the first row as
soon as it is found.

The speedup depends on the execution time of the remaining parts
of the union. The time saved waiting for the first row is
proportional to the total execution time of the rest of the
query. If one part of the union is significantly slower than the
others, and time to first row matters, you should consider
pushing that query block to the end of the union.

Of course, none of this will reduce the total execution time. You
may gain something due to less internal processing of the data,
especially if the temporary table grows large and is moved from
main memory to disk. Again, the performance improvement depends
on your data and queries. As a rule of thumb, the worse your
queries perform now, the more improvement you’ll see.

In a comment to the Facebook post, Robert Eisele suggested a
further improvement: to stop execution early if the global LIMIT
is reached. It’s a very good idea that would significantly reduce
the server load if the limit is small relative to the total
number of result rows. There’s no use in executing a query past
the last row the client will see, and this, as opposed to the
improvements mentioned above, will reduce the total execution
time of the query. I searched the bug database and found that we
already have a feature request for this: bug
#7888. And I have an idea of how it could be implemented.

Another aspect of the new UNION ALL processing is that there is
no longer a temporary table to fill your memory or disk while
executing the query. Judging by the comments in the feature
request, there are people that will appreciate this improvement,
too. (BTW, remember to not put your on-disk temporary tables in
main memory.)

Backporting

In a comment to the same Facebook post, Kacper Rowiński writes
what I interpret as a request to backport the feature to 5.5. I
understand that people want this, but let me give a few reasons
not to do it:

Linux distros expect GA releases to only fix bugs, not
introduce new features. Ideally, they would want only security
bug fixes.

Every new feature we backport increases the risk of
regressions, and none of us wants that.

Even though this feature is stand-alone and doesn’t directly
depend on other new features, we’ve done other changes in 5.7
that makes it non-trivial to backport this feature.

That said, MySQL is open source, so if anyone wants to backport
it and compile their own server, go ahead! Keep an eye out for
how the global limit (i.e., the limit for the whole union) is
handled. That is one of the changes in 5.7 that you’ll have to
work around.

Search

MySQL Links

Content reproduced on this site is the property of the respective copyright holders.
It is not reviewed in advance by Oracle and does not necessarily represent the opinion
of Oracle or any other party.