It's necessary to use a dummy query in this case because otherwise the <tt>order</tt> and <tt>setLimit</tt> methods would be applied to the final result set rather that the individual one.

It's necessary to use a dummy query in this case because otherwise the <tt>order</tt> and <tt>setLimit</tt> methods would be applied to the final result set rather that the individual one.

+

+

[[Category:Database]]

Revision as of 03:15, 11 May 2014

This tutorial is for Joomla! CMS Version(s)

This page covers Joomla 3.3 or later only

Although the union methods are present in Joomla 2.5 and 3.x, they do not work in any release prior to 3.3. There is a pending pull request [1] for this same functionality to be included in Joomla 2.5; testers needed.

Using a UNION in a database query is a useful way to combine the results of two or more database SELECT queries that are not necessarily linked by a database relationship. It can also be a useful performance optimisation. Using a UNION to combine the results of different queries can sometimes be significantly faster than a single query with a WHERE clause especially when the query involves joins to other large tables.

For those familiar with a little set theory a UNION does exactly what you would expect; it merges the set of results from one query with the set of results from another query to produce a set of results which is the set union of both of the individual result sets. If you want the set to be ordered then you will need to pay some particular attention to the way that is done, as will be explained later.

Contents

The basics

In order to use a UNION you need to be aware of the basic requirements established by the SQL server you are using. These are not enforced by Joomla, but if you don't comply with them you will get database errors. In particular, each SELECT query must return the same number of fields in the same order and with compatible data types in order for the UNION to be successful.

A simple example

Suppose you have want to send out a mailshot to a group of people but the database is such that the names and email addresses you want to send to are not all in the same table. To invent an arbitrary example, suppose that you want to send the mail to all customers and all suppliers and that that the names and email addresses are stored in tables called, unsurprisingly, customers and suppliers, respectively.

This query will retrieve all the customer information that we need for the mailshot:

The result set obtained from the union query will actually be a little different from running the individual queries separately because the union query will automatically eliminate duplicates. If you don't mind that the result set may contain duplicates (which mathematically speaking means it isn't a set) then using unionAll instead of union will improve performance.

Lots of ways to use union

The union (and unionAll) methods are quite flexible in what they will accept as arguments. You can pass an "old-style" string query, a JDatabaseQuery object, or an array of JDatabaseQuery objects. For example, suppose you have three tables, similar to the example above:

// The union method can be chained.$q1->union($q2)->union($q3);// The union method will accept string queries.$q1->union($q2)->union('SELECT name, email FROM shareholders');// The union method will accept an array of JDatabaseQuery objects.$q1->union(array($q2,$q3));// It doesn't matter which query object is at the "root" of the query. In this case the actual query that is produced will be different but the result set will be the same.$q2->union(array($q1,$q3));

union, unionAll and unionDistinct

There are actually three union methods available.

union produces a true set union of the individual result sets; that is, duplicates are removed. The process of eliminating duplicates may or may not incur a performance hit, depending on the data sets and database structures involved.

unionAll produces a union of the individual result sets but duplicates are not removed.

unionDistinct is identical in behaviour to union and is merely a proxy for the union method.

Using UNION instead of OR

There are some instances where using union can give a significant performance boost instead of the more commonly used alternative of an OR or an IN in a where clause.

For example, suppose you have a table of products and you want to extract just those products that belong to two particular categories. Typically this would be coded something like this:

This is because an ORDER BY clause in the individual SELECT statements implies nothing about the order in which the rows appear in the final result. A UNION produces an unordered set of rows. The query above will be syntactically correct, but the MySQL optimiser will simply ignore the ORDER BY clause on the suppliers SELECT statement and the ORDER BY clause on the customers SELECT statement will be applied to final result set rather than the individual result set.

The way around this is to add an additional column to the result set and sort on that in such a way that the results will have the desired ordering. Here's one way to do that:

Advanced ordering

However, there may be occasions when it is important to have an ORDER BY clause on the individual queries that won't be dropped by the query optimiser. Suppose you want to send a special offer to your top 10 customers and your top 5 suppliers. You will therefore want to apply a LIMIT clause in combination with the ORDER BY clause and in that case the query optimiser will not ignore the ordering. This is how you could do it: