Featured Database Articles

Three Ways to Identify Non-matching Records in MySQL

By Rob Gravelle

There are many times as a database developer and administrator that you'll want to know what entities are being excluded from a straight table join. For instance, it would be instructive for management to identify sales people who haven't received any bonuses, as these might be deemed to be the underperformers. To provide the requestor with the statistics, you'll have to determine which sales people are absent from the bonuses table. This is commonly referred to as a Non-matching query, although it is sometimes called a Subtract or Set Difference query as well. MySQL provides not one but three standard means of producing non-matching results. I will be presenting each here, along with another that you might not have considered.

Here are the two sample tables that we'll execute our queries against:

employees table:

id

dept_id

gender

name

salary

emp_id_number

1

2

m

Jon Simpson

4500

1234

2

4

f

Barbara Breitenmoser

(NULL)

9999

3

3

f

Kirsten Ruegg

5600

3462

4

1

m

Ralph Teller

5100

6543

5

2

m

Peter Jonson

5200

9747

6

2

m

Allan Smithie

4900

6853

7

4

f

Mary Parker

5600

2223

8

3

f

Kirsten Ruegg

4400

2765

bonuses table:

bonus_id

emp_id

amount

award_date

1

3

500

2009-10-28

2

5

1000

2009-01-03

3

4

250

2009-04-22

4

2

900

2010-01-14

5

1

450

2009-01-03

6

5

1000

2009-10-28

7

8

300

2007-10-01

8

8

300.5

2009-08-12

9

2

250

2009-03-16

10

2

500

2009-07-30

11

6

350

2009-02-12

12

5

300

2009-01-15

While there is only one instance of each employee in the employees table, there are potentially multiple instances of each in the bonuses table, depending on how many bonuses he or she has received. What we want to see is a list of names of employees' IDs that do not appear in the bonuses table.

Solution 1: LEFT JOIN / IS NULL

One way to select values present in one table but missing in another is to use a combination of a Left Join with an "IS NULL" test. Here's the syntax for that:

SELECT field list

FROM left_table

LEFT JOIN

right_table

ON right_table.id = left_table.id

WHERE right_table.id IS NULL

Plugging in our data produces the following Select query:

SELECT emp.name

FROM employees emp

LEFT JOIN

bonuses b

ON b.emp_id = emp.id

WHERE b.emp_id IS NULL;

Running the above statement confirms that Mary Parker is the only employee who has not received a bonus.

Solution 2: NOT IN

The second way to find non-matching records between tables is to use NOT in conjunction with the IN operator. The IN operator allows you to specify multiple values in a WHERE clause, much like a string of ORs chained together. It accepts either a comma-delimited list (value1, value2, etc...) or the results from another Select statement. The latter is what we will be using here.

The syntax for that form is as follows:

SELECT field list

FROM left_table

WHERE left_table.id NOT IN

(

SELECT right_table.id

FROM right_table

)

Here it is again with our specific criteria:

SELECT name

FROM employees

WHERE id NOT IN

(

SELECT emp_id

FROM bonuses

);

It also shows that Mary Parker is the only employee who has not received a bonus.

Solution 3: NOT EXISTS

The main difference between the IN and EXISTS predicate in subquery is the way in which the query gets executed.

IN: The inner query is executed first and the list of values obtained in the result is used by the outer query. The inner query is executed only once.

EXISTS: Each row from the outer query result is selected, and compared against the inner query. Hence, the inner query is executed as many times as there are rows in the outer query result.

Here is the syntax for the Not Exists Select statement:

SELECT field list

FROM left_table

WHERE NOT EXISTS

(

right_table.id

FROM right_table

WHERE right_table.id = left_table.id

)

We would plug in our criteria like so:

SELECT name

FROM employees emp

WHERE NOT EXISTS

(

SELECT emp_id

FROM bonuses b

WHERE b.emp_id = emp.id

);

Once again, poor Mary Parker is without a single bonus. This may not bode well for her next performance evaluation.

Bonus Solution: Group By with Union All

As promised, here is a fourth solution, using Group By with UNION ALL. The first step is to Union the two tables that we want to compare using Union ALL (the ALL retains duplicates). Then we group the result on the column that we want to compare. This will cause columns that appear in both inner SELECTS to have a count of two while unmatched records will only be counted once.

Here is the syntax for the Not Exists Select statement:

SELECT field list

FROM (

SELECT DISTINCT field list FROM left_table

UNION ALL

SELECT DISTINCT field list FROM right_table

) AS tbl

GROUP BY field list HAVING COUNT(*) = 1

Be sure to group by the ID field so that different employees with the same name won't be treated as the same person:

SELECT name

FROM (

SELECT DISTINCT id, name FROM employees

UNION ALL

SELECT DISTINCT emp_id, '' FROM bonuses

) AS tbl

GROUP BY id HAVING COUNT(*) = 1;

You guessed it; Mary Parker is once again left out in the cold.

I have no specific statistics that show how fast this style performs but anecdotal data would suggest that it compares quite favourably.

Which Style Should I Use?

Because of how the MySQL optimizer handles EXISTS, that method is about 30% less efficient than the others. If speed is your primary concern, then the LEFT JOIN / IS NULL and NOT IN styles are your best bets. Otherwise, you really can't go wrong with whichever style suits you.

Advertiser Disclosure:
Some of the products that appear on this site are from companies from which QuinStreet receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. QuinStreet does not include all companies or all types of products available in the marketplace.