SELECT / UNION question.

John McKown

I have a SELECT / UNION statement and I think that I may be doing
some
"overkill". The statement looks like:

SELECT DISTINCT *
FROM DB1
UNION
SELECT DISTINCT *
FROM DB2
;

If I read the book correctly, the UNION will eliminate all
duplicate rows so
using the DISTINCT on the SELECT is not really necessary. Does the
DISTINCT
help or hurt? Both tables are defined identically, even to the
column names.
There may well be duplicates within each table and between the
tables.

Also as an aside, my defination of a "duplicate row" is one in
which all the
columns SELECTed have the same value. Is that correct?

Terry Purcell

If you select all columns, and have a unique index on each table,
then the
DISTINCT is discarded (due to the uniqueness guaranteed by the
index).
Change it to select columns that are not within a unique index and
the
DISTINCT will perform an additional sort. You specify that there
may be
duplicates within each table, so you must not have a unique index,
so will
be performing additional sorts.

The UNION takes care of the duplicates, so the DISTINCT is
logically
redundant.

However, the DISTINCT (to remove the duplicates) before the UNION
(which
removes the final duplicates) may be a benefit. A GROUP BY that is
able to
avoid a sort is even better in this situation, than a DISTINCT that
requires
a sort.

Explain will show three sorts in this case (with DISTINCTs),
whereas without
the DISTINCTs, you will have one sort. As mentioned, this does not
always
mean better performance.

As always, test out which SQL performs better for you.

To your final question....yes...a "duplicate row" is one in which
all the
columns SELECTed have the same value.

I have a SELECT / UNION statement and I think that I may be doing
some
"overkill". The statement looks like:

SELECT DISTINCT *
FROM DB1
UNION
SELECT DISTINCT *
FROM DB2
;

If I read the book correctly, the UNION will eliminate all
duplicate rows so
using the DISTINCT on the SELECT is not really necessary. Does the
DISTINCT
help or hurt? Both tables are defined identically, even to the
column names.
There may well be duplicates within each table and between the
tables.

Also as an aside, my defination of a "duplicate row" is one in
which all the
columns SELECTed have the same value. Is that correct?