Matching and false identifications

Version of November 21, 2012

The basic matching scheme

MAST has a simple procedure for matching a collection of the same objects
on the sky across two catalogs ("crossmatching"). Typically,
the catalogs are the KIC and some Catalog A. The approach is to take
each object in KIC and compare its J2000 coordinates with all those
in Catalog A via an automated cone search. Next we take all matches
within a prescribed matching radius (given in the table on the
Explanations & Caveats
page) and rank them by angular separation from the KIC object coordinates.
Any secondary, tertiary, etc. ranked objects are rejected from this step
of the matching. The procedure is then repeated from Catalog A against
entries in the KIC, and again only primary matches (identifications) within
the search radius are retained. In our present implementation we demand
that the matches be mutually 1:1, that is, if the closest match for KIC1234 is
is CatA_5678, then the converse must be true as well. For all of our matches
retrieved from our Target Search form, we rely on this 1:1 matching
standard. Note while that there is no guarantee that all the matches
are "correct", the chances of having false or duplicate matches
are greatly reduced by the matching/reverse-matching steps.

When these steps are implemented across all non-KIC catalogs the
results are put into a database we call the Kepler Colors
Table. The Target Search tool allows users to retrieve all results
from this table. In the CasJobs implementation the Colors Table is
called keplerObjectSearchWithColors.

Alert users of the CasJobs tool will notice that there are two
CasJobs tables for matching between KIC and GALEX catalog lists.
One is the KGGoldStandard, for which the 1:1 matching criterion
applies. A second, KGMatch, gives secondary, tertiary, etc. matches in
both directions, KIC to GALEX and GALEX to KIC. KGMatch also includes
columns of the numbers of secondary matches and reverse matches.
The intent here is to provide CasJobs users with alternative choices
that are particularly relevant for comparisons to GALEX - a survey
which is shallower than others in brightness. MAST decided not to
provide such secondary tables for crossmatches to other catalogs. We
did so, first, because the other catalogs extend to fainter magnitudes
and therefore have more reliable identifications with respect to a
common catalog (KIC). Second, the results would add a bewildering array
of additional columns to their Target Search retrieval pages that most
users do not need and would thus likely interfere with their work.

Similar to the case of the GALEX survey, we adopted a conservative matching
search radius of 1" for objects in our imported Sloan (SDSS/DR9) catalog.
However, we occasionally find objects out to 2" or so that should be physical
matches but are not in our ColorTable database. In this case we have
not adopted a second table for SDSS in Casjobs: first, because unlike
GALEX, the SDSS goes faint (by >2 magnitudes) than the KIC, and second,
because only a small portion of the Kepler FOV is covered by this survey.

Notes on false identifications

The matching of objects between two different astronomical catalogs cannot be
100% reliable for a variety of reasons. On one hand, a comparatively small
number of matches to the correct counterpart objects can be inadvertently
missed (for example, the secondary identications referred to above may
be the correct ones). On other hand, apparent dual associations can be
made between two KIC objects and an object in another catalog, even though
neither has a formal match. These are subtle points, so we will elaborate.

False and missed identifications can still occur with a 1:1 matching
criterion. Consider, first, that any catalog may contain two entries with
very small angular separations between them. One of these may be an image
artifact, perhaps not. In such cases where artifacts are listed, crossmatches
of them to a secondary catalog will obviously lead to questionable results.
Second, one catalog may
not go as deep in brightness as another, making the object with the
shallower exposure hard to discern; the project pipeline may decide
not to extract it. It may not show up in the catalog, or if it does,
the faint object's coordinates may be inaccurate.
Third, the matching catalogs are constructed from images extracted from
a montage of individual observations of overlapping areas of the sky.
Under these conditions the same object can occasionally be assigned
inaccurate coordinates. As discussed below, if the positional
errors are larger than the catalog’s the rejection criterion of
duplicate matches, then duplicate associations can be found for objects
listed in the KIC and another catalog.

Here are two examples of how matches can go wrong:
Consider first the example given on Part 2
of the CasJobs GOhelp page. This discusses the example of two apparent GALEX
objects matching to a particular KIC object, KIC7434250, if the 1:1
match condition is ignored. In this case both FUV and NUV GALEX identifications
have been assigned coordinates each near the same KIC object, but these
coordinates are inaccurate because in both observations the object is
detected near the edges of the observation field, where the coordinates
are often inaccurate. In fact, the coordinates from one or both of them
are in error enough in this particular case that the GALEX catalog
recognizes them falsely as two separate GALEX objects. Since our matching
relies on assumed 1:1 matches, the matching of Kepler KIC7434250
is not made, and our Color Table gives no matches.
This is an example of a missed match. It will be lost to researchers
searching for this match on the Target Search form. Note that this form
does not give information on the distances of objects from each of their
secondary matches, so the investigator has no clue that a match has been
missed. (The solution is to go to the more liberal KGMatch table in CasJobs,
This table does not require 1:1 matches, so the apparent dual matches
can be discovered.)

A second example follows of a double close association that leads to a possible
ambiguity but not to a missed object. If one searches on the coordinates
(292.2301245, 37.589047) and makes sure that the search radius is
set to at least to 0.03 arcminutes, the retrieval page shows two KIC objects
(rows 1and 3) and one UKIRT object (row 2). This anomaly again occurs because
of the requirement for 1:1 matching. This UKIRT object is very close
to both KIC objects, so it cannot be matched uniquely to either of
them. Our procedure causes the the UKIRT object as be recognized
an independent object, probably incorrectly. An ambiguity in the match
to KIC objects results. The alert user can then make the appropriate
decisions. Note that is also possible that in this case one of the objects
is false. However, this cannot be demonstrated because the two KIC entries
have different KIC r magnitudes. As of this moment, the correct association
for this UKIRT object appears in limbo.

These and other examples require user judgment calls, usually settled
by more information. One way of resolving these difficult cases
is to see if matches are found from additional catalogs.
The take-away from all this is that the matches between catalogs require
simple follow ups: they are not written in stone. Users should investigate
such cases and be aware that their judgments carry an element of risk.