Context. Before the publication of the Gaia Catalogue, the contents of the first data release have undergone multiple dedicated validation tests.

Aims. These tests aim to provide in-depth analysis of the Catalogue content in order to detect anomalies and individual problems in specific objects or in overall statistical properties, and either to filter them before the public release or to describe the different caveats on the release for an optimal exploitation of the data.

Methods. Dedicated methods using either Gaia internal data, external catalogues, or models have been developed for the validation processes. They test normal stars as well as various populations such as open or globular clusters, double stars, variable stars, and quasars. Properties of coverage, accuracy, and precision of the data are provided by the numerous tests presented here and are jointly analysed to assess the data release content.

Results. This independent validation confirms the quality of the published data, Gaia DR1 being the most precise all-sky astrometric and photometric catalogue to date. However, several limitations in terms of completeness, and astrometric or photometric quality are identified and described. Figures describing the relevant properties of the release are shown, and the testing activities carried out validating the user interfaces are also described. A particular emphasis is made on the statistical use of the data in scientific exploitation.

1. Introduction

This paper describes the validation of the first data release from the European Space Agency’s Gaia mission (Gaia Collaboration 2016b). In a historical perspective and following in the footsteps of the great astronomical catalogues since the first by Hipparchus of Nicaea, the Gaia Catalogue describes the state of the sky at the beginning of the 21st century. It is the heir of the massive international astronomical projects, starting in the late 19th century with the Carte du Ciel (Jones 2000), and a direct successor to the ESA Hipparcos mission (Perryman et al. 1997).

Despite the precautions taken during the acquisition of the satellite observations and when building the data processing system, it is a difficult task to ensure perfect astrometric, photometric, spectroscopic, and classification data for a catalogue of one billion sources built from the intricate combination of many data items for each entry. However, several actions have been undertaken to ensure the quality of the Gaia Catalogue through both internal and external data validation processes before each release. The results of the external validation work are described in this paper.

The Gaia DR1:

There is an exhaustive description of the Gaia operations and instruments in Gaia Collaboration (2016b) and of the Gaia processing in Gaia Collaboration (2016a); the astrometric and photometric pre-processing is also detailed in Fabricius et al. (2016). For this reason we mention here only what is strictly necessary and refer the reader to the above papers or to the Gaia documentation for details.

The Gaia satellite is slowly spinning, and measures the fluxes and observation times of all sources crossing the focal plane (their Gaia transit), sending to the ground small windows of pixels around the sources. These times correspond to 1D, along-scan positions (hereafter AL), which are used in an astrometric global iterative solution process (AGIS; Lindegren et al. 2016), which also needs to simultaneously calibrate the instruments and reconstruct the attitude of the satellite. A star crossing the focal plane is measured on nine CCDs in the astrometric instrument so the number of observations of a star can be up to nine times the number of its transits. On-board resources are able to cope with various stellar densities; however, for very dense fields above 400 000 sources per square degree, the brighter sources are preferentially selected.

The photometric instrument is composed of two prisms, a blue photometer (BP) and a red photometer (RP). This colour information is not present in the Gaia DR1; only the G-band photometry is derived from the fluxes measured in the astrometric instrument. The CCD dynamic range does not allow observations of all sources from the brightest up to G ~ 21: sources brighter than G ~ 12 would be saturated. To avoid this, time delay integration (TDI) gates are present on the CCD and can be activated for bright sources, which in practice reduces their integration time (but also complicates their calibration).

Astrometry and photometry are then derived on the ground in independent pipelines, which is part of the work developed under the responsibility of the body in charge of the data processing for the Gaia mission, the Gaia Data Processing and Analysis Consortium (DPAC; Gaia Collaboration 2016a).

This first data release contains preliminary results based on observations collected during the first 14 months of mission since the start of nominal operations in July 2014. At the start of nominal operations of the spacecraft on 25 July 2014, a special scanning law was followed, the Ecliptic Pole Scanning Law (EPSL). In EPSL mode, the spin axis of the spacecraft always lies in the ecliptic plane, such that the field-of-view directions pass the north and south ecliptic poles on each six-hour spin. Then followed the Nominal Scanning Law (NSL) with a precession rate of 5.8 revolutions per year, starting on 22 August 2014. As we note below, the EPSL mode left some imprints on the Catalogue content and scientific results.

Gaia DR1 contains a total of 1 142 679 769 sources. The astrometric part of Gaia DR1 is built in two parts. The first is the primary sources, which contains positions, parallaxes, and mean proper motions for 2 057 050 of the stars brighter than about magnitude V = 11.5 (about 80% of these stars). This data set, the Tycho Gaia Astrometric Solution (TGAS), was obtained through the combination of the Gaia observations with the positions of the sources obtained by Hipparcos (ESA 1997) when available, or Tycho-2 (Høg et al. 2000b). The second part of Gaia DR1, the secondary sources, contains the positions and G magnitudes for 1 140 622 719 sources brighter than about magnitude G = 21. An annex of variable stars located around the south ecliptic poleis also part of the release thanks to the large number of observations made during the EPSL mode.

Catalogue validation:

In terms of a scientific project, the quality of the released data has been controlled by two complementary approaches. The verifications are done internally at each step of the processing development in order to answer the question, are we building the Catalogue correctly? The validations are done at the end to answer the question, is the final Catalogue correct?

It is fundamental to note that the first step of the validations is logically represented by the many tests implemented in the Gaia DPAC groups before producing their own data; these tests are described in dedicated publications: Lindegren et al. (2016) for the astrometry, Evans et al. (2017) for the photometry, and Eyer et al. (2016) for the variability.

To assess the Catalogue properties and as a final check before publication, the DPAC deemed it useful to implement a second and last step, a validation of the Catalogue as a whole, and it should be noted that it was a fully independent validation.

The actual Catalogue validation operations began after data from the DPAC groups had been collected and a consolidated Catalogue had been built before publication. At this step, it was not possible to rerun the data processing; it was only possible to reject some stars (if strictly necessary) and to make some cosmetic changes on the data fields. After the rejection of problematic stars, a process known as filtering, the validation was again performed, and most of the catalogue properties described in this paper refer to this post-filtering, published, final Gaia DR1 data.

The organisation of this paper is as follows. Section 2 summarises the data and models used. Section 3 describes the erroneous or duplicate entries found and partly removed. The main properties of the Gaia DR1 Catalogue are discussed in Sect. 4 for the sky coverage and completeness, with a multidimensional analysis in Sect. 5, the astrometric quality of Gaia DR1 in Sect. 6, and the photometric properties in Sect. 7. As a conclusion, recommendations for data usage are given in Sect. 8. The validation procedures employed in testing the design and interfaces of the archive systems are described in the Appendix together with some illustrations of the statistical properties of the Catalogue.

2. Data and models

2.1. Data used

2.1.1. Gaia data

Two months before the final go-ahead to publish the Gaia DR1 Catalogue, we received the official preliminary Catalogue (hereafter pre-DR1), which was validated and then subsequently filtered, as described in Sect. 3, to produce the Gaia DR1 Catalogue. Generally speaking, the validation work has had access to the same fields as are published in Gaia DR1 so that any user can reproduce most of the work indicated below. For example, we did not have access to any individual transit data or calibration data, or more generally to the main Gaia database, and this fostered developing methods independent from the work done within the Gaia groups producing the data. A few supplementary fields were, however, kindly made available for validation purposes, such as the preliminary GBP and GRP magnitudes (in order to study possible chromatic effects).

2.1.2. Simulated Gaia data

In the course of the preparation of the data validation, we also needed simulated data, mostly for testing the astrometry of the TGAS solution. For this purpose we built a simulated catalogue, called Simu-AGISLab in what follows, which contained astrometric data for the Tycho-2 stars, on top of which were added simulated TGAS astrometric errors. Simu-AGISLab used the Tycho-2 simulated proper motions, but they were deconvolved using the formula indicated in Arenou & Luri (1999, Eq. (10)) to avoid a spurious increase of their dispersion with the TGAS astrometric errors added by the simulation. The simulated parallaxes were a weighted average of deconvolved Hipparcos parallaxes (for nearby stars) and the photometric parallaxes from the Pickles & Depagne (2011) catalogue (for more distant stars). The simulated TGAS astrometric errors were produced as described in the Tycho-Gaia Astrometric Solution document (Michalik et al. 2015), based on solution algorithms described in Lindegren et al. (2012, Sect. 7.2).

In addition, global simulations of the Gaia data generated by the DPAC group devoted to this purpose were also used for validation tasks comparing models with data (see Sect. 2.3).

2.1.3. External data

The comparison of Gaia DR1 to external catalogues is a tricky task as the Gaia Catalogue is unique in many ways: it combines the angular resolution of the Hubble Space Telescope (HST) with a complete all-sky survey in optical wavelength down to a G-magnitude ≃ 21, unprecedented astrometric accuracy, and all-sky homogeneous photometric data.

However, the comparison with external catalogues is one way towards a deeper understanding of many of the parameters describing the performance of the Catalogue: overall sky coverage, spatial resolution, catalogue completeness, and of course precision and accuracy of the different types of data for the various categories of objects observed by Gaia. In addition to the Hipparcos and Tycho-2 catalogues, many other catalogues have been used, specially chosen for each of these tests. They are described in each of the relevant subsections.

The cross-match between TGAS and the external catalogues or compilations has been done using directly Tycho-2 or Hipparcos identifiers, either provided in the publications or obtained through SIMBAD queries (Wenger et al. 2000) using the identifiers given in the original papers. For the full Gaia DR1 tests, a positional cross-match has been used.

2.2. Data integrity and consistency

Gaia DR1 is the combined work of hundreds of people divided into dozens of groups working on several complementary yet independent pipelines. In addition to testing the data themselves, therefore, we tested the data representations to ensure that all catalogue entries were valid and self-consistent. We checked that catalogue values were finite, that data were present (or missing) when expected, that all fields were in their expected ranges, that observation counts agreed with each other, that source identifiers were unique, that correlation coefficients formed a valid correlation matrix, that fluxes and magnitudes were related as expected, that the positions obtained from the equatorial, ecliptic, and galactic coordinates agreed, and so on. We also confirmed that the Gaia DR1 in different data formats indeed contained the same data.

All data integrity issues were fixed before the data release. For TGAS solutions we also checked individual values of proper motions and parallax looking, for example, for unrealistic tangential velocities. We then checked the uncertainties of the five astrometric parameters to make sure that they decreased with the number of observations or to see if there were Healpix pixels with an unusually high fraction of large uncertainties. All in all we were particularly interested in regions on the sky where dubious values occur with higher frequency than in typical areas, with the aim of excluding – if needed – these regions from the release. Although some poorly scanned regions were identified as problematic, in the end none were excluded.

Sources brighter than about 12 mag are observed with “gates”, i.e. with reduced exposure time. We therefore checked that the astrometric standard uncertainties did not show rapid changes as a function of magnitude.

We found only a few minor issues in the Gaia DR1 astrometry regarding the data ranges. High values of fields like astrometric_excess_noise1 and astrometric_excess_noise_sig that statistically were expected for only about a thousand sources are actually present in about 205 million sources, including nearly the entire TGAS sample. These high values reflect the large errors introduced by the preliminary attitude solution for the Gaia spacecraft; a better solution will be used in future releases (Lindegren et al. 2016) and we expect that this problem will be solved. In addition, 4288 sources have positions based on only two 1D measurements, providing an astrometric solution with no degrees of freedom. These minimally constrained solutions are expected to go away as more data are collected.

We tested whether sources had enough astrometric measurements to allow for a 2- or 5-parameter solution, as appropriate. We then compared the distribution of astrometric goodness-of-fit indicators with their expected distributions.

Photometry and astrometry were derived in independent pipelines, each of which could decide to reject or downweight a number of individual observations for a given source. We therefore checked whether the number of valid observations was similar in the two pipelines. If more than half of the observations were rejected, and if the number of valid observations in each pipeline adds up to less than the total number of observations for the source, there is a problem: it is not possible to know whether the astrometric and photometric results refer to the same object or, for example, to different components of a double star. This problem affects less than 9000 sources in Gaia DR1 and we also expect it to be solved in future releases2.

2.3. Galaxy models

Models contain a summary of our present knowledge about the stars in the Milky Way. This knowledge is obviously imperfect and many of the discrepancies between the models and the real Gaia data are likely due to the models themselves. At the level of our current knowledge, however, if a model reproduces the existing data with satisfactory accuracy, it can be used for Gaia validation (at the level of this accuracy). This is what we have done in the set of tests based on models. These tests may supersede the validation using external data in regions of the sky where data are too scarce, or in magnitude ranges where existing data are not accurate enough or are incomplete, or where they do not exist in large portions of sky (e.g. parallaxes).

In Gaia DR1, three kinds of tests have been performed: tests on stellar densities, tests on proper motions, and tests on parallaxes. In all the tests we analysed the distribution on the sky of the model densities and of the statistical distribution of astrometric parameters (proper motions and parallaxes) and compared them with the Gaia data. In order to establish a threshold for test results we compared the model with previous catalogues on portions of sky when available. For this first data release only the Besançon Galactic Model (Robin et al. 2003) has been used for comparisons with Gaia data.

3. Erroneous or duplicate entries

The pre-DR1 Catalogue received for validation was subject to several tests concerning possible erroneous entries. This led to the filtering of a significant number of sources (37 433 092 sources were removed, i.e. 3.2% of the input sources). As this filtering was obviously not perfect (removing actual sources while conserving erroneous ones), and had an impact on the Catalogue content, the rationale, methods used, and results are described in this section.

3.1. Erroneous faint TGAS sources

3.1.1. Data before filtering

As can be seen in Fig. 1a, a significant number of objects (2381 sources) in the pre-DR1 version of TGAS had G ≳ 14 mag, i.e. they were clearly fainter than was expected for Tycho-2. This led to the study of the G photometry for these stars, and beyond for the whole catalogue.

A particular concern has been to catch coarse processing errors in the photometry. For bright sources, the exposure time in each CCD on board Gaia is reduced by activating special TDI gates on the device as the star image crosses the CCD. This shorter exposure time is then taken into account when computing the flux. However, on some rare occasions the information on gate activation did not reach the photometric pipeline. The result was artificially low fluxes in that particular transit, and for reasons beyond the scope of this paper, this could upset the processing and lead to erroneous G magnitudes.

We therefore specifically checked whether sources appeared much fainter in G than in both GBP and GRP, the preliminary versions of photometry to be published in later releases (Riello et al. 2016). In practice, the limit was set at 3 mag in order not to eliminate diffuse objects with a bright core, e.g. galaxies, which were expected to be bright in the diaphragm photometry of GBP and GRP; stars with G−GBP > 3 and G−GRP > 3, where a problem with G was suspected, were filtered (164 446 TGAS or secondary sources).

While the median number of G-band observations per source is 72 in Gaia DR1, it was also found that roughly half of the too faint TGAS sources had fewer than 10 CCD observations; on the whole, catalogue stars with fewer than 10 observations clearly behaved incorrectly. This led to the removal of all sources with fewer than 10 G observations from pre-DR1 (746 292 TGAS or secondary sources).

3.1.2. Data after filtering

Figure 1b shows the resulting magnitude distribution for TGAS in Gaia DR1, i.e. after full filtering. There is a remaining tail with 352 sources fainter than G = 13.5 mag, and the presence of such sources in TGAS calls for an explanation. We have taken a closer look at the 60 faintest TGAS stars of which the brightest has G = 14.98 mag. Of these 60 stars, 25 have a neighbour brighter than G = 13.5 mag and closer than 5′′ in Gaia DR1, suggesting that the wrong star may have been used in the TGAS solution, which is therefore not valid. Of the remaining 35 stars, just over half (18) have from one to four neighbours within 5′′. In these cases we may be dealing with spurious Tycho-2 stars. Tycho-2 (Høg et al. 2000a) was using an input star list dominated by photographic catalogues, and a blend of sources may therefore have been seen as a single bright source. It may then happen that a Tycho-2 solution was derived from the mixed signal of contaminating sources. We see that as a likely explanation for most of these cases. For stars that are isolated in Gaia DR1, spurious Tycho-2 stars cannot be excluded, but in at least one case, the faint Gaia source turns out to be a variable of the R CrB type. This star (HIP 92207) has G = 16.57 mag in Gaia DR1, but is as bright as VT = 10.29 mag in Tycho-2. This is in good agreement with available light curves. It is too early to say whether there are more high amplitude variables in the sample.

3.2. Duplicate entries

3.2.1. Gaia DR1 before filtering

Before launch, a catalogue with known optical astrometric and photometric information of sources up to magnitude G = 21 had been built in order to be used as the Initial Gaia Source List (IGSL; Smart & Nicastro 2014).

Stars from IGSL may have initially contained duplicates originating from overlapping plates, for example. Automatically generated catalogues such as Gaia DR1 may also have multiple copies of a source for a variety of reasons, including poor cross-matching of multiple observations, inconsistent handling of close doubles, or other observational or processing problems, in addition to the duplicates originating from the IGSL. To test for duplicate sources we cross-matched the Gaia catalogue against itself, identifying pairs of sources that could not possibly be real doubles, either because they fell within one pixel (59 mas) of each other or because their positions were consistent to within 5σ. Only reference epoch positions were used, with no corrections for high proper motion stars.

It was found that the pre-DR1 Gaia catalogue contained 71 million sources with a counterpart within one pixel or 5σ. Most appeared in pairs, but some were clustered in groups of up to eight duplicates. Up to one third of sources around G ~ 11 mag were affected, far more than at much brighter or much fainter magnitudes.

For Gaia DR1, we removed all but one source from each group of close matches, selecting the source with the most precise parallax (if present), and breaking ties by choosing the source withthe most observations, followed by the best position or photometric error. Because duplicated sources may have compromised astrometry or photometry (e.g. if a source was duplicated because of a cross-matching problem), the surviving sources were marked with the duplicated_source flag in the final catalogue (35 951 041 TGAS or secondary sources).

Two examples of the effect of the filtering of duplicate sources are shown in Figs. 2 and 3. The result of the filtering as done for Gaia DR1 is illustrated in Figs. 2 and 3c. The artefacts in Figs. 3a and b are the traces of the overlaps of photographic plates used in some of the surveys from which the IGSL catalogue was built, causing an excess of duplicate sources in Gaia DR1.

Number of pairs of sources vs. their angular separation in the field (l = 350°, b = 0°) before filtering (red) and after (green). The line corresponds to a random distribution up to 10′′ of the latter.

3.2.2. Gaia DR1 after filtering

Although it is estimated that about 99% of the duplicates have been removed, spurious sources may still remain in Gaia DR1. Formal uncertainties on positions of these duplicates may have been underestimated, and the 5σ criterion on positional difference used for rejection may not have been large enough. This underestimation was suspected in the following way: a pair made of a source and its duplicate actually refers to a single source which distributed a part of its observations between the two (depending on the orientation of the satellite scans). We used this property to compare the positions and magnitudes in pairs and found that uncertainties were underestimated by a factor of 2 for positions and 4 for magnitudes. While this result cannot be extrapolated to all normal stars (i.e. not duplicated), this gives at least an upper limit and justifies the presence of the duplicated_source flag.

A comparison with the Washington Visual Double Star Catalogue (WDS, Mason et al. 2001) confirms that some duplicates remain, as can be seen from the excess of stars with a near zero separation in the bottom left of Fig. 19b.

In high density fields, it is possible to get several stars very close to each other by chance only, i.e. optical doubles. Trying to remove more duplicates would lead to removing actual stars by mistake. The adopted filtering may actually be a reasonable compromise until the expected improvement is implemented in Gaia DR2.

4. Sky coverage and completeness of DR1

Gaia DR1 is based on 14 months of data only. As aresult, some regions, especially at low ecliptic latitudes, havebeen poorly observed, both in terms of the number of observationsand of the coverage in scanning directions, see e.g.Fig. 2 ofGaia Collaboration (2016a). Starswith fewer than five focal plane transits have been filtered out;

stars with extremely blue or red colours were filtered out during the photometric calibration.

The tests presented in this section aim at a better characterisation of the object content of DR1, including TGAS, regarding the homogeneity of the sky distribution and the small-scale completeness of the Catalogue. These tests have been performed from different points of view, for various populations, and using various inputs and methods: using the characteristics of Gaia data only (internal tests), using external data (all-sky external catalogues, detailed catalogues of specific samples of stars or of specific regions of the sky), or using Galaxy models.

4.1. Limiting magnitude

The completeness of Gaia DR1 is the result of a complex interplay between high stellar densities implying a possible overlap of the images on the focal plane, the scanning law defining the number of times a region was observed, and data processing. Owing to limited telemetry resources, the star images sent to ground followed a decision algorithm which is a complex function of the magnitude. In addition, at the end of the data processing a filter was applied to discard poor solutions in the astrometry and in the photometry. As a result, the density distribution over the sky in the final Catalogue is not a simple function of the stellar density, as usually expected.

Initial, indirect information about the completeness is obtained by the limiting magnitude of the Catalogue. Sky variations of the 0.99 quantile of the G magnitude are shown in Fig. 4 for TGAS and the whole Catalogue. Concerning the latter, it appears that Gaia will easily reachG> 21 at the end of mission in a significant fraction of the sky, even if this is still very limited for Gaia DR1; it seems however that one magnitude has been lost in the underscanned regions, and two magnitudes in the Baade window. The limiting magnitude of TGAS stars also has an amplitude of two magnitudes over the sky; the brightest regions are those that also have some astrometric deficiencies, as shown below.

4.2. Overall large-scale coverage and completeness

4.2.1. All-sky coverage and completeness of TGAS

The overall TGAS content has been tested with respect to the Tycho-2 (Høg et al. 2000b) and Hipparcos Catalogues (Perryman et al. 1997; ESA 1997) in order to detect possible duplicate entries and to characterise missing entries. TGAS contains 79% of the Hipparcos and 80% of the Tycho-2 stars. One of the reasons for the missing stars is a poor astrometric solution, as all sources with a parallax uncertainty above 1 mas were not kept in TGAS (validation tests done on preliminary data had indeed shown several problems associated with these stars). The sky distribution of the Tycho-2 sources not present in TGAS is presented in Fig. 5; it shows the impact of the Gaia scanning law via the number of observations and the orientation of the scans being correlated with the solution reliability criteria filters applied for Gaia DR1.

The detail of the histogram in Fig. 1 shows that stars fainter than 10.5 mag have suffered a higher loss than average, a likely reason is the occasional source duplication described in Sect. 3, which affects these magnitudes more. The loss is clearer for stars brighter than 6 mag, partly due to an insufficient number of bright calibration sources for the broad-band photometers, so no colour was available. The G magnitude calibration includes a colour term (Carrasco et al. 2016), so a missing colour means that no G-band photometry was produced, and the source did not enter the release. Stars brighter than about 5, and a fraction of sources fainter than this, were also among the sources not kept in TGAS owing to the bad quality of their astrometric solution.

TGAS completeness has also been tested with respect to high proper motion stars: a selection of 1098 high proper motion (HPM) stars has been made with SIMBAD on stars with a Tycho or HIP identifier and a proper motion higher than 0.5 arcsec yr-1 (proper motions mainly from Tycho-2 and Hipparcos). Forty per cent of this selection is not found in the TGAS solution, in particular bright stars. All stars with a proper motion higher than 3.5 arcsec yr-1 are absent from TGAS. Stars with a proper motion higher than 1 arcsec yr-1 in TGAS have been confirmed to have a high proper motion in SIMBAD.

4.2.2. All-sky coverage of Gaia DR1 from external data

The overall sky coverage of Gaia DR1 has been tested by comparison with two deeper all-sky catalogues: 2MASS (Skrutskie et al. 2006) and UCAC4 (Zacharias et al. 2013). The tests performed here use the cross-match between Gaia DR1 and these two catalogues provided by the Gaia Archive (Marrese et al. 2016). The variation over the sky of four key parameters are checked: the number of cross-matched sources, the mean number of neighbours (stars which could have been considered as cross-matched, but for which the cross-match was not as good as for the selected source; hereafter the best neighbour), the number of Gaia stars with the same best neighbour, and the number of Gaia sources without any matches. Finally, a random subset of about 5 million sources has been selected in order to check the different properties, if present, in magnitude, colour, proper motion, goodness of fit, etc., of the above four categories of stars.

UCAC4.

Only 5% of the UCAC4 catalogue does not have a match in Gaia DR1. Their sky distribution (Fig. 6a) shows the footprint of the Gaia scanning law. Instead, 7% of the UCAC4 sources appear more than once in the cross-match table. We will refer to them as multiple-matches, but it does not mean that this refers to (or only to) duplicate Gaia entries as discussed in Sect. 3.2.1: the Gaia resolution is much better than ground-based instruments so that multiple objects may appear where ground-based catalogues see one object only; the multiple-matches are distributed mainly in high density region, as expected, but their sky distribution also shows the Gaia scanning law footprint (Fig. 6b). There are 258 605 sources with G < 14 in the Gaia catalogue which do not appear in UCAC4, which is supposed to be complete to about magnitude R = 16; their sky distribution (Fig. 6c) follows the Gaia scanning law footprint and recalls the footprint of the Tycho-2 stars not in TGAS (Fig. 5). A detailed inspection of these sources indicates that a large portion of them are actually present in the UCAC4 catalogue, but that the cross-match could not be done, the positional differences being beyond the astrometric uncertainties. This may be linked to the fact that a large portion of these sources have been measured along uneven scan orientations.

2MASS.

For this test, we selected 2MASS stars with photometric quality flag AAA and magnitude J < 14 (a limit corresponding roughly to V < 20 for AV < 5). As expected, most of the missing sources are located in high extinction regions along the Galactic plane, but some extra features are also apparent and show the Gaia scanning law footprint (Fig. 7a). The 2MASS multiple-matches have a sky pattern (Fig. 7b) similar to that observed with UCAC4; the main concentration is, as expected, along the dense areas added to a smaller Gaia scanning law footprint.

Quasars.

Quasars are essential objects for various reasons and several tests verify that they have been correctly observed by Gaia and identified. The first test compares Gaia DR1 quasars with ground-based quasar compilations: the GIQC (Andrei et al. 2014), LQAC3 (Souchay et al. 2015), and SDSS DR10 (Pâris et al. 2014) catalogues. It is a check for completeness, duplication, and magnitude consistency. While the quasars were also affected by the duplicated sources issue (Sect. 3.2.1), the filtering seems to have removed them nicely. It was found that 81% of GIQC, 53% of LQAC3, and 11% of SDSS quasars are present in Gaia DR1, a ratio that reaches 93% for the LQAC3 sources with a magnitude B brighter than 20.

Galaxies.

For galaxies, the cross-match has been done with SDSS DR12 sources (Alam et al. 2015) with a galaxy spectral classification. The properties of cross-matched galaxies are compared to those of missing galaxies (magnitudes, redshift, axis-ratios, and radii). Unfortunately, only ~0.2% of the SDSS galaxies are present in Gaia DR1 because of the different filters applied. Some large resolved galaxies can still have multiple detections associated with them, tracing their shape.

4.2.3. Completeness from comparison with a Galaxy model

Relative star count differences between Gaia DR1 and the GOG18 simulation in different magnitude bins, from 12 < G < 13 to 19 <G < 20 in steps of one magnitude in galactic coordinates. In addition to the prominent feature of the Magellanic Clouds (absent from the Galaxy model) and inadequacies of the 3D extinction model in the Galactic plane, the Gaia incompleteness around the ecliptic plane due to the scanning law becomes clear at G > 16.

Star counts per square degree as a function of magnitude in several (l, b) directions. Crosses linked with lines are for Gaia DR1 data, filled blue circles are simulations from GOG18. Error bars represent the Poisson noise for one square degree field. The bottom row shows two regions impacted by the scanning law and the filtering of stars with a low number of observations.

Since Gaia DR1 only contains G magnitudes and positions, the validation with models consists of the comparison between the distribution of star densities over the sky and a realisation of the Besançon Galactic Model (BGM, Robin et al. 2003), hereafter version 18 of the Gaia Object Generator (GOG18; Luri et al. 2014). The simulation contains 2 billion stars, including single stars and multiple systems, and incorporates a model for the expected errors on Gaia photometric and astrometric parameters.

In the validation process, star counts as a function of positions and in magnitude bins have been compared with the model (Fig. 8). Systematic differences in Galactic plane fields are mostly due to 3D extinction model problems, but could also be due to other inadequacies of the model (such as local clumps not taken into account in a smooth model). These systematics are seen even in bright magnitude bins. On the other hand, differences at intermediate latitudes in the region of the Magellanic Clouds are not to be considered because these galaxies have not been included in this GOG catalogue. There is no other clear difference between data and model that could warn us about the quality of the data at magnitudes brighter than 16. However at fainter magnitudes, some regions have significantly fewer stars than expected from the model. These regions are located specifically around l = 200−250°, b = 30−60°and l = 30−80°, b = −60;−30°. At magnitudes fainter than 19, regions all along the ecliptic suffer from this smaller number of sources as a result of the scanning law and the filtering of objects with too few observations. In addition, at G > 16 some discrepancies appear in the outer bulge regions, which might be due to incompleteness of the data when the field is crowded (see Sect. 4.3.1 and Fig. 10).

To estimate the completeness in specific fields in greater detail, we compared histograms of star counts from Gaia DR1 and the GOG18 simulation as a function of magnitude. Figure 9 shows the histograms in some regions of the Galactic plane, at intermediate latitudes, and at the Galactic poles. In the Galactic plane (Fig. 9a) the star counts show a drop in the Gaia data at magnitudes brighter than in the model. This could be due a priori to an inadequate extinction model or model density laws, or to incompleteness in the Gaia data at faint magnitudes due to undetected or omitted sources. Since the bright magnitude counts are fairly well fitted, the second hypothesis is more probable. This is also pointed out by comparison with previous catalogues. In the outer Galaxy, the GOG18 simulation is probably too rough to model the Galactic structures, as can be seen in the fields at longitude 180° where some substructures such as the Monoceros ring or the anticentre overdensity might contribute. In Fig. 9b, the field at longitude 43−47° and latitude 0° is for two lines of sight, where the model (in blue) gives similar star counts for the two lines while the data (in black) do not. We believe that this is due to varying extinction, which is underestimated in the model for these specific fields.

Over the whole sky, up to magnitude 18, there is a small relative difference (from less than 3% at magnitude 12 to 10% at magnitude 18). Between 18 and 19 the relative difference is 15%. In the range 19 to 20, the difference is 25% on average. At high latitudes, and specifically at the Galactic poles, the agreement between the model and the data is also quite good. The regions where the Gaia data seem to suffer from incompleteness are located in the specific regions around l = 225°, b = 45°and l = 45°, b = −45°, most probably related to the filtering of sources with a low number of observations; however, the data are probably complete up to G = 16 in those regions (l = 225°, b = 45°), although the incompleteness could also occur at brighter magnitudes in some areas (at G = 14 in l = 45°, b = −45°).

These comparisons show that the Gaia data have a distribution over the sky and as a function of magnitude which is close to that expected from a Galaxy model in most regions of the sky. However, it points towards an incompleteness at magnitudes fainter than 16 in some specific areas that are less observed due to the scanning law, and because sources with a small number of observations have been filtered out. The completeness is also reduced in the Galactic plane due to undetected or omitted sources in crowded regions. This is expected to be solved in future releases where a larger number of observations will be available.

4.3. Small-scale completeness of Gaia DR1

4.3.1. Illustrations of underobserved regions

Regions that are empty due to the threshold on the number of observations are illustrated in Fig. 10a near the Galactic centre; regions like these that are underscanned are not frequent and have a limited area, below 0.1 square degree (see also Gaia Collaboration 2016a, Sect. 6.2). The field shown in Fig. 10b near the bulge suffered from limited on-board resources, which created holes in the sky coverage, as shown also for globular clusters in Fig. 13.

4.3.2. Tests with respect to external catalogues

The small-scale completeness of Gaia DR1 and its variation with the sky stellar density has been tested in comparison with two catalogues:Version 1 of the HST Source Catalogue (HSC, Whitmore et al. 2016) and a selection of fields observed by OGLE (Udalski et al. 2008).

Hubble Source Catalogue.

The HSC is a very non-uniform catalogue based on deep pencil-beam HST observations made using a wide variety of instruments (Wide Field Planetary Camera 2 (WFPC2), Wide Field Camera 3 (WFC3), and the Wide Field Channel of the Advanced Camera for Surveys (ACS)) and observing modes. The spatial resolution of Gaia is comparable to that of Hubble and the HSC is therefore an excellent tool for testing the completeness of Gaia DR1 on specific samples of stars. To check the completeness as a function of G, we computed an approximate G-band magnitude from HST F555W and F814W magnitudes (GHST) using theoretical colour-colour relations derived following the procedure in Jordi et al. (2010).

The first test was made in a crowded field of one degree radius around Baade’s Window. Nearly 13 000 stars were considered, observed in both the F555W and F814W HST filters with either WFPC2 or WFC3.

The second test was made on samples of stars observed with one of the three HST cameras, using the red filter F814W and either F555W or F606W. Sources were selected following the recommendations of Whitmore et al. (2016) to reduce the number of artefacts. Moreover, only stars with an absolute astrometric correction flag in HST set to “yes” have been selected, leading to a typical absolute astrometric accuracy of about 0.1′′. The size of the resulting samples varies from 1600 stars for ACS-F555W to nearly 120 000 stars for ACS-F606W, going through 15 000−23 000 stars for the four other samples. The completeness of Gaia observations for these samples, position differences, and colour-colour relations have been tested.

The completeness results of both tests are presented in Fig. 11. In Baade’s Window, the completeness follows the expectations for DR1: in this very dense area, on-board limitations lead to a brighter effective magnitude limit. The all-sky result (using here 128 000 ACS stars with F606W< 20 mag) is at first more surprising, but in fact bright source observations with HST are quite rare and are done mainly in very dense areas (which need the HST resolution) such as globular clusters, which also suffer from Gaia on-board limitations. We further checked this interpretation by using individual HST observations and images around a few positions. The test performed in a low density area around the dwarf spheroidal galaxy Leo II (Lépine et al. 2011) leads to a completeness at magnitude 20 of nearly 100%, while a test in a high density area around the globular cluster NGC 7078 (Bellini et al. 2014) leads to a completeness worse than the one presented Fig. 11.

Completeness against density in the field of three chosen GCs in different magnitude ranges. Fields such as NGC 1261 have a median of 220 observations, allowing for a much better completeness in the denser regions than NGC 6752 (40 observations).

Stellar distribution for six chosen GCs, colour-coded by number of G observation for each star. Top row: examples of holes caused by limited on-board resources or bright stars. Bottom row: in some regions patterns are visible corresponding to stripes where no stars had a sufficient number of observations.

HST observations of Globular Clusters.

We ran detailed completeness tests within globular clusters using HST data specifically reduced for the study of those crowded fields. We used 26 globular clusters for which HST photometry is available from the archive of Sarajedini et al. (2007, see Table 1. The data for all globular clusters (GCs) were acquired with the ACS and contain magnitudes in the bands F606W and F814W. The observations cover fields of 3 arcmin × 3 arcmin size. For M 4 (NGC 6121), data by Bedin et al. (2013) and Malavolta et al. (2015) taken in the HST project GO-12911 in WFC3/UVIS filters were used. For this test, the photometric transformations HST bands to Gaia G-band were adjusted for each cluster to fit a sample of bright stars in order to avoid issues due to variations in metallicity and extinction.

High quality relative positions and relative proper motions are available for these clusters. When artificial star experiments were available in the original HST catalogue (GCs marked with * in Table 1), the completeness of HST data has been evaluated by comparing the number of input and recovered artificial stars in each spatial bin. We find the completeness of the HST data to be well above 90% and close to 100% in all cases for stars brighter than V = 21, but for the very crowded cluster NGC 5139 (OmegaCen). The GCs are chosen to present different levels of crowding down to G ~ 22. In general, HST data cover the inner core of the clusters where the stellar densities are above 106 stars per square degree in almost all regions (above 30 million in many cases, and up to 110 million stars per square degree in the core of NGC 104/47 Tuc). In a few cases, lower densities are reached in the external regions. We therefore expect Gaia to be very severely incomplete in most of the regions studied in this test. The HST magnitudes were converted to Gaia G magnitudes using the same transformations as previously used between G and F814W, F606W but on the Vega photometric system.

For each GC, the total density of stars in square bins of 0.008 deg ≈0.5 arcmin was evaluated, then in each bin we counted the number of stars present in the HST photometry and in the Gaia DR1 by slice in magnitude.

The completeness of Gaia DR1 is shown in Fig. 12 for three clusters as a function of the stellar density observed in the HST data. Different crowded regions present different degrees of completeness, depending on the number of observations in that region. In addition, holes are found around bright stars (typically for G< 11−12 mag) and entire stripes are missing, as illustrated in Fig. 13.

In less crowded regions, such as in the field around NGC 5053 where stellar densities are under 1 million per square degree, the completeness is very high, as shown in Fig. 14.

OGLE catalogues.

To further test the variation of the completeness with sky density, we looked at the completeness versus OGLE data using a few fields in the OGLE-III disc (Szymański et al. 2010), OGLE-III Bulge (Szymański et al. 2011), and OGLE-IV LMC (Soszyński et al. 2012) surveys. A G-band magnitude was computed from OGLE V and I magnitudes (GOGLE) using an empirical relation derived from the matched Gaia/OGLE sources (two relations were derived, one for OGLE-III and one for OGLE-IV, due to their different filters). The stellar densities were estimated from the OGLE data themselves; therefore, they are certainly slightly underestimated. As can be seen in Fig. 15, the completeness is not only dependent on the sky density, but also on the sky position linked to the Gaia scanning law, as we saw above. In the bulge fields, the completeness may show a drop around G = 15 (as seen in Fig. 15b, confirming the feature of Fig. 11a) because the reddest stars have not been kept in Gaia DR1 (because of filtering at calibration level) and those missing stars correspond to the reddened red giant branch of the bulge (Fig. 15c).

4.4. Completeness and angular resolution

Although there are no doubts about the excellent, spatial angular resolution of Gaia3, the effective angular separation in Gaia DR1 can be questioned, for example due to possible cross-match problems.

4.4.1. Distribution of the distances between pairs of sources

A simple way of checking the angular resolution of a catalogue is to look at the distribution of the distances between pairs of sources. For a random star field with ρ stars per unit area, a ring of radius r centred on a given star will contain ρ2πrΔr stars, where Δr is the width of the ring. For a sample of N stars, we will have NρπrΔr unique pairs at that separation.

We have looked at two fields, a dense field of radius 2° centred at (l,b) = (330°,−4°) with 400 000 stars per square degree, and a sparse field of radius 15° (l = 260°, b = −60°) with 2900 stars per square degree, scaled to produce the same number of sources. Figure 16 shows the distribution of G magnitudes in these two fields. There is a difference between the slopes because the dense field may integrate disc stars on a greater distance, with extinction that is not that high at b = −4°, whereas the sparse field at higher latitude quickly leaves the disc and integrates the thick disc which is less dense.

G magnitudes for a dense field (l = 330°, b = −4°, ρ = 2°) and a sparse field (l = 260°, b = −60°, ρ = 15°). The sparse field has been scaled to give about the same number of sources as the dense field.

The resulting distributions of distance between sources are shown in Fig. 17. For the dense field (left) the distribution is close to random for separations above 4′′, but drops for smaller separations with a sharp drop at 2′′. In the shallow field, which is much larger and not as uniform, the sharp drop between 2′′ and 2.̋5 is also seen, but not the drop at 3.̋5. In order to improve the uniformity of the sparse field, three small areas around galaxies and clusters were left out when deriving the distribution.

To better understand these results, we made a simple simulation of a dense, random field, starting with 500 000 stars in a square degree. We then removed sources which had very poor chances of ever getting a clean photometric observation. The photometric windows are quite large, 2.̋1 in the across-scan direction and a diagonal size of 4.̋1. If a source had either a significantly brighter neighbour within 2.̋1 or at least two such neighbours between 2.̋1 and 4.̋1, it was removed. We took neighbours brighter by more than 0.2 mag. The criterion of two bright neighbours is very simplistic and is taken to represent the cases where a star is unlikely to ever get a clean photometric observation, irrespective of the scanning direction. Figure 18a shows the resulting distribution, which reproduces many of the same characteristics seen in the real data (separations below 4′′) shown in Fig. 17a.

Simulation of the distribution of source-to-source distances in a dense, random field (left) after applying selection criteria similar to Gaia DR1. The fraction retained is shown in the right panel. The field has a true source density of 500 000 stars per square degree, but only 322 000 remain after applying the selection criteria.

We can therefore expect that the population of pairs closer than 2′′ consists of sources of similar brightness, where in a given transit either source had a fair chance of being detected as the brighter source and therefore got a full observation window instead of the truncated window assigned to the fainter detection in the case of overlapping windows. For a brief description of the on-board conflict resolution see e.g. Fabricius et al. (2016, Sect. 2). There is, of course, still the risk that a few of the closest pairs are in reality two catalogue instances of the same source (duplicates) as discussed in Sect. 3.2.1.

We can now further understand the drop between 2′′ and 4′′ as being due to conflicts between the photometric windows for the sources. This drop is not present in the sparse field where the chance of having two disturbing sources in the right distance range is much smaller than in a dense field.

An important lesson from the simulation is illustrated in the second panel of Fig. 18. Of the original 500 000 stars in the simulation only 322 000 (64%) survived the selection criteria described above. This has a significant impact for G ≳ 19.

Below a 2′′ separation, the dense field shows the expected small fraction of field stars of similar magnitude. However, the sparse field shows a peak below half an arcsecond, suggesting a high frequency of binaries in that area. We looked in more detail at the 73 pairs brighter than 12 mag to see if the Tycho Double Star Catalogue (TDSC, Fabricius et al. 2002) could confirm the duplicity. Of the 65 pairs found in Tycho-2, 47 are listed as doubles in TDSC, while 7 may be doubles missing in TDSC, and 11 are possibly duplicated Gaia sources. This small test thus indicates that the majority of the Gaia DR1 doubles are actual double stars.

4.4.2. Tests of the angular resolution using the WDS

The spatial resolution of the Gaia catalogue has also been tested using the Washington Visual Double Star Catalogue (WDS, Mason et al. 2001). A selection was made of sources composed of only two components, with the magnitudes for both the primary and the secondary brighter than 20 mag and a separation smaller than 10′′. We selected only the sources that had been observed at least twice with differences between the two observed separations smaller than 2′′ and magnitude differences smaller than 3 mag. In addition, we did not select sources with a note indicating an approximate position (!), a dubious double (X), uncertain identification (I) nor photometry from a blue (B) or near-IR band (K). The resulting selection contains 43 580 systems. The completeness of Gaia DR1 versus the observation of these systems shows the performance of Gaia detection and observation of double systems as a function of the separation and magnitude difference between the components.

The results are illustrated by a plot of completeness versus separation presented in Fig. 19a. As discussed above, the angular resolution of Gaia DR1 degrades rapidly below 4′′. Although the filtering of pre-DR1 removed most of the duplicated sources, the excess of points with a very small Gaia separation and a WDS separation below about 1′′ in Fig. 19b shows that a few duplicates (~0.5% of the WDS sample) may still be present.

4.5. Summary of the Catalogue completeness

Careful filtering has been done on the main Gaia database to avoid spurious stars, for example a minimum of five focal plane transits for a star to be published in Gaia DR1. Owing to the scanning law, and the resulting varying number of observations, some sky regions have a poor coverage or are, locally, not covered at all. On the positive side, particular attention has been devoted to avoiding spurious stars or ghosts which could be produced in the surroundings of bright stars, or at least our statistical tests did not detect special features due to false detections.

The limiting magnitude is therefore very inhomogeneous over the sky, and the completeness as a function of magnitude is inhomogeneous as well: starting from G = 16 some sky zones clearly appear incomplete. Dense areas are, as expected, more affected due to the window and gate conflicts and the lack of on-board resources (Gaia Collaboration 2016b). High extinction regions also suffer from an increased colour dependent completeness issue due to the removal of the very red sources by the photometric pipeline (van Leeuwen et al. 2017).

Duplicate sources, one of the main problems of pre-DR1, have mostly been removed, although not completely, and their effect on the astrometric or photometric properties of a fraction of bright stars is probably still present.

Owing to the preliminary nature of this data release the effective angular resolution of the Gaia DR1 data (not the angular resolution of the Gaia instrument itself, which is as expected) is also degraded, with a deficit of close doubles. In sparse regions, however, the spatial capabilities of Gaia may already overcome the ground-based ones.

As for TGAS, a significant fraction (20%) of Tycho-2 stars is not present, also due to the scanning coverage and to calibration problems, in particular at the bright end. A large fraction of high proper motion stars is missing, as well as a fraction of redder or fainter stars.

It thus appears that Gaia DR1 is not complete in any sense (magnitude, colour, volume, resolution, proper motion, duplicity, etc.), and any statistical analysis should be careful to produce unbiased results.

The current completeness is, however, not representative of the future Gaia capabilities. This will be corrected at the next data release, but it triggers another warning for the users preparing star lists: the source_id list present in DR2 (and further releases) may be partly different from Gaia DR1. On the one hand, the gains expected in the cross-matching performance (at small angular separations) and the higher number of transits (i.e. fewer stars with not enough observations to be published) imply that many more stars will be present in DR2. On the other hand, a significant number of source_id may disappear, caused by both splitting and merging sources.

5. Multidimensional analysis

5.1. Description of statistical methods

To understand whether the statistical properties of the Gaia DR1 data set are consistent with expectations, we compared the distribution of the data (and in particular their degree of clustering) to suitable simulations for all 2D subspaces. In the case of TGAS, the comparison data is the simulation designated as “Simu-AGISLab-CS-DM18.3cor” (Sect. 2.1.2), while for Gaia DR1 it is GOG18.

To this end, we use the Kullback-Leibler divergence (KLD) (1)where x is a (sub)space of observables, p(x) is the distribution of the observables in the data set, and q(x) is a comparison distribution. When q(x) = Πipi(xi), i.e. the product of the marginalised 1D distribution of each of the observables, the KLD gives the mutual information. This expression shows that the mutual information is sensitive to clustering or correlations in the data set, with a high degree leading to high values, while in their absence pKLD would be zero.

We thus computed pKLD for more than 300 subspaces for the data, as well as for the simulations. In both cases, we used a range for the observables defined by the data after clipping the top and bottom regions by 3σ. Since the simulated and the observed data can have different distributions without this necessarily implying a problem in the data, we preferred to work with the relative mutual information rankings. If the structure is similar in data and simulations, we expect the rankings to cluster around the one-to-one line, while if a subspace shows very different rankings this would imply very different distributions. Such a subspace (or observable) is flagged for further inspection. This is important since the number of subspaces is very large.

The comparison to the simulations is sensitive to global issues (across the whole sky), while there could potentially be systematic problems in the data restricted to small localised regions of the sky. Therefore, we also compared the values of the mutual information obtained for different regions of the sky (e.g. symmetric with respect to the Galactic plane) and with a similar number of observations.

5.2. Results from the KLD statistical methods

5.2.1. TGAS and comparison to AGISLab simulations

Figure 20 shows the mutual information ranking of the 2D subspaces from the TGAS data versus the ranking of the same subspaces in the AGISLab simulation. Most subspaces with direct observables (ra, dec, etc., black points) show very similar distributions in the data and in the simulations, as evidenced by their closeness to the 1:1 line. Subspaces associated with errors (blue crosses) and to correlations between observables/errors (magenta circles), tend to deviate more. Examples of the distributions found for some of the subspaces deviating more strongly (red hexagons in Fig. 20) are given in Fig. 21.

Ranking of 2D subspaces according to their mutual information in the TGAS data (x-axis) vs. the simulation (y-axis). The black squares correspond to subspaces formed only from observables, while the blue crosses are those containing an uncertainty, and the magenta circles contain a correlation parameter. The red hexagons correspond to the subspaces shown in Fig. 21.

Examples of the subspaces showing a strong deviation from the 1:1 expected relation shown in Fig. 20, particularly in the astrometric errors (left) and correlations (right) in TGAS (top) compared to those in the simulations (bottom).

5.2.2. TGAS comparison in different sky regions

Naively, one might expect regions with a similar number of observations to have similar distributions of errors, and if symmetric with respect to the Galactic plane or centre, perhaps also a similar distribution of several of the observables. To check for the presence ofsystematics in the data, we selected 60 regions with a similar astrometric_n_obs_al (in the range 60 to 140), of which (20) 40 have a (non-)symmetric counterpart. The left panel of Fig. 22 shows their distribution in galactic coordinates. For these regions we have computed the mutual information and compared the values to their counterpart. The normalised deviation from the naively expected 1:1 line is plotted in the right panel of Fig. 22, and is defined as , where i runs through the various subspaces and p and p ∗ are the mutual information for the region and its counterpart. Blue and red points correspond to comparisons between symmetric and non-symmetric regions, respectively. This plot shows that non-symmetric regions sometimes havedifferent distributions. By dividing the normalised deviation (whose median value is ~ 30) by the number of subspaces (780 for TGAS) we obtain an estimate of the average deviation per region. In this way we found that on average there are 4% differences in the mutual information between different regions. Comparison to the results of AGISLab simulations does not reveal pairs of regions whose mutual information appears to be very different for specific subspaces.

Left: distribution of regions for which the mutual information has been computed, where the inset indicates the number of observations inside the regions. The regions are circles in l−sinb space, with the positive b region in solid and its symmetric counterpart in dashed. Regions that are compared and are not symmetric are connected by a grey line. Right: average deviation of the mutual information between a region and its counterpart, in (red) blue for (non-) symmetric counterparts.

5.2.3. Gaia DR1 comparison to GOG simulations

In Fig. 23 we show the rankings obtained for the observables and their errors in the full Gaia DR1 Catalogue. Because of the smaller number of observables, only 21 subspaces exist. The relation of the mutual information in data and simulations is very close to the 1:1 line, implying similar distributions and hence a good understanding of the data as far as this global statistic can test. The observables showing the greater deviations are those related to uncertainties, and this can be understood from the fact that GOG18 models the uncertainties expected at the end of mission, rather than those obtained after 14 months of observations.

6. Astrometric quality of Gaia DR1

For the majority of the sources included in Gaia DR1, the 1 140 622 719 secondary sources, the only available astrometric parameter is the position. For the 2 057 050 primary sources, the TGAS subset, the complete set of astrometric parameters is available: position, trigonometric parallax, and proper motion. As a consequence, most tests concerning astrometry have been devoted to TGAS validation and only Sect. 6.4 deals with tests on the astrometry of the secondary sources.

We study in Sect. 6.1 the accuracy of the TGAS parallaxes, and in Sect. 6.2 their precision. In both cases, we discuss first the estimation made using internal data (Gaia only), then with external data. Table 2 gives a summary of the differences between the TGAS parallaxes and those from external catalogues that are presented in this section.

Summary of the comparison between the TGAS parallaxes and the external catalogues.

6.1. TGAS parallax accuracy

6.1.1. Parallax accuracy using quasars

In the course of the AGIS astrometric solution, about 135 000 quasars were included and solved for parallax and positions; proper motions were constrained with a prior near zero mas yr-1 (Michalik & Lindegren 2016; Lindegren et al. 2016, Sect. 4.2) and made available for validation (and are not part of Gaia DR1). As the true parallax for quasars can be considered as null, the study of these parallaxes gives direct information on the properties of the parallax errors. Unfortunately, the available quasars only cover part of the sky, and in particular they can give little insight inside the Galactic plane.

The median zero-point of the quasar parallaxes is significantly non-zero: −0.040 ± 0.003 mas. This is close to the value for the ICRF2 QSO subsample (see Table 2) and is corroborated by other all-sky external comparisons in this table and discussed in greater detail below, and this is what we adopt as average Gaia DR1 parallax zero-point.

We selected random sky regions with 2° radius, keeping only those possessing at least 20 quasars, and computed median parallaxes in these regions. The map of the median parallaxes in these regions is represented Fig. 24. Outside of the Galactic plane where the lack of objects brings little information (see Fig. 26), there are large-scale spatial effects with characteristic amplitude of about 0.3 mas (significant at 2σ). In a few (exceptional) small regions, the parallax bias may even reach the mas level.

Median parallaxes of quasars in 2° radius regions (mas), ecliptic coordinates. There is little insight in the Galactic plane, due to the lack of objects.Outside of this plane, local systematics with about 0.3 mas characteristic amplitude can be seen.

The bias variations are directly related to the number of measurements (Figs. 25a, 26a), and consequently to the standard uncertainties, also with a 0.3 mas amplitude. Parallax biases also appear to be related to the correlations between right ascension and parallax (Figs. 25b, 26b). In Figs. 24 and 26, the regions along λ ~ 0 and 180° (ecliptic pole scanning law) appear clearly.

Possible along-scan measurements problems, if scan_direction_strength_k14 is a proxy for this, may be part of the reason for the origins of these systematics (Fig. 27a), with possible chromaticity problems. The scan_direction_strength_k4, associated with small numbers of observations, also seems to contribute (Fig. 27b), again with a 0.3 mas amplitude.

It is important to stress that the map illustrating spatial variations of the parallax bias of the quasars, Fig. 24, cannot be used to “correct” the TGAS parallaxes. The quasars are faint, and the TGAS parallaxes, which were obtained with a differently constrained astrometric solution, may suffer from supplementary effects due to their bright magnitudes.

6.1.2. Parallax accuracy tested with very distant stars

The zero point of the parallaxes and their precision can also be tested directly by using stars in TGAS (or quasars, see previous subsection) distant enough so that their measured parallaxes can be considered as null according to the catalogue’s expected precision. The normalised parallax distribution of these sources should follow a standard normal distribution. For TGAS we have been looking for stars with ϖ < 0.1 mas. This limit has been chosen to be consistent with TGAS precision (estimated to be of the order of a few tenths of mas). For Gaia DR1, only the Magellanic Clouds contain enough confirmed members in TGAS for this test.

A mean parallax of 0.11 ± 0.02 mas has been found for the LMC and −0.12 ± 0.05 mas for the SMC with a small overestimation of the uncertainties (0.14 mas). None of these values is consistent with the all-sky zero-point and this indicates local variations of the parallax zero point across the sky, confirming the spatial variations found in Sect. 6.1.1. Further filtering of the sources has been done by comparing the parallaxes and proper motions of the stars with the mean values of the clouds (taken from SIMBAD) through a χ2 test. Using a limit p-value of 0.01 on this χ2 test removes 20% of the LMC stars (3% of the SMC). The remaining stars still show a significant parallax bias although reduced as expected. A correlation of the parallax residual with magnitude is observed in all cases (with a larger residual for the brighter stars). This dependency on magnitude and the surprisingly large number of outliers indicated by the χ2 test are similar to the Hipparcos χ2 test results (Sect. 6.2.2), suggesting that a filtering based on the covariance matrix is actually hiding Gaia related issues rather than LMC/SMC membership issues.

6.1.3. Parallax accuracy tested with distant stars

An estimation of the parallax accuracy can also be obtained with stars distant enough so that their estimated distance through period-luminosity relation or spectrophotometry is known with a precision better than σϖE < 0.1 mas, i.e. much more precise than the TGAS parallaxes. A maximum likelihood method (improved from Arenou et al. 1995, Sect. 4) has been implemented to estimate the offset and extra-dispersion that should be taken into account in order for the Gaia parallaxes to be consistent with these external distance estimates.

Two catalogues have been tested using the period-luminosity relation:

Cepheids.

The catalogue of Ngeow (2012) has been used. It provides distance moduli for the Cepheids using the Wesenheit function. The error on the distance modulus has been estimated by adding quadratically the dispersion around the Wesenheit function, the uncertainty on the distance modulus of the LMC used to calibrate this relation, the I-magnitude error, and the overall dispersion seen by Ngeow (2012) when comparing their distance modulus to other methods (0.2 mag). The last was needed in order to obtain distance moduli consistent with the Hipparcos parallaxes. The catalogue contains 233 Tycho-2 stars with σϖE < 0.1 mas.

RRLyrae.

For TGAS we used the catalogue of Maintz (2005). We computed the distance modulus using the magnitude independent of extinction KJ−K= . The extinction coefficients were computed applying the Fitzpatrick & Massa (2007) extinction curve on the Castelli & Kurucz (2003) SEDs. The value of MK was derived from the period-luminosity relation of Muraveva et al. (2015, assuming a mean metallicity of −1.0 dex with a dispersion of 0.2) and the colours were derived from Catelan (2004) transformed in the 2MASS system using the transformations of Carpenter (2001). The catalogue contains 150 Tycho-2 stars with σϖE < 0.1 mas.

A parallax offset of −0.034 ± 0.012 mas and a small overestimation of the standard uncertainty are significant when the Cepheids and the RR Lyrae samples are combined (Table 2).

Distribution of ϖTGAS/ϖRAVE−1 for ~ 200 000 stars matched in the RAVE catalogue to the TGAS solution. Stars along EPSL, λ ~ 180°, appear to have a systematically overestimated parallax of up to ~ 0.3 mas; stars with G magnitudes in the range 10−11.5 and colour 1.4 ≤ GBP−GRP ≤ 1.8 are the most strongly affected.

For the following catalogues, spectrophotometric distance moduli have been collected or computed.

RAVE.

We used the Kordopatis et al. (2013) catalogue with distances from Binney et al. (2014). It contains 6850 Tycho-2 stars with σϖE < 0.1 mas. A comparison with Hipparcos has shown the presence of 24% of outliers, mainly due to dwarf/giant misclassifications. Strong outliers are also seen in the comparison with TGAS, but they represent only 1% of the sample. A global parallax offset of 0.070 ± 0.005 mas is seen with a strong variation with sky position (with 0.3 mas amplitude). This is the only catalogue, together with the LMC, that presents a significant positive parallax bias (Table 2). To further study the presence of systematic effects in localised regions on the sky that could affect the RAVE results, another test has been made, this time using all the 192 655 stars in common between TGAS and RAVE. Thanks to their extended sky coverage, we could identify a systematic difference in the parallaxes in the region with ecliptic coordinates λ ~ 180°, as shown in Fig. 28. The amplitude of this effect is of the order of ~ 0.3 mas and affects the fainter and redder TGAS stars more strongly. It appears that this effect is directly correlated with the number of along-scan observations (astrometric_n_obs_al parameter) and the ecliptic scanning law followed early in the mission, and is consistent with the spatial biases found with quasars in Sect. 6.1.

APOGEE DR12.

We used the Holtzman et al. (2015) data. Distance moduli were computed using a Bayesian method on the Padova isochrones (Bressan et al. 2012, CMD 2.7) and using the magnitude independent of extinction KJ−K. The prior on the mass distribution used the IMF of Chabrier (2001), while the prior on age was chosen flat. Stars too far from the isochrones were rejected using the criterion. It led to 3100 Tycho stars with σϖE < 0.1 mas. A global parallax difference of −0.060 ± 0.006 mas was found, with a strong variation with magnitude: the brighter the star, the greater the difference.

LAMOST DR1.

The Luo et al. (2015) data were used, following the same method as for APOGEE. It leads to 451 stars with σϖE < 0.1 mas. No significant parallax difference was detected with this sample.

PASTEL.

The Soubiran et al. (2016) data were used, following the same method as for APOGEE. It leads to 917 Tycho stars with σϖE < 0.1 mas. No significant parallax difference was found except for the blue stars (J−Ks < 0.3), where there was a difference of up to 0.3 mas, most probably linked to the spectrophotometric distance determination that was less tested on these young massive stars and is more dependent on the age prior. Therefore, only stars with J−Ks > 0.3 are used in the summary Table 2.

APOKASC.

We used the distances provided by Rodrigues et al. (2014) derived using both Kepler asteroseismologic and APOGEE spectroscopic parameters. It contains 984 Tycho sources with σϖE < 0.1 mas. The median σϖE of this catalogue is 0.02 mas. A global parallax difference of −0.070 ± 0.009 mas is seen, and there is a strong variation with magnitude, similar to that found with the APOGEE results. Both use the Padova isochrones, and have the Kepler region and its spectroscopic parameters in common; however, we computed the distance modulus for APOGEE, and the APOKASC has a greatly increased precision on its distance modulus thanks to the asteroseismology parameters. The variation of the parallax difference with magnitude could come from a feature of the stellar evolution models. Both the APOKASC and APOGEE catalogues present a correlation between magnitude and colour, but in APOKASC the brighter stars are bluer than the fainter stars (due to the extinction effect on the red clump population), while in APOGEE it is the opposite (due to the more evolved giants being redder); therefore, we do not expect the colour to be able to explain the systematics we see in magnitude.

All these tests with TGAS show significant variations with sky position, but with global parallax differences lower than 0.3 mas. These tests also show a small correlation with colour (<0.2 mas), but not all in the same direction nor with the same amplitude, indicating an expected bias linked to survey parameter correlations and/or stellar isochrones/priors.

6.1.4. Parallax accuracy tested using distant clusters

This test aims to assess the internal consistency of parallaxes within a cluster, and to check the parallaxes against photometric distances in order to verify the zero-point of parallaxes.

Sky coordinates, ages, extinctions, and distances have been obtained for all clusters listed in the Dias et al. (2014) database (Mermilliod 1995). Making use of theoretical isochrones (Bressan et al. 2012), we retained 488 clusters with an age/distance/extinction combination allowing them to contain stars reaching magnitude V = 11.5 (the magnitude at which Tycho-2 becomes extremely incomplete).

All stars within a radius corresponding to a distance of 3 pc from the centre of the cluster were searched, which means that the angular size of the queried field depends on the cluster distance. Stars were selected based on their identifier in the Tycho-2 catalogue, avoiding double stars flagged in Fabricius et al. (2002). When available, a preliminary knowledge of cluster membership was used, but the final cluster membership was determined from the TGAS data itself. The method used was that of Robichon et al. (1999), which makes use of proper motions and parallaxes.

We limited the statistics to clusters more distant than 1000 pc so that the uncertainty of the photometric parallaxes is mostly better than the uncertainty of the Gaia DR1 parallaxes. For every cluster, we computed the average difference ΔP between the measured parallax of each star and the reference value (or photometric parallax) ϖref normalised by the uncertainty. In order to compute these values, we need to take into account the uncertainties on the parallaxes (i.e. σϖ,ref on the reference value and σϖ on TGAS parallaxes) and the correlation among parameters of nearby stars. We note S = diag(σi), the diagonal matrix made with the standard errors σi, (2)and we note C, the correlation matrix, where Cij is the correlation coefficient between the parallaxes of star i and star j, constructed as in Holl et al. (2010). The matrix Σ = SCS is the covariance matrix of P. Noting D, the design matrix n-vector (1, 1, ..., 1), we can compute the mean parallax with , the square of its standard error.

Once an average difference to the reference value () and associated error () had been established for each cluster, we studied the global distribution of , which tells us by how many standard errors the average measured parallax differs from the reference parallax. In the absence of systematics, this distribution is expected to be centred on zero, with a dispersion of 1σ. A mean value differing from zero would indicate a global offset. Conservatively, we considered that all photometric distances listed in the Dias et al. (2014) database are affected by uncertainties of 20%. No significant global parallax offset was found, but an apparent systematic error varying with sky position (see Fig. 29). Most clusters with overestimated parallaxes appeared to be located in the Galactic regions with l < 260° (towards the Galactic anticentre), while most of the underestimated parallaxes were at l > 260° (see Fig. 30). The parallax offsets were −0.16 ± 0.04 mas for l > 260° and +0.13 ± 0.04 mas for l < 260°.

Distribution of the differences between the mean TGAS parallaxes and the one from photometric distance for the distant open clusters. Red and blue labels are attributed to the clusters defined in Fig. 30.

We investigated the possibility that this effect could be caused by uncertainties in the automatic membership procedure applied. We manually inspected the results of the membership determination and discarded a certain number of clusters for which the cluster membership could not be securely established. The final statistics were computed for a sample of 38 distant clusters with secure membership determinations. The median value of differences to the reference values for these 38 clusters is + 0.004 ± 0.02 mas, confirming no obvious global parallax offset. Splitting the sample into two groups (l > 260° and l < 260°), we find a median of −0.02 ± 0.032 mas for the l > 260° sample and +0.044 ± 0.027 mas for the l < 260° sample, which does not show a significant difference.

Unfortunately, the small number of tracers available in this experiment did not allow us to draw a map of the bias by averaging values in coordinate space. The slight variation in zero-point between the l> 260° and l < 260° groups can then be interpreted either as random variations caused by the uncertainties on the reference values or as local variations of the parallax zero point (of the order of a few tenths of mas on a scale of several degrees).

6.2. TGAS astrometric precision

6.2.1. Internal estimation of the parallax uncertainty

The quasar analysis in Sect. 6.1.1allowed us to also study the parallax dispersion. It was found that the robust unit-weight error (the ratio of the observed dispersion over the standard uncertainty) decreased with magnitude from ~1 to about 0.8 at G = 20. It would be difficult, however, to extrapolate this overestimation of the uncertainties to the much brighter TGAS sources, so this question was studied differently.

The measured TGAS parallax distribution, at least its small and negative tail, can be used to estimate the parallax uncertainties without referring to the formal uncertainty, following the deconvolution procedure of Lindegren (1995). The procedure models the observed distribution as the convolution of a non-parametric true parallax distribution (subject only to the constraint that all true parallaxes are positive) with a Gaussian error kernel. The Gaussian width parameter that gives the best fit to the observed distribution has been adopted as the parallax uncertainty of the sample.

As noted by Lindegren, the estimated parallax uncertainty is usually biased, and the process of solving for the true parallax distribution, which resembles Lucy-Richardson deconvolution, suffers from overfitting as the number of iterations increases. Both effects need to be controlled. As the parallax distribution of the TGAS sample differs from that of the Hipparcos sample explored in Lindegren (1995), we performed simulations to determine the bias correction factor and number of iterations to use for TGAS data. The Simu-AGISLab simulated data (Sect. 2.1.2) were randomly sampled with new errors to produce a realistic data set large enough for testing. We used cross-validation to test the predictive accuracy of the debiased estimates, including uncertainties in the bias correction factor.

Unlike Lindegren, we found that 2−3 iterations gave much more accurate results than a few dozen, regardless of the sample being studied; the reasons for this discrepancy are not yet clear. We fit arbitrary (non-linear) functions to the bias correction factor and the accuracy of the debiased parallax uncertainty, enabling prediction of the bias correction to ~ 8% and of the accuracy of the final parallax uncertainty (i.e. the uncertainty on the uncertainties) to ~ 20%. We also found that simulation runs with small (N ~ 100) or precise (σϖ ~ 0.1 mas) data sets behaved very differently from the trends seen for larger or less precise data sets; presumably, the sharp changes at high precisions are related to the parallax distribution assumed for the TGAS catalogue.

Best-fit uncertainties from deconvolution of parallaxes vs. standard uncertainties for TGAS Hipparcos stars (left) and for Tycho-2 (non-Hipparcos) stars (right) with the bisector represented. Error bars include all sources of uncertainty, including bias correction.

When modelling the observed parallax distribution, we first corrected all parallaxes for the −0.04 mas bias found in Sect. 6.1.1, though analysis with and without the correction gave indistinguishable results. We analysed the TGAS data in bins of standard uncertainty of 0.05 mas in width, and separately for each type of astrometric solution in case each group had different error properties.

We show in Fig. 31 the results of modelling the TGAS parallaxes dispersion compared to the standard uncertainties. As can be seen, the TGAS standard uncertainties σϖ on parallaxes appear to be accurate. More quantitatively, a weighted fit for Hipparcos stars is (0.980 ± 0.135)σϖ−0.003 ± 0.062, while for Tycho-2 stars it is (0.973 ± 0.024)σϖ + 0.011 ± 0.011; both are consistent with a unit-weight error = 1. Assuming a unit-weight error = 1, and fitting only for an extra dispersion (quadratically added) gives −0.19 ± 0.02 mas for Hipparcos and −0.11 ± 0.01 mas for Tycho-2. This is consistent with the median value obtained with external estimates (Table 2) and it shows that the standard uncertainties appear to be slightly pessimistic (except probably for the most precise parallaxes).

6.2.2. Comparison with external astrometric data

The comparison of Gaia results with external astrometric data is not straightforward as Gaia will provide the most accurate and the most numerous astrometric data ever produced, at least in the optical domain. However, the consistency between Gaia data and carefully selected external astrometric data might be important in order to detect any statistical misbehaviour in one of the sources of data, including Gaia.

Only positions from the Hipparcos or Tycho-2 catalogues have been used as priors in TGAS. The parallaxes and/or the proper motions have not been used, so this ensures that the comparison with TGAS parallaxes and proper motions is meaningful, as they are independent from those of Hipparcos and Tycho-2. It should be noted that another independent comparison with these catalogues is presented in Annex C of Lindegren et al. (2016). For the Hipparcos and Tycho-2 proper motion tests, the global rotation between the reference frames of Hipparcos and TGAS derived in Lindegren et al. (2016) has been applied. A possible (residual-)rotation has been checked. For each catalogue, the distribution of the normalised residuals (Gaia-External) of each parameter , e.g. for the parallax = (ϖG−ϖE)/, has been checked to be consistent with a normal distribution, and correlations of these residuals with magnitude, colour, and sky position have been checked too.

A χ2 test has been also performed on combined parameters X (i.e. the positions, the proper motions, or the parallaxes and proper motions) using the full covariance matrix of both the external (ΣE) and the Gaia (ΣG) catalogues to compute the normalised residuals Rχ = (XG−XE)T(ΣG + ΣE)-1(XG−XE) and their distribution has been tested to follow a chi-squared distribution with n degrees of freedom, n being the number of parameters tested (e.g. 2 for Gaia DR1 positions, 2 for TGAS proper motions, and 3 for TGAS parallaxes and proper motions). Similarly to the 1D case, correlations with magnitude, colour, and sky distribution of those residuals have also been tested.

In all the tests, we used a p-value limit of 0.01 (e.g. we indicate that we find a bias, extra variance, or a correlation with a confidence level higher than 99%). For the normalised residuals using individual parameter (), this level corresponds to , while for the χ2 residuals on two components this level corresponds to Rχ > 9.21.

For the validation of TGAS, the following astrometric catalogues have been considered:

Hipparcos new reduction.

A selection of well-behaved Hipparcos stars has been done using the five-parameter solution type with a good astrometric solution (goodness of fit |F2| < 5), and without any binary flag indicated in the literature, mainly from WDS (Mason et al. 2001), CCDM (Catalogue of the Components of Double and Multiple Stars, Dommanget & Nys 2000), and SB9 (9th Catalogue of Spectroscopic Binary Orbits, Pourbaix et al. 2004). Stars also included in Tycho-2 were kept only if the proper motions from Hipparcos were consistent with those of Tycho-2 (rejection p-value: 0.001). The resulting sample includes 93 802 well-behaved stars, against which both the parallaxes and proper motions of TGAS have been tested.

A global parallax zero point difference between Gaia and Hipparcos of −0.094 ± 0.004 mas was found5. The underestimation of the standard uncertainties for both parallax and proper motions is significant (extra dispersion of 0.6 mas). Small variations in the parallax and proper motion residuals is seen with sky position (Fig. 32) and magnitude (smaller than 0.1 mas, most probably due to the gates).

Sky variation of the normalised residuals of the TGAS vs. Hipparcos parallaxes in ecliptic coordinates. Although correlation with the sky position is significant, no sky region indicates a normalised residual larger than 2.6.

The χ2 test with Hipparcos, using either parallax and proper motions or proper motions only, shows stronger variations across the sky (Fig. 33a), with areas showing a mean residual Rχ over 9.21 (the p-value 0.01 limit), while the residuals of parallax or proper motions components individually stay below the p-value limit (). Eleven per cent of the sources have a χ2p-value < 0.01, e.g. 11 times higher than expected. Moreover, a strong correlation between Rχ and G magnitude is observed (Fig. 33b). This behaviour of Rχ is also seen with the quasar positions (Sect. 6.4), indicating potential issues with the covariance matrix. Those could be due to extra correlations introduced by the attitude and calibration models not taken into account in the provided covariance matrix (Holl & Lindegren 2012).

Hipparcos and Tycho-2 stars with inconsistent proper motions.

The second sample includes the 1574 stars previously eliminated because of the inconsistency between Hipparcos and Tycho-2 proper motions. A specific test has been done on these stars: most of them are expected to be long-period binaries not detected in Hipparcos, and for which the longer time baseline of Tycho-2 could have provided a more accurate value.

The TGAS solution also has a long time baseline thanks to its Hipparcos/Tycho input position. It has therefore been tested that the TGAS solution for those stars is globally closer to the Tycho-2 solution than to the Hipparcos solution. This is indeed the case with 7% of those TGAS sources being outliers versus the Tycho-2 solution, while 50% are outliers versus the Hipparcos solution.

Tycho-2.

Only Tycho-2 stars with a normal astrometric treatment (no double star with Tycho-2 separate entries, no close known or suspected double star with photocentre treatment) have been used in this test. Owing to the different priors used for the Hipparcos and Tycho-2 stars in the TGAS solution (Hipparcos positions at the Hipparcos mean epoch, J1991.25, for Hipparcos stars; Tycho-2 positions at the effective Tycho-2 observation epoch, taken to be the mean of the α and δ epochs, for Tycho-2 stars), the test has been done once for the Tycho-2 sources not in Hipparcos and once for the Tycho-2 sources in the well-behaved Hipparcos subsample described above.

For the Tycho-2 sources in the Hipparcos well-behaved subsample an underestimation of the standard uncertainties is seen (extra dispersion of 0.6 mas yr-1, similar to what is found with the Hipparcos sample) and a correlation with magnitude and colour is found with an amplitude smaller than 0.1 mas (the residuals increasing with magnitude and colour). For the Tycho-2 sources not in Hipparcos, a strong variation of the residuals is seen with sky position (Fig. 34) with features parallel to the equatorial declinations, which corresponds to the zones of the Astrographic Catalogue used to derive the Tycho-2 proper motions. A very high extra dispersion of 1.8 mas yr-1 is also observed. We most probably see here the defaults of the Tycho-2 proper motions. A rotation smaller than 0.2 mas yr-1 is also observed.

VLBI compilation.

VLBI data have mainly been obtained from the USA VLBA, the Japanese VERA, and the European EVN: 90 proper motions and 44 parallaxes (including respectively 70 and 30 stars in Tycho-2). Over the years, with increasing baseline length and better calibration of the ionospheric and tropospheric delays, astrometric accuracy using VLBI at centimetre wavelengths approaches ~10 μas for parallaxes and ~1 μas yr-1 for proper motions (Reid & Honma 2014, and references therein). For proper motions, only those with a mean epoch > 2000 were considered, as calibration techniques improved drastically at that epoch, especially with new detailed maps of ionospheric delay. The compilation covers all stellar sources for which trigonometric parallaxes and proper motions have been obtained from VLBI astrometry (as quoted in the review of Reid & Honma 2014), but also stars with only proper motions obtained from VLBI positions (Boboltz et al. 2007) and VLBI proper motions of X-ray binaries with an estimation of distance obtained by other means (Miller-Jones 2014).

Thirty-six stars from this compilation are present in TGAS, including nine with parallax information. All the tests associated with this catalogue pass (parallax and proper motion bias, variance, correlations), with the exception of the full covariance matrix χ2test which indicates that half the stars with both parallax and proper motion information available (assuming no correlation for the VLBI parameters as this information is rarely available) have a χ2p-value higher than 0.01.

HST compilation.

The Fine Guidance Sensors (FGS) on the Hubble Space Telescope have produced high accuracy trigonometric parallaxes of astrophysically interesting objects such as Cepheids, RR-Lyrae, novae, cataclysmic variables, or cluster members (Benedict & McArthur 2015; Benedict et al. 2007). The FGS field of view is small and the parallaxes of target stars have been measured with respect to reference stars which have their own parallaxes estimated by spectro-photomometric measurements. The correction to absolute leads to a median error of absolute parallaxes estimated to be 0.2 mas. The present compilation covers 69 stars with parallaxes (including 43 in Tycho-2) and, for about a third of them, proper motions published up to the end of 2015.

Nineteen stars in this compilation are present in TGAS, passing all the tests.

RECONS.

The REsearch Consortium On Nearby Stars (RECONS; www.recons.org) has built a database of all systems estimated to be closer than 25 pc (parallaxes greater than 40 mas with errors smaller than 10 mas). We have used the database as published on 1 April 2015 (Henry & Jao 2015), leading to 348 stars (including 27 in Tycho-2) with trigonometric parallaxes.

Thirteen stars in this compilation are present in TGAS, passing all the tests.

Comparison in the LMC and SMC of the correlations between astrometric parameters: the median of the standard correlations given in the Catalogue appear consistent with the empirical values computed with the astrometric residuals.

6.2.3. Validation of the astrometric correlations

As shown above and stressed in Sect. 8.1, the correlation between astrometric parameters should not be neglected when computing covariance matrices. After having tested the formal uncertainties above, checking whether these correlations are accurate is also needed.

It is usually difficult to compute these correlations, but there are at least two different local areas, the LMC and SMC, where average proper motions and parallaxes are already known to a sufficient precision. The astrometric errors can thus be computed from the residuals between Gaia proper motions and parallaxes and the external estimation. We only used the Tycho-2 stars (not the Hipparcos sources) as the internal dispersion of the proper motions can be neglected compared to the astrometric uncertainties (~ 1 mas yr-1) in the former case, but not in the latter. In order to avoid any contamination by field stars, we used the star list indicated in Sect. 6.1.2 restricted to Tycho-2 stars and rejected all sources where one of these residuals has an absolute value that is 3 times higher than the formal uncertainty.

In each Magellanic Cloud, we then computed the medians of the formal correlations as given in the Catalogue, and we estimated the actual ones computing the empirical correlation coefficients between residuals. As shown in Table 3, the various estimations are consistent with the predictions at a p-value = 0.01. The expected internal variations of the proper motions inside the Clouds also explain the large dispersion. Although this test has been done on two regions only, it is reasonable to consider that the correlations between astrometric parameters, as given in the Catalogue, are statistically reliable.

There is, however, an important caveat. We do not discuss explicitly in this paper the angular correlations that are known to exist between the stars (see Fig. 24). In principle, this section should compare the full observed covariance-matrix (of all stars × 5 astrometric parameters) to the predicted one, but it is much too difficult to predict the correlations between stars for now. It is thus possible that the local comparison made here shows an agreement, while a whole sky comparison would disagree.

6.2.4. Comparisons with proper motion from distant open clusters

The aims of this test were twofold: assessing the internal consistency of proper motions within stellar clusters and looking for biases and systematics by testing the proper motion zero-points against literature values.

Following the open cluster selection described in Sect. 6.1.4, we computed the difference between the proper motion of each star and the reference value for its cluster listed in the MWSC catalogue (Kharchenko et al. 2013) and in Dias et al. (2014). This procedure is designed to take into account possible small-scale correlations between parameters. For each cluster, we obtained a mean value Δ of this difference, and its associated error σ. We flagged the objects for which the difference to the reference value is too large to be explained by the nominal uncertainties, and those with discrepant small or large internal dispersions. The test also looks for trends in proper motions against magnitude and colour.

A global zero-point test was performed from the Δ values obtained for individual clusters, restraining the sample to objects distant enough so that their internal dispersion in proper motions is negligible compared to the uncertainty on the proper motion of individual stars. The expected all-sky average of this quantity should be zero if no bias is present. A clustering test allows us to verify whether outliers are randomly distributed or clustered in problematic areas in the sky.

We retained 20 clusters that are sufficiently distant and present secure membership for more than 10 stars. Scaling the difference Δ according to the total uncertainty (standard uncertainties listed in TGAS and uncertainty on the literature value), we found no significant differences in proper motions. In units of uncertainty, the all-sky zero-point of μα ∗ is + 0.04 ± 0.21, and of μδ it is + 0.12 ± 0.26. We also found that outliers appear homogeneously distributed across the sky.

6.2.5. Specific tests on known double and multiple systems

In addition to the above general tests, a specific test has also been done on known double and multiple systems from the Hipparcos new reduction (HIP2) and the TDSC in order to detect any possible bias between single and non-single stars. For non-Hipparcos systems, we use the component designation given in the TDSC (m_TDSC) to distinguish between primary components (A or Aa), unresolved systems (AB), and secondary components (all other entries in TDSC). For Hipparcos systems, four categories with increasing periods were distinguished: stochastic solutions (short period, solution type Sn = 1 modulo 10 in HIP2), acceleration stars with seven- or nine-parameter solutions (intermediate period, Sn = 7 or 9 modulo 10 in HIP2), secondary components (long period, separation ρ> 0 as provided in the original Hipparcos catalogue), and other double stars (the remaining non-single stars). The characteristics of these Hipparcos and Tycho systems were compared to those of the well-behaved Hipparcos sample described in Sect. 6.2.2, adding the extra criterion of passing the χ2 test comparing the parallax and proper motion between Hipparcos and TGAS. Of course, many unknown unresolved binaries may hide within these single-star samples.

A difference in behaviour between those different subsets with respect to the single-star samples was looked for, using various parameters: the parallax and proper motion residuals (TGAS-external), and the TGAS errors, goodness of fit, and excess noise (source modelling errors). Mainly acceleration solutions are expected to show large discrepancies between their proper motions in TGAS and those from Hipparcos or TDSC. Another source of discrepancy may be the fictitious difference created by the comparison of TGAS and Hipparcos proper motions for close systems for which only the photocentre was observed by Hipparcos. For example, it was found that the excess noise, which is about 0.5 mas on the average except for very bright stars (Sect. 6.5) did not exhibit significant differences between single, primaries, and secondaries; on the contrary, unresolved systems had significantly degraded solutions with about 1.2 mas excess noise on average in the 7 ≲ G ≲ 12 mag range.

Several other tests have also been done on secondary components, checking whether the separation or position angle with respect to the primary component had no adverse effect. In the past, during the validation of early preliminary Gaia data, it had been found that proper motions of many secondaries below 2′′ separation had a large discrepancy (up to a 80 mas yr-1 amplitude) compared to TDSC. Noting that 2′′ divided by the time span between Hipparcos and Gaia (2015−1991) gives about 80 mas yr-1, it was deduced that the cross-matching of some close double stars had been deficient: most probably an incorrect first epoch position had been used for the Tycho-Gaia astrometric solution (TGAS), e.g. the Tycho position for the A component was associated with the observations of the B component because it was closer to it, depending on the position angle of the system, and vice versa.

Unlike the preliminary Gaia data, the TGAS solution disregarded stars with a parallax uncertainty larger than 1 mas, which received a two-parameter astrometric solution instead. However, for close double stars which remain in TGAS, and as can be seen in Fig. 35, there are still several misidentified pairs, and it is unclear whether the misidentification originally came from Gaia or Tycho. Using this figure, it should be easy for the user to detect and reject the bad astrometric solutions for pairs (both components) depending on a) separation below 2′′, b) position angle in the bad range, c) proper motion differences above uncertainties, and possibly d) high excess noise.

Mean difference in parallax in mas between the BGMBTG2 model simulation and the TGAS data, in different rings of latitude, for five magnitude bins in VTfrom left to right, from 9−9.5 (left) to 11−11.5 (right).

6.3. TGAS validation from the comparison with Galaxy models

Two Besançon Galactic Model simulations have been run for TGAS validation, using slightly modified models, both in density laws and kinematics, in order to verify the dependency of the model inputs to the validation. Both simulations were done with the model described in Czekaj et al. (2014) where the evolutionary scheme has been updated, as well as the IMF, SFR, and evolutionary tracks. Moreover, the thick disc and halo populations have been updated, following Robin et al. (2014), with new density laws. Concerning the kinematics, we used alternatively the standard model kinematics Robin et al. (2003, hereafter BGMBTG2) and a revised kinematics from an analysis of RAVE survey (Robin et al., in prep., hereafter BGMBTG4). BGMBTG2 and BGMBTG4 also differ by several model parameters such as the extinction model and thin disc scale length.

Using two different models allows us to evaluate what is due to acceptable model variations in the parallax and proper motion distributions. Model parameters are described in Mor et al. (2015, internal Gaia documentation GAIA-C9-TN-UB-RMC-001). The simulations contain binary systems where the second component is merged with the primary when the separation is smaller than 0.8 arcsec, the estimated resolution of the Tycho-2 catalogue. We also introduced the uncertainties expected in TGAS after 6 months of Gaia observations, following the recipes published in September 2014 after the commissioning phase6.

The validation was done by comparing the proper motion and parallax distributions in the TGAS catalogue to simulated values. The sky was divided in healpix rings (Górski et al. 2005) with healpixsize 20, giving a solid angle of 8.5943 square degree in each bin, and 4800 bins in total. Then bins were grouped in rings of equal galactic latitudes in order to compare the values between latitude rings. Finally, we considered five latitude intervals (−90 to −70°, −70 to −20°, −20 to 20°, 20 to 70°, and 70 to 90°) in order to analyse the characteristics of the distributions in the plane, at intermediate latitudes and separately at the poles. For each region of the sky considered, we compared the mean and standard deviation between the model and the data for the parallax and proper motion distributions.

6.3.1. Parallaxes

Figure 36 shows the mean parallax differences between the BGMBTG2 simulation and TGAS data, as a function of latitude rings. Each panel corresponds to a magnitude interval of 0.5 mag width, starting at VT = 9.

From these comparisons we notice that, for bright stars, the mean parallax differences seem to suffer from a slight zero point offset, which also depends slightly on galactic latitude. The systematic shift between models and TGAS data is of the order of or less than 1 mas depending on the region of the sky, but it is unclear whether this originates from the data or the model.

In the standard deviation in parallax, the comparison with models shows a good agreement. The dominant factor in the simulation of the parallax standard deviation is the error model assumed to simulate the errors added in the BGM simulations. The good agreement implies that the dependency of the parallax errors on magnitude and latitude is in agreement with the expectations.

6.3.2. Proper motions

Figure 37 shows the differences in the mean proper motion along galactic longitude (μl ∗) between the BGMBTG2 and BGMBTG4 simulations and the TGAS data as a function of latitude healpix rings. Each panel corresponds to a magnitude interval of 0.5 mag width, starting at VT = 9. Both models show similar difference distributions with the data.

Figure 38 shows the differences in the mean proper motion along galactic latitude (μb). The zero point differences between models and data are at the level of the differences between the two models at bright magnitudes. However, systematic differences appear in the faintest magnitude bins which again can be attributed either to the model or related to large correlated errors in some regions of the ecliptic plane due to the scanning law. We also note that the higher noise level at the Galactic poles is due to the smaller number of sources.

6.4. Gaia DR1 positions and reference frame

For the billion+ sources of Gaia DR1, the only astrometric parameters available are the two components of the position. The astrometry of the secondary DR1 data set has been compared with the following catalogues:

URAT1 star positions

(Zacharias et al. 2015). URAT1 is a catalogue containing stellar positions of 228 276 482 stars down to R = 18.5, at epochs ranging from 2012.3 to 2014.6 with typical standard errors of 10−30 mas. Only stars distant enough to have a proper motion lower than 100 mas yr-1 even assuming a tangential velocity of 500 km s-1 were used. The Gaia-ESO and LAMOST surveys have been used to estimate the spectrophotometric distances of these stars (see method in Sect. 6.1.3), leading respectively to samples of 5384 and 136 234 stars. The cross-match between DR1, including TGAS, and URAT1 was done by position, with multiple detections within 0.2′′ removed.

Correlations with magnitude, colours, and sky positions are seen, but overall this effect stays within an amplitude of 30 mas.

ICRF2 QSO positions

(Fey et al. 2015). The second realisation of the International Celestial Reference Frame (ICRF2) contains very precise positions of 3414 compact radio astronomical objects. The positional noise floor is announced to be of about 40 μas and the directional stability of the frame axes of about 10 μas. A least-squares method using the covariance matrix of both catalogues allows us to estimate the rotation and dipolar deformation between the ICRF2 and the Gaia reference frames. Correlations of differences between Gaia DR1 and ICRF2 positions with other parameters such as magnitude and colours were tested, following the same methods as described above for stars.

The test has been done both on the auxiliary quasar solution and on the main Gaia DR1 secondary solution, with the same conclusions so that only the numbers corresponding to Gaia DR1 are provided below (the priors used in their astrometric reduction are different; Lindegren et al. 2016; Mignard et al. 2016). A total of 2 292 ICRF2 quasars are found in Gaia DR1 within a 0.1′′ radius. As expected by construction (Lindegren et al. 2016), no rotation versus the ICRF2 is found, but a deformation (glide) that is lower than 0.2 mas is detected. It should be noted that this deformation is no longer significant if the cross-match radius is increased from 0.1 to 0.5′′, which adds 15 sources. The residuals of the position differences normalised using the covariance matrix of both Gaia DR1 and the ICRF2 Rχ show too many outliers (10% with a p-value <0.01, i.e. 10 times more than expected) and Rχ is correlated both with the magnitude and with the number of observations. This behaviour of Rχ is the same as that observed in the comparison with Hipparcos (Sect. 6.2.2).

Concerning individual sources, four known quasars were included in the Hipparcos and Tycho-2 catalogues (HIP 60936 = 3C 273, TYC 9365-284-1, TYC 259-212-1, TYC 3017-939-1). Only the first and the last are present in TGAS. 3C 273 has an astrometry consistent with null parallax and proper motion, but this is not the case for the Tycho-2 AGN, TYC 3017-939-1 (Rχ = 25.3).

6.5. Quality indicators of the astrometric solution

As mentioned before, the Gaia DR1 astrometric solution applied only a single-star model to all stars; resolved doubles with small magnitude difference or astrometric binaries with noticeable orbital or acceleration motion are thus likely to lead to a bad astrometric fit. Second, as also described above, the adopted PSFs are not yet optimal for all stars (and probably not for very blue or very red stars), and the modelling of the satellite attitude can still be improved together with the geometric or CCD calibrations. There is no sensu stricto goodness of fit metrics in the catalogue, as they would actually never be good given the caveat above. However, there are astrometric_n_bad_obs_al, astrometric_n_bad_obs_ac, and astrometric_excess_noise and also its significance, astrometric_excess_noise_sig. In addition to a median floor at about 0.5 mas due to attitude, etc., the astrometric_excess_noise appears, as expected, sensitive to calibration problems for bright stars and extreme colours (Fig. 39). Outside these cases, and outside some regions (see corresponding figure in Appendix C), a star with a larger and significant excess noise is a candidate for being non-single. Taking advantage of these fields is thus suggested for a selection of “cleaner” samples.

6.6. Summary of the astrometric validation

Gaia DR1 is the most precise all-sky astrometric survey since Hipparcos. And indeed, the quoted parallax precision in the catalogue appears correctly estimated, or even slightly pessimistic, as found by error deconvolution and when compared to external catalogues. The only exception is the comparison with Hipparcos, which then points to some underestimation of the uncertainties in Hipparcos itself.

However, the preliminary character of the astrometric solution, and in particular problems related to imperfect attitude or instrument modelling, reveals systematic errors of the same order as the random errors. A global negative parallax zero point (about −0.04 mas) is consistently found with many independent estimation methods (quasars, period-luminosity candles, spectro/astero/photometric parallaxes). This zero-point may be, however, a consequence of large-scale spatial variations related to the scanning law that may reach at least a 0.3 mas amplitude (i.e. comparable to the median precision of stars in the catalogue). This is also consistently shown independently with quasars, LMC, SMC, or RAVE data. In extreme cases, larger local biases may be expected. Correlation with magnitude is also found towards the bright end.

For the scientific exploitation, the consequence of these systematics is that local parallax averages cannot be more precise than about 0.3 mas. Any study should take into account that any source parallax is ϖ ± σϖ (rand.) ± 0.3 (syst.) And because the correlations between parallaxes and the other astrometric parameters is frequently very large, systematics must be present as well on the other astrometric parameters.

Another consequence of the presence of astrometric systematics is that all luminosity or kinematical calibrations must ensure that the star samples are evenly distributed, which is in itself another issue, as completeness is difficult to ensure (see Sect. 4.5).

Concerning proper motions, significant differences with Tycho-2 have been found which clearly originate from this catalogue, although some correlations with Gaia-only parameters may marginally also be interpreted as originating from Gaia, but this can only be to a much lesser extent. In particular, several components of close double systems have incorrect astrometric solutions possibly due to incorrect cross-matches.

In TGAS, and in the whole catalogue, the astrometric deficiencies appear to be related to bright stars and the small number of observations. There is no doubt that these problems will be resolved in the next Gaia data releases.

7. Photometric quality of DR1

The photometric quality of Gaia DR1, i.e. its accuracy and precision, has been tested using both internal methods (using Gaia photometry only) and by comparisons to external catalogues.

7.1. Internal test of the photometric accuracy

Using the GBP and GRP photometry, a way was found to check internally the variation of the G magnitude zero point with magnitude, which we will also check below with the external catalogues. It is important to keep in mind, however, that the Gaia photometric data are correlated by the calibration procedures.

We randomly selected sources at high galactic latitude (|b| > 50°) with photometric quoted uncertainties in G, GBP and GRP<0.02 mag, and a minimum of ten observations in each band. We resampled this selection to have a uniform distribution in magnitude. An empirical robust spline regression was derived which models the global (GBP−GRP)/(G−GRP) colour relation and we computed the residuals of the observed G-GRP minus the G−GRP= f(GBP−GRP) spline.

The variation of these residuals with magnitude (Fig. 40) is consistent with what we observed in the comparison with external catalogues below. First, the variations at bright magnitudes (G < 12) are most probably linked to the different gate effects and saturation issues, and also to the change in sampling of BP and RP data at G = 11.5. Second, the window size changes on board at G magnitudes 13 and 16. In very preliminary data, this induced a strong jump at G = 13, seen and corrected in the calibration process of the DPAC photometric group (Carrasco et al. 2016). In Gaia DR1 the jump at G = 13 seems nicely corrected, but a small jump at G = 16 is still visible. The increase in the residual dispersion seen in Fig. 40 at faint magnitudes is linked to the reduced precision of GBP/GRP.

GaiaG vs. GBP and GRP photometry. Residuals of G-GRP from a global G-GRP=f(GBP−GRP) spline as a function of G magnitude. The red line is a smoothed spline fit. The sample contains 10 000 stars with a uniform distribution in magnitude; therefore, the lighter grey scale indicates less dispersion in the residuals.

7.2. Internal test of the photometric precision

With only one band and its quoted precision published, validating the photometric precision without external comparisons is difficult. It is useful to remember that the published standard uncertainty of each source has been computed using the intrinsic scatter of the fluxes obtained for this source on all the CCD observations during the first 14 months of mission (Carrasco et al. 2016). Consequently, except for possible correlations, it would be logical that the quoted uncertainties are representative of the actual precisions. We made experiments using GBP and GRP in order to check that the observed variance varies as expected with the precision σG (computed from the flux precision in the Catalogue), i.e. observed variance = intrinsic variance + unit-weight variance × standard uncertainty squared. For most stars, there was no indication that their standard uncertainties were underestimated.

However, there are about 12 million stars with G standard uncertainties better than 0.5 mmag, which are thus difficult to check. There are indications that some of the best precisions may be too optimistic: the 53 most precise stars (σG < 0.1 mmag) have a median value of about 80 observations, while the 1000 most precise have about 500 observations as their median value. While the latter may explain a good precision, the former cannot, as they would otherwise beat the Poisson noise (keeping in mind that a significant fraction of DR1 sources have standard uncertainties below Poisson noise). The most precise photometry may thus contain a mix of stars with a large number of observations (as expected) and of stars with very small apparent scatter, either by chance or due to correlations, and these uncertainties should thus not be taken at face value.

7.3. Photometric accuracy and precision from external catalogues

The following tests compare the photometry of Gaia DR1, including TGAS, with external photometry. We check here the distribution of a mixed colour index, Gaia magnitude minus the external catalogue magnitude, versus an external catalogue colour. An empirical robust spline regression was derived which models the global colour-colour relation. The residuals from this model were then analysed as a function of magnitude, colour, and sky position.

HST CALSPEC standard stars

(Bohlin 2007). The HST CALSPEC standard spectrophotometric database7 has been used to compute theoretical G-magnitudes by convolving their spectra with the nominal Gaia passband using the pre-launch nominal passband. As this passband has not yet been adapted to the real Gaia response, expected photometric differences are observed, reaching a difference of up to 0.1 mag at B−V = 1.2. This confirms that the pre-launch filter should not be used blindly by the community working on Gaia DR1 data. Instead colour−colour transformations between Gaia and other photometric systems, available in Gaia DR1 documentation, should be used. An updated passband will be provided with DR2.

BVRI photometric standard stars

(Landolt 1992). A total of 397 stars, mostly within the magnitude range 11.5 < V < 16.0 and in the colour range −0.3 <B−V < 2.3, with photometric scatter <0.02 mag have been selected for this test. The observed dispersion around the colour-colour relation is larger than the quoted errors. This can be explained by an intrinsic stellar variability or by an underestimation of the errors in one or both catalogues.

Hipparcos photometry.

The sample of the well-behaved Hipparcos stars (i.e. excluding known or suspected binaries; see Sect. 6.2.2) has been used here with extra filters to exclude variable stars (variability flag VA = 0) and to restrict the sample to stars with good Hipparcos photometry (σHp < 0.01 mag and σB−V < 0.02 or σV−I < 0.03 mag). Although the pre-DR1 filtering removed the strongest outliers, a number of outliers are still present in the colour-colour relations, but a large fraction of them can be filtered out using their photometric errors, as illustrated in Fig. 41a where red dots are stars with σG > 0.01 mag.

We have also selected a subset of the Hipparcos stars with low extinction (AV < 0.05 mag) using the 3D extinction map of Puspitarini et al. (2014) or, when the star reaches the limit of the map, the 2D map of Schlegel et al. (1998). This selection ensures a clean colour−colour spline relation G−Hp versus V−I. The residuals versus this global relation show a strong variation with magnitude (Fig. 41b), with an amplitude up to 0.01 mag. Such a systematic is ten times larger than the uncertainties quoted for G at magnitude 8. This is most likely due to saturation effects near gate changes or residual calibration errors linked to this.

SDSS photometry.

Here we usedthe tertiary standard stars of Betoule et al. (2013) calibrated to the HST-CALSPEC spectrophotometric standards with a precision of about 0.4% in griz. It covers four CFHT Deep fields and the SDSS strip 82. While the CFHT fields are in low extinction regions, for the SDSS strip only areas with a maximum E(B−V) < 0.03 according to the Schlegel et al. (1998) map are selected. The residuals versus the global colour-colour spline relation (Fig. 42) show a strong increase in the residuals at the faint end in all SDSS and CFHT fields, with an amplitude larger than the quoted uncertainties, of the order of 0.01 at G = 20. An increase in the bias at ~16 mag is also seen in the SDSS field (the SNLS is too faint to probe this magnitude) which could be due to window class change, but also to saturation in the SDSS data. We checked that the increase at the faint end is not due to the random errors alone (as the ordinate is correlated with the abscissa in Fig. 42) by checking that this increase was visible also when using all the SDSS magnitudes, in particular with z that is fully independent from the other magnitudes used for the residual computation. We note that we did similar checks for all the other external catalogues.

A confirmation of this global behaviour has been obtained with the OGLE data which were used for the completeness tests (Sect. 4.3.2). To avoid potential zero point issues, we used data from a single CCD at a time. The large extinction of those fields lead to a less well-defined colour–colour relation, but the increase in the residuals with magnitude is nevertheless also seen in the OGLE data, confirming the > 0.02 mag zero point variation with magnitude of the Gaia photometry at its faint end.

Tycho-2 photometry.

We only selected stars with photometric errors in BT and VT<0.05 mag and at high galactic latitude (| b | > 40°) in order to have a low extinction. To obtain clean colour-colour relations, the sample has been roughly separated between dwarfs and giants with a colour cut at BT−VT = 0.9 mag and an absolute magnitude cut at MG = 4.5, taking into account the parallax error at 1σ. The residuals show a variation with G magnitude, confirming the increase seen at G ~ 8 with Hipparcos and suggesting an increase at G ~ 11 as well.

2MASS photometry.

The comparison with 2MASS is more difficult owing to a sharp feature at J−Ks ~ 0.8 for the red dwarfs and the unavailability of parallaxes in Gaia DR1. To remove the red dwarf feature, we only selected stars with J−Ks < 0.7. As for Tycho-2 we only selected stars with photometric errors in J and Ks < 0.05 mag and at high galactic latitude (| b | > 40°). The residuals also show an important variation with G magnitude.

All the tests above also show a correlation of the G residuals with Gaia GBP−GRP which has not been studied in detail as this colour is not part of Gaia DR1, but this variation does not exceed ~0.01 mag. These tests also show a significant correlation between the photometric residuals and the astrometric excess noise which measures the disagreement with the astrometric model. This is expected as the astrometry and the photometry share the same PSF model and the same windows, possibly contaminated by a neighbour.

7.4. Testing G photometry using clusters

To test the photometric accuracy and precision of Gaia DR1 against published photometry of stellar clusters, we made use of a sample of high photometric quality by Taylor et al. (2008). These authors provided high precision photometry in V band (a few mmag), for five open clusters: Hyades, Praesepe, Coma Ber, NGC 752, and M 67. The photometry in this catalogue is highly homogeneous, both in data reduction and in zero point for all the clusters. In addition, we used M 4 HST photometry by Nascimbeni et al. (2014) in the F606W band, where repeated observations allowed us to reach a precision of a few mmag (for the relevant magnitudes, F606W < 21).

For all clusters, the same procedure was adopted:

The reference catalogue was checked, removing variable andmultiple stars. Variability information was taken fromSIMBAD. Multiplicity information was taken from theHipparcos catalogue. For the Hyades, we also used the Kopytovaet al. (2016) catalogue to removemultiple stars. In the case of M 4, variabilityinformation was taken from Nascimbeniet al. (2014). After this selection,the total number of stars is 40 in M 4 (down toG ~ 14), and 232 in the open cluster sample.

We extracted the Gaia data for each source. For the open cluster sample stars, the cross-match was straightforward because all the bright stars were observed in the Hipparcos catalogue. For M 4, at fainter magnitudes and with a high level of crowding, a more sophisticated cross-match procedure was followed taking into account proper motions (from L. Bedin, priv. comm.).

The difference between G magnitude and a reference magnitude depends on the apparent colour,and consequently it depends both on temperature and extinction. In the case of open clusters, to improve the statistics while working with homogeneous extinction levels, we grouped the five clusters according to the extinction level (from Taylor 2008, 2007a,b, 2006). The three groups are Coma and Hyades (E(B−V) < 0.01); Praesepe (E(B−V) ~ 0.1); M 67 and NGC 752 (E(B−V) = 0.1−0.14).

For each group of open clusters and for M 4, we derived separately the relation between G magnitude and the reference magnitude against colour, using a low-order spline.

We analysed the residuals of this function against the apparent G magnitude.

We show the residuals in Fig. 43, for the five open clusters together and in Fig. 44 for M 4. We fitted a high-order spline in both figures. The residuals clearly show systematics at a 10 mmag level related to the presence of gates, as discussed in Sect. 7.3, using a comparison with large external catalogues.

Residuals of the difference G−V against a low-order spline, as a function of the magnitude, for five different clusters. The V magnitude is from the Taylor et al. (2008) catalogue. The red lines mark the gate positions in magnitude. The green curve is a high-order spline fit to the data.

7.5. Photometry for variable stars

Gaia is particularly interesting for stellar variability studies since it provides a remarkable time-domain survey, which will help to better characterise the already known variables and will even detect new ones. Gaia DR1 includes light curves for a selection of Cepheids and RR Lyrae stars as described in Eyer et al. (2016) and Clementini et al. (2016). Several tests were developed to validate the data compared to ground-based surveys.

Additionally, objects with intrinsic or extrinsic variability may also affect the Gaia data analysis (Eyer & Grenon 2000). For instance, the instrument and/or the data processing can also introduce false variability that might be interpreted as real. This aspect has been taken into consideration to implement a set of tests able to verify that no significant statistical biases are present in Gaia DR1.

7.5.1. Testing variable stars light curves

We compared the data set of Cepheids and RR Lyrae stars included in the Gaia DR1 against the OGLE IV SEP catalogue (Soszyński et al. 2012). We found that reported Gaia DR1 periods, average G magnitudes, and amplitudes are in agreement with the external catalogue and no particular outlier was found. OGLE also classifies stars depending on their variability, no particular disagreement was found with Gaia DR1 classification.

Light curves included in Gaia DR1 were also compared to the OGLE IV SEP catalogue. Since OGLE uses V and I filters, it was necessary to transform them into G magnitudes, which was possible thanks to the internal work done by the DPAC variability group. Additionally, to ease the comparison task, OGLE light curves were linearly interpolated to match the data points present in the folded Gaia light curves, as shown in Fig. 45. This is a simple approach. The magnitude transformation is not perfect and the interpolation is more difficult in regions with fewer measurements, but it has been shown to be good enough to discard the presence of extreme outliers.

Considering the whole sample, we found an average RMS of 0.04 ± 0.02 (the average G magnitude is ~ 18.99 mag). After a visual inspection of transformed OGLE and Gaia folded light curves with larger rms, we did not identify any significant outlier.

Example of a folded light curve corresponding to a Gaia RR Lyrae star compared to a magnitude-converted and interpolated OGLE counterpart. The interpolation process hides the real dispersion present in OGLE, which is generally greater than in Gaia.

The determination of the light curves of variable stars is not limited to the presence of accurate photometry; it is also fundamental to have reliable registered times for each measurement. To validate this aspect, we computed and compared the time separations between the moment of maximum and minimum magnitude in the Gaia and OGLE light curves. As a complementary test, we also computed , where tmax are the times of maximum magnitude and p the period, and we considered the decimal part of v, which should be close to 0.000 or close to 0.999 if the variable has gone through the full variability cycle an integer number of times. Both validations were executed considering the whole group of variables together since it is expected that in individual cases there can be variations due to sources not pulsating completely regularly. Based on statistical tests, we did not find any significant discrepancy in the reported times between catalogues.

7.5.2. Comparing distributions of variable stars to constant stars

The Hipparcos catalogue and its variability classification was used as the main reference for creating two different subsets of Gaia sources with constant and variable stars. These groups were then compared to check that

parallaxes are not affected by variability;

no correlation exists between parallaxes or parallax uncertainties and periods, amplitudes, mean G magnitude, or colours;

mean G magnitude values are within known min/max magnitudes for variable stars.

The cross-matched group formed by constant stars contained 36 661 sources with a mean G magnitude of 8.27 ± 1.11, while the variable stars group was composed of 1820 sources with a mean G magnitude of 8.26 ± 1.10. Based on statistical tests, we found that the normalised parallax difference distributions between these two groups were consistent and, for periodic stars, that no correlation was identified with periods or amplitudes. Hence, stellar variability does not seem to have a major effect on the reported Gaia DR1 parallaxes.

7.6. Summary of the photometric validation

With very precise photometry for (much) more than one billion stars, the Gaia photometry is on the verge of becoming a standard for several decades. It is thus extremely important to understand the properties and limitations of G photometry for Gaia DR1.

It appears that systematics are present at the 10 mmag level and display a strong variation with magnitude. This is well above the standard uncertainties for bright stars and could originate from saturation and gate configuration changes. These points will be solved for the DR2.

Concerning the photometric precision, the standard uncertainties may be underestimated for the most precise,but they are probably correctly estimated for most of the other stars.

8. Conclusions and recommendations for data usage

This paper summarises the results of the validation tests applied to the first Gaia data release as a final quality control before its publication. These tests have both confirmed the global quality of the data and shown several shortcomings due to the preliminary nature of the release, which is based on a limited set of observations and was processed using initial versions of the processing pipelines (see Lindegren et al. 2016; van Leeuwen et al. 2017; Gaia Collaboration 2016a, for a more detailed discussion on these issues).

We advise the users of Gaia DR1 to keep these shortcomings in mind when using the data for scientific exploitation since they may have relevant effects on the final results. In the next sections we discuss some of the main limitations arising from these problems, but the limitations for the use of the Gaia data in any specific case should be carefully assessed as a part of the data analysis.

8.1. Effect of correlations

The astrometric data in DR1 is provided with formal uncertainties for each of the parameters (five in the case of TGAS and two in the case of the main catalogue). Although these standard uncertainties are enough when using each of the parameters in isolation, they do not contain the complete information about the error distribution of the astrometric data. Indeed, the astrometry of a star in the Gaia catalogue is the result of the Astrometric Global Iterative Solution (Lindegren et al. 2016) and therefore its parameters (whether two or five) are obtained from a joint fitting during the Source Update stage. Thus, strictly speaking, the error distributions of these parameters can only be described by a joint distribution of all of them.

For this reason DR1 provides, in addition to the standard uncertainties, a correlation matrix for the astrometric parameters: a correlation value is given in dimensionless units (values in the range [− 1,1]) for each pair of parameters. This matrix should be used for the error analysis when the astrometric parameters are jointly used. For instance, the calculation of the transverse spatial velocity of a star requires the use of its parallax (for the distance) and the proper motions in right ascension and declination; therefore, the three correlations between them will be needed for the error analysis. If the correlations are high the three uncertainties cannot be treated as being fully independent. If the correlations are not included, the dispersion of velocities could be underestimated, for instance.

It is also important to note that in Gaia DR1, due to the limited time span and number of observations, the values of these correlations can be large. For instance, Fig. 46a shows the histogram of the μα ∗ and ϖ correlations in the TGAS data set. It is clear that the fraction of stars with high correlations is large. However, although this applies to most TGAS stars, the Hipparcos subset is strikingly different (Fig. 46b) as the precise first epoch Hipparcos positions allowed the proper motion to be decoupled from the parallax.

The Gaia DR1 covariance matrix between parameters should, however, be used with some caution. All the tests performed against external catalogues using the covariance matrix to compute the residuals Rχ indicate a much greater number of outliers than when using only each astrometric parameter normalised residuals independently. The abnormally high values of Rχ can be seen in Fig. 33a for the Hipparcos catalogue, and theymost probably explain the bright Gaia sources mismatch with UCAC4 (Fig. 6) as well as the high number of LMC member stars removed by a χ2 test. Moreover, a strong increase in the Rχ residual for bright sources has been seen on the Hipparcos proper motions (Fig. 33b) and on the ICRF2 QSO positions as well. This indicates that a censorship using the covariance matrix will induce a censorship on the magnitude too.

And again, in addition to the correlations between astrometric parameters, there are also correlations between stars which produce systematics at small scales (Sect. 6.6).

8.2. Censorship and truncation, completeness

As discussed in Sect. 4, Gaia DR1 is incomplete in several ways. There are global effects, small-scale effects, and also effects related to crowding, angular separation, brightness, colour, proper motion, and position that make the incompleteness of the catalogue very difficult to describe. For this reason using Gaia DR1 for star count analysis, although not impossible, should be done with great care. Especially in small fields, the complex features of the completeness caused by underscanning and lack of on-board resources (see Fig. 10) should be taken into account.

8.3. Data transformation and error distributions

In addition to the limitations described above which are due to the characteristics of Gaia DR1 and related to its preliminary nature, we want to conclude this paper with a warning to the user about potential biases introduced by the use of transformed quantities. A complete discussion of this issue is beyond the scope of this text, and we instead refer the reader to other texts.

First of all, the TGAS data set in Gaia DR1 provides an unprecedented set of stellar parallaxes, more than two million. More frequently, however, the users of these data will rather be interested in obtaining stellar distances from the parallaxes, and the first obvious idea will be just to apply the well-known relation (3)where d is the distance in parsecs and ϖ is the parallax in arcseconds. Although this relation is formally true, the presence of observational errors complicates its use for the estimation of distances from parallaxes. We use the word “estimation” on purpose because in practice this is the most we can do to obtain a distance from a parallax: build an estimator. Owing to the observational error the observed parallax will be a value around the true parallax, determined by some statistical distribution describing the error. In the case of Gaia this distribution is almost Gaussian, its width given by the standard uncertainties in the catalogue and centred (unbiased) in the true value within the limits of the systematics described in previous sections.

A discussion on how to use the observed parallaxes – understood as the realisations of the error distributions – was already presented at the time of the release of the Hipparcos catalogue in Brown et al. (1997), and a further discussion can be found in Arenou & Luri (1999). We refer the reader to these papers, which warn about the truncation of samples based on the relative parallax error and about the bias in the estimated distances if one just naively inverts the observed parallaxes.

Solving these problems is not straightforward. Simple procedures can help to some extent. For instance, never average distances obtained from inverting observed parallaxes, but rather first average the parallaxes and then invert the result (see Arenou & Luri 1999). But a proper solution would require a careful analysis of the problem in hand to define an unbiased estimator of the distances needed, for instance using a Bayesian estimator. We refer the reader to Bailer-Jones (2015) for a discussion of these methods. Besides distances, another application of parallaxes is the computation of an absolute magnitude; here again, the formal expression MG = mG−10 + 5log (ϖ)−AG has to face the non-linear use of parallaxes having an observational error.

Beyond the problems with using the trigonometric parallaxes discussed in the papers cited above, we also want to add a word of warning about the comparison of the Gaia DR1 parallaxes with parallaxes from other sources. In this case the properties of the error distribution in each catalogue, and their combined effect, should be properly taken into account when drawing conclusions about the comparison. We illustrate this with a couple of examples. First, to compare the Hipparcos and TGAS parallaxes one can draw a plot of the differences between them versus the Hipparcos parallaxes. The result can be seen in Fig. 47a, and to the unwary reader this figure can suggest a strong systematic difference between the two sets for small values of the parallax ϖ ≲ 2 mas. However, this behaviour is just what one can expect when drawing this figure when the two sets of parallaxes have significantly different values of the uncertainties. Figure 47b shows this using simulated data. Starting from a set of error-free (simulated) parallaxes imitating the distribution of the data set used in the previous figure, two sets of parallaxes were generated, one with uncertainties around 1 mas (Hipparcos-like) and the other with uncertainties around 0.3 mas (TGAS-like). As can be seen in the figure, although the simulation is completely bias-free and therefore without any systematic difference between the two sets of parallaxes, the figure is similar to the one from real Hipparcos data and could (wrongly) suggest the presence of systematic effects in one or another catalogue. In fact, the asymmetric top-tail in these figures is just an effect of the longer tail of negative parallaxes in the Hipparcos data when compared with the TGAS data.

A second example of such effects derived from the error distributions in the parallaxes is present when comparing trigonometric parallaxes versus photometric or spectroscopic parallaxes. In this case the effect does not come from the different magnitudes of the errors but from their different distributions, the first being Gaussian and the second (derived from magnitudes or spectra) being log-normally distributed. Figure 48 shows another simulation illustrating this effect. Starting from a set of error-free (simulated) parallaxes two sets of parallaxes were generated: one with log-normal errors (photometric-like) and another with normal errors, in both cases with a standard deviation of 0.3 mas. Again, the figure could suggest to the unwary reader a systematic effect, making the TGAS parallaxes smaller than the photometric ones, especially for large parallaxes (short distances). The linear fit (red line) added to the figure stresses this effect. However, as stated, the simulation is completely bias-free and therefore this effect comes purely from the properties of the error distributions of the two data sets and the complete (anti)correlation between abscissa and ordinate (see also Arenou & Luri 1999, Fig. 4). Spurious distance-related biases may then be wrongly attributed to Gaia when random errors (as well as systematic photometric errors) are not properly taken into account.

The discussions presented above on the proper use of the parallaxes also extend to the case of the G magnitude contained in Gaia DR1. The archive does not contain, on purpose, standard uncertainties for these magnitudes. Instead, errors are given for the fluxes from which these magnitudes are obtained, along with the fluxes themselves. The problem in this case is again that obtaining the desired quantity (the magnitude m) from the observed quantity (the flux F) is non-linear, m = −2.5log (F) + C0, where C0 is the zero point of the photometric band. As in the case of the parallax, this non-linearity will introduce biases if not properly taken into account, although in this case the effect is less severe because the relative errors are smaller. We note here, however, that the flux uncertainty provided in Gaia DR1 corresponds to the observed scatter which can be much lower than the systematics and may therefore not be fully representative of the actual uncertainties for bright stars.

8.4. Conclusion

At the end of this paper, it should be remembered that the validation, by its very nature, has focused more on the various problems found rather than on the intrinsic quality of the Catalogue. The summary of the Catalogue completeness can be found in Sect. 4.5, of the astrometry in Sect. 6.6, and conclusions about the photometry are given in Sect. 7.6.

It must nevertheless be underlined that the Gaia DR1 represents a major breakthrough since the Hipparcos Catalogue on the direct measurement of the solar neighbourhood. With 20 times more stars than Hipparcos, and a median precision that is three times better, it will provide a new basis for studies on stellar physics and galactic structure, provided the limitations shown above are accounted for.

With the promise of soon being superseded by the Gaia DR2 data, Gaia DR1 proves the ESA cornerstone mission concept, the good health of the instruments, the capabilities of the on-ground reconstruction, and the strong dedication of the community members involved in the project.

The “scan direction strength” fields in the Catalogue quantify the distribution of AL scan directions across the source, and scan_direction_strength_k1 is the degree of concentration when the sense of direction is taken into account; a value near 1 for scan_direction_strength_k4 indicates that the scans are concentrated in two nearly orthogonal directions.

5

If we assume, as shown Sect. 6.1.1, a −0.04 ± 0.003 mas zero-point for Gaia DR1, an estimate of the Hipparcos zero-point (new reduction) would then be + 0.054 ± 0.005 mas. This would also be the zero-point of the first Hipparcos reduction as the average parallax difference between the two reductions is about 0. This value is then marginally consistent with the estimation done two decades ago (−0.02 ± 0.06 mas, Arenou et al. 1995) with preliminary Hipparcos data, or with the published data, −0.05 ± 0.05 mas (ESA 1997, Vol III, Chap. 20).

Acknowledgments

Funding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement: the Centre National d’Études Spatiales (CNES), the European Space Agency in the framework of the Gaia project.

This research has made an extensive use of Aladin and the SIMBAD, VizieR databases operated at the Centre de Données Astronomiques (Strasbourg) in France and of the software TOPCAT (Taylor 2005). This work was supported by the MINECO (Spanish Ministry of Economy) – FEDER through grant ESP2014-55996-C2-1-R, MDM-2014-0369 of ICCUB (Unidad de Excelencia “María de Maeztu”) and the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement GENIUS FP7 – 606740. We acknowledge the computer resources, technical expertise, and assistance provided by the Red Española de Supercomputación and especially the MareNostrum supercomputer at the Barcelona Supercomputing Center. A.H., M.B., and J.V. acknowledge financial support from NOVA (Netherlands Research School for Astronomy), and from NWO in the form of a Vici grant. R.L. acknowledges support from the French National Research Agency (ANR) through the STILISM project. This work has been possible thanks to the support and efficiency of A. Brown, G. Gracia, and J. Hernández, to cite only a few. In particular, we thank F. van Leeuwen for drawing our attention to the property of the Hipparcos subset with respect to Fig. 46, and L. Lindegren for many inputs.

Appendix A: Gaia archive interface validation

Appendix A.1: Testing methodology

This section discusses the validation procedures employed in testing the design and interfaces of the archive systems delivering the Gaia DR1 data to the end-user community.

The Gaia Archive was designed to fulfil the set of data access requirements gathered through a community scoping exercise. The Gaia user community were asked to suggest a number of “Gaia data access scenarios” and to enter them on the Gaia Data Access wiki pages8. All scenarios received by March 2012 were considered and analysed, and presented in the DPAC Gaia data access scenarios scoping document GAIA-C9-TN-LEI-AB-0269. The Gaia ESA Archive (Salgado et al. 2016) was designed to take into account these user requirements.

Within the Catalogue validation exercise a “Gaia Beta Test Group” (BTG) was constituted with a remit to perform a range of usage tests on the Gaia Data Archive and associated access clients and interfaces. The BTG is composed of members with expertise in all areas of Gaia from across the DPAC. In addition, the BTG includes members from the astronomical data centres associated with DPAC.

The BTG generated a range of archive tests, documented the results of these tests, and raised fault reports in cases where the tests failed. These issues were reported through the DPAC ticketing system; each issue was assigned to the relevant members of the Gaia Archive team.

A range of the test queries generated have subsequently been reused as part of the user documentation associated with the Gaia DR1 release; in particular, many queries have entered the Gaia DR1 Cookbook10.

Appendix A.2: Testing the main Gaia DR1 archive

The main website access to the Gaia DR1 data is accessible online11. This was made available to the BTG at an early stage, initially populated with simulation data. Testing commenced in early 2016, with an initial focus on the web interfaces to the archive. This included queries constructed via the simple form-based archive pages or via more complex queries using Astronomical Data Query Language (ADQL)12, an IVOA13 standard.

Issues raised included those related to the user interface and also to the archive documentation. Functionality issues covered topics such as simplifying bulk data download, using server side storage, and inconsistencies in data table schemas.

At the time of Gaia DR1 release to the community, all raised issues classified as high priority have been fixed or resolved. Some lower priority issues will be addressed in upcoming maintenance releases and were documented at the time of public data release.

Appendix A.3: Testing the Gaia DR1 partner archives

The Gaia DR1 has also been released through a number or “partner” data centres. These provide alternative access points to the Gaia data, and additionally each provides some specific functionalities not available through the main ESA Gaia archive.

The Gaia partner archives publishing Gaia DR1 data are available at the following access points:

Each partner data centre was provided with the Gaia DR1 data in early August 2016 in advance of the Gaia DR1 data release. This enabled a range of tests of the interfaces to be carried out by the BTG. All issues found were reported to the operators of these partner data centres.

Sky map in galactic coordinates of the standard uncertainties of TGAS: parallaxes (mas, left), proper motions in right ascension ( mas yr-1, centre), and proper motions in declination (right). The precision is, however, much better for the subset of Hipparcos stars.

An overview and discussion of the contents of Gaia DR1 can be found in Gaia Collaboration (2016a), and full details are available in the archive documentation19.

Appendix C.1: Selected TGAS statistics

Figure C.1a shows the star density of TGAS in galactic coordinates. In addition to the physical features, e.g. the Galactic disc, this figure also clearly shows the traces of the incompleteness discussed in previous sections; artefacts in the shape of the Gaia scanning law show regions of underdensities arising from the removal of stars with a low number of observations in underscanned regions. We again remind the reader of the incompleteness of this release discussed in Sect. 4.

Figure C.2 shows the distribution of the uncertainties in TGAS astrometry over the sky. As can be seen the distribution of these uncertainties is quite inhomogeneous around the sky, some large regions with small uncertainties and some regions with large uncertainties. These features are also present in the distributions of the uncertainties of other parameters, e.g. the magnitudes. Therefore, we advise the reader to always use the uncertainties given in the catalogue for the analysis of the data and to never rely on an average error. Also, as discussed in Sect. 8.1, the correlations between the astrometric parameters should be taken into account for the error analysis. These correlations can be significant in Gaia DR1 and its sky distribution is very inhomogeneous, as illustrated in Fig. C.1c. We note the large areas with significant positive or negative correlations.

Although these uncertainties and correlations represent the behaviour of most TGAS stars, it is important to note that the corresponding figures with the Hipparcos subset alone are very different, due to much smaller uncertainties and correlations (see e.g. Fig. 46b).

Appendix C.2: Selected global statistics

Figure C.4a shows the star density in galactic coordinates of the global Gaia DR1 data set. Although less prominent than in Fig. C.1a, the artefacts in the shape of the scanning law due to the incompleteness caused by the selection applied are still present, and should be taken into account for star count analysis, as already discussed.

On the other hand, Fig. C.3 illustrates the distribution of the uncertainties in magnitude and position as a function of the G magnitude. As illustrated by these figures, the behaviour of the uncertainties approximately follows the mean dependence on G expected for the mission Science Performance estimations20, but

also shows features due to the effects of on-board priorisation of the Calibration Faint Stars at every magnitude (vertical lines), some jumps due to the effects of the CCD gates (at the bright end), and a wide dispersion around these mean relations due to the varying number of observations and star colours. Again, we advise the reader to always use the uncertainties given in the catalogue for the analysis of the data and to never rely on average errors or error relations.

Comparison in the LMC and SMC of the correlations between astrometric parameters: the median of the standard correlations given in the Catalogue appear consistent with the empirical values computed with the astrometric residuals.

All Figures

Number of pairs of sources vs. their angular separation in the field (l = 350°, b = 0°) before filtering (red) and after (green). The line corresponds to a random distribution up to 10′′ of the latter.

Relative star count differences between Gaia DR1 and the GOG18 simulation in different magnitude bins, from 12 < G < 13 to 19 <G < 20 in steps of one magnitude in galactic coordinates. In addition to the prominent feature of the Magellanic Clouds (absent from the Galaxy model) and inadequacies of the 3D extinction model in the Galactic plane, the Gaia incompleteness around the ecliptic plane due to the scanning law becomes clear at G > 16.

Star counts per square degree as a function of magnitude in several (l, b) directions. Crosses linked with lines are for Gaia DR1 data, filled blue circles are simulations from GOG18. Error bars represent the Poisson noise for one square degree field. The bottom row shows two regions impacted by the scanning law and the filtering of stars with a low number of observations.

Completeness against density in the field of three chosen GCs in different magnitude ranges. Fields such as NGC 1261 have a median of 220 observations, allowing for a much better completeness in the denser regions than NGC 6752 (40 observations).

Stellar distribution for six chosen GCs, colour-coded by number of G observation for each star. Top row: examples of holes caused by limited on-board resources or bright stars. Bottom row: in some regions patterns are visible corresponding to stripes where no stars had a sufficient number of observations.

G magnitudes for a dense field (l = 330°, b = −4°, ρ = 2°) and a sparse field (l = 260°, b = −60°, ρ = 15°). The sparse field has been scaled to give about the same number of sources as the dense field.

Simulation of the distribution of source-to-source distances in a dense, random field (left) after applying selection criteria similar to Gaia DR1. The fraction retained is shown in the right panel. The field has a true source density of 500 000 stars per square degree, but only 322 000 remain after applying the selection criteria.

Ranking of 2D subspaces according to their mutual information in the TGAS data (x-axis) vs. the simulation (y-axis). The black squares correspond to subspaces formed only from observables, while the blue crosses are those containing an uncertainty, and the magenta circles contain a correlation parameter. The red hexagons correspond to the subspaces shown in Fig. 21.

Examples of the subspaces showing a strong deviation from the 1:1 expected relation shown in Fig. 20, particularly in the astrometric errors (left) and correlations (right) in TGAS (top) compared to those in the simulations (bottom).

Left: distribution of regions for which the mutual information has been computed, where the inset indicates the number of observations inside the regions. The regions are circles in l−sinb space, with the positive b region in solid and its symmetric counterpart in dashed. Regions that are compared and are not symmetric are connected by a grey line. Right: average deviation of the mutual information between a region and its counterpart, in (red) blue for (non-) symmetric counterparts.

Median parallaxes of quasars in 2° radius regions (mas), ecliptic coordinates. There is little insight in the Galactic plane, due to the lack of objects.Outside of this plane, local systematics with about 0.3 mas characteristic amplitude can be seen.

Distribution of ϖTGAS/ϖRAVE−1 for ~ 200 000 stars matched in the RAVE catalogue to the TGAS solution. Stars along EPSL, λ ~ 180°, appear to have a systematically overestimated parallax of up to ~ 0.3 mas; stars with G magnitudes in the range 10−11.5 and colour 1.4 ≤ GBP−GRP ≤ 1.8 are the most strongly affected.

Distribution of the differences between the mean TGAS parallaxes and the one from photometric distance for the distant open clusters. Red and blue labels are attributed to the clusters defined in Fig. 30.

Best-fit uncertainties from deconvolution of parallaxes vs. standard uncertainties for TGAS Hipparcos stars (left) and for Tycho-2 (non-Hipparcos) stars (right) with the bisector represented. Error bars include all sources of uncertainty, including bias correction.

Sky variation of the normalised residuals of the TGAS vs. Hipparcos parallaxes in ecliptic coordinates. Although correlation with the sky position is significant, no sky region indicates a normalised residual larger than 2.6.

Mean difference in parallax in mas between the BGMBTG2 model simulation and the TGAS data, in different rings of latitude, for five magnitude bins in VTfrom left to right, from 9−9.5 (left) to 11−11.5 (right).

GaiaG vs. GBP and GRP photometry. Residuals of G-GRP from a global G-GRP=f(GBP−GRP) spline as a function of G magnitude. The red line is a smoothed spline fit. The sample contains 10 000 stars with a uniform distribution in magnitude; therefore, the lighter grey scale indicates less dispersion in the residuals.

Residuals of the difference G−V against a low-order spline, as a function of the magnitude, for five different clusters. The V magnitude is from the Taylor et al. (2008) catalogue. The red lines mark the gate positions in magnitude. The green curve is a high-order spline fit to the data.

Example of a folded light curve corresponding to a Gaia RR Lyrae star compared to a magnitude-converted and interpolated OGLE counterpart. The interpolation process hides the real dispersion present in OGLE, which is generally greater than in Gaia.

Sky map in galactic coordinates of the standard uncertainties of TGAS: parallaxes (mas, left), proper motions in right ascension ( mas yr-1, centre), and proper motions in declination (right). The precision is, however, much better for the subset of Hipparcos stars.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.