The WFCAM pipeline/archive project

All UK-owned WFCAM data, whether public UKIDSS data or open-time private data, will
be processed through a pipeline producing standardised data products, and ingested
into an on-line science archive. The UKATC, as builders of the camera, will provide
software for operation of the instrument, and extraction of data from it. Following
this, the responsibility for the pipeline and archive system belongs jointly to the
JAC, and the two UK rolling-grant funded wide field astronomy groups, the Cambridge
Astronomy Survey Unit (CASU) of the Institute of Astronomy, Cambridge University,
and the Wide Field Astronomy Unit (WFAU) of the Institute for Astronomy, Edinburgh
University. In addition, the National Astronomical Observatory of Japan (NAOJ) at
Hilo, Hawaii, and Mitaka, Tokyo will collaborate to develop further advanced
pipeline facilities. The JAC has end-to-end responsibility. CASU is taking prime
responsibility for the pipeline processing, and WFAU for the science archive, but
the boundaries between these activities is blurred, and all teams are working
closely together. Finally, the pipeline/archive project is formally integrated with
the VISTA Data Flow System (VDFS) project, which is led by Jim Emerson at QMUL.

All WFCAM observations will be in a queued mode, using UKIRT's new Observing
Management Protocol (OMP) system. Furthermore, there will be very few possible
observing modes. For both UKIDSS and open-time observations we will be enforcing
fixed standard calibration procedures. All these simplifying factors make a standard
pipeline possible for all WFCAM data. However, no standard pipeline will ever
squeeze all the possible information out of the data, and there will always be
users, or types of question, that require different assumptions or algorithms in the
processing. We are not attempting to construct an all-purpose completely flexible
pipeline toolkit, but rather a
processing pipeline to produce pre-agreed standard data products. These should be
good enough for most purposes, but the design is optimised to produce the survey
products demanded by the UKIDSS project, with optimal stacking, mosaicing, and
source extraction, and uniform astrometric and photometric calibration across survey
fields. The Science Requirements Document (SRD) for the pipeline and archive is
under construction as we write, and is expected to be agreed with the JAC and with
the UKIDSS
consortium by the end of 2002. More advanced processing, and an on-the-fly
user-pipeline toolkit may also be constructed, but these are over and above the
commitment to the standard processing.

The processing can be seen as divided into several stages - data acquisition, the
summit pipeline, the basic pipeline, further survey-wide processing, ingestion
to the archive, refinement of calibration, and serving the data to users. At the
summit, data from all exposures within a single night at the same telescope pointing
position (including micro-steps within that position) and using the
same filter, are co-added on the spot by
the Data Acquisition System (DAS). Regardless of the macroscopic dither
pattern through the night, data from
the four arrays are kept distinct, as are data from different pointings, so that the
data written to media consist of a collection of 4096 co-added frames
within one night. These frames are then the analogy of traditional plates and form
the basic units of the archive from which all other data products are constructed.
The main purpose of the summit pipeline is to generate near
real-time Data Quality Control (DQC) information from the co-added frames. It will
use fixed library frames for instrument signature removal (e.g. flat fielding) and
will do a first cut source extraction. The reduced frames, DQC, and a statistical
analysis of the source lists, will be examined in Hawaii by UKIRT and or UKIDSS
staff and used to update a survey progress database, which then feeds back to
the observing queue.

The raw data (i.e. the collection of co-added 4K frames from each night) will be
sent to the UK on a daily basis for processing with the basic pipeline in
Cambridge. The default plan is that the data will be sent on a nightly tape, but we
are discussing the possibility of sending a hot-swappable hard disk drive, as used
by the ESO NGAST system. The basic pipeline will use the same software as the summit
pipeline, but it will process the real calibration data, and will estimate a
separate PSF from
each frame. The pipeline removes instrument signature, does a first cut photometric
and astrometric calibration, and a default source extraction. The result is a
calibrated version of the collection of 4K frames, and a separate source list for
each frame.

A series of further processing steps is needed to make final survey products,
but these steps can only be carried out as the survey data accumulates in the
science archive, so
they need to be especially carefully planned between the CASU and WFAU teams. The
most obvious thing is optimal stacking of matching frames from
different nights, and mosaicing to produce a final large pixel-map for each survey.
(Once again the first aim is a standard single pixel map, but we will also develop
the facility to build images of any given sky-area on the fly from the constituent
frames using different sampling and stacking choices). The next step is
improved PSF generation including variations over the field and within a frame
stack. Next there is improved source extraction from the stacked data, including
detection and
parameterisation of Low Surface Brightness Objects and transient events. Then we
have pairing of sources across catalogues in different filters to make YJHK
colours, and pairing with objects in external catalogues, such as the SDSS. Finally,
as large area surveys are accumulated, we will revisit astrometric and photometric
calibration looking for systematic gradients and step functions, and making external
checks, and eventually, over several years, deriving proper motions and variability
parameters.

Finally the data is ingested into a public science archive housed in Edinburgh. This
will have both interactive and batch modes, and all data in it will be well
calibrated and documented. It will contain all the final survey pixel maps and
paired source-lists, but will also contain all the constituent nightly 4K frames and
their standard one-filter source lists, as well as a browsable database of the
available frames. As a minimum, the user will be offered the ability to download
arbitrary subsets of these data, including on-the-fly mosaics of small areas using
specified combinations of frames. However, we expect to offer rather more
functionality, as discussed in the section after next.