*********
* DRAFT * 10/11/01
*********
REPORT OF THE SHARE COMMITTEE:
Study of Hubble Archive & Reprocessing Enhancements (SHARE)
G. Kriss, M. Dickinson, D. Fraquelli, A. Koekemoer, H. Bushouse,
D. Swade, M. Donahue, M. Giavalisco, T. Keyes, G. Meurer,
P. Padovani, W. Sparks
==============================================================================
INTRODUCTION
------------
The calibration of HST data is one of the primary areas where the STScI
adds value to the Hubble science program. The Hubble Archive has become, in
itself, a major resource for the astronomical community. The
operational debut of the On-the-fly Reprocessing (OTFR) system will radically
change the paradigm that we have applied to the calibration and storage of
HST data. This system will open up avenues for the STScI to further enhance
the scientific value and impact of the HST data sets stored in the archive.
Progress in the field of astronomical surveys and catalogs has been great in
the past few years and suggests possibilities for increasing the scientific
scope of our archive beyond that originally envisioned when the system was
designed. This is an appropriate time for us to evaluate the scientific
potential of such enhancements and develop a road map for implementing the
most promising of them.
In the past, the requirement to store uniformly calibrated data has
generally prevented adoption of algorithms requiring user input, or
knowledge of the astronomical scene. The OTFR concept removes those
restrictions and could allow selection of algorithms or processing paths by
the user. Our initial pipelines dealt with only one data set at a time.
Later, pre-defined associations of data sets were developed to allow
processing of related data sets, such as wavelength calibration of
astronomical spectra via internal calibration exposures. The OTFR concept
allows for post facto definition of associations of data, either permanent
or on-the-fly. This could allow application of simple associations to SIs
for which this was not originally available (i.e. WFPC2). This could also
allow processing of larger groups of data sets to provide more
scientifically valuable products, such as summed data sets or mosaics. The
archive catalog was originally conceived as simply an index into individual
observations. Already, its scientific value has been increased by
incorporation of pre-view images, and we are working in the direction of
seamless access to data across missions. At this point, it would be
technically feasible to extend the Hubble archive to include scientific
services such as object catalogs, their generation from HST data sets, and
direct cross-references to other catalogs and databases. Many of these
ideas are also under contemplation for the NGST era, and it is worth
considering how the archive might smoothly evolve to provide similar
services for HST, NGST, and MAST holdings.
On this basis, Rodger Doxsey chartered the SHARE committee to
1. Evaluate and recommend general capabilities, services and enhancements
to these systems.
2. Evaluate and recommend specific new scientific services and enhancements
to these systems. These should be augmentations with clear and substantial
added benefit for the research community.
3. Provide a rough road map or order for the implementation of the
recommendations made in item 1 and 2.
4. Recommend a process for encouraging astronomical community participation
in the development of such enhancements.
5. Recommend a process for regularly assessing and prioritizing enhancements of
this type in the future.
The SHARE study group consists of
Gerard Kriss (chair) HST/STIS & NGST
Daryl Swade ESS
Howard Bushouse ESS/SSG
Megan Donahue ACDSD (& FASST Chair)
Paolo Padovani ACDSD
Dorothy Fraquelli ACDSD
Tony Keyes HST/COS
Mauro Giavalisco HST/WFC3
Mark Dickinson HST/NICMOS
Anton Koekemoer HST/WFPC2
Bill Sparks HST/ACS
Gerhard Meurer JHU/ACS
Our group met in a series of meetings from May -- September 2001 to develop
ideas for reprocessing enhancements and rank them in order of scientific
priority. After next examining the timescales and resources needed to
implement each of these ideas, the group intends to develop a suggested
road map for implementing several of these ideas, with the order of
implementation based on these rankings, the resource requirements, and
the fraction of the user community that is likely to benefit from an
implementation of a given idea. External input from advisory bodies such
as the Space Telescope Users' Committee (STUC) should play an important
role in making final decisions on implementation.
POTENTIAL REPROCESSING ENHANCEMENTS
-----------------------------------
We have identified a number of possible enhancements that could be implemented
via the OTFR mechanism. We prioritized the full list we developed on the basis
of their potential scientific value, with additional weight given to the
unique capability that the facilities and knowledge resident at STScI could
bring to bear. In this initial study, we cast a broad net to capture as many
potential ideas as possible. In some cases our suggestions might be
characterized more as "data analysis" rather than "data processing".
If considered for implementation, we note that such "analysis" enhancements
would benefit from a higher level of testing, documentation, and scientific
peer review than normally accorded pipeline data processing routines.
In order, the current possibilities are
1. Improvements to the accuracy of the World Coordinate System in
the headers of calibrated data products.
2. Combining images to produce a wide-field mosaic or deep stacked image.
3. Identifying and classifying objects in a single image or a combined set
of images.
4. Producing a catalog of objects in a single image or a combined set
of images.
5. Determining photometric redshifts for objects in a region of sky which
is imaged in a number of different bands.
6. Combining spectra from several different observations or from dithered
STIS observations.
7. Using the time history of data and calibrations to enhance the reprocessing.
8. Providing better quality information for data sets in the archive.
9. Allow users to specify customized parameters to be used when running OTFR
to tailor the pipeline calibration to their scientific needs.
10. Creating "data cubes" from dithered long-slit spectroscopic observations.
We note that there was a wide dispersion in our rankings of the priority to
be given to each of these tasks. Basically, the top third, middle, and
bottom thirds of the above list have similar priorities.
Below we give more detailed descriptions of each of these topics.
WCS Improvements
================
Description:
The primary purpose of this enhancement is to improve the astrometry that
is encoded in the WCS keywords information in image headers, by making
use of updated measurements and astrometric information about guide stars
from data taken with HST.
Science Case:
The current astrometric accuracy of HST images is limited by the inherent
astrometric uncertainties in the Guide Star Catalog system, and is generally
likely to be accurate to no more than ~0.5 - 1 arcseconds. However, there is
often a need to obtain much higher astrometric accuracy, in particular when
comparing images at different wavebands from different telescopes (radio,
optical, X-ray) as well as different images obtained with HST, generally at
different times and in unrelated programs (e.g., narrow-band and broad-band
images of the same object). This is required when carrying out source
identifications based on multi-wavelength information, when examining
color gradients across a given object, or when determining the relative
location of morphological features seen in different bands. Ideally it would
be desirable to achieve astrometric accuracy to a level comparable to the
measurement error of unresolved sources on HST (i.e., << 0.1 arcseconds).
This issue involves two separate, but related, concepts: relative astrometric
precision between different HST images, and absolute astrometric accuracy with
respect to some other well-defined, high-resolution system (the global VLBI
reference frame is one such example).
Currently, relative astrometric accuracy between different HST images is
generally only achievable in cases where stars or other bright unresolved
objects are common to both images. For images that have few or not stars
(e.g., narrow-band or UV observations), relative astrometry can still be
aimed at but becomes less certain, thereby directly impacting the science.
Absolute astrometry, for example between HST images and radio data, is
generally only possible in images that contain sources unresolved in both
radio and optical, or otherwise display precise one-to-one correspondence
in the two bands. Otherwise, if sources are resolved in one or both and
display different morphologies, absolute registration becomes uncertain.
Unique STScI Capability:
STScI has the ability to update the GSC information, as well as the
resources to carry out studies of all the guide stars for which astrometric
information may be updated. Furthermore, only STScI has the ability to
automatically incorporate the updated information into the archive data
processing pipelines, thereby potentially eliminating the need for users
to carry out any further refinements on the astrometry.
Drawbacks:
The only minor drawback is that some guide stars will have better
astrometric information than others, so this capability will produce
improvements on a non-uniform basis for different images.
Combining Images
================
Description:
This archival capability would allow users to combine images that are
offset with respect to one another, creating a single output image
(i.e., doing drizzle on-the-fly). It includes relative registration
between images, combining images that are offset by relatively small
amounts, and also potentially the creation of much larger mosaiced
images from pointings that are offset by scales comparable to that
of the detector itself.
Science Case:
The large majority of long-exposure HST images are split into two or more
exposures in order to facilitate the removal of cosmic rays, and furthermore
many programs make use of dithering to move the objects around on the
detectors, not only alleviating the effect of hot pixels but also in
some cases improving the PSF sampling through the use of non-integral
pixel offsets. Such techniques are already commonplace with WFPC2 and STIS,
and is expected to be the norm for NICMOS and ACS observations.
However, the archive currently offers only limited capability for combining
separate CR-SPLIT images, and no current capability for combining dithered
exposures. Yet many of the steps involved in combining dithered images are
repetitive and time-intensive and can potentially be automated. Furthermore,
steps that still currently involve human iteration (such as checking image
registration) may also be amenable to automation using different parameters
for different classes of images (some examples of image categories may
include sparse extragalactic fields, dense stellar regions, or extended
bright diffuse emission across the field).
Unique STScI Capability:
STScI has the ability to maintain up-to-date information on geometric
distortion, image pointing information (in the form of jitter files and
other telemetry information).
Drawbacks:
Although many steps in combining images using "drizzle" can now be
automated, there still remains a need for manual intervention/iteration
in some cases, and at a minimum there is a need for observers to check
that the images have been combined correctly. Furthermore, the parameters
for cross-correlation and combination must often be fine-tuned according
to the nature of the images themselves (for example, depending upon
whether there are many bright stars or only a few faint diffuse objects)
and care would need to be taken in generalizing the algorithm to deal
with such different types of images in a way that will still provide
useful results.
Object Detection and Classification
===================================
Description:
This archival capability would allow for the detection and classification
of objects on an image, based on a set of pre-defined criteria (possibly
selected by the user from one of several alternatives, depending upon
the nature of the image).
Science Case:
Although several object detection routines exist, their behavior often
needs to be fine-tuned by the user. Incorporating this capability into the
archive/pipeline would allow standardized behavior according to some set
of parameters (perhaps several different sets of parameters, optimized for
different classes of images - spare vs crowded-field, for example). The
advantage of standardized behavior is that it allows well-defined
completeness and object detection thresholds to be specified.
Unique STScI Capability:
The ability to standardize the behavior of the photometric routine,
particularly optimizing it for several different classes of images, is
something that is unique to STScI.
Drawbacks:
If the object detection parameters are set up such that this technique
will yield useful results for a large fraction of images, then it is also
likely to extract less information than would be the case if it were
optimized for a specific dataset. Furthermore, the same dataset can
sometimes be used to create different kinds of catalogs - some catalogs
may be of bright point sources, another catalog may be of faint diffuse
galaxies in the same image - and the same set of parameters is unlikely
to work in both cases, thus the observers would likely need to fine-tune
the parameters themselves in such cases and re-run the task manually.
Automated Catalog Generation
============================
Description:
This capability would build upon the object-detection technique, by
creating catalogs for individual images or datasets that are linked
together in some way (for example by their filter selection, or their
exposure time, or their location on the sky), and potentially from a
number of different observing programs.
Science Case:
The aim of this capability would be to allow users to specify parameters
such as filter selection, exposure depth, possibly limiting magnitude or
range of magnitudes, and thereby create a custom-made catalog containing
objects from all images (not necessarily from the same program) that satisfy
these criteria. Thus for example one could create magnitude-limited samples
of all objects in all images that have some specified minimum exposure depth,
from a large number of different programs, thereby creating a well-defined
sample that may cover a much larger area than any of the individual programs.
Unique STScI Capability:
The ability to store and associate image parameters with the catalogs
generated from each image is something that can only be done internally
in the STScI database.
Drawbacks:
In order for the catalogs to be useful, the behavior of the object
detection routines would need to be reasonably well quantified, and
furthermore would need to be automatically applicable to a relatively
large fraction of datasets. It is not clear how practical this will
be to carry out.
Determining Photometric Redshifts from Multicolor Imaging Data
==============================================================
Description:
Estimating photometric redshifts from multicolor imaging data
Science Case:
The use of multicolor photometry for estimating redshifts of
galaxies has become an increasingly common tool for extragalactic
observers. This was spurred in a large part by the availability
of the WFPC2 Hubble Deep Field images, which provided high quality
multicolor optical photometry for thousands of galaxies in a field
where extensive spectroscopy was also available to calibrate
the photometric redshift methods. Even after an orbit or two,
WFPC2 images detect galaxies faint enough that spectroscopy becomes
impractically difficult, motivating the desire for color-based
redshift estimates. A variety of methods have been employed,
and some stand-alone software packages have already been created
to compute photometric redshifts from multicolor photometry
catalogs.
If object catalogs for HST images were to be generated as a data
product, then one might imagine feeding these to a photometric
redshift estimation code and providing the results as another data
product, potentially searchable via the archive.
In practice, however, reliable, general purpose photometric
redshift estimates require images through at least three filters,
preferably at least four. Even then, especially with optical
photometry alone (e.g., from WFPC2 or ACS), the range of redshift
over which photometric redshifts can reliably be estimated is
limited (e.g., galaxies at 1 < z < 2 generally require both optical
and infrared photometry for quality photo-z estimates).
Only a small subset of HST imaging data would be suitable for
photometric redshift estimation, and it would be difficult to
implement the sort of "quality control" that would result in
easy-to-use and reliable results for the general user.
In principle, even with measurements in only two or three bands,
a photometric redshift code could be used to provide a likelihood
function L(z) which could restrict the range of possible redshifts
for a galaxy without necessarily specifying one "preferred"
redshift estimate. However, such a product would be more complex
to use, interpret, or to search than a straightforward
catalog with single data values for each object, and there
would be risk that naive users might use these redshift estimates
uncritically without considering the very substantial uncertainties
involved, especially for non-HDF-like data.
Unique STScI capability:
If object catalogs from HST images were being automatically
generated as an STScI data product, STScI would be in a convenient
position to then automatically feed multicolor data meeting some
particular criteria to a photometric redshift estimator, and
to standardize the performance and output product format.
Drawbacks:
Relatively few data sets would be suitable for providing
good photometric redshift estimates. Therefore, only a small
fraction of the archived data would benefit from this effort,
or, perhaps worse, inadequate and misleading photometric
redshift estimates could be calculated and distributed for
a larger body of unsuitable data sets.
Combining Spectral Data
=======================
Description:
This enhancement would permit the user to specify data sets from multiple
observations of a target and request that they be optimally combined into
a single, summed spectrum. Many STIS CCD spectral observations are obtained
in a dither pattern to optimize CR rejection, to avoid hot pixels,
and to completely sample the spatial domain. These dithered observations
could be combined in an OTFR process.
Science Case:
Many HST archival spectral observations consist of multiple data sets.
In some cases these are merely repeated, unrelated observations, and in
others it is a result of a deliberate observing strategy. The combinations
produced via OTFR could be multiple observations in a single wavelength region
whose combination increases the signal-to-noise ratio, or it may consist of
several observations at different grating settings that could be combined to
obtain broader wavelength coverage in the product. Automated combination
of dithered STIS CCD spectral observations would automatically produce a better
total product for the archive user.
Unique STScI capability:
The instrument groups at STScI have the specific knowledge of the instruments
and the associated pointing data (WCS information and jitter files) that
would be needed to perform this task routinely.
Drawbacks:
Blindly combining spectra obtained at different epochs could produce
errant results for time-variable objects. Judicious scientific input
on the part of a careful observer is generally required to assure the
validity of any result. An automated system could circumvent the careful
scrutiny usually applied when users combine data sets on their own.
In the case of merging dithered long-slit spectral observations, we note
that combining spectral images has the same (if not more) problems with
alignment and registration that affect combining images. One often
must tweak parameters manually in an iterative process to achieve an
optimum result.
Processing Data Sets Based on Time History
==========================================
Description:
Use the time history of an instrument's observing program and calibration
state as an additional factor in reprocessing data.
Science Case:
Some aspects of data processing are time-dependent, in the
sense that the optimal data processing may depend on characteristics
of the instrument which change over time, or which relate in
some way to previous or subsequent science exposures in a series.
The HST data processing system already carries out some simple
time-dependent procedures when pipelining HST data. E.g., "best"
reference files are often selected based on the date on which
an observation is taken. This may include "super" or "delta"
dark reference files, for example.
Other, more sophisticated examples could be identified.
For example, NICMOS data are subject to persistence, in which
detector pixels which collect a large number of counts in one
exposure continue to "glow" with a count rate that decays
fairly gradually with time, producing afterimages in subsequent
images. This occurs both due to astronomical sources (e.g., bright
stars which leave afterimages in subsequent exposures) and to
radiation events, especially after SAA passages when the
entire array is heavily bombarded, leaving a spatially
mottled pattern which gradually fades throughout the subsequent
orbit.
In some cases, it may be possible to track and even
correct persistent afterimages. For "astronomical"
persistence, bright sources could be identified in one
exposure, and those pixels could be flagged (at least)
or corrected (at best, if a suitable persistence model
could be defined) in subsequent science images.
The SIRTF data pipeline will attempt to do this.
For post-SAA persistence, in Cycle 11 STScI will begin
taking automatically-scheduled "post-SAA darks" in
every SAA-impacted orbit. There is hope that software
can be developed which will scale and subtract these
"darks" from subsequent images to reduce or remove the
persistence signal, although this has not yet been
generally demonstrated on-orbit. If this is successful,
it might be implemented in a pipeline. It is quite
likely that there are other examples involving other
instruments, where time-dependent processing could
improve data quality for many users.
Unique STScI capabilities:
In general, time- or history-dependent processing would
require some means to search for, link together, and multiply
process exposures with a given instrument that were taken over
some time frame, regardless of whether they are part of the
same HST proposal or not. STScI is in the best position to do
this, using direct interfaces between the data archive and
the OPUS system.
Enhance Quality Notations for Archived Data Sets
================================================
Description:
Data that are obviously unsuitable for use in any scientific
investigation or that may require special processing should
be flagged as bad, that is, called to the user's attention,
early in the observation request process.
Examples of unsuitable data include, but are not limited to,
the following.
+ NIC data taken within