Adaptive moments

Adaptive moments are the second moments of the object intensity, measured using
a particular scheme designed to have near-optimal signal-to-noise ratio.
Moments are measured using a radial weight function interactively adapted to the
shape (ellipticity) and size of the object. This elliptical weight function
has a signal-to-noise advantage over axially symmetric weight functions. In
principle there is an optimal (in terms of signal-to-noise) radial shape for
the weight function, which is related to the light profile of the object
itself. In practice a Gaussian with size matched to that of the object is
used, and is nearly optimal. Details can be found in Bernstein & Jarvis (2002).

The outputs included in the SDSS data release are the following:

The sum of the second moments in the CCD row and column direction:mrr_cc = <col2> + <row2>
and its error mrr_cc_err.
The second moments are defined in the following way: <col2>= sum[I(col,row) w(col,row) col2]/sum[I*w]
where I is the intensity of the object and w is the weight function.

The object radius, called size, which is just the square root of
mrr_cc

A fourth-order momentmcr4 = <r4>/sigma4
where r2 = col2 + row2, and sigma is the size of the gaussian weight. No error is quoted on this quantity.

These quantities are also measured for the PSF, reconstructed at the position
of the object. The names are the same with an appended _psf. No errors are
quoted for PSF quantities. These PSF moments can be used to correct the
object shapes for smearing due to seeing and PSF anisotropy. See Bernstein &
Jarvis (2002) and Hirata & Seljak (2003) for details.

The asinh magnitude

Magnitudes within the SDSS are expressed as inverse hyperbolic sine
(or "asinh") magnitudes, described in detail by Lupton, Gunn, & Szalay (1999). They are sometimes
referred to informally as luptitudes . The transformation
from linear flux measurements to asinh magnitudes is designed to be
virtually identical to the standard astronomical magnitude at high
signal-to-noise ratio, but to behave reasonably at low signal-to-noise
ratio and even at negative values of flux, where the logarithm in the
Pogson magnitude
fails. This allows us to measure a flux even in the absence of a
formal detection; we quote no upper limits in our photometry.
The asinh magnitudes are characterized by a softening parameter
b, the typical 1-sigma noise of the sky in a PSF aperture in
1" seeing. The relation between detected flux f and asinh
magnitude m is:

m=-(2.5/ln10)*[asinh((f/f0)/2b)+ln(b)].

Here, f0 is given by the classical zero
point of the magnitude scale, i.e., f0 is the flux
of an object with conventional magnitude of zero. The
quantity b is measured relative to f0,
and thus is dimensionless; it is given in the table of asinh softening
parameters (Table 21 in the EDR paper), along with the asinh
magnitude associated with a zero flux object. The table also lists
the flux corresponding to 10f0, above which the
asinh magnitude and the traditional logarithmic magnitude differ by
less than 1% in flux.

The r photometric CCDs serve as the astrometric reference CCDs for the SDSS.
That is, the positions for SDSS objects are based on the r centroids and
calibrations. The r CCDs are calibrated by matching up bright stars detected
by SDSS with existing astrometric reference catalogs. One of two reduction
strategies is employed, depending on the coverage of the astrometric catalogs:

Whenever possible, stars detected on the r CCDs are matched
directly with stars in the
United States Naval Observatory CCD
Astrograph Catalog
(UCAC, Zacharias et al. 2000), an (eventually)
all-sky astrometric catalog with a precision of 70 mas at its catalog
limit of R = 16, and systematic errors of less than 30 mas.
There are approximately 2 - 3 magnitudes of overlay between UCAC and
unsaturated stars on the r CCDs. The astrometric CCDs
are not used. For DR1, stripes 9-12, 82, and 86 used UCAC.

If a scan is not covered by the current version of UCAC, then it is
reduced against Tycho-2
(Hog et al. 2000), an all-sky astrometric catalog
with a median precision of 70 mas at its catalog limit of
VT = 11.5, and
systematic errors of less than 1 mas. All Tycho-2 stars are saturated
on the r CCDs; however there are about 3.5 magnitudes of overlap between
bright unsaturated stars on the astrometric CCDs and the faint end of
Tycho-2 ( 8 < r < 11.5), and about 3 magnitudes of overlap between bright
unsaturated stars on the r CCDs and faint stars on the astrometric CCDs
(14 < r < 17). The overlap stars in common to the astrometric and r CCDs
are used to map detections of Tycho-2 stars on the astrometric CCDs onto
the r CCDs. For DR1, stripes 34-37, 42-44, and 76 used Tycho-2.

The r CCDs are therefore calibrated directly against the primary astrometric
reference catalog. Frames uses the astrometric calibrations to match up
detections of the same object observed in the other four filters.
The accuracy of the relative astrometry between filters can thus significantly
impact Frames, in particular the deblending of overlapping objects, photometry
based on the same aperture in different filters, and detection of moving
objects. To minimize the errors in the relative astrometry between filters,
the u, g, i, and z CCDs are calibrated against the r CCDs.

Each drift scan is processed separately. All six camera columns are processed
in a single reduction. In brief, stars detected on the r CCDs if
calibrating against UCAC, or stars detected on the astrometric CCDs transformed
to r coordinates if calibrating against Tycho-2, are matched to catalog
stars. Transformations from r pixel coordinates to catalog mean place (CMP)
celestial coordinates are derived using
a running-means least-squares fit to a focal plane model,
using all six r CCDs together to solve for both the telescope tracking and
the r CCDs' focal plane offsets, rotations, and scales,
combined with smoothing spline fits to the intermediate residuals.
These transformations, comprising the calibrations for the r CCDs, are then
applied to the stars detected on the r CCDs, converting them to CMP
coordinates and creating a catalog of secondary astrometric standards. Stars
detected on the u, g, i, and z CCDs are then matched to this
secondary catalog, and a similar fitting procedure (each CCD is fitted
separately) is used to derive transformations
from the pixel coordinates for the other photometric CCDs to CMP celestial
coordinates, comprising the calibrations for the u, g, i, and z CCDs.

Note: At the edges of pixels, the quantities objc_rowc and objc_colc take integer values.

Image Classification

This page provides detailed descriptions of various morphological
outputs of the photometry pipelines. We also provide discussion of
some methodology; for details of the Photo pipeline processing please
visit the Photo pipeline
page. Other photometric outputs, specifically the various
magnitudes, are described on the photometry
page.

The frames pipeline
also provides several characterizations of the shape and morphology of an
object.

Star/Galaxy Classification
The frames pipeline
provides a simple star/galaxy separator in its
type
parameters (provided separately for each band) and its
objc_type parameters
(one value per object); these are set to:

Class

Name

Code

Unknown

UNK

0

Cosmic Ray

CR

1

Defect

DEFECT

2

Galaxy

GALAXY

3

Ghost

GHOST

4

Known object

KNOWNOBJ

5

Star

STAR

6

Star trail

TRAIL

7

Sky

SKY

8

In particular, Lupton et al. (2001a) show that the following simple cut
works at the 95% confidence level for our data to r=21
and even somewhat fainter:

psfMag - (dev_L>exp_L)?deVMag:expMag)>0.145

If satisfied, type
is set to GALAXY
for that band; otherwise, type
is set to STAR
. The global type objc_type
is set according to the same criterion, applied to the
summed fluxes from all bands in which the object is detected.

Experimentation has shown that simple variants on this scheme, such
as defining galaxies as those objects classified as such in any two of
the three high signal-to-noise ratio bands (namely, g, r,
and i), work better in some
circumstances. This scheme occasionally fails to distinguish pairs of
stars with separation small enough (<2") that the deblender does
not split them; it also occasionally classifies Seyfert galaxies with
particularly bright nuclei as stars.

Further information to refine the star-galaxy separation further may
be used, depending on scientific application. For example,
Scranton et al. (2001) advocate applying a Bayesian prior to the above
difference between the PSF and exponential magnitudes, depending on
seeing and using prior knowledge about the counts of galaxies and
stars with magnitude.

Radial Profiles
The frames pipeline extracts an azimuthally-averaged radial
surface brightness profile. In the catalogs, it is given as the
average surface brightness in a series of annuli. This quantity is in
units of
"maggies" per square arcsec, where a maggie is a linear
measure of flux; one maggie has an AB magnitude of 0 (thus a surface
brightness of 20 mag/square arcsec corresponds to 10-8 maggies
per square arcsec). The number of annuli for which there is a
measurable signal is listed as nprof, the mean surface
brightness is listed as profMean, and the error is listed as
profErr. This error includes both photon noise, and the
small-scale "bumpiness" in the counts as a function of azimuthal
angle.

When converting the profMean values to a local surface
brightness, it is not the best approach to assign the mean
surface brightness to some radius within the annulus and then linearly
interpolate between radial bins. Do not use smoothing
splines, as they will not go through the points in the cumulative
profile and thus (obviously) will not conserve flux. What frames
does, e.g., in determining the Petrosian ratio, is to fit a taut spline to the
cumulative profile and then differentiate that spline fit,
after transforming both the radii and cumulative profiles with asinh
functions. We recommend doing the same here.
The annuli used are:

Aperture

Radius (pixels)

Radius (arcsec)

Area (pixels)

1

0.56

0.23

1

2

1.69

0.68

9

3

2.58

1.03

21

4

4.41

1.76

61

5

7.51

3.00

177

6

11.58

4.63

421

7

18.58

7.43

1085

8

28.55

11.42

2561

9

45.50

18.20

6505

10

70.15

28.20

15619

11

110.50

44.21

38381

12

172.50

69.00

93475

13

269.50

107.81

228207

14

420.50

168.20

555525

15

657.50

263.00

1358149

Surface Brightness & Concentration Index
The frames pipeline also reports the radii containing 50% and 90% of
the Petrosian
flux for each band, petroR50 and petroR90 respectively.
The usual characterization of surface-brightness in the target
selection pipeline of the SDSS is the mean surface brightness within
petroR50.

It turns out that the ratio of petroR50 to petroR90, the
so-called "inverse concentration index", is correlated with
morphology (Shimasaku et al. 2001, Strateva et al. 2001). Galaxies with a de
Vaucouleurs profile have an inverse concentration index of around 0.3;
exponential galaxies have an inverse concentration index of around
0.43. Thus, this parameter can be used as a simple morphological
classifier.

An important caveat when using these quantities is that they
are not corrected for seeing. This causes the surface
brightness to be underestimated, and the inverse concentration index
to be overestimated, for objects of size comparable to the PSF. The
amplitudes of these effects, however, are not yet well characterized.

Model Fit Likelihoods and Parameters
In addition to the model and PSF magnitudes,
the likelihoods deV_L, exp_L, and star_L are also
calculated by frames. These are the probabilities of achieving the
measured chi-squared for the deVaucouleurs, exponential, and PSF fits,
respectively. For instance, star_L is the probability that an object would have at least the measured value of chi-squared if it is really well represented by a PSF.
If one wishes to make use of a trinary scheme to
classify objects, calculation of the fractional likelihoods is recommended:

f(deV_L)=deV_L/[deV_L+exp_L+star_L]

and similarly for f(exp_L) and f(star_L).
A fractional likelihood greater than 0.5 for any of these three profiles
is generally a good threshold for object classification. This works
well in the range 18<r<21.5; at the bright end, the
likelihoods have a tendency to underflow to zero, which makes them less
useful. In particular, star_L is often zero for bright stars.
For future data releases we
will incorporate improvements to the model fits to give more
meaningful results at the bright end.

Ellipticities
The model fits yield an estimate of the axis ratio and position angle
of each object, but it is useful to have model-independent measures of
ellipticity. In the data released here, frames provides two further
measures of ellipticity, one based on second moments, the other based
on the ellipticity of a particular isophote. The model fits do
correctly account for the effect of the seeing, while the methods
presented here do not.

where a and b are the semi-major and semi-minor axes, and φ
is the position angle. Q and U are Q and U in
PhotoObj
and are referred to as "Stokes parameters." They can be used to
reconstruct the axis ratio and position angle, measured relative to row
and column of the CCDs. This is equivalent to the normal definition of
position angle (East of North), for the scans on the Equator.
The performance of the Stokes parameters are not ideal at
low S/N.
For future data releases, frames will also output variants
of the adaptive shape measures used in the weak lensing analysis of
Fischer et al. (2000), which are closer to optimal measures of shape for
small objects.

Isophotal Quantities
A second measure of ellipticity is given by measuring the ellipticity
of the 25 magnitudes per square arcsecond isophote (in all bands). In
detail, frames
measures the radius of a particular isophote as a function of angle
and Fourier expands this function. It then extracts from the
coefficients the centroid (isoRowC,isoColC), major and minor axis (isoA,isoB), position angle (isoPhi), and average
radius of
the isophote in question (Profile). Placeholders exist in the database for the errors
on each of these
quantities, but they are not currently calculated. It also reports the
derivative of each of these
quantities with respect to isophote level, necessary to recompute
these quantities if the photometric calibration changes.

Deblending Overlapping Objects

One of the jobs of the frames pipeline is to decide if an initial single detection is in fact a blend of multiple overlapping objects, and, if so, to separate, or deblend them. The deblending process is performed self-consistently across the bands (thus, all children have measurements in all bands). After deblending, the pipeline again measures the properties of these individual children.

Bright objects are measured at least twice: once with a global sky and no deblending run (this detection is flagged BRIGHT) and a second time with a local sky. They may also be measured more times if they are BLENDED and a CHILD.

Once objects are detected, they are deblended by identifying individual peaks within each object, merging the list of peaks across bands, and adaptively determining the profile of images associated with each peak, which sum to form the original image in each band. The originally detected object is
referred to as the "parent" object and has the flagBLENDED set if multiple peaks are detected; the
final set of subimages of which the parent consists are referred to as the "children" and have the
flag CHILD set. Note that all quantities in the photometric catalogs (currently in the tsObj files) are measured for both parent and
child. For each child object, the quantity parent gives the object id (object) of the parent (for parents themselves or isolated objects,7
this is set to the object id of the BRIGHT counterpart if that exists; otherwise it is set to -1); for each parent,
nchild gives the number of children an object has. Children are assigned the id numbers immediately
after the id of the parent. Thus, if an object with id 23 is set as BLENDED and has nchild equal to 2,
objects 24 and 25 will be set as CHILD and have parent equal to 23.

The list of peaks in the parent is trimmed to combine peaks (from different bands) that are too
close to each other (if this happens, the flag PEAKS_TOO_CLOSE is set in the parent). If there are
more than 25 peaks, only the most significant are kept, and the flag DEBLEND_TOO_MANY_PEAKS is
set in the parent.

In a number of situations, the deblender decides not to process a BLENDED object; in this case
the object is flagged as NODEBLEND. Most objects with EDGE set are not deblended. The exceptions
are when the object is large enough (larger than roughly an arcminute) that it will most likely not be
completely included in the adjacent scan line either; in this case, DEBLENDED_AT_EDGE is set, and
the deblender gives it its best shot. When an object is larger than half a frame,the deblender also
gives up, and the object is flagged as TOO_LARGE. Other intricacies of the deblending results are
recorded in flags described on the Object Flags section of the Flags page.

On average, about 15% - 20% of all detected objects are blended, and many
of these are superpositions of galaxies that the deblender successfully treats
by separating the images of the nearby objects. Thus, it is almost always the
childless (nChild=0, or !BLENDED ||
(BLENDED && NODEBLEND)) objects that are of most interest for science
applications. Occasionally, very large galaxies may be treated somewhat
improperly, but this is quite rare.

The behavior of the deblender of overlapping images has been further improved
since the DR1; these changes are most important for bright galaxies of large
angular extent (> 1 arcmin). In the EDR, and to a lesser extent in the DR1,
bright galaxies were occasionally "shredded" by the deblender, i.e.,
interpreted as two or more objects and taken apart. With improvements in the
code that finds the center of large galaxies in the presence of superposed
stars, and the deblending of stars superposed on galaxies, this shredding now
rarely happens. Indeed, inspections of several hundred NGC galaxies shows that
the deblend is correct in 95% of the cases; most of the exceptions are
irregular galaxies of various sorts.

Reddening and Extinction Corrections

Reddening corrections in magnitudes at the position of each object,
extinction, are computed following Schlegel, Finkbeiner & Davis (1998). These
corrections are not applied to the magnitudes ugriz in the
databases. If you want corrected magnitudes, you should use dered_[ugriz]; these are the extinction-corrected model magnitudes. All other magnitudes must have the correction applied by hand or as part of your SQL query.
Conversions from E(B-V) to total extinction Alambda, assuming a
z=0 elliptical galaxy spectral energy distribution, are
tabulated in Table 22 of the EDR Paper.

Image processing flags

For objects in the calibrated object lists, the
photometric pipeline sets a number of flags that indicate the status
of each object, warn of possible problems with the image itself, and
warn of possible problems in the measurement of various quantities
associated with the object. For yet more details, refer to Robert Lupton's
flags document.

The status flags, called status in the
PhotoObjAll table, with information needed to discount
duplicate detections of the same object in the catalog.

The object flags, called flags in the
PhotoObjAll table, with information
about the success of measuring the object's location, flux, or
morphology.

The "status" of an object

The catalogs contain multiple detections of objects from
overlapping CCD frames. For most applications, remove duplicate
detections of the same objects by considering only those which have
the "primary" flag set in the status entry of the
PhotoObjAll table and its Views.

A description of status is provided on the details page. The details of
determining primary status and of the remaining flags stored in
status are found on the algorithms page describing the
resolution of overlaps (resolve).

Object "flags"

The photometric pipeline's flags describe how certain measurements
were performed for each object, and which measurements are considered
unreliable or have failed altogether. You must interpret the
flags correctly to obtain meaningful results.

For each object, there are 59 flags stored as bit fields in a single 64-bit
table column called flags in the PhotoObjAll table (and its Views). There are two versions of the flag variable for each
object:

Individual flags for each filter u, g, r, i, z. These are called flags_u, etc.

A single combination of the per-filter flags appropriate for the whole
object, called flags.

Note: This differs from the tsObj files in the DAS, where the individual filter flags are stored as vectors in two separate 32-bit columns called flags and flags2, and the overall flags are stored in a scalar called objc_flags.

Here we describe which flags should be checked for which measurements,
including whether you need to look at the flag in each filter, or at
the general flags.

Recommendations

Clean sample of point sources

In a given band, first select objects with PRIMARY status and
apply the SDSS star-galaxy separation. Then, define the following
meta-flags:

If you are very picky, you probably will want not to include the
NODEBLEND objects. Note that selecting
PRIMARY objects implies !BRIGHT && (!BLENDED ||
NODEBLEND || nchild == 0)

These are used in the SDSS quasar target selection code which is
quite sensitive to outliers in the stellar locus.
If you want to select very rare outliers in color space, especially
single-band detections, add cuts to MAYBE_CR and
MAYBE_EGHOST to the above list.

Clean sample of galaxies

As for point sources, but don't cut on EDGE (large galaxies
often run into the edge). Also, you may not need to worry about the
INTERP problems. The BRIGHTEST_GALAXY_CHILD may be
useful if you are looking at bright galaxies; it needs further
testing.

If you want to select (or reject against) moving objects
(asteroids), cut on the DEBLENDED_AS_MOVING flag, and then cut
on the motion itself. See the the
SDSS Moving Objects Catalog for more details. An interesting
experiment is to remove the restriction on the
DEBLENDED_AS_MOVING flag to find objects with very small proper
motion (i.e., those beyond Saturn).

Descriptions of all flags

Flags that affect the object's status

These flags must be considered to reject duplicate catalog
entries of the same object. By using only objects with
PRIMARY status (see above), you automatically
account for the most common cases: those objects which are BRIGHT, or
which have been deblended (decomposed) into one or more child objects
which are listed individually.

In the tables, Flag names link to detailed descriptions. The "In Obj Flags?"
column indicates that this flag will be set in the general (per object) "flags" column if this flag is set in any of the
filters. "Bit" is the number of the bit.

object was extended and centroid was determined on 2x2 binned
frame. Avoid for astrometric work, e.g.

The fiber magnitude

The flux contained within the aperture of a spectroscopic fiber
(3" in diameter) is calculated in each band and stored in
fiberMag.

Notes:
-For children of deblended galaxies, some of
the pixels within a 1.5" radius may belong to other children; we now measure the flux of the parent at the position of the child; this properly reflects the amount of light which the spectrograph will see. This was not true in the EDR.
-Images are now convolved to 2" seeing before fiberMags are measured. This also makes the fiber magnitudes closer to what is seen by the spectrograph. This was not true in the EDR.

The model magnitude

Important Note for EDR and DR1 data ONLY:Comparing the model
(i.e., exponential and de Vaucouleurs fits) and Petrosian magnitudes of bright
galaxies in EDR and DR1 data shows a systematic offset of about 0.2 magnitudes
(in the sense that the model magnitudes are brighter). This turns out to be
due to a bug in the way the PSF was convolved with the models (this bug
affected the model magnitudes even when they were fit only to the central 4.4"
radius of each object). This caused problems for very small objects (i.e.,
close to being unresolved). The code forces model and PSF magnitudes of
unresolved objects to be the same in the mean by application of an aperture
correction, which then gets applied to all objects. The net result is that
the model magnitudes are fine for unresolved objects, but systematically
offset for galaxies brighter than at least 20th mag. Therefore, model
magnitudes should NOT be used in EDR and DR1 data. This problem has
been corrected as of DR2.

Just as the PSF magnitudes are optimal measures of the fluxes of
stars, the optimal measure of the flux of a galaxy would use a matched galaxy
model. With this in mind, the code fits two models to the
two-dimensional image of each object in each band:

1. a pure deVaucouleurs profile:I(r) = I0exp{-7.67[(r/re)1/4]}
(truncated beyond 7re to smoothly go to zero at 8re, and with some softening within r=re/50.

Each model has an arbitrary axis ratio and position angle. Although for large
objects it is possible and even desirable to fit more complicated
models (e.g., bulge plus disk), the computational expense to compute
them is not justified for the majority of the detected
objects. The models are convolved
with a double-Gaussian fit to the PSF, which is provided by psp.
Residuals between the
double-Gaussian and the full KL PSF model are added on for just the
central PSF component of the image.

These fitting procedures yield the quantities

r_deV and
r_exp,
the effective radii of the models;

ab_deV and
ab_exp,
the axis ratio of the best fit models;

phi_deV
and phi_exp,
the position angles of the ellipticity (in degrees East of North).

deV_L and
exp_L,
the likelihoods associated with each model from the chi-squared fit;

deVMag and
expMag,
the total magnitudes associated with each fit.

Note that these quantities correctly model the effects of the PSF.
Errors for each of the last two quantities (which
are based only on photon statistics) are also reported. We apply
aperture corrections to make these model magnitudes equal the PSF
magnitudes in the case of an unresolved object.

In order to measure unbiased colors of galaxies, we measure
their flux through equivalent apertures in all bands.
We choose the model (exponential or
deVaucouleurs) of higher likelihood in the r filter, and apply that model
(i.e., allowing only the amplitude to vary) in the other bands
after convolving with the appropriate PSF in each band. The
resulting magnitudes are termed
modelMag.
The resulting estimate of galaxy color will be unbiased in the absence
of color gradients. Systematic differences from Petrosian colors are
in fact often seen due to color gradients, in which case the concept
of a global galaxy color is somewhat ambiguous. For faint galaxies,
the model colors have appreciably higher signal-to-noise ratio than do
the Petrosian colors.

Due to the way in which model fits are carried out, there is some weak
discretization of model parameters, especially
r_exp and
r_deV. This is yet to be fixed. Two other issues (negative axis ratios, and bad model mags for bright objects) have been fixed since the EDR.

Caveat: At bright magnitudes (r <~ 18), model magnitudes
may not be a robust means to select objects by flux.
For example, model magnitudes in target
and best
imaging may often differ significantly because
a different type of profile (deVaucouleurs or exponential) was
deemed the better fit in target vs. best.
Instead, to select samples by flux, one should typically use
Petrosian magnitudes
for galaxies and psf magnitudes
for stars and distant quasars. However, model colors
are in general robust and may be used to select galaxy samples by color.
Please also refer to the SDSS
target selection algorithms for examples.

The Petrosian magnitude

Stored as petroMag. For galaxy photometry, measuring flux is more difficult than for
stars, because galaxies do not all have the same radial surface
brightness profile, and have no sharp edges. In order to avoid
biases, we wish to measure a constant fraction of the total light,
independent of the position and distance of the object. To satisfy these
requirements, the SDSS has adopted a modified form of the
Petrosian (1976) system, measuring galaxy fluxes within a circular
aperture whose radius is defined by the shape of the azimuthally
averaged light profile.

We define the "Petrosian ratio" RP at a radius
r from
the center of an object to be the ratio of the local surface
brightness in an annulus at r to the mean surface brightness within
r, as described by Blanton et al. 2001a, Yasuda et al. 2001:

where I(r) is the azimuthally averaged surface brightness profile.

The Petrosian radius rP is defined as the radius
at which
RP(rP) equals some specified value
RP,lim, set to 0.2 in our case. The
Petrosian flux in any band is then defined as the flux within a
certain number NP (equal to 2.0 in our case) of
r Petrosian radii:

In the SDSS five-band photometry, the aperture in all bands is set by
the profile of the galaxy in the r band alone. This procedure
ensures that the color measured by comparing the Petrosian flux
FP in different bands is measured through a
consistent aperture.

The aperture 2rP is large enough to contain nearly all of
the flux for typical galaxy profiles, but small enough that the sky noise in
FP is small. Thus, even substantial errors in
rP cause only
small errors in the Petrosian flux (typical statistical errors near
the spectroscopic flux limit of r ~17.7 are < 5%),
although these errors are correlated.

The Petrosian radius in each band is the parameter petroRad, and
the Petrosian magnitude in each band (calculated, remember, using only
petroRad for the r band) is the parameter petroMag.

In practice, there are a number of complications associated with this
definition, because noise, substructure, and the finite size of
objects can cause objects to have no Petrosian radius, or more than
one. Those with more than one are flagged as MANYPETRO; the
largest one is used.
Those with none have NOPETRO set. Most commonly, these objects
are faint (r > 20.5 or so); the
Petrosian ratio becomes unmeasurable before dropping to the limiting
value of 0.2;
these have PETROFAINT set and have
their "Petrosian radii" set to the default value of the larger
of 3" or the outermost measured point in the radial profile.
Finally, a galaxy with a bright stellar nucleus, such as a Seyfert
galaxy, can have a Petrosian radius set by the nucleus alone; in this
case, the Petrosian flux misses most of the extended light of the
object. This happens quite rarely, but one dramatic example in the
EDR data is the Seyfert galaxy NGC 7603 = Arp 092, at RA(2000) =
23:18:56.6, Dec(2000) = +00:14:38.

How well does the Petrosian magnitude perform as a reliable and
complete measure of galaxy flux? Theoretically, the Petrosian
magnitudes defined here should recover essentially all of the flux of
an exponential galaxy profile and about 80% of the flux for a de
Vaucouleurs profile. As shown by Blanton et al. (2001a), this fraction is
fairly constant with axis ratio, while as galaxies become smaller (due
to worse seeing or greater distance) the fraction of light recovered
becomes closer to that fraction measured for a typical PSF, about 95%
in the case of the SDSS. This implies that the fraction of flux
measured for exponential profiles decreases while the fraction of flux
measured for deVaucouleurs profiles increases as a function of
distance. However, for galaxies in the spectroscopic sample
(r<17.7), these effects are small;
the Petrosian radius measured by frames is extraordinarily constant
in physical size as a function of redshift.

The PSF magnitude

Stored as psfMag. For isolated stars, which are well-described by the point spread function
(PSF), the optimal
measure of the total flux is determined by fitting a PSF model to the
object. In practice, we do this by sync-shifting the image of a star
so that it is exactly centered on a pixel, and then fitting a Gaussian
model of the PSF to it. This fit is carried out on the local PSF KL
model at each position as well; the difference
between the two is then a local aperture correction, which gives a
corrected PSF magnitude. Finally, we use bright stars to determine a
further aperture correction to a radius of 7.4" as a function of
seeing, and apply this to each frame based on its seeing. This involved
procedure is necessary to take into account the full variation of the
PSF across the field,
including the low signal-to-noise ratio wings. Empirically, this
reduces the seeing-dependence of the photometry to below 0.02 mag for
seeing as poor as 2". The resulting magnitude is stored in the
quantity psfMag. The flagPSF_FLUX_INTERP warns that the PSF photometry might be suspect.
The flag BAD_COUNTS_ERROR warns that because of interpolated
pixels, the error may be under-estimated.

Match and MatchHead Tables

Jim Gray, Alex Szalay, Robert Lupton, Jeff Munn

May 2003, revised January, May, June, July, December 2004

The SDSS data can be used for temporal studies of objects that are re-observed
at different times. The SDSS survey observes about 10% of the Northern survey
area 2 or more times, and observes the Southern stripe more than a dozen
times.

The match table is intended
to make temporal queries easy by providing a precomputed list of all objects that
were observed multiple times. More formally,

Match = { (ObjID1,ObjID2) | Objid1 and ObjID2 are both from different
runs (i.e. different observations).

And they are within 1 arcsecond of one another

And are both good (star or galaxy or unknown)

And are both fully deblended (no children)

And they are primary or secondary (not family or outside) }

But, as always, there are complications.

Green, Yellow, Red: What if ObjID2 in Run2 is missing?
It could be missing because it was not seen or because it was masked. What
about the "edge cases" at the edge of run1 or of run2. Perhaps it is
an edge object. We color-code these edge cases as "yellow" objects and the
masked objects as "red" objects. And of course, we flag the missing but
matched objects as "green".

Surrogate: When an object is missing in Run2, what do we put in
the match table?
We could fabricate ObjID2, or we could find the closest ObjID2 in Run2 and
just flag it and record the distance to it (which will be more than 1
arcsecond).

Computing the Match table

The Match table is computed by using the Neighbors table and has a very
similar schema (the Neighbors table only stores mode (1,2) (aka
primary/secondary) and type (3,5,6) (aka galaxy, unknown, star) objects.

One arcsecond is a large error in Sloan Positioning - the vast majority is
within 0.5 arcsecond (95%). But a particular cluster may not form a
complete graph (all members connected to all others). To make the graph fully
transitive, we repeatedly execute the query to add the "curved" arcs in Figure
1. Notice that that figure shows two objects observed in four runs, and that
the two objects are observed only once in the middle two runs. The whole
collection is closed to make a "bundle" that will have a matchHead object (the
smallest objID of the bundle).

Computing the MatchHead table

Now each cluster of objects in the Match table is fully connected. We can
name the clusters in the Match table by the minimum (non zero) objID in the
cluster and can compute the MatchHead table that describes the global
properties of the cluster: its name, its average RA and DEC and the variance
in RA, DEC.

Figure 2: Green area of A
must match B, yellow area may match B.Red areas of B (masks) have a
good reason for the A object to have a missing object. In these
missing cases, we pick the "nearest" object in B to be a surrogate
for A in the match table.

Figure 3: A green surrogate for
the right 4 objects and a red (masked) surrogate for the
leftmost one.

The number missing from the cluster is computed in the next section.

Matching the Missing Objects

There may be an object in camcol A that should have matching objects in an
overlapping camcol B (see figure 2). In particular, any object in the green
part of A should have a matching object in B (in Figure 2). Objects in
A that are near the edge of B (10 pixels ~4 arc seconds = the yellow part of
B) may have matching objects in B.In some cases the B area is masked
(red) and that explains why there is not a match.

If a "green" A object does not match a B object then either the object is
moving or variable or masked. We can check the masks to see if the (A.ra,
A.dec) is masked in B. If not, we assume that A is just "missing."

Similarly, if a "yellow" A object does not match a B object, then either the
object is moving or variable or masked or the edge effects caused the object
to be missing.In these edge cases we check to see if (A.ra, A.dec) is masked
in B, if not we call the object missing-edge.

So, missing objects come in 3 varieties:

Object

Color

Flag

Hit

0

Missing

Green

1

Missing Edge

Yellow

2

Masked

Red

3

In each of these cases we create a match object as the closest object in B to
A and Match.flag is set to Green, Red or Yellow.These "fake" objects do not
contribute to the cluster average or variance or centroid.

We add this object to A's cluster (along with all the edges), and we increment
the cluster miss count by the number of records we add to the cluster.

Figure 4: Graph showing distance
vs frequency of missies of various colors. This is data from a small
sample of the BestDR1 database.

The logic for computing missing objects is as follows.

For each RunA in the Regions table.
For each RunB overlapping RunA other than RunA
Let R_ABYellow be the Region RunAÇ((runB+ ε))
Find (x,y) where x and y in R_ABYellow and run.x = RunA and run.y = RunB
And x not in match and y the closest object in RunB.
If the position of x is in runB - ε we mark the pair as green.
If the position of x is masked in RunB then we mark the pair as red.
We now add the (x,y) pairs to the match table and
If x is not in match, we add x to matchhead with a miss count of 1
If x is in match, we propagate x.matchHead to this match entry
and increment the matchHead miss count

The actual code is a little more complex (about 700 lines of SQL). In the
personal SkyServerDR1 there are about 20,000 matches and 10,000 object misses,
so it seems that the misses will make an interesting study.

Figure 5: Statistics from full
DR1 dataset showing distribution of miss distances. Red objects are
not being computed. On BestDR1, the statistics are:

Match

12,431,518

Green

5,446,930

Yellow

912,571

Red

??

The results of this are that a bundle can have dangling pointers to these
surrogate objects.Figure 3 shows the diagram of Figure 1 where a fifth
overlapping run has been added. The leftmost object is masked in this new run
and so we find a surrogate "red" object for it. The other objects are also
have no match in this run but are not masked and are closest to the green
(right) object in the figure.

It takes 4 minutes to compute on the personal SkyServer DR1, It will take a
bit longer on the thousand times larger Dr2, but ...

As per Robert's request, surrogate match objects are found rather than
invented. Sometimes we have to look far away for them (500 arcseconds in some
cases).

Misses are painted Yellow (near the edge), Red (masked), and Green (well
inside the overlap). Most misses are Green.

color

count

Y

3,516

R

58

G

15,264

The graphs of distances are shown in Figure 4.

SDSS ObjID Encoding

The bit encoding for the long (64-bit) IDs that are used as unique keys in the
SDSS catalog tables is described here.

PhotoObjID

The encoding of the photometric object long ID (objID in the photo
tables) is described in the table below. This scheme applies to the fieldID
and objID (objid bits are 0 for fieldID).

Bits

Length(# of bits)

Mask

Assignment

Description

0

1

0x8000000000000000

empty

unassigned

1-4

4

0x7800000000000000

skyVersion

resolved sky version (0=TARGET, 1=BEST, 2-15=RUNS)

5-15

11

0x07FF000000000000

rerun

number of pipeline rerun

16-31

16

0x0000FFFF00000000

run

run number

32-34

3

0x00000000E0000000

camcol

camera column (1-6)

35

1

0x0000000010000000

firstField

is this the first field in segment?

36-47

12

0x000000000FFF0000

field

field number within run

48-63

16

0x000000000000FFFF

object

object number within field

SpecObjID

The encoding of the long ID for spectroscopic objects is described below.
This applies to plateID, specObjID, specLineID, specLineIndexID, elRedshiftID
and xcRedshiftID.

Bits

Length(# of bits)

Mask

Assignment

Description

0-15

16

0xFFFF000000000000

plate

number of spectroscopic plate

16-31

16

0x0000FFFF00000000

MJD

MJD (date) plate was observed

32-41

10

0x00000000FFC00000

fiberID

number of spectroscopic fiber on plate (1-640)

42-47

6

0x00000000003F0000

type

type of targeted object

48-63

16

0x000000000000FFFF

line/redshift/index

0 for SpecObj, else number of spectroscopic line (SpecLine) or index (SpecLineIndex) or redshift (ELRedhsift or XCRedshift)

Photometric Flux Calibration

The objective of the photometric calibration process is to tie the
SDSS imaging data to an AB magnitude system, and specifically to the
"natural system" of the 2.5m telescope defined by the
photon-weighted effective wavelengths of each combination of SDSS
filter, CCD response, telescope transmission, and atmospheric
transmission at a reference airmass of 1.3 as measured at APO.

The calibration process ultimately involves combining data from three
telescopes: the USNO 40-in on which our primary
standards were first measured, the
SDSS Photometric Telescope (or PT) , and the SDSS 2.5m telescope.
At the beginning of the survey it was expected that there would be a
single u'g'r'i'z' system. However, in the course of processing the
SDSS data, the unpleasant discovery was made that the filters in the
2.5m telescope have significantly different effective wavelengths from
the filters in the PT and at the USNO. These differences have been
traced to the fact that the short-pass interference films on the
2.5-meter camera live in the same vacuum as the detectors, and the
resulting dehydration of the films decreases their effective
refractive index. This results in blueward shifts of the red edges of
the filters by about 2.5 percent of the cutoff
wavelength, and consequent shifts of the effective
wavelengths of order half that. The USNO filters are in ambient air,
and the hydration of the films exhibits small temperature shifts; the
PT filters are kept in stable very dry air and are in a condition
about halfway between ambient and the very stable vacuum state. The
rather subtle differences between these systems are describable by
simple linear transformations with small color terms for stars of
not-too-extreme color, but of course cannot be so transformed for very
cool objects or objects with complex spectra. Since standardization is
done with stars, this is not a fundamental problem, once the
transformations are well understood.

It is these subtle issues that gave rise to our somewhat awkward
nomenclature for the different magnitude systems:

Previous reductions of the data, including that used in the EDR,
were based on inconsistent photometric equations; this is why we
referred to the 2.5m photometry with asterisks: u*g*r*i*z*. With the
DR1, the photometric equations are properly self-consistent, and we
can now remove the stars, and refer to u g r i z photometry with the
2.5m.

Overview of the Photometric Calibration in SDSS

The photometric calibration of the SDSS imaging data is a multi-step
process, due to the fact that the images from the 2.5m telescope
saturate at approximately r = 14, fainter than typical
spectrophotometric standards, combined with the fact that observing
efficiency would be greatly impacted if the 2.5m needed to interrupt
its routine scanning in order to observe separate calibration fields.

The first step involved setting up a primary standard
star network of 158 stars distributed around the Northern sky.
These stars were selected from a variety of sources and span a range
in color, airmass, and right ascension. They were observed repeatedly
over a period of two years using the US Naval Observatory 40-in
telescope located in Flagstaff, Arizona. These observations are tied
to an absolute flux system by the single F0 subdwarf star BD+17_4708,
whose absolute fluxes in SDSS filters are taken from
Fukugita et al. 1996 As noted above, the photometric system
defined by these stars is called the u'g'r'i'z' system. You
can look at the table containing
the calibrated magnitudes for these standard stars.

Most of these primary standards have brightnesses in the range r = 8 -
13, and would saturate the 2.5-meter telescope's imaging camera in
normal operations. Therefore, a set of 1520 41.5x41.5
arcmin2 transfer fields, called secondary patches,
have been positioned throughout the survey area. These secondary
patches are observed with the PT; their size is set by the field of
view of the PT camera. These secondary patches are grouped into sets
of four. Each set spans the full set of 12 scan lines of a survey
stripe along the width of the stripe, and the sets are spaced along
the length of a stripe at roughly 15 degree intervals. The patches
are observed by the PT in parallel with observations of the primary
standards and processed using the Monitor Telescope Pipeline (mtpipe).
The patches are first calibrated to the USNO 40-in
u'g'r'i'z' system and then transformed to the 2.5m
ugriz system; both initial calibration to the
u'g'r'i'z' system and the transformation to the ugriz
system occur within mtpipe. The ugriz-calibrated patches
are then used to calibrate the 2.5-meter's imaging data via the Final
Calibrations Pipeline (nfcalib).

Monitor Telescope Pipeline

The PT has two main functions: it measures the atmospheric extinction
on each clear night based on observations of primary standards at a
variety of airmasses, and it calibrates secondary patches in order to
determine the photometric zeropoint of the 2.5m imaging scans. The
extinction must be measured on each night the 2.5m is scanning, but
the corresponding secondary patches can be observed on any photometric
night, and need not be coincident with the image scans that they will
calibrate.

The Monitor Telescope Pipeline (mtpipe), so called for historical
reasons, processes the PT data. It performs three basic functions:

it bias subtracts and flatfields the images, and performs
aperture photometry;

it identifies primary standards in the primary standard
star fields and computes a transformation from the
aperture photometry to the primary standard star u'g'r'i'z' system;

it applies the photometric solution to the stars in the
secondary patch fields, yielding u'g'r'i'z'-calibrated
patch star magnitudes, and then transforms these u'g'r'i'z'
magnitudes into the SDSS 2.5m ugriz system.

The Final Calibration Pipeline

The final calibration pipeline (nfcalib) works much like mtpipe,
computing the transformation between psf photometry (or other
photometry) as observed by the 2.5m telescope and the final SDSS
photometric system. The pipeline matches stars between a camera
column of 2.5m data and an overlapping secondary patch. Each camera
column of 2.5m data is calibrated individually. There are of order
100 stars in each patch in the appropriate color and magnitude range
in the overlap.

The transformation equations are a simplified form of those used by mtpipe.
Since mtpipe delivers patch stars already calibrated to the
2.5m ugriz system, the nfcalib transformation equations have the following
form:
mfilter_inst(2.5m) = mfilter(patch) + afilter + kfilterX,
where, for a given filter, mfilter_inst(2.5m) is the
instrumental magnitude of the star in the 2.5m data [-2.5 log10(counts/exptime)],
mfilter(patch) is the magnitude of the same star in
the PT secondary patch, afilter is the photometric
zeropoint, kfilter is the first-order extinction
coefficient, and X is the airmass of the 2.5m observation. The
extinction coefficient is taken from PT observations on the same
night, linearly interpolated in time when multiple extinction
determinations are available. (Generally, however, mtpipe calculates
only a single kfilter per filter per night, so
linear interpolation is usually unnecessary.) A single zeropoint
afilter is computed for each filter from stars
on all patches that overlap a given CCD in a given run. Observations
are weighted by their estimated errors, and sigma-clipping is used to
reject outliers. At one time it was thought that a time dependent
zero point might be needed to account for the fact that the 2.5m
camera and corrector lenses rotate relative to the telescope mirrors
and optical structure; however, it now appears that any variations in
throughput are small compared to inherent fluctuations in the
calibration of the patches themselves. The statistical error in the
zeropoint is usually constrained to be less than 1.35 percent
in u and z and 0.9 percent in gri.

Assessment of Photometric Calibration

With Data Release 1 (DR1), we now routinely meet our requirements of
photometric uniformity of 2% in r, g-r, and r-i and of 3% in u-g and
i-z (rms).

This is a substantial improvement over the photometric uniformity
achieved in the Early Data Release (EDR), where the corresponding
values were approximately 5% in r, g-r, and r-i and 5% in u-g and i-z.

The improvements between the photometric calibration of the EDR and
the DR1 can be traced primarily to the use of more robust and
consistent photometric
equations by mtpipe and nfcalib and to improvements to the PSF-fitting algorithm and flatfield methodology in the Photometric Pipeline (photo).

Note that this photometric uniformity is measured based upon
relatively bright stars which are no redder than M0; hence, these
measures do not include effects of the
u band red leak (see caveats below) or the
model magnitude bug.

How to go from Counts in the fpC file to Calibrated ugriz magnitudes?

Asinh and Pogson magnitudes

All calibrated magnitudes in the photometric catalogs are
given not as conventional Pogson astronomical
magnitudes, but as asinh
magnitudes. We show how to obtain both kinds of magnitudes from
observed count rates and vice versa. See further down for conversion of SDSS magnitudes to physical fluxes.
For both kinds of magnitudes, there are two ways to obtain the
zeropoint information for the conversion.

A little slower, but gives the final calibration and works
for all data releases

Here you first need the following information from the tsField
files:

aa = zeropoint
kk = extinction coefficient
airmass

To get a calibrated magnitude, you first need to determine the
extinction-corrected ratio of the observed count rate to the
zero-point count rate:

Convert the observed
number of counts to a count rate using the exposure time exptime
= 53.907456 sec,

correct counts for atmospheric extinction using the
extinction coefficient kk and the
airmass, and

divide by the zero-point count rate, which is given by
f0 = 10-0.4*aaboth for asinh and conventional
magnitudes.

In a single step,

f/f0 = counts/exptime * 100.4*(aa + kk *
airmass)

Then, calculate either the conventional ("Pogson") or the SDSS
asinh magnitude from f/f0:

Pogson

mag = -2.5 * log10(f/f0)

asinh

mag = -(2.5/ln10)*[asinh((f/f0)/2b)+ln(b)],
where b is the softening parameter
for the photometric band in question and is given in the
table of b
coefficients below.

asinh Softening Parameters (b coefficients)

Band

b

Zero-Flux Magnitude [m(f/f0 = 0)]

m(f/f0 = 10b)

u

1.4 × 10-10

24.63

22.12

g

0.9 × 10-10

25.11

22.60

r

1.2 × 10-10

24.80

22.29

i

1.8 × 10-10

24.36

21.85

z

7.4 × 10-10

22.83

20.32

Note: These values of the softening
parameter b are set to be approximately 1-sigma of the sky
noise; thus, only low signal-to-noise ratio measurements are affected
by the difference between asinh and Pogson magnitudes. The final
column gives the asinh magnitude associated with an object for which
f/f0 = 10b; the difference between
Pogson and asinh magnitudes is less than 1% for objects brighter than
this.

The calibrated asinh magnitudes are given in the tsObj
files. To obtain counts from an asinh magnitude, you first need to
work out f/f0 by inverting the asinh
relation above. You can then determine the number of counts from
f/f0 using the zero-point, extinction
coefficient, airmass, and exposure time.

The equations above are exact for DR1. Strictly speaking, for
EDR photometry, the corrected counts should include a color term
cc*(color-color0)*(X-X0) (cf.
equation 15 in section 4.5 in the EDR paper), but it turns out
that generally, cc*(color-color0)*(X-X0) < 0.01 mag and the
color term can be neglected. Hence the calibration looks
identical for EDR and DR1.

Faster magnitudes via "flux20"

The "flux20" keyword in the header of the corrected frames
(fpC files) approximately gives the net number of
counts for a 20th mag object. So instead of using the zeropoint
and airmass correction term from the tsField file,
you can determine the corrected zero-point flux as

f/f0 = counts/(exptime * 10-8
* flux20)

Then proceed with the calculation of a magnitude from
f/f0 as above.

The relation is only approximate because the final calibration
information (provided by nfcalib) is not available at the
time the corrected frames are generated. We expect the error
here (compared to the final calibrated magnitude) to be of order
0.1 mag or so, as estimated from a couple of test cases we have
tried out.

Note the counts measured by photo for each object are given
in the fpObjc files, as e.g., "psfcounts", "petrocounts", etc.

On a related note, in DR1 one can also use relations
similar to the above to estimate the sky level in magnitudes per
sq. arcsec (1 pixel = 0.396 arcsec). Either use the header keyword
"sky" in the fpC files, or remember to first subtract "softbias" (=
1000) from the raw background counts in the fpC files. Note the sky
level is also given in the tsField files. This note only
applies to the DR1 and later data releases. Note also that the calibrated
sky brightnesses reported in the tsField values have been
corrected for atmospheric extinction.

Computing errors on counts (converting counts to photo-electrons)

The fpC (corrected frames) and fpObjc (object tables with counts for each
object instead of magnitudes) files report counts (or "data numbers",
DN). However, it is the number of photo-electrons which is really counted by
the CCD detectors and which therefore obeys Poisson statistics. The number of
photo-electrons is related to the number of counts through the gain (which is
really an inverse gain):

photo-electrons = counts * gain

The gain is reported in the headers of the tsField and fpAtlas files (and
hence also in the field table in the CAS). The total noise contributed by dark
current and read noise (in units of DN2) is also reported in the tsField files
in header keyword dark_variance (and correspondingly as darkVariance in the
field table in the CAS), and also as dark_var in the fpAtlas header.

Thus, the error in DN is given by the following expression:

error(counts) = sqrt([counts+sky]/gain + Npix*dark_variance),

where counts is the number of object counts, sky is the number of sky counts
summed over the same area as the object counts, Npix is the area covered by
the object in pixels, and gain and dark_variance are the numbers from the
corresponding tsField files.

Conversion from SDSS ugriz magnitudes
to AB ugriz magnitudes

The SDSS photometry is intended to be on the AB system (Oke
& Gunn 1983), by which a magnitude 0 object should have the same
counts as a source of Fnu = 3631
Jy. However, this is known not to be exactly true, such that the photometric
zeropoints are slightly off the AB standard. We continue to work to pin down
these shifts. Our present estimate, based on comparison to the STIS standards
of Bohlin,
Dickinson, & Calzetti~(2001) and confirmed by SDSS photometry and
spectroscopy of fainter hot white dwarfs, is that the u band
zeropoint is in error by 0.04 mag, uAB =
uSDSS - 0.04 mag, and that g, r, and
i are close to AB. These statements are certainly not precise to
better than 0.01 mag; in addition, they depend critically on the system
response of the SDSS 2.5-meter, which was measured by Doi et al. (2004, in
preparation). The z band zeropoint is not as certain at this time,
but there is mild evidence that it may be shifted by about 0.02 mag in the
sense zAB = zSDSS + 0.02 mag. The
large shift in the u band was expected because the adopted
magnitude of the SDSS standard BD+17 in Fukugita
et al.(1996) was computed at zero airmass, thereby making the assumed
u response bluer than that of the USNO system response.

We
intend to give a fuller report on the SDSS zeropoints, with uncertainties, in
the near future. Note that our relative photometry is quite a bit
better than these numbers would imply; repeat observations show that our
calibrations are better than 2%.

Conversion from SDSS ugriz
magnitudes to physical fluxes

As explained in the preceding section, the SDSS system is nearly an
AB system. Assuming you know the correction from
SDSS zeropoints to AB zeropoints (see above), you can turn the AB
magnitudes into a flux density using the AB zeropoint flux
density. The AB system is defined such that every filter has a
zero-point flux density of 3631 Jy (1 Jy = 1 Jansky = 10-26
W Hz-1 m-2 = 10-23 erg s-1
Hz-1 cm-2).

To obtain a flux density from SDSS data, you need to work out
f/f0 (e.g. from the asinh magnitudes in
the tsObj files by using the inverse of the
relations given above). This number is
then the also the object's flux density, expressed as fraction of the
AB zeropoint flux density. Therefore, the conversion to flux
density is

Transformation Equations Between SDSS magnitudes and UBVRcIc

Improved photometric calibration ("Übercal")

Ubercal is an algorithm to photometrically calibrate wide field
optical imaging surveys, that simultaneously solves for the
calibration parameters and relative stellar fluxes using overlapping
observations. The algorithm decouples the problem of relative
calibrations from that of absolute calibrations; the
absolute calibration is reduced to determining a few numbers for the
entire survey. We pay special attention to the spatial structure of
the calibration errors, allowing one to isolate particular error modes
in downstream analyses. Applying this to the Sloan Digital Sky Survey
imaging data, we achieve ~1% relative calibration errors across 8500
sq.deg. in griz; the errors are ~2% for the u band. These errors are
dominated by unmodelled atmospheric variations at Apache Point
Observatory. For a detailed description of ubercal, please see the Ubercal
paper (Padmanabhan et al. 2007, ApJ submitted [astro-ph/0703454]).

This improved calibration is available only through the
Ubercal table.

The u filter has a natural red leak around 7100 Å
which is supposed to be blocked by an interference coating. However,
under the vacuum in the camera, the wavelength cutoff of the
interference coating has shifted redward (see the discussion in the
EDR paper), allowing some of this red leak through. The extent of
this contamination is different for each camera column. It is not
completely clear if the effect is deterministic; there is some
evidence that it is variable from one run to another with very similar
conditions in a given camera column. Roughly speaking, however, this
is a 0.02 magnitude effect in the u magnitudes for mid-K
stars (and galaxies of similar color), increasing to 0.06 magnitude
for M0 stars (r-i ~ 0.5), 0.2 magnitude at r-i ~
1.2, and 0.3 magnitude at r-i = 1.5. There is a large
dispersion in the red leak for the redder stars, caused by three
effects:

The differences in the detailed red
leak response from column to column, beating with the complex red
spectra of these objects.

The almost certain time variability of the red leak.

The red-leak images on the u chips are out of focus and are
not centered at the same place as the u image because of
lateral color in the optics and differential refraction - this means
that the fraction of the red-leak flux recovered by the PSF fitting
depends on the amount of centroid displacement.

To make matters even more complicated, this is a detector
effect. This means that it is not the real i and
z which drive the excess, but the instrumental colors
(i.e., including the effects of atmospheric extinction), so the leak
is worse at high airmass, when the true ultraviolet flux is heavily
absorbed but the infrared flux is relatively unaffected. Given these
complications, we cannot recommend a specific correction to the
u-band magnitudes of red stars, and warn the user of these
data about over-interpreting results on colors involving the
u band for stars later than K.

Photometric Redshifts

There are no photometic redshifts available for data releases 2 through 4
(DR2-DR4). Starting with DR5, there are two versions of photometric redshift
in the SDSS databases, in the Photoz and
Photoz2 tables respectively. The
algorithms for generating these are described below.

Photoz Table

This set of photometric redshift has been obtained with the template
fitting method. Please also see
this link for more detailed information about this method..

The template fitting approach simply compares the expected colors of a
galaxy (derived from template spectral energy distributions) with those
observed for an individual galaxy. The standard scenario for template
fitting is to take a small number of spectral templates T (e.g., E, Sbc,
Scd, and Irr galaxies) and choose the best fit by optimizing the
likelihood of the fit as a function of redshift, type, and luminosity p(z,
T, L). Variations on this approach have been developed in the last few
decades, including ones that use a continuous distribution of spectral
templates, enabling the error function in redshift and type to be well
defined.

Since a representative set of photometrically calibrated spectra in the
full wavelength range of the filters is not easy to obtain, we have used
the empirical templates of Coleman Weedman and Wu extended with spectral
synthesis models. These templates were adjusted to fit the calibrations
(see Budavari et al. AJ 120 1588 (2000))

The table contains the estimated redshift, the best matching template's
spectral class, K-corrections and absolute magnitudes. There are also some
parameters of the chi-square fitting. Caveats: The quality of photometric
redshift estimation of faint objects (or to be prcise with large
photometric errors) is weak. The "quality", "zErr" and "tErr" values are
just estimates, they are not always reliable. For this estimation we have
used galaxy templates for all objects. Except for a few misidentified
galaxies which were categorized as star in the photopipeline, the values
fornon-galaxies shouldn't be used.

Name

Type

Units

Description

objID

bigint 8

Unique ID pointing to PhotoObj table

Estimated parameters:

z

real 4

Photometric redshift

zErr

real 4

Marginalized error of the photometric redshift

t

real 4

Photometric SED type between 0 and 1

tErr

real 4

Marginalized error of the photometric type

dmod

real 4

mag

Distance modulus for Omega_M =
0.3,

Omega_lambda = 0.7 cosmology

rest_ug

real 4

mag

Rest-frame u-g color

rest_gr

real 4

mag

Rest-frame g-r color

rest_ri

real 4

mag

Rest-frame r-i color

rest_iz

real 4

mag

Rest-frame i-z color

kcorr_u

real 4

mag

k-correction

kcorr_g

real 4

mag

k-correction

kcorr_r

real 4

mag

k-correction

kcorr_i

real 4

mag

k-correction

kcorr_z

real 4

mag

k-correction

absMag_u

real 4

mag

Rest-frame u0 absolute magnitude

absMag_g

real 4

mag

Rest-frame g0 absolute magnitude

absMag_r

real 4

mag

Rest-frame r0 absolute magnitude

absMag_i

real 4

mag

Rest-frame i0 absolute magnitude

absMag_z

real 4

mag

Rest-frame z0 absolute magnitude

Parameters of the chi-square fit

class

int 4

Number describing the object type (galaxy = 1)

pId

int 4

Unique ID for photoz version

rank

int 4

Rank of the photoz determination; default is 0

version

varchar 6

Version of photoz code

chiSq

real 4

The chi^2 value for the fit

c_tt

real 4

tt-element of covariance matrix

c_tz

real 4

tz-element of covariance matrix

c_zz

real 4

zz-element of covariance matrix

fitRadius

int 4

pixels Radius of area used for covariance fit

fitThreshold

real 4

Probability threshold for .tting, peak normalized to 1

quality

int 4

Integer describing the
quality (best:5, lowest 0)

Photoz2 Table

The photometric redshifts from the U. Chicago/Fermilab/NYU group
(H. Oyaizu, M. Lima, C. Cunha, H. Lin, J. Frieman, and E. Sheldon)
are calculated using a Neural Network method that is
similar in implementation to that of
Collister and Lahav (2004, PASP, 116, 345).
The photo-z training and validation sets consist of over
551,000 unique spectroscopic redshifts
matched to nearly 640,000 SDSS photometric measurements.
These spectroscopic redshifts come from
the SDSS as well as the deeper galaxy surveys 2SLAQ, CFRS,
CNOC2, TKRS, and DEEP+DEEP2.

We provide photo-z estimates for a sample of over 77.4 million
DR6 primary objects, classified as galaxies by the
SDSS PHOTO pipeline (TYPE = 3),
with dereddened model magnitude r < 22,
and which do not have any of the flags BRIGHT,
SATURATED, or SATUR_CENTER set.
Note that this is a significant
change in the input galaxy sample selection compared to the DR5
version of Photoz2.

Our data model is

Name

Type

Description

objid

bigint

unique ID pointing to PhotoObjAll table

photozcc2

real

CC2 photo-z

photozerrcc2

real

CC2 photo-z error

photozd1

real

D1 photo-z

photozerrd1

real

D1 photo-z error

flag

int

0 for objects with r <= 20; 2 for objects with r > 20

Both the "CC2" and "D1" photo-z's are neural network based
estimators. "D1" uses the galaxy magnitudes in the
photo-z fit, while "CC2" uses only galaxy colors (i.e., only
magnitude differences). Both methods also employ concentration
indices (the ratio of PetroR50 and PetroR90).
The "D1" estimator provides smaller photo-z errors than the "CC2" estimator,
and is recommended for bright galaxies r < 20 to minimize
the overall photo-z scatter and bias.
However, for faint galaxies r > 20, we recommend "CC2"
as it provides more accurate photo-z redshift distributions.
If a single photo-z method is desired for simplicity,
we also recommend "CC2" as the better overall photo-z estimator.

The photo-z errors (1&sigma, or 68% confidence)
are computed using an empirical "Nearest Neighbor Error" (NNE) method.
NNE is a training set based method that associates similar errors to
objects with similar magnitudes, and is found
to accurately predict the photo-z error when the training set
is representative.

The photo-z "flag" value is set to 2 for fainter objects with r > 20,
whose photo-z's have larger uncertainties and biases.

QSO Catalog

Building the QsoCatalogAll and QsoConcordanceAll tables

Abstract: We constructed a catalog of all quasar candidates and
gathered their "vital signs" from the many different SDSS data sources into
one Quasar Concordance table.

1. The Target, Best, and Spec SDSS Datasets

The SDSS Target Database is used to select the targets that will
be observed with the SDSS spectrographs. Once made, these targeting decisions
are never changed but the targeting algorithm has improved over time. The
SDSS pipeline software is always improving so the underlying pixels are
re-analyzed with each data release. To have a consistent catalog, all the
mosaiced pixels, both from early and recent observations are reprocessed with
the new software in subsequent data releases. The output of each of these
uniform processing steps is called a Best Database. So at any instant
there is the historical cumulative Target database and the current
Best database. As of early 2006 we have the Early Data Release (EDR)
databases and then five "real" data releases DR1, DR2, DR3, DR4, and DR5.

The target selection is done by the various branches (galaxy, quasar,
serendipity) of the TARGET selection algorithm. These targets are organized
for spectroscopic follow-up by the TILING (Blanton et al. 2003) [0] algorithm
as part of a tiling run that works within a tiling geometry.
The tiling run places a 2.5 deg. circle over a tiling geometry and then
assigns spectroscopic targets to be observed. The circle corresponds to a
plate that can be mounted on the SDSS telescope to observe 640 targets
at a time. The plates are "drilled" and "plugged" with optical fibers and then
"observed". These spectroscopic observations are fed through a pipeline that
builds the Spec dataset. Because Spec is relatively small (2% the size
of Best), it is included in the Best database. Unfortunately, only the
"main" SDSS target photometry is exported to the Target database (the
target photometry for Southern and Special plates is not
exported - at best we have the later Best photometry for these objects in the
database).

The SDSS catalogs are cross-matched with the FIRST, ROSAT, Stetson,
USNO, and USNO-B catalogs and some vital signs from some of those catalogs are
included in the Quasar Concordance.

2. Overview: Finding Everything That MIGHT be a Quasar

We look in the Target..PhotoObjAll, Best..SpecObjAll, and Best..PhotoObjAll
tables to find any object that might be a quasar (a QSO). We build a
QsoCatalogAll table that has a row for every combination of nearby
TargPhoto-Spec-BestPhoto objects from these lists that are within 1.5
arcseconds of one another. If no matching object can be found from the QSO
candidate list we find a surrogate object -- the nearest primary
object from the corresponding catalog (Spec, BestPhoto, TargPhoto) if one can
be found (again using the 1.5" radius.) If an object is still unmatched, we
look for a secondary object, or put a zero for that ObjectID (in general, we
use zero rather than the SQL null value to represent missing
data).

2.1. Overview: QSO Tables

The tables and views created by the quasar concordance algorithm
on the Best, Target and Spectro datasets are part of the Best database.
The following sections explain how they are computed.

QSO Table/View descriptions

Name

type

Description

QsoCatalog

View

A view of QsoCatalogAll limited to only
the best QSO from each bunch

QsoConcordance

View

A view of QsoConcordanceAll limited to
only the best QSO from each bunch

QsoCatalogAll

Table

The superset of all QSO candidates
identified by the algorithm described below

QsoConcordanceAll

Table

The wide table that combines the Best, Spec and Target fields for each
QSO candidate

QsoBunch

Table

The QSO neighbors organized into
neighborhood bunches with a head QSO associated with each bunch

QsoBest

Table

The fields from the Best PhotoObjAll
table associated with each QSO candidate

QsoSpec

Table

The fields from the Best SpecObjAll table
associated with each QSO candidate

QsoTarget

Table

The fields from the Target PhotoObjAll
table associated with each QSO candidate

2.2. Overview: Quasar Bunches

Figure 1: A bunch of 2 targets, 2 bests and one spec object that
are within 1.5" of another bunch member. This bunch produces 4
(target,best,spec) triples in the concordance. The first target is
the bunch head.

The algorithm uses spatial proximity (aka: "is it nearby?") to cross-correlate
objects in the Target, Best, and Spec databases. The definition of nearby
is fairly loose: The SDSS Photo Survey pixels are 0.4 arcsecond and the
positioning is accurate to .1 arcsecond, but the Spectroscopic survey has
fibers that are 1.5 arcseconds in diameter. Therefore, the QSO
concordance uses the 1.5" fiber radius to define nearby for all 3
datasets.

In a perfect world, one SpecObj matches one BestObj and one TargetObj, and
they are all marked as QSOs. Some objects have no match in the other catalogs
-- so we have zeros in those slots of that object's row. But, sometimes 2
SpecObj match 3 TargetObj and 4 BestObj, and all 9 objects are marked as
QSOs. In this case we get 2x3x4 rows. We group together all the objects that
are related in this way as a bunch. Each bunch has a head
object ID: the first member of the bunch to be recognized as a possible QSO.
The precedence is TargetObjID first, if there is no target in the bunch then
the first SpecObjID (highest S/N primary first), else the first
BestObjID. This ordering reflects the first time the object was considered for
follow-up spectroscopy. This order avoids a selection bias in the dataset
(e.g., Malmquist bias if we were to order on decreasing S/N).

2.3 The QSO Catalog and Concordance

Figure 2: The Qso schema.

The premise is that any Target-Spec-Best tripple may be interesting so all
such triples are the QsoCatalogAll table. The vital signs (e.g position,
flags, flux,...) of each object are copied from the corresponding database to
a small tables along with some derived measurements special to QSOs (these are
the QsoTarget, QsoSpec, and QsoBest tables). All these tables are unified by
the QsoConcordanceAll view that "glues" the vital signs together. Most people
just want to see the best triple of each bunch - primary only and best S/N.
So the QsoConcordance view shows just the "primary" triple of each bunch.

3. Overview: A Walkthrough of the Algorithm.

Phase 1: Gather the Quasars and Quasar Candidates:
As a first step, gather the Target, Spec, and Best quasar candidate or
confirmed objects into a Zones table [1] containing their object identifiers
and positions. These are copied from the Best and Target PhotoObjAll tables
and the Best SpecObjAll table. These copies are filtered by flags indicating
that the objects are QSOs or are targeted as QSOs. For the photo objects
(target and best), this means they are primary or secondary and flagged
(primTarget) as: TARGET_QSO_HIZ OR TARGET_QSO_CAP OR TARGET_QSO_SKIRT OR
TARGET_QSO_FIRST_CAP OR TARGET_QSO_FIRST_SKIRT ( = 0x0000001F). For the
spectroscopic objects, they must have one or more of the following
properties:

recognized as a QSO or is of Unknown type or -- specClass in {UNKNOWN, QSO, or HIZ_QSO}

have high redshift (z > 0.6), or -- High Redshift objects are likely QSOs

they must be a QSO target ((primTarget & 0x1F) ≠ 0). -- or the object was targeted as a QSO

That logic is fine for most Spectroscopic objects, but there are "special
plates" whose authors overloaded the primary target flags (yes, they made it
much harder to understand the data and cost many hours of discussion trying
to disambiguate the data.) One can recognize the standard cases with the
predicate plate.programType = 0 meaning that the plate was processed as a
"Main" (programType=0 is "Main") chunk, not as a "special" (programType=2) or
"Southern" (programType=1) plate. The three-case logic about works fine for
"main" targets. The "targets for special plates" have SpecObj.primtarget
& 0x80000000≠ 0. Once you know it is "special" plate you have to
ask if it is a "special target". If it is, you have to ask is it the
"Fstar72" group? If not you can use the standard test ((primTarget & 0x1F)
≠ 0) - those nice people did not "overload" the primTarget flags. But
the folks who did "Fstar72" overloaded the flags and so we get the following
complex logic:

Phase 2: Find the Neighbors.
Once the zone table is assembled containing all the candidates, a zones
algorithm [1] is used to build a neighbors table among all these
objects. Two objects are QSO neighbors if they are within 1.5
arcseconds of one another. The relationship is made transitive so that
friends of friends are all part of the same neighborhood.

Phase 3: Build the Bunches.
The Neighbors relationship partitions the objects into bunches. We
pick a distinguished member from each bunch to represent that bunch - called
the bunch head. The selection favors Target then Spec, then Photo
Objects and within that category it favors primary, then secondary, then
outside objects if there is a tie within one group (e.g. multiple target
objects in a bunch.) If there are multiple selections within these groups, the
tie is broken by taking the minimum object ID for PhotoObj (again, to avoid
any selection bias) and the highest S/N for specObjs. Given these bunch
heads, we record a summary record for each bunch in the QsoBunch table:

QsoBunch table

Name

type

Description

HeadID

bigint

Unique identifier of the head object of
this bunch of objects (all nearby one another).

HeadType

Char(6)

TARGET, SPEC, or BEST depending on what
type of object the head is

RA

Float

RA of bunch head object

Dec

Float

DEC of bunch head object

TargetObjs

int

Count of the number of Target objects in
the bunch.

SpecObjs

int

Count of the number of Spectroscopic
objects in the bunch.

BestObjs

int

Count of the number of Best objects in
the bunch.

TargetPrimaries

int

Count of Primary Target objects in the
bunch.

SpecPrimaries

int

Count of the SciencePrimary Spectroscopic
objects in the bunch.

BestPrimaries

int

Count of Primary Best objects in the
bunch.

Where the difference between TargetObjs and TargetPrimaries (etc.) is that
TargetObjs indicates multiple entries of the same object in the database
(e.g. both as a primary and a secondary), whereas TargetPrimaries helps us to
identify objects that are either very close together or that were deblended
into two objects separated by less than 1.5" (or are in a circle of 1.5"
radius). Because the object primary flags are not handy at this point of the
computation, the Bunch statistics are actually computed in Phase 9.

Phase 4: Build the Catalog.
Now we grow the QsoCatalogAll table which, for each bunch, has triples
drawn from each class of the bunch (a target, a spec, and a best object). For
example, the bunch of Figure 1 would produce 4 triples. If there is no
object in one of the classes, we fill in with a non-QSO surrogate object - the
primary object from that database (Targ, Photo, Spec) closest to the bunch
head, or if there is no primary then a secondary (the test insists on the 1.5
arcsecond radius.) If no such object can be found we fill in that slot with a
zero object. The resulting table looks like this:

QsoCatalogAll table

Name

type

Description

HeadID

bigint

Unique
identifier of this bunch of objects (all nearby one another).

TripleID

bigint

Unique identifier of this (spec, best, target) triple

QsoPrimary

bit

This is the
best triple of the bunch.

TargetObjID

bigint

Unique ID
in Target DB or 0 if there is no matching object.

SpecObjID

bigint

Unique ID
of spectrographic object or 0 if there is no such object.

BestObjID

bigint

Unique ID
in BestDB composed from or 0 if there is no such object.

TargetQsoTargeted

bit

Flag: 1
PhotoObjID was flagged as a QSO in the target flags.

SpecQsoConfirmed

bit

Flag: 1
means this SpecObj.SpecClass QSO or HiZ_QSO

SpecQsoUnknown

bit

Flag: 1
means this SpecObj.SpecClass is
unknown

SpecQsoLargeZ

bit

Flag: 1
means this SpecObj Z > 0.6

SpecQsoTargeted

bit

Flag: 1
means this SpecObj was picked as a QSO target

BestQsoTargeted

bit

Flag: 1
PhotoObjID was flagged as a QSO in the target flags.

dist_Target_Best

float

distance
arcMin between Target and Best

dist_Target_Spec

float

distance
arcMin between Target and Spec

dist_Best_Spec

float

distance
arcMin between Best and Spec

psfmag_i_diff

float

target.psfmag_i - best.psfmag_i

psfmag_g_i_diff

float

(target.psfmag_g-target.psfmag_i)
- (best.psfmag_g-best.psfmag_i)

The last 5 "quality fields" are computed in Phase 9.

Phase 5: Find Surrogates for missing objects.
Some objects in the Catalog entries have no matching Target, Best, or
Spec objects. In these cases we look in the database to find a
surrogate object (which was not a QSO candidate) that is nearby the
bunch head object - as usual the search radius is 1.5 arcseconds and we favor
primary over secondary objects and favor low-signal-to noise ratio
SpecObjs.

Phase 6: Get the Vital Signs.
We now go to the source databases and get the "vital signs" of these photo and
spetro objects (both quasar candidates and also surrogates) , building a
QsoSpec, QsoTarget, and QsoBest tables holding these values and for the
photo objects, some additional values from ROSAT and FIRST if there is a
match. We then define QsoConcordanceAll as a view on these base
tables with the following (~100) fields.

Phase 7: Define QsoConcordanceAll and QsoConcordance Views:
Now we are ready to "glue together the QsoCatalog with the vital signs to
make a "fat table" with all the attributes.

Phase 9: Mark the primary triple of each bunch, compute some derived
magnitude values and cleanup:
Having the QsoConcordanceAll view and all the vital signs in place we
compute some derived values: Picking the best triple of each bunch, computing
the distances among members of the triple and computing some derived psf
magnitudes.

In the end, the DR5 database has 265,697 bunches, 329,871 triples in the
concordance and 114,883 confirmed quasars. Most bunches have one catalog
entry, but about 10% have multiple matches (generally and primary and
secondary best or target object where both are flagged as QSO candidates or
multiple observations of a spectroscopic object). The catalog itself has
some interesting cases. In DR5 there are 82,142 cases where the Target,
Spec, and Best all agree that it is a quasar. Since SDSS spectroscopy lags
the imaging, it is not surprising that there are 81,011 objects where both the
Target and Best indicate a likely QSO, but there is no spectrogram for the
object (the Spec Zero case).

With the QsoCatalogAll and QsoConcordanceAll in place we define two views:
QsoCatalog (the best of the bunch) and QsoConcordance (the wide version) by
picking the best targetObj, spec, and bestObj of each bunch.

Spectroscopic Redshift and Type Determination

The spectro1d pipeline analyzes the combined, merged
spectra output by spectro2d and determines object
classifications (galaxy, quasar, star, or unknown) and redshifts; it
also provides various line measurements and warning flags. The code
attempts to measure an emission and absorption redshift independently
for every targeted (nonsky) object. That is,
to avoid biases, the absorption and emission codes operate
independently, and they both operate independently of any target
selection information.

The spectro1d pipeline
performs a sequence of tasks for each object spectrum on a plate: The
spectrum and error array are read in, along with the pixel
mask. Pixels with mask bits set to FULLREJECT,
NOSKY, NODATA, or BRIGHTSKY are
given no weight in the spectro1d routines. The continuum
is then fitted with a fifth-order polynomial, with iterative rejection
of outliers (e.g., strong lines). The fit continuum is subtracted from
the spectrum. The continuum-subtracted spectra are used for
cross-correlating with the stellar templates.

Emission-Line Redshifts

Emission lines (peaks in the one-dimensional spectrum) are found by
carrying out a wavelet transform of the continuum-subtracted spectrum
fc(&lambda):

where g(x; a, b) is the wavelet (with
complex conjugate ) with translation and scale parameters a and
b. We apply the à trous wavelet (Starck,
Siebenmorgen, & Gredel 1997). For fixed wavelet scale b,
the wavelet transform is computed at each pixel center a; the
scale b is then increased in geometric steps and the process
repeated. Once the full wavelet transform is computed, the code finds
peaks above a threshold and eliminates multiple detections (at
different b) of a given line by searching nearby pixels. The
output of this routine is a set of positions of candidate emission
lines.

This list of lines with nonzero weights is matched
against a list of common galaxy and quasar emission lines, many of which
were measured
from the composite quasar spectrum of Vanden Berk et al.(2001; because
of velocity shifts of different lines in quasars, the wavelengths
listed do not necessarily match their rest-frame values). Each
significant peak found by the wavelet routine is assigned a trial line
identification from the common list (e.g., MgII) and an associated trial redshift. The
peak is fitted with a Gaussian, and the line center, width, and height
above the continuum are stored in HDU 2 of the spSpec*.fits files
as parameters wave,
sigma, and height, respectively. If the code
detects close neighboring lines, it fits them with multiple
Gaussians. Depending on the trial line identification, the line width
it tries to fit is physically constrained. The code then searches for
the other expected common emission lines at the appropriate
wavelengths for that trial redshift and computes a confidence level
(CL) by summing over the weights of the found lines and dividing by
the summed weights of the expected lines. The CL is penalized if the
different line centers do not quite match. Once all of the trial line
identifications and redshifts have been explored, an emission-line
redshift is chosen as the one with the highest CL and stored as
z in the EmissionRedshift table and
the spSpec*.fits emission
line HDU. The exact expression for the emission-line CL has been
tweaked to match our empirical success rate in assigning correct
emission-line redshifts, based on manual inspection of a large number
of spectra from the EDR.

The SpecLine table also gives the errors,
continuum, equivalent width, chi-squared, spectral index, and
significance of each line. We caution that the emission-line
measurement for Hα should only be used if chi-squared is less than
2.5. In the SpecLine table,
the "found" lines in HDU1 denote only those lines used to measure the
emission-line redshift, while "measured" lines in HDU2 are all lines
in the emission-line list measured at the redshifted positions
appropriate to the final redshift assigned to the object.

A separate routine searches for high-redshift (z > 2.3)
quasars by identifying spectra that contain a Lyα
forest signature: a broad emission line with more
fluctuation on the blue side than on the red side of the line. The
routine outputs the wavelength of the Lyα
emission line; while this allows a determination of
the redshift, it is not a high-precision estimate, because the Lyα
line is intrinsically broad and affected by Lyα
absorption. The spectro1d pipeline
stores this as an additional emission-line redshift. This redshift
information is stored in the EmissionRedshift table.

If the highest CL emission-line redshift uses lines only expected
for quasars (e.g., Lyα, CIV, CIII], then the object is
provisionally classified as a quasar. These provisional
classifications will hold up if the final redshift assigned to
the object (see below) agrees with its emission redshift.

Cross-Correlation Redshifts

The spectra are cross-correlated with stellar, emission-line
galaxy, and quasar template spectra to determine a cross-correlation
redshift and error. The cross-correlation templates are obtained from
SDSS commissioning spectra of high signal-to-noise ratio and comprise
roughly one for each stellar spectral type from B to almost L, a
nonmagnetic and a magnetic white dwarf, an emission-line galaxy, a
composite LRG spectrum, and a composite quasar spectrum (from Vanden
Berk et al. 2001). The composites are based on co-additions of ∼
2000 spectra each. The template redshifts are
determined by cross-correlation with a large number of stellar spectra
from SDSS observations of the M67 star cluster, whose radial velocity
is precisely known.

When an object spectrum is cross-correlated with the stellar templates, its found emission lines are masked out, i.e., the redshift is derived from the absorption features. The cross-correlation routine follows the technique of Tonry & Davis (1979): the continuum-subtracted spectrum is Fourier-transformed and convolved with the transform of each template. For each template, the three highest cross-correlation function (CCF) peaks are found, fitted with parabolas, and output with their associated confidence limits. The corresponding redshift errors are given by the widths of the CCF peaks. The cross-correlation CLs are empirically calibrated as a function of peak level based on manual inspection of a large number of spectra from the EDR. The final cross-correlation redshift is then chosen as the one with the highest CL from among all of the templates.

If there are discrepant high-CL cross-correlation peaks, i.e., if
the highest peak has CL < 0.99 and the next highest peak
corresponds to a CL that is greater than 70% of the highest peak, then
the code extends the cross-correlation analysis for the corresponding
templates to lower wavenumber and includes the continuum in the
analysis, i.e., it chooses the redshift based on which template
provides a better match to the continuum shape of the object. These
flagged spectra are then manually inspected (see below). The
cross-correlation redshift is stored as z in the CrossCorrelationRedshift table.

Final Redshifts and Spectrum Classification

The spectro1d pipeline assigns a final redshift
to each object spectrum by choosing the emission or cross-correlation
redshift with the highest CL and stores this as z in the
SpecObj table. A
redshift status bit mask zStatus
and a redshift warning bit mask zWarning
are stored. The CL is stored in zConf. Objects with redshifts
determined manually (see below)
have CL set to 0.95 (MANUAL_HIC set in
zStatus), or 0.4 or 0.65 (MANUAL_LOC set in
zStatus). Rarely, objects have the entire red or blue
half of the spectrum missing; such objects have their CLs reduced by a
factor of 2, so they are automatically flagged as having low
confidence, and the mask bit Z_WARNING_NO_BLUE or
Z_WARNING_NO_RED is set in zWarning as
appropriate.

All objects are classified in specClass as
either a quasar, high-redshift quasar, galaxy, star, late-type star,
or unknown. If the object has been identified as a quasar by the
emission-line routine, and if the emission-line redshift is chosen as
the final redshift, then the object retains its quasar
classification. Also, if the quasar cross-correlation template
provides the final redshift for the object, then the object is
classified as a quasar. If the object has a final redshift z >
2.3 (so that Lyα is or should be present in the
spectrum), and if at least two out of three redshift estimators agree
on this (the three estimators being the emission-line, Lyα,
and cross-correlation redshifts), then it is
classified as a high-z quasar. If the object has a redshift
cz < 450 km s-1, then it is classified as a
star. If the final redshift is obtained from one of the late-type
stellar cross-correlation templates, it is classified as a late-type
star. If the object has a cross-correlation CL < 0.25, it is
classified as unknown.

There exist among the spectra a small number of composite objects. Most common are bright stars on top of galaxies, but there are also galaxy-galaxy pairs at distinct redshifts, and at least one galaxy-quasar pair, and one galaxy-star pair. Most of these have the zWarning flag set, indicating that more than one redshift was found.

The zWarning bit mask mentioned above records problems
that the spectro1d pipeline found with each spectrum. It provides
compact information about the spectra for end users, and it is also
used to trigger manual inspection of a subset of spectra on every
plate. Users should particularly heed warnings about parts of the
spectrum missing, low signal-to-noise ratio in the spectrum,
significant discrepancies between the various measures of the
redshift, and especially low confidence in the redshift
determination. In addition, redshifts for objects with zStatus = FAILED
should not be used.

Spectral Classification Using
Eigenspectra

In addition to spectral classification based on measured lines,
galaxies are classified by a Principal Component Analysis (PCA), using
cross-correlation with eigentemplates constructed from SDSS
spectroscopic data. The 5 eigencoefficients and a classification
number are stored in eCoeff and eClass,
respectively, in the SpecObj table and the spSpec
files. eClass, a single-parameter classifier based on the
expansion coefficients (eCoeff1-5), ranges from about
-0.35 to 0.5 for early- to late-type galaxies.

A number of
changes to eClass have occurred since the EDR. The
galaxy spectral classification eigentemplates for DR1 are created from
a much larger sample of spectra than were used in the Stoughton et al. EDR
paper, and now number
approximately 200,000. The eigenspectra used in DR1 are an early
version of those created by Yip et al. (in prep). The sign of the
second eigenspectrum has been reversed with respect to that of EDR;
therefore we recommend using the expressionatan(-eCoeff2/eCoeff1) rather than
eClass as the single-parameter classifier.

Manual Inspection of Spectra

A small percentage of spectra on every plate are inspected
manually, and if necessary, the redshift, classification,
zStatus, and CL are corrected. We inspect those spectra
that have zWarning or zStatus indicating
that there were multiple high-confidence cross-correlation redshifts,
that the redshift was high (z > 3.2 for a quasar or z >
0.5 for a galaxy), that the confidence was low, that signal-to-noise
ratio was low in r, or that the spectrum was not measured. All
objects with zStatus = EMLINE_HIC or
EMLINE_LOC, i.e., for which the redshift was determined
only by emission lines, are also examined. If, however, the object has
a final CL > 0.98 and zStatus of either
XCORR_EMLINE or EMLINE_XCORR, then despite
the above, it is not manually checked. All objects with either
specClass = SPEC_UNKNOWN or
zStatus = FAILED are manually
inspected.

Roughly 8% of the spectra in the EDR were thus inspected, of which about one-eighth, or 1% overall, had the classification, redshift, zStatus, or CL manually corrected. Such objects are flagged with zStatus changed to MANUAL_HIC or MANUAL_LOC, depending on whether we had high or low confidence in the classification and redshift from the manual inspection. Tests on the validation plates, described in the next section, indicate that this selection of spectrafor manual inspection successfully finds over 95% of the spectra for which the automated pipeline assigns an incorrect redshift.

Resolving Multiple Detections and Defining Samples

In addition to reading this section, we recommend that users
familiarize themselves with the ,
which indicate what happened to each object during the Resolve
procedure.

SDSS scans overlap, leading to duplicate detections of objects in
the overlap regions. A variety of unique (i.e., containing no
duplicate detections of any objects) well-defined (i.e., areas with
explicit boundaries) samples may be derived from the SDSS database.
This section describes how to define those samples. The resolve figure is a useful visual aid for the
discussion presented below.

Consider a single drift scan along a stripe, called a run.
The camera has six columns of CCDs, which scan six swaths across the
sky. A given camera column is referred to throughout with the
abbreviation camCol. The unit for data processing is the
data from a single camCol for a single run. The same data may be
processed more than once; repeat processing of the same run/camCol is
assigned a unique rerun number. Thus, the fundamental unit of
data process is identified by run/rerun/camCol.

While the data from a single run/rerun/camCol is a scan
line of data 2048 columns wide by a variable number of rows
(approximately 133000 rows per hour of scanning), for purposes of data
processing the data is split up into frames 2048 columns wide by 1361
rows long, resulting in approximately 100 frames per scan line per
hour of scanning. Additionally, the first 128 rows from the next
frame is added to the previous frame, leading to frames 2048 columns
wide by 1489 rows long, where the first and last 128 rows overlap the
previous and next frame, respectively. Each frame is processed
separately. This leads to duplicate detections for objects in the
overlap regions between frames. For each frame, we split the overlap
regions in half, and consider only those objects whose centroids lie
between rows 64 and 1361+64 as the unique detection of that object for
that run/rerun/camCol. These objects have the OK_RUN bit set in the
"status" bit mask. Thus, if you want a unique sample of all objects
detected in a given run/rerun/camCol, restrict yourself to all objects
in that run/rerun/camCol with the OK_RUN bit set. The boundaries of
this sample are poorly defined, as the area of sky covered depends on
the telescope tracking. Objects must satisfy other criteria as well
to be labeled OK_RUN; an object must not be flagged BRIGHT (as there
is a duplicate "regular" detection of the same object); and must not
be a deblended parent (as the children are already included); thus it
must not be flagged BLENDED unless the NODEBLEND flag is set. Such
objects have their GOOD bit set.

For each stripe, 12 non-overlapping but contiguous scan lines are
defined parallel to the stripe great circle (that is, they are bounded by two
lines of constant great circle latitude). Each scan line is 0.20977 arcdegrees
wide (in great circle latitude). Each run/camCol scans along one
of these scan lines, completely covering the extent of the scan line in
latitude, and overlapping the adjacent scan lines by approximately 1 arcmin.
Six of these scan lines are covered when the "north" strip of the stripe is
scanned, and the remaining six are covered by the "south" strip.
The fundamental unit for defining an area of the sky considered as observed
at sufficient quality is the segment. A segment consists of all
OK_RUN objects for a given run/rerun/camCol contained within a rectangle
defined by two lines of constant great circle longitude (the east and west
boundaries) and two lines of constant great circle latitude (the north and
south boundaries, being the same two lines of constant great circle latitude
which define the scan line). Such objects have their OK_SCANLINE bit set in
the status bit mask. A segment consists of a contiguous set of fields, but
only portions of the first and/or last field may be contained within the
segment, and indeed a given field could be divided between two adjacent
segments.
If an object is in the first field in a segment, then its FIRST_FIELD bit is
set, along with the OK_SCANLINE bit; if its not in the first field in the
segment, then the OK_SCANLINE bit is set but the FIRST_FIELD bit is not set.
This extra complication is necessary for fields which are split between two
segments; those OK_SCANLINE objects without the FIRST_FIELD bit set would
belong to the first segment (the segment for which this field is the last
field in the segment), and those OK_SCANLINE objects with the FIRST_FIELD
bit set would belong the the second segment (the segment for which this field
is the first field in the segment).

A chunk consists of a non-overlapping contiguous set of segments
which span a range in great circle longitude over all 12 scan lines for a
single stripe. Thus, the set of OK_SCANLINE (with appropriate attention to
the FIRST_FIELD bit) objects in all segments for a given chunk comprises
a unique sample of objects in an area bounded by two lines of constant great
circle longitude (the east and west boundaries of the chunk) and two lines of
constant great circle latitude (+- 1.25865 degrees, the north and south
boundaries of the chunk).

Segments and chunks are defined in great circle coordinates along their given
stripe, and contain unique detections only when limited to other segments and
chunks along the same stripe. Each stripe is defined by a great circle, which
is a line of constant latitude in survey coordinates (in survey coordinates,
lines of constant latitude are great circles while lines of constant longitude
are small circles, switched from the usual meaning of latitude and longitude).
Since chunks are 2.51729 arcdegrees wide, but stripes are separated by 2.5
degrees (in survey latitude), chunks on adjacent stripes can overlap (and
towards the poles of the survey coordinate system chunks from more than two
stripes can overlap in the same area of sky). A unique sample of objects
spanning multiple stripes may then be defined by applying additional cuts in
survey coordinates. For a given chunk, all objects that lie within +- 1.25
degrees in survey latitude of its stripe's great circle have the OK_STRIPE
bit set in the "status" bit mask. All OK_STRIPE objects comprise a unique
sample of objects across all chunks, and thus across the entire survey area.
The southern stripes (stripes 76, 82, and 86) do not have adjacent stripes,
and thus no cut in survey latitude is required; for the southern stripes
only, all OK_SCANLINE objects are also marked as OK_STRIPE, with no additional
survey latitude cuts.

Finally, the official survey area is defined by two lines of constant survey
longitude for each stripe, with the lines being different for each stripe.
All OK_STRIPE objects falling within the specified survey longitude boundaries
for their stripe have the PRIMARY bit set in the "status" bit mask. Those
objects comprise the unique SDSS sample of objects in that portion of the
survey which has been finished to date. Those
OK_RUN objects in a segment which fail either the great circle latitude cut for
their segment, or the survey latitude or longitude cut for their stripe, have
their SECONDARY bit set. They do not belong to the primary sample, and
represent either duplicate detections of PRIMARY objects in the survey area,
or detections outside the area of the survey which has been finished to date.

Objects that lie close to the bisector between frames, scan lines, or chunks
present some difficulty. Errors in the centroids or astrometric calibrations
can place such objects on either side of the bisector. A resolution is
performed at all bisectors, and if two objects lie within 2 arcsec of each
other, then one object is declared OK_RUN/OK_SCANLINE/OK_STRIPE (depending on
the test), and the other is not.

Transformations between SDSS magnitudes and UBVRcIc

There have been several efforts in calculating transformation equations
between ugriz (or u'g'r'i'z') and UBVRcIc.
Here, we focus on six of the most current efforts:

There are currently no transformation equations explicitly for
galaxies, but Jester al.'s (2005) and Lupton's (2005) transformation equations for
stars should also provide reasonable results for normal galaxies
(i.e., galaxies without strong emission lines).

Caveat: Note that these transformation equations are for the SDSS
ugriz (u'g'r'i'z') magnitudes as measured, not for SDSS ugriz
(u'g'r'i'z') corrected for AB offsets. If you need AB ugriz
magnitudes, please remember to convert from SDSS ugriz to AB ugriz
using AB offsets described at this
URL).

Creating Sectors

August 2003, revised March 2004, December 2004, November 2005

The Problem

The SDSS spectroscopic survey will consist of about 2000 circular Tiles, each
about 1.5 deg. in radius, which contain the objects for a given spectroscopic
observation. There are more opportunities to target (get the spectrum of) an
object if it is covered by multiple tiles. If three tiles cover an area, the
objects in that area are three times more opportunity to be targeted. At the
same time, objects are not targeted uniformly over a plate. The targeting is
driven by a program that uses the SDSS photographic observations to schedule
the spectroscopic observations. These photographic observations are 2.5
deg. wide stripes across the sky. The strips overlap about 15%, so the sky is
partitioned into disjoint staves and the tiling is actually done in terms of
these staves (see Figure 1.) Staves are often misnamed stripes in the database
and in other SDSS documentation.

Figure 1. Observations consist of overlapping stripes
partitioned into disjoint staves. TilingRuns work on a set of staves, and each TilingGeometry region is contained within a stave.

Spectroscopic targeting is done by a tiling run that works with a collection
of staves - actually not whole staves but segments of them called chunks. The
tiling run generates tiles that define which objects are going to be observed
(actually, which holes to drill in a SDSS spectroscopic plate.) The tiling
run also generates a list of TilingGeometry rectangular regions that describe
the sections of the staves that were used to make the tiles. Some
TilingGeometry rectangles are positive, others are negative (masks or holes.)
Subsequent tiling runs may use the same staves (chunks) and so tiling runs are
not necessarily disjoint. So, TilingGeometries form rather complex
intersections that we call SkyBoxes.

The goal is to compute contiguous sectors covered by some number of plates and
at least one positive TilingGeometry. We also want to know how many plates
cover the sector.

This is a surprisingly difficult task because there are subtle interactions.
We will develop the algorithm to compute sectors in steps. First we will
ignore the TilingGeometry and just compute the wedges (Boolean combinations of
tiles). Then we will build TilingBoxes, positive quadrilateral partitions of
each tiling region that cover the regions. SkyBoxes are the synthesis of the
TilingBoxes from several tiling runs into a partitioning of the survey
footprint into disjoint quadrilaterals positive quadrilaterals. Now, to
compute sectors, we simply intersect all wedges with all Skyboxes. The
residue is the tile coverage of the survey. A tile contributes to a sector if
the tile contributes to the wedge and the tile was created by one of the tile
runs that contain the SkyBox (you will probably understand that last sentence
better after you read to the end of this paper.)

Wedges

Figure 2. A wedge and
sector covered by one plate. There are adjoining wedges
covered by 2, 3, 4 plates. The lower left corner is an area that is
not part of any wedge or sector. SkyBoxes break wedges into
sectors and may mask parts of a wedge.

A wedge is the intersection of one or more tiles or the intersection of some
tiles with the complements of some others. Each wedge has a depth: the number
of positive tiles covering the wedge (see figures 2, 3). The two intersecting
tiles in figure 2, A and B, have (A-B) and (B-A) wedges of depth 1, and the
intersection (AB) is a depth 2 wedge.

Figure 3.Tile A has a blue boundary;
tile B has the red boundary, both regions of depth
1. Their intersection is yellow, a Region of depth 2. The crescents
shaded in blue and green are the two wedges of
depth 1, and the yellow area is a wedge of
depth 2. Nodes are purple dots.

A sector is a wedge modified by intersections with overlapping TilingGeometry
regions. If the TilingGeometry regions are complex (multiple convexes) or if
they are holes (isMask=1), then the result of the intersection may also be
complex (a region of multiple wedges). By going to a SkyBox model we keep
things simple. Since SkyBoxes partition the sky into areas of known tile-run
depth, SkyBox boundaries do not add any depth to the sectors; they just
truncate them to fit in the box boundary and perhaps mask a tile if that tile
is in a TilingGeometry hole or if the tile that contributes to that wedge is
not part of the TilingGeometry (one of the tiling runs) that make up that
SkyBox (Figure 4 shows a simple example of these concepts).

Figure 4.This shows how the tiles and
TilingGeometry rectangles intersect to form
sectors. On the figure we have a layout that has wedges
of various depths, depth 1 is gray, depth 2 is light blue, depth 3 is
yellow and depth 4 is magenta. The wedges are clipped by the
TilingGeometry boundary to form sectors.

To get started, spCreateWedges() computes the wedge regions, placing them in
the Sectors table, and for each wedge W and each tile T that adds to or
subtracts from W, records the T->W in the Sectors2Tiles table (both positive
and negative parents). So, in Figure 3, the green wedge (the leftmost wedge)
would have tile A as a positive parent and tile B as a negative parent.

Boxes

A particular tiling run works on a set of (contiguous) staves, and indeed only
a section of each stave called a chunk. These areas are defined by disjoint
TilingRegions. To complicate matters, some TilingRegions have rectangular
holes in it them that represent bad seeing (bright stars, cosmic rays or other
flaws). So a tiling run looks something like Figure 5. And each
TilingGeometry is spherical rectangle with spherical-rectangular holes (see
Figure 5.)

Figure 5.Staves (convex sides not illustrated)
are processed in chunks. TilingGeometry is a
chunk/stavesubset with holes
(masks). TilingBoxes cover a
TilingGeometrywith disjoint spherical rectangles. There
are many such coverings, two are shown for TG1. The one at left has 23
TileBoxes while the one at right has 7
TileBoxes

To simplify matters, we want to avoid the holes and work only with simple
convex regions. So we decompose each TileGeometry to a disjoint set of
TileBoxes. As Figure 5 shows, there are many different TileBox
decompositions. We want a TileBox decomposition with very few
TileBoxes. Fewer is better - but the answer will be the same in the end since
we will merge adjacent sectors if they have the same depth.

It is not immediately obvious how to construct the TileBoxes. Figure 6 gives
some idea.

First, the whole operation of subtracting out the masks happens inside the
larger TilingGeometry, called the Universe, U. We are going to construct
nibbles which are a disjunctive normal form of the blue area with at least one
negative hole edge to make sure we exclude the hole. These nibbles are
disjoint and cover the TileGeometry and exclude the mask (white) area.

As described in "There Goes the Neighborhood: Relational Algebra for Spatial
Data Search" we represent spherical polygons as a set of half-space
constraints of the form h = (hx,hy,hz,c). Point p = (px,py,pz) is inside
the halfspace if hx*px+hy*py+hz*pz>c. A convex region, C ={hi} is the set of
points inside each of the hi.

Given that representation we can compute the set N of nibbles covering region
R = U-C as follows:

Compute R = N = U - C where U and C are convex regions (C is the "hole" in U)
the idea is

R = {ui} - {ci}
= U &{~c1} | U&{~c2} | ...| U&{~cm}
= U&~c1 | U&c1&~c2 | ... | U&c1&c2&...&cm-1&~cm
The terms in the last equation are called nibbles.
They are disjoint (look at the terms if each term has a unique ~ci)
and together they cover R and exclude C (each ~ci excludes C).

Algorithm:

R= {} -- the disjoint regions will be added to R.
NewU = spRegionCopy U -- make a copy of U so we do not destroy it
for each c in C -- for each constraint in c that is an arc
-- of the hull
Nibble = NewU &{ ~c } -- intersect Not c with the current universe
if Nibble not empty -- if Not c intersects universe then
add Nibble to R -- Add this Nibble to answer set
NewU = NewU & {c} -- Not c is covered, so reduce the universe
When each positive TilingGeometry is "nibbled" by its masks, the resulting
nibbles are the TileBoxes we need.

The procedure spCreateTileBoxes creates, for each TilingGeometry, a set of
TilingBox regions that cover it. That procedure also records in Region2Boxes a
mapping of TilingGeometry-> TileBox so that we can tell which TilingGeometry
region covers a box.

SkyBoxes are the unification of all TileBoxes into a partitioning of the
entire sky. Logically, SkyBboxes are the Boolean combination of all the
TileBoxes - somewhat analogous to the relationship between wedges and tiles.
A SkyBoxes may be covered by multiple TilingGeometries (and have corresponding
tiling runs); Region2Boxes records this mapping of TilingGeometry -> TileBox.
Figure 7 illustrates how SkyBoxes are computed and how the TilingGeometry
relationship is maintained.

Figure 7.SkyBoxes are the intersection of
TileBoxes. A pair can produce up to 7
SkyBoxes. The green areas are covered by the union of
the tiling runs of the two TileBoxes and
the other SkyBoxes are covered by the Tiling Runs of
their one parent box.

spCreateSkyBoxes builds all the SkyBoxes and records the dependencies.
spCreateSkyBoxes uses the logic of spRegionQuradangleFourOtherBoxes to create
the SkyBoxes from the intersections of TileBoxes.

From Wedges and SkyBoxes to Sectorlets to Sectors

We really want the sectors, but it is easier to first compute wedges and
SkyBoxes and then build the sectors from them. Recall that:

Wedge: a Boolean combination of tiles.

Skybox: a convex region of the survey covered by certain TilingRuns.
So, the sectors are just

Wedge ( Skybox.

This is may be fine a partition - but two adjacent sectors computed in this
way might have the same list of covering TileGeometry and Tiles in which case
they should be unified into one sector. So, this first Wedge-SkyBox partition
is called sectorlets. These sectorlets need to be unified into sectors if they
have the same covering tiles. This unification gives us a unique answer
(remember that Figure 5 showed many different TileBox partitions, this final
step eliminates any "fake" partitions introduced by that step).

Sectorlets are computed as follows: Given a wedge W and a SkyBox SB, the area
is just W ( SB. If that area is non-empty then we need to compute the list of
covering tileGeometry and tiles. The TilingGeometries come from SB. The
tiles are a bit more complex. Let T be the set of tiles covering W. Discard
from T any tile not created by a tiling run covering SB. In mathematical
notation:

But, a particular tile or set of tiles can create many sectorlets. We want
the sector to be all the adjacent sectorlets with the same list of parent
tiles (note that sectorlets have positive (covering) and negative (excluded)
parents that make up the sector).

The routine spSectorCreateSectors unifies all the sectorlets with the same
list of parent tiles into one region. This region may not be connected (masks
or tiling geometry may break it into pieces which we then glued back together
- see the example of 5 sectorlets creating one sector in Figure 8.)

All these routines are driven by the parent spSectorCreate routine.

Measuring and recreating the sky value

How Sky Values are Measured

It is quite clear what astronomers mean by 'sky': the mean
value of all pixels in an image which are not explicitly identified as
part of any detected object. It is this quantity which, when
multiplied by the effective number of pixels in an object, tells us
how much of the measured flux is not in fact associated with the
object of interest. Unfortunately, means are not very robust, and the
identification of pixels not explicitly identified as part
of any detected object is fraught with difficulties.

There are two main strategies employed to avoid these difficulties:
the use of clipped means, and the use of rank statistics such as the
median.

Photo performs two levels of sky subtraction; when first processing
each frame it estimates a global sky level, and then, while searching
for and measuring faint objects, it re-estimates the sky level locally
(but not individually for every object).

The initial sky estimate is taken from the median value of every pixel
in the image (more precisely, every fourth pixel in the image),
clipped at 2.32634 sigma. This estimate of sky is corrected for the
bias introduced by using a median, and a clipped one at that. The
statistical
error in this value is then estimated from the values of sky determined
separately from the four quadrants of the image.

Using this initial sky estimation, Photo proceeds to find all the
bright objects (typically those with more than 60 sigma detections).
Among these are any saturated stars present on the frame, and Photo
is designed to remove the scattering wings from at least the brighter
of these --- this should include the scattering due to the
atmosphere, and also that due to scattering within the CCD membrane,
which is especially a problem in the i band. In fact, we have chosen
not to aggressively subtract the wings of stars, partly because
of the difficulty of handling the wings of stars that do not fall on
the frame, and partly due to our lack of a robust understanding of the
outer parts of the PSF . With the parameters
employed, only the very cores of the stars (out to 20 pixels) are ever
subtracted, and this has a negligible influence on the data. Information
about star-subtraction is recorded in the fpBIN files, in
HDU 4.

Once the BRIGHT detections have been processed, Photo
proceeds with a more local sky estimate. This is carried out by
finding the same clipped median, but now in 256x256 pixel
boxes, centered every 128 pixels. These values are again debiased.

This estimate of the sky is then subtracted from the data, using
linear interpolation between these values spaced 128 pixels apart; the
interpolation is done using a variant of the well-known Bresenham
algorithm usually employed to draw lines on
pixellated displays.

This sky image, sampled every 128x128 pixels is written out to
the fpBIN file in HDU 2; the estimated uncertainties in the sky
(as estimated from the interquartile range and converted to a standard
deviation taking due account of clipping) is stored in HDU 3. The
value of sky in each band and its error, as interpolated to the center
of the object, are written to the fpObjc files along with all
other measured quantities.

After all objects have been detected and removed,
Photo has the option of re-determining the sky using the same
256x256 pixel boxes; in practice this has not proved to
significantly affect the photometry.

Emission and absorption line fitting

Spectro1D fits spectral features at three separate stages during
the pipeline. The first two fits are fits to emission lines
only. They are done in the process of determining an emission line
redshift and these are referred to as foundLines. The
final fitting of the complete line list, i.e. both emission and
absorption lines, occurs after the object's classification has been
made and a redshift has been measured. These fits are known as
measuredLines. In all cases a single Gaussian is fitted
to a given feature, therefore the quality of the fit is only good
where this model holds up.

The first line fit is done when attempting to measure the object's
emission line redshift. Wavelet filters are used to locate emission
lines in the spectrum. The goal of these filters is to find strong
emission features, which will be used as the basis for a more careful
search. The lines identified by the wavelet filter are stored in the
specLine table as foundLines, i.e., with the
parameter category set to 1. They are stored without any
identifications, i.e., they have restWave = 0.

Every one of these features is then tentatively matched to each of
a list of candidate emission lines as given in the line table below, and a system of lines is
searched for at the position indicated by the tentative matching. The
best system of emission lines (if any) found in this process is used
to calculate the object's emission-line redshift. The lines from this
system and their parameters are stored in the specLine
table as foundLines, i.e., with the parameter
category set to 1. These lines are identified by their
restWave as given in the line table below.

The final line fitting is done for all features (both
emission and absorption) in the line list
below, and occurs after the object has been classified and a redshift
has been determined. This allows for a better continuum estimation and
thus better line fits. This latter fit is stored in the
specLine table with the parameter category
set to 2.

For almost all purposes we recommend the use of the
measuredLines (category=2) since these
result from the most careful continuum measurement and precise line
fits.

Details of continuum fitting and line measurements

Parameter Notes

All of the line parameters are measured in the observed frame, and no
correction has been made for the instrumental resolution.

Continuum Fitting

The continuum is fit using a median/mean filter. A sliding window is
created of length 300 pixels for galaxies and stars or 1000 pixels for
quasars. Pixels closer than 8 pixels(560km/s) for galaxies and stars
or 30 pixels (2100 km/s) for QSOs to any reference line are masked and
not used in the continuum measurement. The remaining pixels in the
filter are ordered and the values between the 40th and 60th percentile
are averaged to give the continuum. The category=1 lines are fit with
a cruder continuum which is given by a fifth order polynomial fit
which iteratively rejects outlying points.

Reference Line List

The list of lines which are fit are given as an HTML line table below. Note that many times a
single line in the table actually represents multiple features. Since
the line fits are allowed to drift in wavelength somewhat, the exact
precision of the lines are not important. The wavelength precision
does become important for the emission line determination. To improve
the accuracy of the emission-line redshift determination for QSOs, the
wavelength for many of the lines listed here are not the laboratory
values, but the average values calculated from a sample of SDSS QSOs
taken from
Vanden Berk et al. 2001 AJ 122 .

Line Fitting

Every line in the reference list is fit as a single Gaussian on top of
the continuum subtracted spectrum. Lines that are deemed close enough
are fitted simultaneously as a blend. The basic line fitting is
performed by the SLATEC common mathematical library routine SNLS1E
which is based on the Levenberg-Marquardt method. Parameters are
constrained to fall within certain values by multiplying the returned
chi-squared values by a steep function. Any lines with parameters
falling close to these constraints should be treated with caution.
The constraints are: sigma > 0.5 Angstrom, sigma < 100 Angstrom, and
the center wavelength is allowed to drift by no more than 450 km/sec
for stars and galaxies or 1500 km/sec for QSOs, except for the CIV
line which is allowed to be shifted by as much as 3000 km/sec.

Testing the results

There are a number of ways that the line fitting can fail. If the
continuum is bad the line fits will be compromised. The median/mean
filtering routine will always fail for white dwarfs, some A stars as
well as late-type stars. In addition is has trouble for galaxies with
a strong 4000 Angstrom break. Likewise the line fitting will have
trouble when the lines are not really Gaussian. The
Levenberg-Marquardt routine can fall into local minima, which can
happen when there is self-absorption in a QSO line or both a narrow
and broad component for example. One should always check the
chi-squared values to evaluate the quality of the fit.

Spectro-Photo Matchup

Each BEST and each TARGET
photo object points to a spectroscopic object if there is one nearby the photo
object (ra,dec).

Each SPECTRO object points to a BEST photo object if there is one
nearby the spectro (ra,dec) and a TARGET object id if there is a
nearby one.

We chose 1 arc seconds as the "nearby radius" since that approximates
the fiber radius.

This is complicated by the fact that

there may be multiple photo objects at the same (ra,dec)
(primary, secondary
objects).

the same hole may be observed several times to give several
spectroscopic objects.

To resolve these ambiguities, we defined two views:

PhotoObj is the subset of
PhotoObjAll that contains all the primary and
secondary objects.

SpecObj is the subset of
SpecObjAll that are the "science primary"
spectroscopic objects.

There is at most one "primary" object at any spot in the sky.

So, the logic is as follows:

SpecObjAll objects point to the
closest BEST photoObj object if there is one within 1 arcseconds.

If not, it points to the closest BEST PhotoObjAll object
if there is one within 1 arcseconds.

If not, the SpecObj has no corresponding BEST PhotoObj.

TARGET issues

TARGET.PhotoObjAll.specObjID = 0, always. TARGET is not supposed to
depend on BEST, and spectro stuff only lives in BEST. You can find what
you want using SpecObjAll.targetObjID = TARGET.PhotoObjAll.objID.

TargetInfo.targetObjID is set while loading the data for a
chunk into TARGET. The only difference between a
targetID and targetObjID is the possible flip of one bit. This bit
distinguishes between identical PhotoObjAll objects that are in fields that
straddle 2 chunks. Only one of the pair will actually be within the chunk
boundaries, so we want to make sure we match to that one. Note that the one
of the pair that is actually part of a chunk might not be primary.

So, setting SpecObjAll.targetObjID does not use a positional match - it's
all done through ID numbers. This match should always exist, so
SpecObjAll.targetObjID always points to something in TARGET.PhotoObjAll.
However, it is not guaranteed that SpecObjAll.targetObjID will match
something in TARGET.PhotoObj because in the past we have targetted
non-primaries (stripe 10 for example). To try to make this slightly less
confusing we require something in SpecObj to have been targetted from
something in TARGET.PhotoObj (ie primary spectra must have been primary
targets).

SpecObjAll objects with targetObjID = 0 are usually fibers that were not
mapped, so we didn't have any way to match them to the imaging (for either
TARGET or BEST since we don't have an ID or position).

BEST issues

spSpectroPhotoMatch handles all matching between SpecObjAll and BEST.Photo*,
but doesn't do anything with SpecObjAll.targetID or TARGET.Photo*.

SpecObjAll.bestObjID is set as described above. To be slightly more
detailed about the case where there is no BEST.PhotoObj within 1", we go
through the modes (primary,secondary,family) in order looking for the nearest
BEST.PhotoObjAll within 1".

BEST.PhotoObjAll.specObjID only points to things in SpecObj (ie
SpecObjAll.sciencePrimary=1) because the mapping to non-sciencePrimary
SpecObjAlls is not unique. You can still do BEST.PhotoObjAll.objID =
SpecObjAll.bestObjID to get all the matches.

SUMMARY

The matching of spectra to the BEST skyversion is done as a nearest object
search within 1" with a preference for primary objects. There is no better
practical way of doing this - deblending differences cause huge numbers of
special cases that we probably could not even enumerate.

Ambiguities are not flagged. There are no ambiguities if you start from
PhotoObj and go to SpecObj. It might be possible for more than one
SpecObj to point to the same PhotoObj, but there are no examples of this
unless it is a pathological case. It is possible for a SpecObj to point to
something in PhotoObjAll that is not in PhotoObj, but if you are joining with
PhotoObj you won't see these. If you start joining PhotoObjAll and SpecObjAll
you need to be quite careful because the mapping is (necessarily)
complicated.

Spectrophotometry

Because the SDSS spectra are obtained through 3-arcsecond fibers during
non-photometric observing conditions, special techniques must be employed to
spectrophotometrically calibrate the data. There have been three substantial
improvements to the algorithms which photometrically calibrate the spectra

improved matching of observed standard stars to models;

tying the spectrophotometry directly to the observed fiber magnitudes
from the photometric pipeline; and

Analysis of spectroscopic standard stars

On each spectroscopic
plate, 16 objects are targeted as spectroscopic standards. These objects are
color-selected to be F8 subdwarfs, similar in spectral type to the SDSS
primary standard BD+17 4708.

The color selection of the SDSS standard stars. Red points represent
stars selected as spectroscopic standards. (Most are flux standards; the very
blue stars in the right hand plot are"hot standards"used for telluric
absorption correction.)

The flux calibration of the spectra is handled
by the Spectro2d pipeline. It is performed separately for each of the 2
spectrographs, hence each half-plate has its own
calibration. In the EDR and DR1 Spectro2d calibration pipelines, fluxing was
achieved by assuming that the mean spectrum of the stars on each
half-plate was equivalent to a synthetic composite F8 subdwarf spectrum from
Pickles
(1998). In the reductions included in DR2, the spectrum of each standard
star is spectrally typed by comparing with a grid of theoretical spectra
generated from Kurucz model atmospheres (Kurucz
1992) using the spectral synthesis code SPECTRUM (Gray
& Corbally 1994; Gray,
Graham, & Hoyt 2001). The flux calibration vector is derived from the
average ratio of each star (after correcting for Galactic reddening) and its
best-fit model. Since the red and blue halves of the spectra are imaged onto
separate CCDs, separate red and blue flux calibration vectors are
produced. These will resemble the throughput curves under photometric
conditions. Finally, the red and blue halves of each spectrum on each exposure
are multiplied by the appropriate flux calibration vector. The spectra are
then combined with bad pixel rejection and rebinned to a constant
dispersion.

Throughput curves for the red and blue channels on the two SDSS
spectrographs.

Note about galactic extinction correction

The
EDR and DR1 data nominally corrected for galactic extinction. The
spectrophotometry in DR2 is vastly improved compared to DR1, but the final
calibrated DR2 spectra are not corrected for foreground Galactic
reddening (a relatively small effect; the median E(B-V) over the
survey is 0.034). This may be changed in future data releases. Users of
spectra should note, though, that the fractional improvement in
spectrophotometry is much greater than the extinction correction itself.

Improved Comparison to Fiber Magnitudes

The second update in the
pipeline is relatively minor: We now compute the absolute calibration by tying
the r-band fluxes of the standard star spectra to the fiber
magnitudes output by the latest version of the photometric pipeline. The
latest version now corrects fiber magnitudes to a constant seeing of 2", and
includes the contribution of flux from overlapping objects in the fiber
aperture; these changes greatly improve the overall data consistency.

Smears

The third update to the spectroscopic pipeline is that we
no longer use the "smear" observations in our calibration. As the EDR paper
describes, "smear" observations are low signal-to-noise ratio (S/N)
spectroscopic exposures made through an effective 5.5" by 9" aperture, aligned
with the parallactic angle. Smears were designed to account for object light
excluded from the 3" fiber due to seeing, atmospheric refraction and object
extent. However, extensive experiments comparing photometry and
spectrophotometry calibrated with and without smear observations have shown
that the smear correction provides improvements only for point sources (stars
and quasars) with very high S/N. For extended sources (galaxies) the spectrum
obtained in the 3" fiber aperture is calibrated to have the total flux and
spectral shape of the light in the smear aperture. This is undesirable, for
example, if the fiber samples the bulge of a galaxy, but the smear aperture
includes much of its disk: For extended sources, the effect of the smears was
to give a systematic offset between spectroscopic and fiber magnitudes of up
to a magnitude; with the DR2 reductions, this trend is gone. Finally, smear
exposures were not carried out for one reason or another for roughly 1/3 of
the plates in DR2. For this reason, we do not apply the smear
correction to the data in DR2.

To the extent that all point sources are
centered in the fibers in the same way as are the standards, our flux
calibration scheme corrects the spectra for losses due to atmospheric
refraction without the use of smears. Extended sources are likely to be
slightly over-corrected for atmospheric refraction. However, most galaxies are
quite centrally concentrated and more closely resemble point sources than
uniform extended sources. In the mean, this overcorrection makes the
g-r color of the galaxy spectra too red by ~1%.

The Spectro Parameter Pipeline sppLines Table

Spectra for over 250,000 Galactic stars of all common
spectral types are available with DR6.
These Spectra were processed with a pipeline called the 'Spectro
Parameter Pipeline' (spp) that computes line indices for
a wide range of common features at the radial velocity of the
star in question.
These outputs are stored in the CAS in a table called
sppLines,
indexed on the 'specObjID' key index parameter for queries joining
to other tables such as specobjall and photoobjall.
The fields available in the sppLines table are:

The Spectro Parameter Pipeline sppParams Table

Spectra for over 250,000 Galactic stars of all common
spectral types are available with DR6.
These Spectra were processed with a pipeline called the 'Spectro
Parameter Pipeline' (spp) that computes standard
stellar atmospheric parameters such as [Fe/H], log g and Teff
for each star by a variety of methods.
These outputs are stored in the CAS in a table called
sppParams,
indexed on the 'specObjID' key index parameter for queries joining
to other tables such as specobjall and photoobjall.
The fields available in the sppParams table are:

Target Selection

Detailed descriptions of the selection algorithms for the different
categories of SDSS targets are provided in the series of papers
noted below under Target Selection References.
Here we provide short summaries of the various target selection
algorithms.

In the SDSS imaging data output tsObj files, the
result of target selection for each object is recorded in the 32-bit
primTarget flag, as defined in
Table 27 of Stoughton et al. (2002). For details, see the Target Selection References

Note the following subtleties:

An object can be targeted simultaneously by more than one
algorithm.

The photometric catalogs contain a target selection flag for
every single object,

but not all objects which are
flagged as a spectroscopic target will actually be observed with the
spectrograph. The assignment of spectrograph fibers to targets
from the photometry catalogs is called tiling.

Perhaps most importantly, the target selection flags used in
order to create the spectroscopic plates were (necessarily) based
on an earlier processing of the data. Thus, objects that were
targets in the original rerun may not be targets now, and vice
versa. For the Main Galaxy Sample, this amounts to changes in the r
band flux limit; for Quasars it means wholesale changes in the
algorithms; for Luminous Red Galaxies, it means that the effective
color selection differs from place to place on the sky.

Main Galaxy Sample

Galaxy targets are selected starting from objects which are detected
in the r band (i.e. those objects which are more than 5σ
above sky after smoothing with a PSF filter).
The photometry is corrected for Galactic extinction using the
reddening maps of
Schlegel, Finkbeiner, and Davis (1998).
Galaxies are separated from stars using the following cut on the difference between
the r-band
PSF
and
model
magnitudes:

rPSF - rmodel >= 0.3

Note that this cut is more conservative for galaxies than the
star-galaxy separation cut used by Photo.
Potential targets are then rejected if they have been flagged by Photo
as SATURATED, BRIGHT, or BLENDED
The
Petrosian
magnitude limit rP = 17.77 is then applied, which
results in a main galaxy sample surface density of about 90 per deg2.

A number of surface brightness cuts are then applied, based on
mu50, the mean surface brightness within the Petrosian half-light
radius petroR50. The most significant cut is mu50
<= 23.0 mag arcsec-2 in r, which already
includes 99% of the galaxies brighter than the Petrosian magnitude
limit. At surface brightnesses in the range 23.0 <=
mu50 <= 24.5 mag arcsec-2, several other
criteria are applied in order to reject most spurious targets, as
shown in the flowchart. Please see the
detailed discussion of these surface brightness cuts, including
consideration of selection effects, in
Section 4.4 of Strauss et al. (2002). Finally, in order to reject
very bright objects which will cause contamination of the spectra of
adjacent fibers and/or saturation of the spectroscopic CCDs, objects
are rejected if they have (1)fiber magnitudes
brighter than 15.0 in g or r, or 14.5 in i;
or (2) Petrosian magnitude rP < 15.0 and Petrosian
half-light radius petroR50 < 2 arcsec.

Main galaxy targets satisfying all of the above criteria have the
GALAXY bit set in their primTarget flag. Among those, the ones with
mu50 >= 23.0 mag arcsec-2 have the
GALAXY_BIG bit set. Galaxy targets who fail all the surface brightness
selection limits but have r band fiber magnitudes brighter than 19 are
accepted anyway (since they are likely to yield a good spectrum) and
have the GALAXY_BRIGHT_CORE bit set.

Luminous Red Galaxies (LRG)

SDSS luminous red galaxies (LRGs) are selected on the basis of color and magnitude
to yield a sample of luminous intrinsically red galaxies that extends fainter and
farther than the SDSS main galaxy sample.
Please see
Eisenstein et al. (2001) for detailed discussions of sample selection, efficiency,
use, and caveats.

LRGs are selected using a variant of the photometric redshift technique
and are meant to comprise a uniform, approximately volume-limited sample of
objects with the reddest colors in the rest frame. The sample is
selected via cuts in the (g-r, r-i, r)
color-color-magnitude cube. Note that all colors are measured using
model magnitudes, and all quantities are
corrected for Galactic extinction following
Schlegel, Finkbeiner, and Davis (1998).
Objects must be detected by Photo as BINNED1, BINNED2, OR BINNED4
in both r and i, but not necessarily in g,
and objects flagged by Photo as BRIGHT or SATURATED in g, r,
or i are excluded.

The galaxy model colors are rotated first to a basis that is aligned with the galaxy
locus in the (g-r, r-i) plane
according to:

c&perp = (r-i) + (g-r)/4 + 0.18
c|| = 0.7(g-r) + 1.2[(r-i) - 0.18]

Because the 4000 Angstrom break moves from the g band to the r band
at a redshift z ~ 0.4, two separate sets of selection criteria are needed
to target LRGs below and above that redshift:

Cut I for z

rP < 13.1 + c|| / 0.3

rP < 19.2

|c&perp| < 0.2

mu50 < 24.2 mag arcsec-2

rPSF - rmodel > 0.3

Cut II for z >~ 0.4

rP < 19.5

c&perp > 0.45 - (g-r)/6

g-r > 1.30 + 0.25(r-i)

mu50 < 24.2 mag arcsec-2

rPSF - rmodel > 0.5

Cut I selection results in an approximately volume-limited LRG sample to
z=0.38, with additional galaxies to z ~ 0.45.
Cut II selection adds yet more luminous red galaxies to z ~ 0.55.
The two cuts together result in about 12 LRG targets per deg2
that are not already in the main galaxy sample (about 10 in Cut I, 2 in Cut II).

In primTarget, GALAXY_RED is set if the LRG passes either Cut I or Cut II.
GALAXY_RED_II is set if the object passes Cut II but not Cut I.
However, neither of these flags is set if the LRG is brighter than the
main galaxy sample flux limit but failed to enter the main sample
(e.g., because of the main sample surface brightness cuts).
Thus LRG target selection never overrules main sample target selection
on bright objects.

Quasars

The final adopted SDSS quasar target selection algorithm is
described in
Richards et al. (2002). However, it should be noted that the
implementation of this algorithm came after the last
date of DR1 spectroscopy. Thus this paper does not technically
describe the DR1 quasar sample and the DR1 quasar sample is
not intended to be used for statistical purposes (but see
below). Interested parties are instead encouraged to use the catalog
of DR1 quasars that is being prepared by Schneider et al (2003, in
prep.), which will include an indication of which quasars were also
selected by the Richards et al. (2002) algorithm. At some later time,
we will also perform an analysis of those objects selected by the new
algorithm but for which we do not currently have spectroscopy and will
produce a new sample that is suitable for statistical analysis.

Though the DR1 quasars were not technically selected with the
Richards et al. (2002) algorithm, the algorithms used since the EDR
are quite similar to this algorithm and this paper suffices to
describe the general considerations that were made in selecting
quasars. Thus it is worth describing the algorithm in more detail.

The quasar target selection algorithms are summarized in this
schematic flowchart. Because the
quasar selection cuts are fairly numerous and detailed, the reader is
strongly recommended to refer to
Richards et al. (2002) (link to AJ paper; subscription required)
for the full discussion of the sample selection criteria,
completeness, target efficiency, and caveats.

The quasar target selection algorithm primarily identifies quasars as outliers
from the stellar locus, modeled following
Newberg & Yanny (1997)
as elongated tubes in the
(u-g, g-r, r-i)
(denoted ugri) and
(g-r, r-i, i-z)
(denoted griz)
color cubes.
In addition, targets are also selected by matches to the FIRST
catalog of radio sources
(Becker, White, & Helfand 1995).
All magnitudes and colors are measured using
PSF magnitudes, and all quantities are
corrected for Galactic extinction following
Schlegel, Finkbeiner, and Davis (1998).

Objects flagged by Photo as having either "fatal" errors (primarily those
flagged BRIGHT, SATURATED, EDGE, or BLENDED;
or "nonfatal" errors
(primarily related to deblending or interpolation problems) are rejected
from the color selection, but only objects with fatal errors are rejected
from the FIRST radio selection. See
Section 3.2 of Richards et al. (2002) for the full details.
Objects are also rejected (from the color selection, but not the radio selection)
if they lie in any of 3 color-defined exclusion regions which are dominated
by white dwarfs, A stars, and M star+white dwarf pairs; see
Section 3.5.1 of Richards et al. (2002) for the specific exclusion region color boundaries.
Such objects are flagged as QSO_REJECT. Quasar targets are further restricted to
objects with iPSF > 15.0 in order to exclude bright objects
which will cause contamination of the spectra from adjacent fibers.

Objects which pass the above tests are then selected to be quasar targets
if they lie more than 4σ from either the ugri or griz
stellar locus. The detailed specification of the stellar loci and of the
outlier rejection algorithm are provided in Appendices
A and
B of Richards et al. (2002). These color-selected quasar targets are divided
into main (or low-redshift) and high-redshift samples, as follows:

Main Quasar Sample (QSO_CAP, QSO_SKIRT)

These are outliers from the ugri stellar locus and are selected in
the magnitude range 15.0 < iPSF < 19.1.
Both point sources and extended objects are included, except that
extended objects must have colors that are far from the colors
of the main galaxy distribution and that are
consistent with the colors of AGNs; these additional color cuts for
extended objects are specified in
Section 3.4.4 of Richards et al. (2002).

Even if an object is not a ugri stellar locus outlier, it may be selected as
a main quasar sample target if it lies in either of these 2 "inclusion" regions:
(1) "mid-z", used to select 2.5 < z < 3 quasars whose colors cross
the stellar locus in SDSS color space; and (2) "UVX", used to duplicate
selection of z <= 2.2 UV-excess quasars in previous surveys.
These inclusion boxes are specified in
Section 3.5.2 of Richards et al. (2002).

Note that the QSO_CAP and QSO_SKIRT distinction is kept for historical
reasons (as some data that are already public use this notation)
and results from an original intent to use separate
selection criteria in regions of low ("cap") and high ("skirt")
stellar density. It turns out that the selection efficiency is
indistinguishable in the cap and skirt regions, so that the
target selection used is in fact identical in the 2 regions
(similarly for QSO_FIRST_CAP and QSO_FIRST_SKIRT, below).

High-Redshift Quasar Sample (QSO_HIZ)

These are outliers from the griz stellar locus and are selected in
the magnitude range 15.0 < iPSF < 20.2.
Only point sources are selected, as these quasars will lie at
redshifts above z~3.5 and are expected to be classified
as stellar at SDSS resolution. Also, to avoid contamination from
faint low-redshift quasars which are also griz stellar locus
outliers, blue objects are rejected according to eq. (1) in
Section 3.4.5 of Richards et al. (2002).

Moreover, several additional color cuts are used in order to
recover more high-redshift quasars than would be possible using only
griz stellar locus outliers.
So an object will be selected as a high-redshift quasar target if it
lies in any of these 3 "inclusion" regions:
(1) "gri high-z", for z >= 3.6 quasars;
(2) "riz high-z", for z >= 4.5 quasars; and
(3) "ugr red outlier", for z >= 3.0 quasars.
The specifics are given in eqs. (6-8) in
Section 3.5.2 of Richards et al. (2002).

FIRST Sources (QSO_FIRST_CAP, QSO_FIRST_SKIRT)

Irrespective of the various color selection criteria above,
SDSS stellar objects are selected as quasar targets if they have
15.0 < iPSF < 19.1 and are matched to within 2 arcsec of
a counterpart in the FIRST radio catalog.

Finally, those targets which otherwise meet the color selection
or radio selection criteria described above, but fail the cuts
on iPSF, will be flagged as QSO_MAG_OUTLIER
(also called QSO_FAINT).
Such objects may be of interest for follow-up studies,
but are not otherwise targeted for spectroscopy under routine
operations (unless another "good" quasar target flag is set).

Other Science Targets

A variety of other science targets are also selected; see also
Section 4.8.4 of Stoughton et al. (2002). With the exception of
brown dwarfs, these samples are not complete, but are
assigned to excess fibers left over after the main samples of
galaxies, LRGs, and quasars have been tiled.

Stars

A variety of stars are also targeted using color selection criteria,
as follows:

blue horizontal-branch stars (STAR_BHB)

both dwarf and giant carbon stars (STAR_CARBON)

brown dwarfs (STAR_BROWN_DWARF) - this is the only tiled sample of stars

Tiling of spectroscopy plates

Tiling is the process by which the spectroscopic plates are designed
and placed relative to each other. This procedure involves optimizing
both the placement of fibers on individual
plates, as well as the placement of plates (or tiles) relative to each
other.

Introduction

Because of large-scale structure in the galaxy distribution (which
form the bulk of the SDSS targets), a naive covering of the sky with
equally-spaced tiles does not yield uniform sampling. Thus, we present
a heuristic for perturbing the centers of the tiles from the
equally-spaced distribution to provide more uniform
completeness. For the SDSS sample, we can attain a sampling rate of
>92% for all targets, and >99% for the set of targets which do
not collide with each other, with an efficiency >90% (defined as
the fraction of available fibers assigned to targets).

The Spectroscopic Survey

The spectroscopic survey is performed using two multi-object fiber
spectrographs on the same telescope. Each spectroscopic
fiber plug plate, referred to as a "tile," has a circular field-of-view
with a radius of 1.49 degrees, and can accommodate 640 fibers, 48 of
which are reserved for observations of blank sky and
spectrophotometric standards.Because of the finite size of the fiber
plugs, the minimum separation of fiber centers is 55". If, for
example, two objects are within 55" of each other, both of them can
be observed only if they lie in the overlap between two adjacent
tiles. The goal of the SDSS is to observe 99% of the maximal set of
targets which has no such collisions (about 90% of all targets).

What is Tiling?

Around 2,000 tiles will be necessary to provide fibers for all the
targets in the survey. Since each tile which must be observed
contributes to the cost of the survey (due both to the cost of
production of the plate and to the cost of observing time), we desire
to minimize the number of tiles necessary to observe all the desired
targets. In order to maximize efficiency (defined as the fraction of
available fibers assigned to tiled targets) when placing these tiles
and assigning targets to each tile, we need to address two
problems. First, we must be able to determine, given a set of tile
centers, how to optimally assign targets to each tile --- that is, how
to maximize the number of targets which have fibers assigned to them.
Second, we must determine the most efficient placement of the tile
centers, which is non-trivial because the distribution of targets on
the sky is non-uniform, due to the well-known clustering of galaxies
on the sky. We find the exact solution to the first problem and use a
heuristic method developed by Lupton et al. (1998) to find an
approximate solution to the second problem (which is NP-complete).
The code which implements this solution is designed to run on a patch of sky
consisting of a set of rectangles in a spherical coordinate system,
known in SDSS parlance as a tiling region.

NOTE: the term "chunk" or "tiling chunk" is sometimes
used to denote a tiling region. To avoid confusion with the
correct use of the term chunk, we use "tiling region" here.

Fiber Placement

First, we discuss the allocation of fibers given a set of tile
centers, ignoring fiber collisions for the moment.
Figure 1 shows at the left a very simple example of a
distribution of targets and the positions of two tiles we want to use
to observe these targets. Given that for each tile there is a finite
number of available fibers, how do we decide which targets get
allocated to which tile? This problem is equivalent to a network flow
problem, which computer scientists have been kind enough to solve for us
already.

Figure 1: Simplified Tiling and Network Flow View

The basic idea is shown in the right half of Figure 1,
which shows the appropriate network for the situation in the left
half. Using this figure as reference, we here define some terms
which are standard in combinatorial literature and which will be
useful here:

node: The nodes are the solid dots in the figure; they
provide either sources/sinks of objects for the flow or simply serve
as junctions for the flow. For example, in this context each target
and each tile corresponds to a node.

arc: The arcs are the lines connecting the nodes. They
show the paths along which objects can flow from node to node. In
Figure 1, it is understood that the flow along the arc
proceeds to the right. For example, the arcs traveling from target
nodes to tile nodes express which tiles each target may be assigned
to.

capacity: The minimum and maximum capacity of each arc is
the minimum and maximum number of objects that can flow along it. For
example, because each tile can accommodate only 592 target fibers, the
capacities of the arcs traveling from the tile nodes to the sink node
is 592.

cost: The cost per object along each arc is exacted for
allowing objects to flow down a particular arc; the total cost is the
summed cost of all the arcs. In this paper, the network is designed
such that the minimum total cost solution is the desired solution.

Imagine a flow of 7 objects entering the network at
the source node at the left. We want the entire flow to leave
the network at the sink node at the right for the lowest possible
cost. The objects travel along the arcs, from node to node. Each
arc has a maximum capacity of objects which it can transport, as
labeled. (One can also specify a minimum number, which will be
useful later). Each arc also has an associated cost, which is exacted
per object which is allowed to flow across that arc. Arcs link the
source node to a set of nodes corresponding to the set of
targets. Each target node is linked by an arc to the node of each tile
it is covered by. Each tile node is linked to the sink node by an arc
whose capacity is equal to the number of fibers available on that
tile. None of these arcs has any associated cost. Finally, an
"overflow" arc links the source node directly to the sink node, for
targets which cannot be assigned to tiles. The overflow arc has
effectively infinite capacity; however, a cost is assigned to objects
flowing on the overflow arc, guaranteeing that the algorithm fails to
assign targets to tiles only when it absolutely has to. This network
thus expresses all the possible fiber allocations as well as the
constraints on the numbers of fibers in each tile. Finding the
minimum cost solution then maximizes the number of targets
which are actually assigned to tiles.

Dealing with Fiber Collisions

As described above, there is a limit of 55" to how close two fibers
can be on the same tile. If there were no overlaps between tiles,
these collisions would make it impossible to observe ~10% of
the SDSS targets. Because the tiles are circular, some fraction of
the sky will be covered with overlaps of tiles, allowing some of these
targets to be recovered. In the presence of these collisions, the
best assignment of targets to the tiles must account for the presence
of collisions, and strive to resolve as many as possible of these
collisions which are in overlaps of tiles. We approach this problem
in two steps, for reasons described below. First, we apply the network
flow algorithm of the above section to the
set of "decollided"
targets --- the largest possible subset of the targets which do not
collide with each other. Second, we use the remaining fibers and a
second network flow solution to optimally resolve collisions in
overlap regions.

Figure 2: Fiber Collisions

The "decollided" set of targets is the maximal subset of targets which
are all greater than 55" from each other. To clarify what we mean by
this maximal set, consider Figure 2. Each circle represents a target;
the circle diameter is 55", meaning that overlapping circles are
targets which collide. The set of solid circles is the "decollided"
set. Thus, in the triple collision at the top, it is best to keep the
outside two rather than the middle one.

This determination is complicated slightly by the fact that some
targets are assigned higher priority than others. For example, as
explained in the Targeting section, QSOs are
given higher priority than
galaxies by the SDSS target selection algorithms. What we mean here by
"priority" is that a higher priority target is guaranteed never to
be eliminated from the sample due to a collision with a lower priority
object. Thus, our true criterion for determining whether one set of
assignments of fibers to targets in a group is more favorable than
another is that a greater number of the highest priority objects are
assigned fibers.

Once we have identified our set of decollided objects, we use the
network flow solution to find the best possible assignment of fibers
to that set of objects.

After allocating fibers to the set of decollided
targets, there will
usually be unallocated fibers, which we want to use to resolve
fiber collisions in the overlaps. We can again express the problem of
how best to perform the collision resolution as a network, although
the problem is a bit more complicated in this case. In the case of
binaries and triples, we design a network flow problem such that the
network flow solution chooses the tile assignments optimally. In the
case of higher multiplicity groups, our simple method for binaries and
triples does not work and we instead resolve the fiber collisions in a
random fashion; however, fewer than 1% of targets are in such groups,
and the difference between the optimal choice of assignments and the
random choices made for these groups is only a small fraction of that.

We refer the reader to the tiling
algorithm paper for more details, including how the fiber
collision network flow is designed and caveats about what
aspects of the method may need to be changed under different
circumstances.

Tile Placement

Once one understands how to assign fibers given a set of tile centers,
one can address the problem of how best to place those tile centers.
Our method first distributes tiles
uniformly across the sky and then uses a cost-minimization scheme to
perturb the tiles to a more efficient solution.

In most cases, we set initial conditions by simply laying down a
rectangle of tiles. To set the centers of the tiles along the long
direction of the rectangle, we count the number of targets along the
stripe covered by that tile. The first tile is put at the mean of the
positions of target 0 and target N_t, where N_t
is the number of fibers per tile (592 for the SDSS). The second tile
is put at the mean between target N_t and 2N_t, and so on.
The counting of targets along adjacent stripes is offset by about half a
tile diameter in order to provide more complete covering.

The method is of perturbing this uniform distribution is iterative.
First, one allocates targets to the tiles, but instead of limiting a
target to the tiles within a tile radius, one allows a target to be
assigned to further tiles, but with a certain cost which increases
with distance (remember that the network flow accommodates the
assignment of costs to arcs). One uses exactly the same fiber
allocation procedure as above.
What this does is to give each tile some information about the
distribution of targets outside of it. Then, once one has assigned a
set of targets to each tile, one changes each tile position to that
which minimizes the cost of its set of targets. Then, with the new
positions,
one reruns the fiber
allocation, perturbs the tiles again, and so on. This method is guaranteed
to converge to a minimum (though
not necessarily a global minimum), because the total cost must
decrease at each step.

In practice, we also need to determine the appropriate number of tiles
to use. Thus, using a standard binary search, we repeatedly run the
cost-minimization to find the minimum number of tiles necessary to
satisfy the SDSS requirements, namely that we assign fibers to 99%
of the decollided targets.

In order to test how well this algorithm works, we have applied it
both to simulated and real data. These results are discussed in the
Tiling
paper.

Technical Details

There are a few technical details which may be useful to mention in
the context of SDSS data. Most importantly, we will describe which
targets within the SDSS are "tiled" in the manner described here,
and how such targets are prioritized. Second, we will discuss the
method used by SDSS to deal with the fact that the imaging and
spectroscopy are performed within the same five-year time
period. Third, we will describe the tiling outputs which the SDSS
tracks as the survey progresses. Throughout, we refer to the code
which implements the algorithm described above as tiling.

Only some of the spectroscopic target types identified by the target
selection algorithms in the SDSS are "tiled." These types (and their
designations in the primary and secondary target bitmasks) are
described in the Targeting pages). They consist
of most types of QSOs, main sample galaxies,
LRGs, hot standard stars, and brown dwarfs. These are the types of
targets for which tiling is run and for which we are attempting to
create a well-defined sample. Once the code has guaranteed fibers to
all possible "tiled targets," remaining fibers are assigned to other
target types by a separate code.

All of these target types are treated equivalently, except that they
assigned different "priorities," designated by an integer. As
described above, the tiling code uses them to help decide fiber
collisions. The sense is that a higher priority object will never lose
a fiber in favor of a lower priority object. The priorities are
assigned in a somewhat complicated way for reasons immaterial to
tiling, but the essence is the following: the highest priority objects
are brown dwarfs and hot standards, next come QSOs, and the lowest
priority objects are galaxies and LRGs. QSOs have higher priority than
galaxies because galaxies are higher density and have stronger angular
clustering. Thus, allowing galaxies to bump QSOs would allow
variations in galaxy density to imprint themselves into variations in
the density of QSOs assigned to fibers, which we would like to avoid.
For similar reasons, brown dwarfs and hot standard stars (which have
extremely low densities on the sky) are given highest priority.

Each tile, as stated above, is 1.49 degrees in radius, and has
the capacity to handle 592 tiled targets. No two such targets may be
closer than 55" on the same tile.

The operation of the SDSS makes it impossible to tile the entire
10,000 square degrees simultaneously, because we want to be able to
take spectroscopy during non-pristine nights, based on the imaging
which has been performed up to that point. In practice, periodically a
"tiling region" of data is processed, calibrated, has targets
selected, and is passed to the tiling code. During the first year of
the SDSS, about one tiling region per month has been created; as more
and more imaging is taken and more tiles are created, we hope to
decrease the frequency with which we need to make tiling regions, and
to increase their size.

A tiling region is defined as a set of rectangles on the sky (defined in
survey
coordinates). All of these rectangles cover only sky which has
been imaged and processed. However, in the case of tiling, targets
may be missed near the edges of a tiling region because that area
was not covered by tiles. Thus, tiling is actually run on a somewhat larger area than a single tiling region, so the areas near the edges of adjacent
tiling regions are also included. This larger area is known as a tiling region. Thus, in general, tiling regions overlap.

The first tiling region which is "supported" by the SDSS is denoted
Tiling Region
4. The first tiling region for which the version of tiling described here was
run is Tiling Region 7. Tiling regions earlier than Tiling Region 7 used a different (less
efficient) method of handling fiber collisions. The earlier version
also had a bug which artificially created gaps in the distribution of
the fibers. The locations of the known gaps are given in the EDR paper for Tiling Region 4 as the overlaps
between plates 270 and 271, plates 312 and 313, and plates 315 and 363
(also known as tiles 118 and 117, tiles 76 and 75, and tiles 73 and
74).

Tiling Window

In order to interpret the spectroscopic sample, one needs to use
the information about how targets were selected, how the tiles were
placed, and how fibers were assigned to targets. We refer to the
geometry defined by this information as the "tiling window" and
describe how to use it in detail elsewhere. As we note below, for the purposes of data release
users it is also important to understand what the photometric imaging
window which is released (including, if desired, masks for image
defects and bright stars) and which plates have been released.

Velocity dispersion measurements

The observed velocity dispersion sigma is the result of
the superposition of many individual stellar spectra, each of which
has been Doppler shifted because of the star's motion within the
galaxy. Therefore, it can be determined by analyzing the integrated
spectrum of the whole galaxy - the galaxy integrated spectrum will
be similar to the spectrum of the stars which dominate the light of
the galaxy, but with broader absorption lines due to the motions of the
stars. The velocity dispersion is a fundamental parameter because it is
an observable which better quantifies the potential well of a galaxy.

Selection criteria

Estimating velocity dispersions for galaxies which have integrated
spectra which are dominated by multiple components showing different
stellar populations and different kinematics (e.g. bulge and disk
components) is complex. Therefore, the SDSS estimates the velocity
dispersion only for spheroidal systems whose spectra are dominated by
the light of red giant stars. With this in mind, we have selected
galaxies which satisfy the following criteria:

Because the aperture of an SDSS spectroscopic fiber (3 arcsec)
samples only the inner parts of nearby galaxies, and because the spectrum
of the bulge of a nearby late-type galaxy can resemble that of an
early-type galaxy, our selection includes spectra of bulges of nearby
late-type galaxies. Note that weak emission lines, such as
Halpha and/or O II, could still be present in the selected spectra.

Method

A number of objective and accurate methods for making velocity
dispersion measurements have been developed (Sargent et al. 1977;
Tonry & Davis 1979; Franx, Illingworth & Heckman 1989; Bender 1990;
Rix & White 1992). These methods are all based on a comparison
between the spectrum of the galaxy whose velocity dispersion is to be
determined, and a fiducial spectral template. This can either be the
spectrum of an appropriate star, with spectral lines unresolved at the
spectra resolution being used, or a combination of different stellar
types, or a high S/N spectrum of a galaxy with known velocity
dispersion.

Since different methods can give significantly different results,
thereby introducing systematic biases especially for low S/N spectra,
we decided to use two different techniques for measuring the velocity
dispersion. Both methods find the minimum of

chi2 = sum { [G - B * S]2 }

where G is the galaxy, S the star and B is the gaussian broadening
function (* denotes a convolution).

The "Fourier-fitting" method
(Sargent et al. 1977; Tonry & Davis 1979; Franx, Illingworth & Heckman
1989; van der Marel & Franx 1993).
Because a galaxy's spectrum is that of a mix of stars convolved with
the distribution of velocities within the galaxy, Fourier space is the
natural choice to estimate the velocity dispersions---this first method
makes use of this:

chi2 = sum { [G~(k) - B~(k,sigma) S~(k)]2 /Vark2},

where G~, B~ and S~ are the Fourier Transforms of G, B and S,
respectively, and Vark2 = sigmaG~2 + sigmaS~2 B~(k,sigma).
(Note that in Fourier space, the convolution is a multiplication.)

The "Direct-fitting" method (Burbidge, Burbidge & Fish 1961; Rix
& White 1992). Although the Fourier space seems to be the natural
choice to estimate the velocity dispersions, there are several
advantages to treating the problem entirely in pixel space. In
particular, the effects of noise are much more easily incorporated in
the pixel-space based "Direct-fitting" method which minimizes

chi2 = sum { [G(n) - B(n,sigma) S(n)]2 /Varn2}.

Because the S/N of the SDSS spectra are relatively low, we assume that the
observed absorption line profiles in early-type galaxies are
Gaussian.

It is well known that the two methods have their own particular
biases, so we carried out numerical simulations to calibrate these
biases. In our simulations, we chose a template stellar spectrum
measured at high S/N, broadened it using a Gaussian with rms
sigmainput, added Gaussian noise, and compared the input
velocity dispersion with the measured output value. The first
broadening allows us to test how well the methods work as a function
of velocity dispersion, and the addition of noise allows us to test
how well the methods work as a function of S/N. Our simulations show
that the systematic errors on the velocity dispersion measurements
appear to be smaller than ~ 3% but estimates of low velocity
dispersions (sigma< 100 km s-1) are more biased (~ 5%).

Measurements

The SDSS uses 32 K and G giant stars in M67 as stellar templates.
The SDSS velocity dispersion estimates are obtained by fitting the
restframe wavelength range 4000-7000 Å, and then averaging the estimates
provided by the "Fourier-fitting" and "Direct-fitting" methods.
The error on the final value of the velocity dispersion is determined by
adding in quadrature the errors on the two estimates
(i.e., the Fourier-fitting and Direct-fitting).
The typical error is between delta(logsigma) ~ 0.02 dex
and 0.06 dex, depending on the signal-to-noise of the spectra.
The scatter computed from repeated observations is ~ 0.04 dex,
consistent with the amplitude of the errors on the measurements.

Estimates of sigma are limited by the instrumental dispersion
and resolution. The instrumental dispersion of the SDSS
spectrograph is 69 km s-1 per pixel, and the resolution is
~ 90 km s-1. In addition, the instrumental dispersion may
vary from pixel to pixel, and this can affect measurements of sigma.
These variations are estimated for each fiber by using arc lamp
spectra (up to 16 lines in the range 3800-6170 Å and 39 lines
between 5780-9230 Å). A simple linear fit provides a good description of
these variations. This is true for almost all fibers, and allows us to
remove the bias such variations may introduce when estimating galaxy
velocity dispersions.

Caveats

The velocity dispersion measurements distributed with SDSS spectra use
template spectra convolved to a maximum sigma of 420
km/s. Therefore, velocity dispersion sigma > 420 km/s are not
reliable and must not be used.

We recommend the user to not use SDSS velocity dispersion
measurements for:

spectra with S/N< 10

velocity dispersion estimates smaller than about 70 km s-1
given the typical S/N and the instrumental resolution of the
SDSS spectra

Also note that the velocity dispersion measurements output by the
SDSS spectro-1D pipeline are not corrected to a standard relative
circular aperture.
(The SDSS spectra measure the light within a fixed aperture of radius
1.5 arcsec. Therefore, the estimated velocity dispersions of more
distant galaxies are affected by the motions of stars at larger
physical radii than for similar galaxies which are nearby. If the
velocity dispersions of early-type galaxies decrease with radius,
then the estimated velocity dispersions (using a fixed aperture) of
more distant galaxies will be systematically smaller than those of
similar galaxies nearby.)