8. An RGS Data Processing and Analysis Primer

While a variety of analysis packages can be used for the following steps, the
SAS was designed for the basic reduction and analysis of XMM-Newton data and
will therefore be used here for demonstration purposes.

At this point, it is assumed that you have downloaded the data from the HEASARC
archive onto a Hera server, standard or anonymous Hera is running (see §
4.2), you have prepared the data for processing
with odfingest (see §6), and the working directory
PROC has been made. Throughout this chapter, we will use the Mkn 421 dataset with
ObsID 0153950701 available through links at the HEASARC archive.

8.1 Rerunning the Pipeline

It is very likely that you will want to filter your data to some extent; in
this case, you will need to reprocess it in order to determine the appropriate
filters, regardless of the age of the observation. To do this, verify that the
working directory PROC is highlighted in the GUI. In the new Command Window you
made at the end of §6, run the task(s):

This takes several minutes, and outputs 12 files per RGS, plus 3 general use
FITS files. At this point, renaming files to something easy to type is a good idea.
This is easily done by right-clicking on the event files. We will assume that the
newly pipelined event files are named rgs1.fits and rgs2.fits.

8.2 Potentially useful tips for using the pipeline

The pipeline task, rgsproc, is very flexible and can address potential
pitfalls for RGS users. In §8.1, we used the default parameter
settings, and if this is sufficient for your data (and it should be for most), feel
free to skip to §8.3. In the following sections, we will look at
the cases of a nearby bright optical source, a nearby bright X-ray source, and a
user-defined source.

8.2.1 A Nearby Bright Optical Source

With certain pointing angles, zeroth-order optical light may be reflected off the
telescope optics and cast onto the RGS CCD detectors. If this falls on an
extraction region, the current energy calibration will require a wavelength-dependent
zero-offset. Stray light can be detected on RGS DIAGNOSTIC images taken before,
during and after the observation. This test, and the offset correction, are not
performed on the data before delivery. To check for stray light and apply the
appropriate offsets, enter

8.2.2 A Nearby Bright X-ray Source

In the example above, it is assumed that the field around the source contains
sky only. Provided a bright background source is well-separated from the target
in the cross-dispersion direction, a mask can be created that excludes it from
the background region. Here the source has been identified in the EPIC images
and its coordinates have been taken from the EPIC source list which is included
among the pipeline products. The bright neighboring object is found to be the
third source listed in the sources file. The first source is the target:

withepicset - calculate extraction regions for the sources contained
in an EPIC source list
epicset - name of the EPIC source list, such as generated by
emldetect or eboxdetect procedures
exclsrcsexpr - expression to identify which source(s) should be excluded
from the background extraction region

8.2.3 User-defined Source Coordinates

If the true coordinates of an object are not included in the EPIC source list or
the science proposal, the user can define the coordinates of a new source by
entering:

withsrc - make the source be user-defined
srclabel - source name
srcstyle - coordinate system in which the source position is defined
srcra - the source's right ascension in decimal degrees
srcdec - the source's declination in decimal degrees

8.3 Examine and Analyze the Data

Since the event files are current, we can proceed with some simple analysis
demonstrations, which will allow us to generate filters. Note that
the Command Window always sees (and places) files in the directory that it was
invoked in.

8.3.1 Create and Display an Image

Two commonly-made plots are those showing PI vs. BETA_CORR (also known
as ``banana plots'') and XDSP_CORR vs. BETA_CORR.

imagebinning - form of binning, force entire image into
a given size or bin by a specified number of pixels

ximagesize - output image pixels in X

yimagesize - output image pixels in Y

Plots comparing BETA_CORR to XDSP_CORR may be made
in a similar way. The output files can be viewed by using a standard FITS display.
The example plots, as seen with fv, are shown in Figure 8.1.

Figure 8.1:
Plots of XDSP_CORR vs. BETA_CORR (left) and PI vs. BETA_CORR (right).
The gap is due to the missing CCD7. Similarly, CCD4 is missing in RGS2.

8.3.2 Create and Display a Light Curve

The background is assessed through examination of the light curve. We will extract
a region, CCD9, that is most susceptible to proton events and generally records the
least source events due to its location close to the optical axis. Also, to avoid
confusing solar flares for source variability, a region filter that that removes the
source from the final event list should be used. The region filters are kept in the
source file product P*SRCLI_*.FIT.

More experienced users should be aware that with SAS 13, the *SRCLI* file's
column information changed. rgsproc now outputs an M_LAMBDA column
instead of BETA_CORR, and M_LAMBDA should be used to generate the light curve.
(The *SRCLI* file that came with the PPS products still contains a BETA_CORR
column if you prefer to use that instead.)

table - input event table
withrateset - make a light curve
rateset - name of output light curve file
maketimecolumn - control to create a time column
timebinsize - time binning (seconds)
makeratecolumn - control to create a count rate column, otherwise a count column will be created

expression - filtering criteria

The output file r1_ltcrv.fits can be viewed with fv.
The light curve is shown in Figure 8.2.

Figure 8.2:
Background event rate from the RGS1 CCD9 chip. The flares are solar
events. The time units are elapsed mission time.

8.3.3 Generating the Good Time Interval (GTI) File

Examination of the lightcurve shows that there is a noisy section at
the end of the observation, after 1.36975e8 seconds, where the count
rate is well above the normal background count rate of 0.05 count/second.
There are two procedures that make the GTI file (gtibuild and tabgtigen)
that, when applied to the event file in another run of rgsproc,
will excise these sections.

The first method, using gtibuild, requires a text file as input.
This file can be made on your local machine and uploaded to your Hera
account by right-clicking and dragging the file from your local directory
to the remote directory. In the first two columns, refer to the start and end
times (in seconds) that you are interested in, and in the third column,
indicate with either a + or - sign whether that region should be kept or removed.
In the example case, then, we would write in our ASCII file (named r1_gti.txt):

1.36958e8 1.36975e8 +

and proceed to the task gtibuild:

gtibuild file=r1_gti.txt table=r1_gti.fits

where

file - intput text file
table - output gti table

Alternatively, we can make the GTI file with tabgtigen and filter for RATE
(though we could just as easily filter on TIME) by entering

tabgtigen table=r1_ltcrv.fits gtiset=r1_gti.fits expression='RATE0.2'

where

table - the lightcurve file
gtiset - output gti table
expression - the filtering criteria. Since the nominal
count rate is 0.05 about count/sec, we have set the upper
limit to 0.2 count/sec.

8.3.4 Applying the GTI

Now that we have GTI file, we can apply it to the event file by running rgsproc
again. rgsproc is a complex task, running several steps, with five different entry
and exit points. It is not necessary to rerun all the steps in the procedure, only the
ones involving filtering.

orders - spectral orders to be processed
auxgtitables - gti file in FITS format
bkgcorrect - subtract background from source spectra?
withmlambdacolumn - include a wavelength column in the event file product
entrystage - stage at which to begin processing
finalstage - stage at which to end processing

We will refer to the output event file as r1_filt.fits.

8.3.5 Creating the Response Matrices (RMFs)

Response matrices (RMFs) are not provided as part of the pipeline product package,
so you must create your own before analyzing data. The task rgsproc generates
a response matrix automatically, but as noted in §8.2.3, the source
coordinates are under the observer's control. The source coordinates have a profound
influence on the accuracy of the wavelength scale as recorded in the RMF that is produced
automatically by rgsproc, and each RGS instrument and each order will have its
own RMF.

Making the RMF is easily done with the package rgsrmfgen. Please note that,
unlike with EPIC data, it is not necessary to make ancillary response files (ARFs).

At this point, the spectra can be analyzed. If you you wish, skip the discussion
on combining spectra (§8.3.6) and go straight to fitting the
spectrum (§8.4.)

8.3.6 Combining Spectra

Spectra from the same order in RGS1 and RGS2 can be safely combined to
create a spectrum with higher signal-to-noise if they were reprocessed
using rgsproc with spectrumbinning=lambda, as we did in
§8.1 (this also happens to be the default).
The task rgscombine also merges response files and background
spectra. When merging response files, be sure that they have the same
number of bins. For this example, we assume that RMFs were made for
order 1 in both RGS1 and RGS2.

The spectra are ready for analysis, so we can prepare the spectrum for fitting.

8.4 Approaches to Spectral Fitting and the Cash Statistic

For data sets of high signal-to-noise and low background, where counting statistics
are within the Gaussian regime, the data products above are suitable for analysis
using the default fitting scheme in XSPEC, -minimization. However, for
low count rates, in the Poisson regime, -minimization is no longer suitable.
With low count rates in individual channels, the error per channel can dominate over
the count rate. Since channels are weighted by the inverse-square of the errors during
model fitting, channels with the lowest count rates are given overly-large
weights in the Poisson regime. Spectral continua are consequently often fit
incorrectly, with the model lying underneath the true continuum level.
This will be a common problem with most RGS sources. Even if count rates are large,
much of the flux from these sources can be contained within emission lines, rather
than the continuum. Consequently, even obtaining correct equivalent widths for such
sources is non-trivial.

The traditional way to increase the signal-to-noise of a data set is to rebin or
group the channels, since, if channels are grouped in sufficiently large numbers,
the combined signal-to-noise of the groups will jump into the Gaussian regime.
However, this results in the loss of information. For example, sharp features
like an absorption edge or emission line can be completely washed out. Further,
in the Poisson regime, the background spectrum cannot simply be subtracted, as is
commonly done in the Gaussian regime, since this could result in negative counts.
Therefore, rebinning should be reserved for fast, preliminary analysis of spectra
without sharp features, or for making plots for publication. When working on the final
analysis for a low-count data set, the (unbinned) background and source spectra
should be fitted simultaneously using the Cash statistic. (If fitting with XSPEC,
be sure you are running v11.1.0 or later. This is because RGS spectrum files
have prompted a slight modification to the OGIP standard, since the RGS spatial
extraction mask has a spatial-width which is a varying function of wavelength. Thus,
it has become necessary to characterize the BACKSCL and AREASCL parameters
as vectors (i.e., one number for each wavelength channel), rather than scalar keywords
as they are for data from the EPIC cameras and past missions. These quantities map
the size of the source extraction region to the size of the background extraction
region and are essential for accurate fits. Only Xspec v11.1.0, or later versions,
are capable of reading these vectors, so be certain that you have an up-to-date
installation at your site.)

Finally, a caveat of using the Cash statistic in Xspec is that the scheme requires
a ``total'' and ``background'' spectrum to be loaded into Xspec. This is in order
to calculate parameter errors correctly. Consequently, be sure not to use the
``net'' spectra that were created as part of product packages by SAS v5.2 or
earlier. To change schemes in Xspec before fitting the data, type:

XSPEC statistic cstat

For our sample spectrum, we will rebin and fit it with statistics.

8.4.1 Spectral Rebinning

There are two ways to rebin a spectrum: the FTOOL grppha, or the RGS pipeline.
grppha can group channels using an algorithm which bins up consecutive channels
until a count rate threshold is reached. This method conserves the resolution in emission
lines above the threshold while improving statistics in the continuum.

The disadvantage of using grppha is that, although channel errors are propagated
through the binning process correctly, the errors column in the original spectrum
product is not strictly accurate. The problem arises because there is no good way
to treat the errors within channels containing no counts. To allow statistical fitting,
these channels are arbitrarily given an error value of unity, which is subsequently
propagated through the binning. Consequently, the errors are overestimated in the
resulting spectra.

The other approach, which involves calling the RGS pipeline after it is complete,
bins the data during spectral extraction. The following rebins the pipeline
spectrum by a factor 3.

orders - dispersion orders to extract
rebin - wavelength rebinning factor
rmfbins - number of bins in the response file; this should be
greater than 3000
entrystage - entry stage to the pipeline
finalstage - exit stage for the pipeline

One disadvantage of this approach is that you can only choose integer binning
of the original channel size. To change the sampling of the events, the pipeline
must be run from the second stage (``angles'') or earlier:

nbetabins - number of bins in the dispersion direction; the default is 3400

The disadvantage of using rgsproc, as opposed to grppha, is that the
binning is linear across the dispersion direction. Velocity resolution is lost in the
lines, so the accuracy of redshift determinations will be degraded, transition
edges will be smoothed, and neighboring lines will become blended.

8.5 Fitting a Spectral Model

We can fit the spectrum using Xspec. This is easily done by entering

xspec

Enter the data, background, and response file at the prompts, and edit the fitting
parameters as needed.

Figure 8.3:
1st order RGS1 spectrum of Mkn 421. The fit is an
absorbed power law model. The gap between 10-15Å is
due to the absence of CCD7.

8.6 Analysis of Extended Sources

8.6.1 Region masks

The optics of the RGS allow spectroscopy of reasonably extended sources, up
to a few arc minutes. The width of the spatial extraction mask is defined by
the fraction of total events one wishes to extract. With the default pipeline
parameter values, over 90% of events are extracted, assuming a point-like source.

Altering and optimizing the mask width for a spatially-extended source may
take some trial and error, and, depending on the temperature distribution of
the source, may depend on which lines one is currently interested in. While
Mkn 421 is not an extended source, the following example increases the
width of the extraction mask and ensures that the size of the background
mask is reduced so that the two do not overlap.

Observing extended sources effectively broadens the psf of the spectrum in the
dispersion direction. Therefore, it is prudent to also increase the width of the
PI masks using the pdistincl parameter in order to prevent event losses.

8.6.2 Making RMFs for extended sources

RGS response matrices as made in §8.3.5 are appropriate for use with
point sources only. If we are interested in analyzing an extended source, the
RMF must take into account the spatial degradation of the resolution. The
most straight-forward way to do this is to modify the response matrix prior
to spectral fitting. For sources extended up to about 1 arcminute, this can be
done with the FTOOL rgsrmfsmooth. It requires three files: the
point source RMF (as made in §8.3.5), an image of the source (from
an EPIC camera, see §7.2.1, or different mission), and a text file.
The better the resolution of the image, the more accurate the modified RMF will
be, so if a Chandra image is available for a source, it should be used instead
of an EPIC image. The text file must list the name of the image, the boresight,
and the aperture size in the following format:

For an example case, we will name our text file xsource.mod. We will assume
that a RMF for the first order grating was made as in §8.3.5 and an
MOS1 image was made as in §7.2.1; xsource.mod contains these lines: