Data Processing

KamLAND Data Processing

Data Collection

The signals from the photo-multiplier tubes (PMTs) in the KamLAND
detector are read out by the electronics. The electronics processes and formats the data and is
in turn read out by a data-acquisition (DAQ) system. KamLAND
electronics stores waveform data coming from the PMTs, this is similar
to what one sees on an oscilloscope. The DAQ system performs basic data
validation and writes the data out to disk (in a format called
SF-file). The data files are later copied onto tape and shipped to the
KamLAND data processing facilities in Japan and in the US. We write
approximately 120GB per day, 365 days a year. The amount of data is
enormous and we are looking for a few rare events, it therefore needs
considerable resources to do the data mining and find the interesting
events.

The US data processing facility is located at NERSC in Oakland, CA. This is a large
scientific computing facility funded by the Department of Energy. It
has a large Linux computing cluster (called PDSF) and a High Performance Storage
System (HPSS). The data
is initially copied from the tapes shipped from Japan into the HPSS
system. It is unfeasible to copy the data from the experimental site in
Japan to HPSS directly, the network bandwidth is simply too small for
the large data volume.

Data Processing

Once the data is in HPSS we can start processing it on the PDSF
cluster. We use software written in C++ that is based on the ROOT framework to do the processing (the
software is called AKat). First, the data is
converted from the raw signals coming from the electronics (which look
like oscilloscope traces) into a more useful quantity (time (T) and
charge (Q)), which tell us the size and the timing of the pulses were
detected by the PMTs. This data is stored into an intermediate analysis
file that we named TQ-file. A second pass goes over the TQ files and
does the actual reconstruction of the physical events, this data is
stored in so-called RECON files. The main reason for breaking up the
analysis into a two-step process is to separate the
time-consuming part of generating TQ information from the
reconstruction part that will often change rapidly during the analysis.
Reconstructing events from TQ files is a fairly quick process.

There are essentially two types of information that are stored in
the RECON files: vertex objects contain the position of the event in
the balloon volume and the energy seen by the detector. Track objects
contain information about muons passing through the detector (about 1
in 100 events is a muon in KamLAND, the muons come from cosmic rays
penetrating through the mountain). The muons need to be stored as
additional information since they can produce particles (spallation
products) during their passage through the detector that mimic the
signal KamLAND is looking for. The events in the detector are vetoed
for a certain time after a muon passes through it.

The anti-neutrino signal in KamLAND is a delayed coincidence
(see Physics Impact) between two
different detector events: first a photon from the positron in the
reaction is detected and about 200µs later a photon from the
capture of the neutron is registered. This means that one other step
has to occur in the analysis; the prompt and delayed event have to be
correlated.

The correlation information is extracted from the RECON files and
stored as event 'multiplets' in so-called Coincidence files. These are
the files from which we extract the anti-neutrino candidate list, by
performing appropriate cuts (see Data Analysis).

File Sizes

The following table lists the reduction of file sizes as the
information if further extracted. As an example, the total file sizes
of run1449 (a 24 hour run on October 2, 2002) are listed:

File Type

File Size

SF

120GB

TQ

39GB

RECON

1.2GB

Coincidence

40MB

So there is a reduction by a factor of 3000 in file size when
going from the data that is collected by the detector to the
coincidence format!