Simulation and Reconstruction

For analysis applications (like BetaMiniApp), the input is a collection in the Event Store. But these collections have already undergone several stages of reconstruction. In this section you will learn about the executables and packages that produce the collections in the Event Store. At the end, you will learn how to run the reconstruction programs for yourself.

Data begins as signals in the BaBar detector, and must pass
through several stages before it is translated into a collection in
the BaBar event store. The signals are digitized and reconstructed
into tracks and clusters. Particle ID algorithms identify each track or
cluster as a candidate for a given type of particle. Once the final-state
particles are identified, they are used reconstruct other particles in the
decay chain.

Simulated data consists of generated particles, each with their own
identities and four-vectors, which propagate through and leave signals in
a GEANT4-simulated BaBar detector. These signals have the same format as
the signals left by real data in a real detector, so from that point
on reconstruction is the same as for real data.

Skims are subsets of the full data set that contain particular
tag bits needed for a given analysis.

The following sections describe in more detail how real data, simulated,
and skimmed collections are produced.

Reconstruction of real data

The journey of real data begins in the detector. A high-energy
e+e- collision results in a shower of particles, which spread out,
interact with and pass through the various layers of the detector.
Thus the first format of data is raw detector signals. Next, the raw
detector signals are digitized. Finally, events that pass the Trigger
(a very loose filter) are put in XTC files to await reconstruction
(actually, "prompt reconstruction").

The program that performs the reconstruction of real data is called Elf
(executable = ElfUserXtcApp). Reconstruction takes place in three steps.
First, the hits are reconstructed into the basic objects corresponding
to individual particles: tracks in the SVT and DCH, and clusters in the
EMC and IFR. Second, particle identification (PID) algorithms are used to
assign probable identities to the particles. This produces the AllEvents
data set. Finally, tagging creates a database of tag bits, simple boolean
or boolean-like flags which are useful for quick data skims. The result
is the AllEventsSkim data set.

Production and reconstruction of
simulated data

The aim of simulation production is to create simulated (Monte Carlo)
collections that mimic real data collections as closely as possible.
Therefore, it is not enough just to generate a given decay -- one must
also simulate the propagation of the particles through
the detector, and the detector response to those events.
Several stages are required to produce these simulated data:

Generation of the underlying physics event;

Particle transport and calculation of the idealized energy
deposits in the detector;

Overlaying of backgrounds and digitization of the energy deposits;

Reconstruction of the event

The last step, reconstruction, is performed by an executable called
BearApp from the Bear package. Bear is the simulation equivalent of the Elf
package for real data reconstruction. Like Elf, it takes collections
of digits and runs the full reconstruction chain, invoking the
reconstruction modules within the SVT, DCH, DRC, EMC and IFR
sub-systems. And as in Elf, the output from Bear are collections
designed to be used in physics analyses. The only important
difference is that Bear is for Monte Carlo collections, and Elf is for
real data collections.

The other steps of course exist on the simulation side only, since
for real data there is no need to simulate events and the detector
response.

The first two steps -- generation and propagation of
particles -- are performed by a program called Bogus. (The actual
package name is BgsApp, which is also the name of the executable.)
BOGUS, the BaBar Object-oriented Geant4-based
Unified Simulation, is BaBar's detector simulation layer over
GEANT4 running in the BaBar Framework. The output of Bogus is in a data structure
called GHits, which lists the (idealized) energy deposited by the particles
passing through the detector, and the location of each energy deposit.

Stage 3 takes these idealized GHits and applies digitization, that
is, it transforms them into signals which look like the real
data collected by the detector electronics. At this stage also
backgrounds are overlaid; the final output looks as similar as
possible to real data collected by the detector. The package for
this stage is called SimApp (executable = SimAppApp).

In the past, the simulation was always performed by the
3-step Bgs-SimApp-Bear procedure. However, the majority of users do
not need the intermediate collections
produced by Bogus and SimApp -- all they want are the Bear output
collections. Therefore, a new program called Moose (Monolithic
Object-Oriented Simulation Executable) was created. Moose does
all three stages in one step.

Now most users use Moose instead of the 3-step procedure.
The only reasons a person might want to run the 3-step procedure
instead would be (a) to study the intermediate collections (output
collections of Bogus or SimApp), (b) to run software from before release 12,
or (c) to test Moose against the 3-step method.

Skim production

The Event Store also contains skims, subsets of the
AllEventsSkim collection that contain a given set of tag bits. Skims are
produced by running the SkimMiniApp executable (from the SkimMini package)
over the AllEventsSkim collection. SkimMiniApp can be run over
the output collections from Elf or Bear, provided they contain the
required tag bits.

The names of the skims can be found in FilterTools/defineMiniSkims.tcl.
The main code for the skims is often in other packages, but you can find
it by looking at the corresponding FilterTools/XXXPath.tcl file,
where XXX is the name of the skim.

The main task of the reconstruction software is to take the
hits (digis) in the different subdetectors, and reconstruct
them into the basic particle objects: tracks in the SVT and DCH,
and clusters in the EMC and IFR. Then particle identification (PID)
algorithms are used to assign probable identities to the particles.

As you saw above, BaBar reconstruction is performed by two packages:
Elf for real data, and Bear (or Moose, which includes Bear) for
simulated data. The output of Elf and Bear (or Moose) are
collections, ready to be analyzed with Beta applications.

Like all BaBar code, the reconstruction software is
built from modules. The reconstruction modules for each subdetector can be
found in the appropriate XxxReco and XxxPid packages, where Xxx
is the sub-system name: Svt, Dch, Drc, Emc, or Ifr.

As usual in C++ code, the important reconstruction objects are
defined as members of a C++ class. The reconstruction objects
for each subdetector can be found in the XxxData packages.

Most of the XxxReco, XxxPid, and XxxData packages come with
very good README files that describe and list the reconstruction
modules or objects.

For example, a look at the EmcData and EmcReco packages (and their
READMEs!) reveals that the reconstruction modules and objects for
the EMC are:

The following examples will show you how to run the main executables
for the packages used to produce collections: Moose, Elf, and SkimMini.
They are for the most part identical to the examples in the CM2 intro doc.

Note that in practice, user collections (ones you make yourself) are
for testing purposes only. For a real analysis, you are
expected to use only official BaBar collections from the Event Store.

Elf, Moose, and SkimMini are set up to write output collections.
To make a space for these collections, enter the command:

> KanUserAdmin createuser

This will create a directory /work/users/<username> where
you can put your collections. The following examples are for
a user with username elephant. To make them work for you,
you will of course need to put in your own username instead
of elephant, and your own initial instead of e.

So far in the Workbook, you have always used analysis-31 as your
test release. analysis-xx releases are the recommended releases
for running Beta applications like BetaMiniApp. However, for
simulation production, reconstruction and skimming, you should
instead use a recommended production release.

For simplicity, the following examples all use the same production release
18.7.0d. Note, however, that production releases become obsolete rather
quickly. This tutorial was last updated in October 2006, but within a few
months you may need to use a later release in order for the tutorial
to work. (I will try to keep these examples up to date, but if you find that
the production release in this example is obsolete, please let me know so I can fix it.)
To begin, set up your test release:

This is the application used to produce MC events, including generation,
simulation, digitization , and reconstruction.

Input

The user specifies a decay file for an exclusive or inclusive decay process.

Output

The output is the same as the output of the reconstruction program. In
addition, the tagbits used to define the skims for physics and detector
studies are also stored in the output collection. The format of the output
collection is a root file.

Requirements

Decay file for your favorite decay mode. These can be found
in workdir/PARENT/ProdDecayFiles

A tcl file to configure correctly the I/O (see the example below).

Example

In workdir, create a tcl "snippet" with your favorite options for
each job you want to run. Here is an example mymoose.tcl:

As you can see the tcl snippet is quite simple, it just sets a few
configuration parameters specific to a particular job you would like to run:

RUNNUM - Set the (arbitrary) run number. Also used as the seed for
the random number generator

CONDALIAS - set the data taking period for which you want to generate MC.

NEVENT - number of events to generate.

UDECAY - The input to Moose is a dec file from the the release's
ProdDecayFiles directory. There are different dec files for different
decay modes. Here you have chosen B0B0bar_generic.dec, so you will
be generating generic B0B0bar events.

SimAppBkndInputCollection - set the collection to use for background
mixing. Must be consistent with the CONDALIAS setting (note that
Aug2001 = 200108). For a list of background collections use the command
"BbkSqlShell select * from bbrmdc.prod_bkg".

SimAppBkndFirstEvent - set the first event from the background collection to use

MooseHBookFile - name of the monitoring plots file. The extension (.root
or .hbook) tells Moose what kind of file to produce (ROOT or PAW).

MooseOutputCollection - name of the output collection. You should put
your collections in the space created with KanUserAdmin, as shown above. That is, the name of the output collection should
begin with /work/users/elephant/.

The KanEventOutput "module talk" that sets "allowDirectoryCreation" to
true allows any necessary subdirectories in your /work/users/elephant
area to be created automatically.

Note that Moose will not overwrite collections that already exist,
so if you want to rerun Moose you must either set a new collection
name or delete the old collection first.

To learn more about KanUserAdmin and its options, use the help option:

KanUserAdmin help

Normally, KanUserAdmin is all you need to manage your collections.
However, it does not tell you very much about them. If you want to
know more about your user collections, or any other collections,
the tool you need is KanCollUtil.

You can see that your collection is made up of two ROOT files.
"MooseCol.01.root" is the micro part of the mini collection,
and "MooseCol.02E.root" is the rest.
Larger collections are made of more than two files.

This is the application used to reconstruct the events recorded in
IR-2 with the online system.

Input

The output of the online system is stored in the xtc files which are
used as input to run Elf.

Output

The output of the reconstruction are charged tracks and neutral particles
that are then used to perform physics analysis. The output of Elf is by
default a root file. You can still (for test purposes) store the output as a
collection in objectivity, but this requires some modification of the default
tcl parameters.

Requirements

The xtc file for a run. These are usually in /nfs/farm/babar/tcfiles.

A tcl file to configure correctly the I/O (see the example below).

A login to a SNAL computer such as a yakut or noric machine.

Example

Because it takes up so much space, raw data is kept on tape and must
be "staged in" before being used. So to begin, from a noric or yakut machine, you need to stage the xtc
file for the run you are interested in:

workdir> tcstage 0020029-001

The system might respond,

File /nfs/babar/tcfiles/babar-0020029-001.xtc is already on disk

in which case you are ready to go. Or it could take a while, during which
you might see responses from the system like

Where the "set SkimInput Collection" collection should be all given
on one line - it is only split here for formatting purposes.
As you can see the tcl snippet is quite simple, it just sets a few
configuration parameters specific to a particular job you would like to run:

SkimNEvent - number of input events to read. If this isn't set all events
from the input colletion are read.

SkimsToRun - Skims for which their selection code should run. Possible
values are {all, none, or specific skims}. You can determine
the name of possible skims from FilterTools/defineMiniSkims.tcl.

SkimsToWrite - Skims to write. Possible values are {all, none, or
a list of specific skims}. If this isn't set the default
is "all".

SkimOutputDir - output directory (in collection space) where the output
skim collections are to be written. Here you put your
collections in the /work/users/elephant/SkimDir
directory you created above.

SkimMC - set this for MC skims

SkimInputCollection - input collection you want to skim

The full list of available options (and most up-to-date documentation of
them) can be found in SkimMini/SkimMiniProduction.tcl.

You are now ready to run:

$ SkimMiniApp myskim.tcl

Once the job is done, check your SkimDir directory:

KanUserAdmin list /work/users/elephant/SkimDir

Because you set "SkimsToRun all" and "SkimsToWrite all", the system should
respond with a long list of skims.

In the Quicktour, and in most
BaBar tutorials, the user is instructed to check out packages, add
the required extra tags, and then compile and link the code (the "gmake"
commands). But in these examples, you did not have to compile and link
anything. Why not?

The answer is that BaBar already has "production" versions of
all BaBar applications. If you do not build your own executable,
then the production executable is automatically called instead.
So when you ran MooseApp, you were running the production version
from $BFDIST/releases/18.7.0d/bin/$BFARCH/MooseApp, not your own
personal MooseApp.

Normally, the production exectuable is not good enough. If you
want to change any C++ code in your test release --- even just adding extra
tags --- then you will have to compile and link a new executable
that will take these changes into account. But in this example,
you did not need to make any C++ changes, so it was okay to use
the production executable.

A useful tip on how to delete user collections

To delete the directory in collection space, the command is:

KanUserAdmin rmdir /work/users/elephant/SkimDir

However, if you try that now the system will respond

ERROR: directory is not empty.

Before you can delete the directory, you have to delete all of the
collections in the directory. This can be very tedious if you have
many collections. One way to get around this is to do:

KanUserAdmin list /work/users/elephant/SkimDir > rmfiles.job

This produces a file called rmfiles.job with a list of collection
names: