The ErrorLog objects that modules set up and that users see require that an
ELadministrator exist.
This encapsulates all the framework control over
how logging is done; in particular, the ELadministrator holds the
information as to what destinations should be used, and the way to get
context information when needed.

The framework, outside the context of any ordinary module, should get
an instance by invoking the ELadministrator::instance() method. It will
then use methods of this to establish how to get context information,
and to attach various ELdestination sinks.

Next the framework provides the ELcontextSupplier, which the logger
will use when messages are logged to find out context such as run and event
numbers.
This is done by
creating a class derived from ELcontextSupplier,
and overriding the pure
virtual methods such as context().
An instance of this class is passed to
logger.setContextSupplier().

Providing a context supplier is optional; it provides a way for the logger
to get at the run and event numbers when an error is logged. If no supplier
is provided, no run/event information will be associated with messages.
Further details are given below in
ELcontextSupplier: How the run and event numbers are indicated.

The frameworker may also declare the name of the overall process (or job,
or node in a farm situation). When an error is logged, this name will go
into the message. This could be used to distinguish among many cooperating
nodes which share the same output for destinations.

logger->setProcess( "PCfarm81" );

Having instantiated logger and told it how to get context information,
the framework must attach one or more destinations -- sinks for the messages
and statistics to be sent to. An error logger with no sinks is probably
useless.

The form taking a file name is mostly a convenience, since
one could have opened an ofstream for that file,
and supplied that ofstream to the first constructor.

ELstatistics is a table of message frequencies and sample
contexts, which contains methods for generating run summaries.
Although when a summary is requested one specifies a stream or
destination, the constructor of ELstatistics can take an ostream,
to which unreported statistics would be sent when a job terminates.

ELstatistics stats (cerr);

ELcollected
is constructed taking an instance of an ELsender class
defined by the user to implement the mechanism by which data would
be transmitted to a collecting process.
Collective error logging is explained elsewhere.

The usual framework will use one or more
ELoutput sinks, an ELstatistics, and perhaps an ELsaveBuffer.
To attach a destination, the frameworker must instantiate it, and
attach the destination to the ELadministrator. Attaching the destination
returns an ELdestControl object, which is a handle for controlling
the behavior of that destination. If no destination is attached to the
ELadministrator then one is attached automatically to "cout" when the first
error is logged.

Thus, instantiating a destination and attaching it to the logger leaves
the framework with an ELdestControl to use to affect that destination.

(Although in these notes we use the CDF term "Module" for the base
class that a package of physicists' codes is associated with, in D0 the same
concept is called a Package.)

The class for each module should allow its methods to log errors by
containing a variable object of the type ErrorLog.
This should be done
by giving the base class Module, which all modules classes derive from,
an instance of ErrorLog. The name of this object should be known to the
various physicists coding parts of the module. We assume that this variable
will be named
"errlog."
This should be placed in the protected section, not private, so
that methods of the derived classes can make use of errlog.

The ErrorLog instance will contain a small amount of module-specific
information, but most work is actually done by the ELadministrator
set up by the
framework. For instance, the ELadministrator will hold the list of
destinations for messages sent to the log. Since there is inherently
only one instance of ELadministrator, the many instances of ErrorLog
will send to destinations in a coherent and unified way.

After instantiating errlog, the Module class should provide a module name,
to use when methods of this class issue an error message to the log.
Thus in the Module base class, the lines setting up the ErrorLog will
look something like:

If this scheme is used,
the classes derived from Module, which contain the
physicists' code, need do nothing further to be able to issue messages to
errlog.

A simpler scheme would be to have a glogbal ErrorLog named errlog, and
then everybody could log through that same errlog. In that cse, however,
all the messages would be assigned the same module name (unless users
were to explicitly call errlog.setModule() before logging a message).

The run and event are printed in the text of every error message. However,
the framework does not have to call some setEvent() method
every time a new event
is started.
In order to avoid overhead for every event (when errors might
be logged in only a tiny fraction of those events), this information is not
pre-supplied by the framework.

Instead, when an error is being logged, the logger asks for the run and
event names at that point. It asks by invoking the context() function
of the ELcontextSupplier object set up for the logger.

The frameworker should create a class derived from ELcontextSupplier
and
pass an instance of that class to the constructor of the ELadministrator,
as shown above ("How the frameworker sets up for logging").

As seen from this, the class derived from ELcontextSupplier must
define the three virtual methods returning ELstring, though some may be
identical to others.
Although destinations are free to call for and use whichever context string
they wish, the intent of the three forms is as follows:

context()

is the form used by the typical output-to-a-log-file
destination, for example ELoutput. Here the string
ought not to be too long, to avoid clutter in the log;
but not limitation or truncation is imposed per se.
A typical string would be
run= 1234 event= 12345.

summaryContext()

is the form used by ELstatistics to get a string
suitable for insertion into a table. This is
length-critical, and will be truncated at 16 characters
in the statistics summary. A typical string would be
1234/12345.

fullContext()

is a form intended for use where extra info may be
useful and length is not a big consideration. ELoutput
can enable fullContext() instead of context() on a
per-destination basis. fullContext() would typically
just return context(), but another example might be
run= 1234 event= 12345 reco version 3.2.

The editErrorObj() routine will be called when the message is started.
It provides a hook to modify the module or subroutine information, the
id, or any part of the message other than the actual text items (which
would not have been established yet). This method is optional; the
editErrorObj() method of the base ELcontextSupplierI class is an inline
method doing nothing.

The context supplier is spcified via the setContextSupplier() method of
ELadministrator. If this method is not invoked, no context information
will be associated with error messsages.

Some nomenclature: Thresholds and limits both apply to particular
destinations.

When we say limit, we shall always mean some count
of messages of a particular description, beyond which some action will
cease to happen. A limit can be specified by message id, or for multiple
id's in two ways: A general wildcard, or all messages of some severity
level.
When we say threshold we shall always mean some ELseverityLevel,
at or above which some action would happen.

One can set a limit or a threshold for each individual destination.
(using its ELdestControl).
To do this, you use methods of the associated ELdestControd, which we
will call "dest" for these examples. The effects are as follows:

dest.setThreshold (severity);

Supress logging or acting on
messages below this severity,
for this destination.

For this destination, don't
log past n instances of any
given exception id matching the
specified type.

In case one has established two or more applicable limits, the limit used
is the most specific applicable case: Specified ID before specific severity
level, and both before wildcard "*".

As implied by the above chart, each ELdestination owns a threshold level
(if no threshold is explicitly set, the threshold is ELzeroSeverity).
Each ELdestination also owns a general limits table (indexed by severity
level) and a limits table (indexed by message id). Each limits table
entry contains as data the limit and the count.

ELstatistics is a particular destination. That means
that filtering by severity is set up via setThreshold. In the case of an
ELstatistics destination, however, setLimit has no effect: Once an error
message id gets into that table, there is absolutely no cost to incrementing
its count, so a limit would be useless.

Although a limit conceptually may or may not have been set, and if no limit
whatsoever applies, no messages will be throttled out, the threshold
severity level always has a value. By default, in each individual
destination, this threshold starts as ELzeroSeverity.

For convenience, the ELadministrator also has setThresholds() and setLimits()
methods. The meaning of those is to invoke the corresponding method for
every
destination currently attached to the logger. (That might change
some limits previously established.)

Finally, two features slightly soften the concept of throttling via a limit.
The first is that a count is past its limit, action does not cease
completely: If the applicable limit for a given error would be L,
then if there is an excess E = count-L, instances with
E/L = 2**N for any
non-negative integer N, will be logged.
Thus if the limit for a specific message is 5, the loggger would contain
numbers 1, 2, 3, 4, 5, 10, 15, 25, 45, 85, 165, and so forth.

The second concept applies when a large bunch of errors of some type
is followed by a long time span with no such errors. At that point the count
toward the limit may automatically be reset, so that a new cluster of these
errors can again be logged. This is controled by

dest.setTimespan ( id, seconds );

where the id works the same way and uses the same wildcards as for limits.

You can set up a destination (ELouput,
ELcollected, ELerrorList
or ELstatistics) to react only to messages coming from one module:

dest.filterModule ( "Calorimetry" );

The destination controlled by dest will now only react to
error messages logged from a module whose name has been set to
setModule("Calorimetry"). (Note that the comparison is
case-sensitive.)

Other filtering behaviors, that is, limits and thresholds, will still apply to
this destination, acting on those error messages which survive the module test.

Alternatively, you can set up a destination to exclude
all messages coming from one module:

dest2.excludeModule ( "MuonIdent" );

The destination controlled by dest2 will now ignore all
error messages logged from a module whose name has been set to
setModule("MuonIdent") (again the comparison is
case-sensitive.)

These routines are a special case of a more geneneral capability:

The model is that there is a `respondToModule' list and an
`ignoreModule' list.
At any given time, either the `respondToModule' list or the
`ignoreModule'`ignoreModule' list (or neither) may be active,
but not both.

Then there are two ways to call each of two methods to influence these:

dest.ignoreModule ("*");

Clear `respondToModule' list, and filter out all messages, except
those coming from any module that is later added to the
`respondToModule' list.

dest.respondToModule ("name");

Add this module to the `respondToModule' list, and remove it (if
present) from the `ignoreModule' list.

dest.respondToModule ("*");

Clear `ignoreModule' list, and respond to all messages, except
those coming from any module that is later added to the
`ignoreModule' list.

dest.ignoreModule ("name");

Add this module to the `ignoreModule' list, and remove it (if
present) from the `respondToModule' list.

The filter and exclude routines
are expressed in terms of ignoring and responding to modules:

The frameworker can short-ciruit the formatting,
timestamping, and "shopping" the message to each destination that occurs
when a low-severity error message is sent:

errlog.setDiscardThreshold (sev);

This threshold is sent for the particular instance of ErrorLog.
It might normally be set to the lowest threshold of any destination;
but in special circumstances one could specify that any message sent to this
Errorlog with severity less than some higher threshold is to be
discarded.

The meaning is that any messages sent through the usual mechanism of

errlog ( severity, id ) << items << endmsg;

which have severity less than sev, will be discarded with the absolute
minimum possible work.
The message will not be shopped for each destinations to react to,
and each operator<< will simply return, doing no work.
The net effect is a speedup of null-message processing by a factor of about 30.

The discardThreshold works like other thresholds in that if a message
severity is greater or equal to the threshold, it can be reacted to.
The discardThreshold takes precedence over individual destinations'
thresholds.

An example of usage, and demonstration of the differnce in speed,
is in the file
testDiscard.cc.

The idea is that this output can easily be disabled at an overall
level, set on a per-ErrorLog basis by the framework.
The fundamental notion is that of a debug "verbosity" level, which is simply
an integer.
If a message is given a level which is higher than the cuttoff chosen by the
framework for that logger, then that message will be completely (and
efficiently) ignored.

errlog.setDebugVerbosity(3);

The default, if no debug verbosity level is set for an ErrorLog,
is a level of 0:
All debug messages assigned any positive verbosity level will be ignored by
default.

The cuttoff on debugging verbosity level is controllable for each individual
ErrorLog. This allows the framework to accomodate
cases where verbosity needs to be high for one module but
not for others.

By default, every message issued through this debug message mechanism is
assigned a severity of ELinfo and a message id of "DEBUG".
The framework can alter this for a given ErrorLog, for example:

errlog.setDebugMessages ( ELwarning, "differentID" );

The mechanism of debug verbosity levels acts very much like that of
discard thresholds for the ErrorLog, in that:

If the debug verbosity is too low, the message is discarded
regardless of thresholds set for various destinations, and never
even makes it to statistics.

If the debug verbosity "cut" is passed, then individual output
destinations may still reject a given message based on its severity
not being high enough. In particular, by default destinations which
don't react to ELinfo severity will not react to

One distinction is that since the level can be any integer, the granularity
availble is not restricted by the limited number of severity levels.

The higher severity levels are used by physicists to indicate that they
consider this error so severe as to warrant terminating the job. However,
ultimate control of this rests in the hands of the framework. The control
is associated with the logger (the ELadministrator) rather than any specific
destination.

logger->setAbortThreshold (ELfatal);

By default, the abort threshold is ELabort.

If the physicist issues a log message with severity as at or above the
abort threshold,
upon completion of the log message
and dispatch of that to
the various destinations, the logger will terminate the job by invoking
abort(). This will cause a core file (potentially useful for debugging)
to be produced. No termination summary is given.

A plausible philosophy is to set the abort threshold at
ELhighestSeverity, so
that NO user message aborts the job. In that case, we advise periodically
checking for the presence of severe errors (see
"Checking on errors").

The abort() routine causes a core dump, which normally will be useful
to trace down the cause of the error. However, in some cases the source
of errors may be known, and production runs may be better off not consuming
the disk space taken up by core files. The framework has the option of
causing errors of some severity levels to invoke exit (severityLevel)
rather than abort. If exit() is invoked, a termination summary is given.
Assuming the framework had established an atexit()
handler, this would be called as the job exits.
The exit() call will not cause a core file to be produced.

logger->setExitThreshold (ELfatal);

By default, the exit threshold is ELhighestSeverity, that is,
unless an exit threshold is specificallyestablished, the automatic exit()
behavior will not occur.

The framework can provide flexibility, so that some errors will still cause
an abort() while others will cause an exit(). The rules used to decide whether
to abort or exit, when encountering an error message of severity s,
are as follows:

If s is of lower severity than both the abort and the exit thresholds,
then of course neither form of termination occurs.

Among the thresholds which are less than or equal to s,
the termination action chosen is the one with the higher threshold.

It is suggested that the thresholds not be set to be identical; if they are,
then the action chosen for sufficiently severe messages will be abort() and
the choice of exit() will never be taken.

Since severity levels above ELnext indicate advice that the flow of
processing ought to be modified or ceased, it is advisable (in all but
the most time-critical applications) that the framework check for the
presence of such errors, between invoking modules or between events.
To make this inexpensive, a routine checkSeverity() is provided.

Assuming the frameworker has attached an instance of the provided
ELstatistics destination to the logger, statistics summaries may be
obtained. ELstatistics has no visible reaction to being sent an error
message, but places information into a table, which can be formatted and
output. For the purposes of the illustrations below, we will assume that
the logger has attached an ELstatistics destination, with an ELdestControl
called "stats."

The summary of error statistics can be obtained as a string. It can also
be sent to a destination device (as a series of summary lines); ELoutput
places these lines into its associated stream to output the summary.

There is an additonal means of having the summary sent to an ostream: If
the constructor of ELstatistics is supplied an ostream argument, then when
the job completes the destructor for that ELstatistics will check to see
if any information has changed since the last summary request. If so, one
final summary will automatically be sent to that ostream. (This is safe
assuming the user does not explicitly delete the ostream before the
statistics destination goes away.)

To suppress the termination summary altogether:

stats.noTerminationSummary();

The format of the summary information is that of three parts:

The first part lists the errors which have occurred,
with their frequencies.

The second part re-lists just the identification,
and supplies the contexts
(usually run/event of the first two and last examples of
occurences of each error.

The method to clear the counts in the statistics being kept in
the destination whose ELdestControl is "stats" is

stats.clearSummary();

This clears both the individual statistics (kept by message ID) and the
counts for the various severity levels. It would normally be used after
having invoked stats.summary() in some form to send the information
somewhere.

clearSummary() does not zero the aggregate counts for individual error IDs
or for severity levels. It also does not affect any limits set, or wipe
the knowledge of the message IDs from the statistics tables. (Thus there
would remain a bunch of errors with counts of zero; when statistics are
output these do not appear.)

There is a more sweeping method:

stats.wipe();

This clears everything -- counts and aggregate counts for individual ID's
and for severity levels. The statistics is wiped clean, which may be
relevant if memory management issues have imposed a limit on the number of
message ID entries in the statistics table.

The same routine also wipes out any information in the limits table. This
includes values which have been supplied by setLimit() (or setTimespan()),
and counts of the individual instances of each message ID. For the
ELstatistics destination this is moot (since limits do not apply) but for
ELoutput or ELsaveBuffer this may be useful when a maximum number of
message ID entries in the limits table has been imposed.

This clears everything -- counts and aggregate counts for severity levels
and for individual ID's, as well as any limits established. (This includes
the limits for "*" all messages.) The table is wiped clean, which may be
relevant if memory management issues imposes a limit on the number of
message ID entries in the statistics table.

Finally, there is a way to zero the counts going toward the limits all
IDs in a destination. This does not affact ELstatistics (which does not
use limits) and will not compactify the limits table.
It also does not affect aggregate counts.