Overview

A common problem encountered in different fields of applied science is
to find an expression for one physical quantity in terms of several
others, which are directly measurable.

An example in high energy physics is the evaluation of the momentum of
a charged particle from the observation of its trajectory in a magnetic
field. The problem is to relate the momentum of the particle to the
observations, which may consists of of positional measurements at
intervals along the particle trajectory.

The exact functional relationship between the measured quantities
(e.g., the space-points) and the dependent quantity (e.g., the
momentum) is in general not known, but one possible way of solving the
problem, is to find an expression which reliably approximates the
dependence of the momentum on the observations.

This explicit function of the observations can be obtained by a
least squares fitting procedure applied to a representive
sample of the data, for which the dependent quantity (e.g., momentum)
and the independent observations are known. The function can then be
used to compute the quantity of interest for new observations of the
independent variables.

This class TMultiDimFit implements such a procedure in
ROOT. It is largely based on the CERNLIB MUDIFI package
[2]. Though the basic concepts are still sound, and
therefore kept, a few implementation details have changed, and this
class can take advantage of MINUIT [4] to improve the errors
of the fitting, thanks to the class TMinuit.

In [5] and [6] H. Wind demonstrates the utility
of this procedure in the context of tracking, magnetic field
parameterisation, and so on. The outline of the method used in this
class is based on Winds discussion, and I refer these two excellents
text for more information.

The Method

Let by the dependent quantity of interest, which depends smoothly
on the observable quantities
, which we'll denote by
. Given a training sample of tuples of the form,
(TMultiDimFit::AddRow)

where
are independent
variables, is the known, quantity dependent at
,
and is the square error in , the class
TMultiDimFit
will
try to find the parameterization

(1)

such that

(2)

is minimal. Here
are monomials, or Chebyshev or Legendre
polynomials, labelled
, in each variable
,
.

So what TMultiDimFit does, is to determine the number of
terms , and then terms (or functions) , and the
coefficients , so that is minimal
(TMultiDimFit::FindParameterization).

Of course it's more than a little unlikely that will ever become
exact zero as a result of the procedure outlined below. Therefore, the
user is asked to provide a minimum relative error
(TMultiDimFit::SetMinRelativeError), and
will be considered minimized when

Optionally, the user may impose a functional expression by specifying
the powers of each variable in specified functions
(TMultiDimFit::SetPowers). In that case, only the
coefficients is calculated by the class.

Limiting the Number of Terms

As always when dealing with fits, there's a real chance of
over fitting. As is well-known, it's always possible to fit an
polynomial in to points with
, but
the polynomial is not likely to fit new data at all
[1]. Therefore, the user is asked to provide an upper
limit, to the number of terms in
(TMultiDimFit::SetMaxTerms).

However, since there's an infinite number of to choose from, the
user is asked to give the maximum power. , of each variable
to be considered in the minimization of
(TMultiDimFit::SetMaxPowers).

One way of obtaining values for the maximum power in variable , is
to perform a regular fit to the dependent quantity , using a
polynomial only in . The maximum power is is then the
power that does not significantly improve the one-dimensional
least-square fit over to [5].

There are still a huge amount of possible choices for ; in fact
there are
possible
choices. Obviously we need to limit this. To this end, the user is
asked to set a power control limit,
(TMultiDimFit::SetPowerLimit), and a function
is only accepted if

where is the leading power of variable in function
. (TMultiDimFit::MakeCandidates). So the number of
functions increase with (1, 2 is fine, 5 is way out).

To further reduce the number of functions in the final expression,
only those functions that significantly reduce is chosen. What
`significant' means, is chosen by the user, and will be
discussed below (see 2.3).

The functions are generally not orthogonal, which means one will
have to evaluate all possible 's over all data-points before
finding the most significant [1]. We can, however, do
better then that. By applying the modified Gram-Schmidt
orthogonalisation algorithm [5] [3] to the
functions , we can evaluate the contribution to the reduction of
from each function in turn, and we may delay the actual inversion
of the curvature-matrix
(TMultiDimFit::MakeGramSchmidt).

So we are let to consider an matrix
, an
element of which is given by

with

(3)

where labels the rows in the training sample and labels
functions of variables, and . That is, is
the term (or function) numbered evaluated at the data point
. We have to normalise
to for this to
succeed [5]
(TMultiDimFit::MakeNormalized). We then define a
matrix
of which the columns
are given by

So for each new function included in the model, we get a
reduction of the sum of squares of residuals of
,
where
is given by (4) and by
(9). Thus, using the Gram-Schmidt orthogonalisation, we
can decide if we want to include this function in the final model,
before the matrix inversion.

Function Selection Based on Residual

Supposing that steps of the procedure have been performed, the
problem now is to consider the
function.

The sum of squares of residuals can be written as

(12)

where the relation (9) have been taken into account. The
contribution of the
function to the reduction of S, is
given by

(13)

Two test are now applied to decide whether this
function is to be included in the final expression, or not.

Test 1

Denoting by the subspace spanned by
the function
is
by construction (see (4)) the projection of the function
onto the direction perpendicular to . Now, if the
length of
(given by
)
is very small compared to the length of
this new
function can not contribute much to the reduction of the sum of
squares of residuals. The test consists then in calculating the angle
between the two vectors
and
(see also figure 1) and requiring that it's
greater then a threshold value which the user must set
(TMultiDimFit::SetMinAngle).

Figure 1:
(a) Angle between
and
, (b) angle between
and

Test 2

Let
be the data vector to be fitted. As illustrated in
figure 1, the
function
will contribute significantly to the reduction of , if the angle
between
and
is smaller than
an upper limit , defined by the user
(TMultiDimFit::SetMaxAngle)

However, the method automatically readjusts the value of this angle
while fitting is in progress, in order to make the selection criteria
less and less difficult to be fulfilled. The result is that the
functions contributing most to the reduction of are chosen first
(TMultiDimFit::TestFunction).

In case isn't defined, an alternative method of
performing this second test is used: The
function
is accepted if (refer also to equation (13))

(14)

where is the sum of the first residuals from the
functions previously accepted; and is the total number
of functions allowed in the final expression of the fit (defined by
user).

From this we see, that by restricting -- the number of
terms in the final model -- the fit is more difficult to perform,
since the above selection criteria is more limiting.

The more coefficients we evaluate, the more the sum of squares of
residuals will be reduced. We can evaluate before inverting
as shown below.

Having found a parameterization, that is the 's and , that
minimizes , we still need to determine the coefficients
. However, it's a feature of how we choose the significant
functions, that the evaluation of the 's becomes trivial
[5]. To derive
, we first note that
equation (4) can be written as

(15)

where

(16)

Consequently,
is an upper triangle matrix, which can be
readily inverted. So we now evaluate

Considerations

It's important to realize that the training sample should be
representive of the problem at hand, in particular along the borders
of the region of interest. This is because the algorithm presented
here, is a interpolation, rahter then a extrapolation
[5].

Also, the independent variables need to be linear
independent, since the procedure will perform poorly if they are not
[5]. One can find an linear transformation from ones
original variables to a set of linear independent variables
, using a Principal Components Analysis(see TPrincipal), and
then use the transformed variable as input to this class [5]
[6].

H. Wind also outlines a method for parameterising a multidimensional
dependence over a multidimensional set of variables. An example
of the method from [5], is a follows (please refer to
[5] for a full discussion):

Define
are the 5 dependent
quantities that define a track.

Compute, for different values of
, the tracks
through the magnetic field, and determine the corresponding
.

Use the simulated observations to determine, with a simple
approximation, the values of
. We call these values
.

Determine from
a set of at least five relevant
coordinates
, using contrains, or
alternative:

Perform a Principal Component Analysis (using
TPrincipal), and use
to get a linear transformation
, so that
are constrained and linear independent.

Perform a Principal Component Analysis on
, to get linear
indenpendent (among themselves, but not independent of
) quantities

For each component
make a mutlidimensional fit,
using
as the variables, thus determing a set of
coefficents
.

To process data, using this parameterisation, do

Test wether the observation
within the domain of
the parameterization, using the result from the Principal Component
Analysis.

Determine
as before.

Detetmine
as before.

Use the result of the fit to determind
.

Transform back to
from
, using
the result from the Principal Component Analysis.

Testing the parameterization

The class also provides functionality for testing the, over the
training sample, found parameterization
(TMultiDimFit::Fit). This is done by passing
the class a test sample of tuples of the form
, where
are the independent
variables, the known, dependent quantity, and is
the square error in
(TMultiDimFit::AddTestRow).

The parameterization is then evaluated at every
in the
test sample, and

is evaluated. The relative error over the test sample

should not be to low or high compared to from the training
sample. Also, multiple correlation coefficient from both samples should
be fairly close, otherwise one of the samples is not representive of
the problem. A large difference in the reduced over the two
samples indicate an over fit, and the maximum number of terms in the
parameterisation should be reduced.

It's possible to use Minuit
[4] to further improve the fit, using the test sample.

Add a row consisting of fNVariables independent variables, the
known, dependent quantity, and optionally, the square error in
the dependent quantity, to the training sample to be used for the
parameterization.
The mean of the variables and quantity is calculated on the fly,
as outlined in TPrincipal::AddRow.
This sample should be representive of the problem at hand.
Please note, that if no error is given Poisson statistics is
assumed and the square error is set to the value of dependent
quantity. See also the
class description

Add a row consisting of fNVariables independent variables, the
known, dependent quantity, and optionally, the square error in
the dependent quantity, to the test sample to be used for the
test of the parameterization.
This sample needn't be representive of the problem at hand.
Please note, that if no error is given Poisson statistics is
assumed and the square error is set to the value of dependent
quantity. See also the
class description

Calculate Chi square over either the test sample. The optional
argument coeff is a vector of coefficients to use in the
evaluation of the parameterisation. If coeff == 0, then the found
coefficients is used.
Used my MINUIT for fit (see TMultDimFit::Fit)

PRIVATE METHOD:
Invert the model matrix B, and compute final coefficients. For a
more thorough discussion of what this means, please refer to the
class description
First we invert the lower triangle matrix fOrthCurvatureMatrix
and store the inverted matrix in the upper triangle.

Make histograms of the result of the analysis. This message
should be sent after having read all data points, but before
finding the parameterization
Options:
A All the below
X Original independent variables
D Original dependent variables
N Normalised independent variables
S Shifted dependent variables
R1 Residuals versus normalised independent variables
R2 Residuals versus dependent variable
R3 Residuals computed on training sample
R4 Residuals computed on test sample
For a description of these quantities, refer to
class description

Selection method. User can override this method for specialized
selection of acceptable functions in fit. Default is to select
all. This message is sent during the build-up of the function
candidates table once for each set of powers in
variables. Notice, that the argument array contains the powers
PLUS ONE. For example, to De select the function
f = x1^2 * x2^4 * x3^5,
this method should return kFALSE if given the argument
{ 3, 4, 6 }

Set the max angle (in degrees) between the initial data vector to
be fitted, and the new candidate function to be included in the
fit. By default it is 0, which automatically chooses another
selection criteria. See also
class description

Define a user function. The input array must be of the form
(p11, ..., p1N, ... ,pL1, ..., pLN)
Where N is the dimension of the data sample, L is the number of
terms (given in terms) and the first number, labels the term, the
second the variable. More information is given in the
class description

This page has been automatically generated. If you have any comments or suggestions about the page layout send a mail to ROOT support, or contact the developers with any questions or problems regarding ROOT.