About variable scaling

Contents

Variables with different scales

In order to understand concept of the scale, consider hypothetical optimization problem which came from particle physics.

Suppose that we process experimental data in order to determine mass and speed of a particle passing through detector.
In order to do this we minimize some merit function of two parameters.
One variable is mass of the particle, measured in kilograms, and another one is a speed measured in meters per second.
We don't know neither mass nor speed prior to solving our problem, but we can guess that mass will be somewhere about 10 -27 kilograms,
and speed will be somewhere about 10 7 meters per second.
What if we try to solve such a problem?
We'll meet two kinds of problems.

First, optimization algorithm treats all variables in the same way.
10% error in the speed (about 10 6 units of measurement) will be considered more important than 100% error in the mass (about 10 -27 units of measurement).
As result, it will be hard to specify some meaningful stopping criterion.
Some fixed value will be too strict for one variable - or too loose for another one.

Second, such function will be too hard to optimize.
Same step equal to 1.0 will be too large for the mass axis (mass of the particle is just a minor fraction of nanogram, not kilogram),
but too small for the speed axis (particle in question moves many times faster than one meter per second).
It will be almost impossible to make more than one step on such a problem.

Both these problems can be solved by telling optimizer that scale of the first variable is about 10 -27, and scale of the second one is 10 7.
ALGLIB optimizers have special functions whose names end with ...setscale().
But how it will solve these problems?

First, scale of the variable will be used when checking for the stopping conditions.
For example, when we talk about "step as short as 0.001" we use 0.001 times the scale of the variable.
I.e. first component of the step (mass) must be as short as 10 -30 and second one - as short as 10 4 in order for algorithm to stop.
Function gradient will be componentwise multiplied by the scale vector before testing gradient-based stopping criteria.

Second, scale of the variables may hint optimizer that steps along mass axis must be many times shorter than steps along speed axis.
This is called preconditioning - we can build scale-based preconditioner
which will allow us to solve even badly scaled problems by appropriately scaling and twisting search direction before making steps.

Uncentered variables

Above we've considered example when variables were wildly different in their magnitudes.
Now we consider situation when it is hard to determine exact value of the variable scale.

Consider that metal trader tries to determine optimal selling price for several steel sheets.
Purchase price was 1000 EUR, but because of decline in prices retail markup can't be too high and even can be negative.
In order to determine optimal price he tries to find extremum of penalty function which accounts for expected selling time, taxes, etc.
One of the variables is a selling price itself, which can be between 990 EUR and 1010 EUR.
The question is - what is the scale of this variable?

At the first sight we can think that scale of the variable is equal to its magnitude - about 1000 EUR.
But in fact we have interesting situation here - scale of the variable is many times less than its magnitude.
Our trader can't increase prices above 1010 EUR, because it will be impossible to sell goods for such high price.
Neither he can sell cheaper than 990 EUR because losses will be too large.
Obviously that scale of the variable is equal to 10 EUR, which is 100 times smaller than its typical magnitude.