A novel system for the robust data analysis yielding the maximum
information

BASIC INFO

DOWNLOADS

QUESTIONS

REFERENCES

History

Mathematical gnostics is an alternative (non-statistical) approach based on a new
paradigm of quantitative uncertainty. It originated on borders of several scientific
fields inspired by ideas and methods of mathematics and geometry, measurement theory,
thermodynamics, mechanics and statistics. Its development was connected with the
professional activity of its author, Pavel Kovanic, reflected in his bibliography.
Educated in high-voltage electrical engineering and retrained in nuclear technology,
he was attracted by problems of nuclear reactor control and statistical data treatment
in nuclear research. But the politically motivated end of his professional career in
nuclear engineering came with the opportunity to work among specialists in statistics
and cybernetics at the Institute of Information Theory and Automation of the
Czechoslovak Academy of Sciences, Prague.

The experience with the multidimensional statistical models resulted in the
Minimum-Penalty Estimate, which enabled to optimize the compromise between unbiased
and biased minimum-variance estimation. However, it also led to understanding, that
losses of data information caused by giving the same weight determined by the
„collective“ variance of a data sample to all – „bad“
and „good“ – data are unnecessary.

The individual weight determined by estimated data error used in robust statistics
could improve the estimate. However, this idea led to the availability of a lot of
„influence“ functions valid under some specific assumptions about data
models and not working elsewhere. Robustness with respect to outliers improved, but
other non-robustness arose by dependence on the subjective assumptions on data
nature.

But there also was a fundamental problem: a non-linear data weighing was equivalent
to introducing a Riemannian metric instead of the Euclidean one, which lies in the
fundament of statistics. But according to B. Riemann, determination of metric of a
real curved space should not be a task for mathematicians: „Metrics are given
objectively by laws of Nature“. This idea was confirmed e.g. by Minkowskian
metric of the special relativistic theory determined by the limited speed of light as
well as by metric of the cosmos determined by gravitation fields in Einstein’s
gravitation theory.

Another problem of the statistical approach was the reliance of many proofs of
statistical statements on the Central Limit Theorem, the validity of which is limited
to „large“ randomly selected data samples having a distribution with the
mean and standard deviation. But many applications do not support such a data model.
It was obvious, that a Law of Nature more universally applicable even to individual
uncertain data and to small data sample should be found to justify the measuring and
composition of uncertain data.

This motivation led to the gnostic theory of individual uncertain data and small
samples. Metric of the space of uncertain data has been shown to result from
structural features of properly quantified real uncertain data. Extremals of this
space describe the nature of data uncertainty and enable the optimum estimation path
to be determined to minimize the data uncertainty. Entropy increase and information
loss caused by the uncertainty can be derived by using the classical (non-statistical,
Clausius’s) entropy for an individual uncertain data item. Probability
distribution of such a data item is the final result of proving the equation of the
mutual conversion of entropy and information (recalling the idea of
„Maxwell’s demon“). Fundamental characteristics of an uncertain data
item (the irrelevance – „data error“ and its integral, data weight)
are shown to be isomorphic with the pair momentum and energy of a free relativistic
particle. This mapping is Lorentz-invariant, i.e. valid for all amounts of uncertainty
(and for all corresponding velocities of the particle). The Lorentz-invariant
uncertainty characteristics (the „quantifying“ ones) irrelevance and data
weights thus have their estimating counterparts. Estimating characteristics differ
from quantifying ones by their natural robustness: estimation is robust with respect
to outliers while quantification is robust to inliers (incresing the weights of
peripheral data).

The uncertainty ? mechanics mapping implies validity of the additive composition of
irrelevances and the same law for data weights for the quantification process. This
means, that the composition law for uncertain data is justified by the Energy-Momentum
Conservation Law of relativistic mechanics.

It is well-known from the history of sciences, that promotion of a new paradigm
represented always a difficult process. No wonder, that paradigm of attaching entropy,
information and probability to a single data item, justifying their non-linear
measuring by using non-Euclidean geometries and supporting its statement by
thermodynamics and relativistic mechanics was not met favourably by the scientific
environment full of statisticians. Continuation of this type of research in this
environment was possible only due to the support of some few colleagues, minds of
which were open enough to cross the boundary of statistical paradigm. Another positive
support came from the industry, where the new methodology in the form of gnostic
software was reaping successes.

The development of gnostic functions was going in parallel with the progress of
computing technology from small programmed calculators to modern PC’s, because
the large computers running in batch regime did not provide sufficiently fast
feed-back required by the intricate functions.

Two modern mathematical and statistical computing environments deserve to be
mentioned in this connection:

The S-PLUS™ (www.insightful.com) , which was used for development of a broad scale of
gnostic functions by using its S-language in the long term. Many tests and
applications were thus enabled.

The environment of the R-project (www.r-project.org). R is a language and environment for
statistical computing and graphics. It is a GNU project which is similar to the S
language and environment. R is available as Free Software under the terms of the
Free Software Foundation’s GNU General Public License in source code
form.

Starting in 2000, the field of Health Risk Assessment and the monitoring of
pollutions in the environment became the prevailing focus of applications of
mathematical gnostics. Cooperation with the Institute of Public Health, Ostrava
(http://www.zuova.cz/) turned up to be fruitful for both development and application
of the mathematical gnostics within the framework of three research projects of the
European Union: