Content awareness

Descriptive Statistics

Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
in that descriptive statistics aim to summarize a data set, rather than use the data to learn about the population that the data are thought to represent.

References Descriptive Statistics

Probability theory

Probability theory is the branch of mathematics concerned with probability, the analysis of random phenomena.
The central objects of probability theory are random variables, stochastic processes,
and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single occurrences or evolve over time in an apparently random fashion.

historical references

calcula, measures, units

Calcution is the basic step with Mathematics .
Many methods with measures haven been used. It have been made common with the Decimal approach.
Numbers like 12 (2,3,4) and 60 (2,3,4,5,6) are better divisible as by 10 (2,5). 10 based calculations have become common accepted.
The only excecption is the technical computer approach as binary based, mostly notitions are in hexadecimal.

0_(number) Greek and Romans did note use a decimal system with a placeholder. The decimal system has an other origin.

Trigonometry
has not touched that much by a decimal approach. Radians (Pi of pythagoras) has more influence.
Angles are still in degrees, 360 to be round up, or 2pi based. Some french approach angles till 400.
The measurement of the earth has become very accurate with GPS.

The time and calendar did not changed to a metric system it has been tried:
French_Republican_Calendar
Still using hours of 60 minutes and every minute of 60 seconds. This can be very easy with locations (gis) and positions/time on earth.

Greek fundamentals

Pythagoras

The fundamental of the western world are of the old greek. The most famous is: Pythagoras of Samos
Pythagoras ho Samios "Pythagoras the Samian", b. about 570 – d. about 495 BC was an Ionian Greek philosopher, mathematician, and founder of the religious movement called Pythagoreanism.
Most of the information about Pythagoras was written down centuries after he lived, so very little reliable information is known about him.

Mean

In statistics, mean has two related meanings:
the arithmetic mean (and is distinguished from the geometric mean or harmonic mean).
the expected value of a random variable, which is also called the population mean.

Median

In statistics and probability theory, median is described as the numerical value separating the higher half of a sample,
a population, or a probability distribution, from the lower half.

Normal

In probability theory, the normal (or Gaussian) distribution is a continuous probability distribution that has a bell-shaped probability density function,
known as the Gaussian function or informally the bell curve

Poisson

In probability theory and statistics, the Poisson distribution (pronounced [pwas?~]) is a discrete probability distribution that expresses the probability of a given number of events occurring
in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.

Uniform

In probability theory and statistics, the discrete uniform distribution is a probability distribution whereby a
finite number of equally spaced values are equally likely to be observed; every one of n values has equal probability 1/n.

Skewness

In probability theory and statistics, Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable.

Kurtosis

In probability theory and statistics, Kurtosis kurtosis is any measure of the "peakedness" of the probability distribution of a real-valued random variable.[1] In a similar way to the concept of skewness, kurtosis is a descriptor of the shape of a probability distribution and, just as for skewness, there are different ways of quantifying it for a theoretical distribution and corresponding ways of estimating it from a sample from a population.

Chi-squared

In probability theory and statistics, the chi-squared distribution(also chi-square or ?²-distribution) with
k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables

F-test

A F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis.
It is most often used when comparing statistical models that have been fit to a data set, in order to identify the model that best fits the population from which the data were sampled.

colinear , hidden variable

In geometry, Colinearity is a property of a set of points, specifically, the property of lying on a single line. A set of points with this property is said to be collinear (often misspelled as, but should not be confused with, co-linear or colinear). Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated. In this situation the coefficient estimates may change erratically in response to small changes in the model or the data.

Anova correlation clustering

Mahalanobis distance

In statistics, Mahalanobis distance
is a distance measure introduced by P. C. Mahalanobis in 1936. It is based on correlations between variables by which different patterns can be identified and analyzed

Operational

Decisions Operations

Operational research

Operational research
Operations research, or Operational Research in British usage, is a discipline that deals with the application of advanced analytical methods to help make better decisions[1]. It is often considered to be a sub-field of Mathematics. The terms management science and decision science are sometimes used as more modern-sounding synonyms.

MPS , LP programming , AHP

OODA loop

The OODA loop
The (for observe, orient, decide, and act) is a concept originally applied to the combat operations process,
often at the strategic level in military operations. It is now also often applied to understand commercial operations and learning processes.

monte Carlo , Vegas

A randomized algorithm is an algorithm which employs a degree of randomness as part of its logic.
The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case" over all possible choices of random bits.
Formally, the algorithm's performance will be a random variable determined by the random bits; thus either the running time, or the output (or both) are random variables

CHAID

CHAID is a type of decision tree technique, based upon adjusted significance testing (Bonferroni testing).

Analytics analyse (inferential)

CRISP-DM

The CRISP-DM methodology is described in terms of a hierarchical process model, consisting of sets of tasks described at four
levels of abstraction (from general to specific): phase, generic task, specialized task, and process instance

At the top level, the data mining process is organized into a number of phases; each phase consists of several second-level generic tasks. This second level is called generic because it is intended to be general enough to cover all possible data mining situations.

SEMMA

SEMMA is an acronym that stands for Sample, Explore, Modify, Model and Assess. It is a list of sequential steps developed by SAS Institute Inc., one of the largest producer of business intelligence software. It guides the implementation of data mining applications [1]. Although SEMMA is often considered as a general data mining methodology,
SAS claims that it is "rather a logical organisation of the functional tool set of" one of their product, SAS Enterprise Miner, "for carrying out the core tasks of data mining"

SQL scoring

Statistical_classification In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known."

PMML

The Predictive Model Markup Language (PMML) is an XML-based markup language developed by the Data Mining Group (DMG) to provide a way for applications to
define models related to predictive analytics and data mining and to share those models between PMML-compliant applications.

The Data Mining Group The Data Mining Group (DMG) is an independent, vendor led consortium that develops data mining standards, such as the Predictive Model Markup Language disappointing are the old years (2010) mentioned.

Predicting the future - PMML

PMML is a standard to help deploy (score) data mining models Part 1 offered a general overview of predictive analytics. Part 2 focused on predictive modeling techniques, the mathematical algorithms that make up the core of predictive analytics. Part 3 put those techniques to use and described the making of a predictive solution.

Treatment of missing data

In statistics Imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as "unit imputation"; when substituting for a component of a data point, it is known as "item imputation". Because missing data can create problems for analyzing data, imputation is seen as a way to avoid pitfalls involved with listwise deletion of cases that have missing values

Ensemble Models

In statistics and machine learning, ensemble methods
use multiple models to obtain better predictive performance than could be obtained from any of the constituent models.[1][2][3] Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble refers only to a concrete finite set of alternative models, but typically allows for much more flexible structure to exist between those alternatives.

Game

Games Theory

Classic theory

Gamification

Gamification Gamification is the use of game thinking and game mechanics in a non-game context in order to engage users and solve problems. Gamification is used in applications and processes to improve user engagement, Return on Investment, data quality, timeliness, and learning

Six_Sigma

Control charts , also known as Shewhart charts (after Walter A. Shewhart) or process-behavior charts, in statistical process control are tools used to determine if a manufacturing or business process is in a state of statistical control.