VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP Functions

An aggregate function that returns the
variance of a set of
numbers. This is a mathematical property that signifies how far the values spread apart from the mean. The
return value can be zero (if the input is a single value, or a set of identical values), or a positive number
otherwise.

This function is typically used in mathematical formulas related to probability distributions.

The VARIANCE_SAMP() and VARIANCE_POP() functions compute the sample
variance and population variance, respectively, of the input values. (VARIANCE() is an alias
for VARIANCE_SAMP().) Both functions evaluate all input rows matched by the query. The
difference is that STDDEV_SAMP() is scaled by 1/(N-1) while
STDDEV_POP() is scaled by 1/N.

The functions VAR_SAMP() and VAR_POP() are the same as
VARIANCE_SAMP() and VARIANCE_POP(), respectively. These aliases are
available in Impala 2.0 and later.

If no input rows match the query, the result of any of these functions is NULL. If a single
input row matches the query, the result of any of these functions is "0.0".

Examples:

This example demonstrates how VARIANCE() and VARIANCE_SAMP() return the
same result, while VARIANCE_POP() uses a slightly different calculation to reflect that the
input data is considered part of a larger "population".

This function cannot be used in an analytic context. That is, the OVER() clause is not allowed at all with this function.

Related information:

The STDDEV(), STDDEV_POP(), and STDDEV_SAMP() functions
compute the standard deviation (square root of the variance) based on the results of
VARIANCE(), VARIANCE_POP(), and VARIANCE_SAMP()
respectively. See STDDEV, STDDEV_SAMP, STDDEV_POP Functions for details about the standard deviation property.