This chapter describes the statistical functions provided by the PLT Scheme Science Collection. The basic statistical functions include functions to compute the mean, variance, and standard deviation. More advanced functions allow you to calculate absolute deviation, skewness, and kurtosis, as well as the median and arbitrary percentiles. The algorithms use recurrance relations to compute average quantities in a stable way, without large intermediate values that might overflow.

The functions described in this chapter are defined in the statistics.ss file in the science collection and are made available ising the following form:

This function returns the absolute deviation of data relative to the given value of the mean, mu. If mu is not provided, it is calculated by a call to (mean data). This function is also useful if you want to calculate the absolute deviation relative to any value other than the mean, such as zero or the median.

The shewness measures the symmetry of the tails of a distribution. This function returns the skewness of data using the given values of the mean, mu and standard deviation, sigma. If mu and sigma are not provided, they are calculated by calls to (mean data) and (standard-deviation data mu).

The kurtosis measures how sharply peaked a distribution is relative to its width. This function returns the kurtosis of data using the given values of the mean, mu and standard deviation, sigma. If mu and sigma are not provided, they are calculated by calls to (mean data) and (standard-deviation data mu).

This function returns the covariance of data1 and data2, which must both be the same length, using the given values of the means, mu1 and mu2. If the values of mu1 and mu2 are not given, they are calculated using calls to (mean data1) and (mean data2), respectively.

This function returns the weighted standard deviation of data using weights w with a fixed weighted population mean, wmu. The result is the square root of the weighted-variance-with-fixed-mean function.

This function returns the weighted absolute deviation of data using weights s relative to the given value of the weighted mean, wmu. If wmu is not provided, it is calculated by a call to (weighted-mean w data). This function is also useful if you want to calculate the weighted absolute deviation relative to any value other than the weighted mean, such as zero or the weighted median.

The shewness measures the symmetry of the tails of a distribution. This function returns the weighted skewness of data using weights w and the given values of the weighted mean, wmu and weighted standard deviation, wsigma. If wmu and wsigma are not provided, they are calculated by calls to (weighted-mean w data) and (weighted-standard-deviation w data wmu).

The kurtosis measures how sharply peaked a distribution is relative to its width. This function returns the weighted kurtosis of data using weights w and the given values of the weighted mean, wmu and weighted standard deviation, wsigma. If wmu and wsigma are not provided, they are calculated by calls to (weighted-mean w data) and (weighted-standard-deviation w data wmu).

The median and percentile functions described in this section operate on sorted data. The contracts for these functions enforce this. Also, for convenience we use quantiles measured on a scale of 0 to 1 instead of percentiles (which use a scale of 0 to 100).

median-from-sorted-data

Function:

(median-from-sorted-datasorted-data)

Contract:

(-> (and/cnon-empty-vector-of-reals?sorted?)
real?)

This function returns the median value of sorted-data. When the dataset has an odd number of elements, the median is the value of element (n- 1)/2. When the dataset has an even number of elements, the median is the mean of the two nearest middle values, elements (n- 1)/2 and n/2.

quantile-from-sorted-data

Function:

(qualtile-from-sorted-datasorted-dataf)

Contract:

(-> (and/cnon-empty-vector-of-reals?sorted?)
(real-in0.01.0) real?)

This function returns a quantile calue of sorted-data. The quantile is determined by the value f, a fraction between 0 and 1. For example, to compute the value of the 75th percentile, f should have the value 0.75.

This example generates two vectors of data from a unit Gaussian distribution and a vector of cosine squared weighting data. All of the vectors are of length 1,000. These data are used to test all of the statistics functions.