Statistics Theory and Applications

This page details some of the theory behind various common statistical methods and models. The material presented here should be considered work-in-progress and is constantly evolving. Most of the work included will not be original. Instead, I aim to reference and briefly describe important literature in various topics of statistical methodology. Web links to software implementations will be provided whenever possible. I am currently working on the linear regression section and the various regularisation approaches for linear regression in the large p (number of covariates) and small n (sample size) setting.

Linear Regression

Linear regression is perhaps the most commonly used statistical model in practice. The standard multivariate linear regression model for explaning a vector of observed data can be written as

where

is the vector of linear regression coefficients

are i.i.d. variates distributed as per

denotes the (k x k) identity matrix

is an index vector determining which regressors comprise the design matrix

The design matrix comprising all regressors is denoted as

where and is the maximum number of candidate regressors. The set then indexes any possible design matrix that can be derived from the full matrix . The aim in linear regression is to estimate the unknown parameters and as well as determine the optimal regressor subset .

A popular method for estimating the regression parameters is Fisher’s maximum likelihood approach. The idea is to set the regression coefficients to the values that maximise the likelihood given the observed data. In the case of linear regression, the maximum likelihood estimates exist in closed form provided that (1) , and (2) the regressors are not highly correlated. The maximum likelihood estimator can be written as

The maximum likelihood estimate of the regression parameters has some nice statistical properties. It is an unbiased estimator of the true regression coefficients, and it is strongly consistent provided that

I recommend the paper by Lai et. al. for consistency proofs and further results (see References below). However, we cannot use maximum likelihood if:

the regressors are highly correlated, or

if the number of regressors is greater than the number of samples.

Since maximum likelihood does not zero out regressors, we cannot use maximum likelihood alone to select the optimal regressor subset.

Recently, there has been a large amount of interest in regularisation approaches to linear regression. The idea here is to again maximise the (log)likelihood subject to inequality constraints on the regression coefficients. The nature of the inequality constraints determines the properties of the resulting estimates and the type of regularisation. Some commonly used regularisation methods are discussed below: