STAT7005 Multivariate Methods

Chapter 2

2 Multivariate Normal and Related Distributions

In this course, all methods are based on the assumption that the underlying multivariate distributionis multivariate normal. In this chapter, we shall summarize some properties of this distribution, showhow the maximum likelihood estimators of unknown parameters are derived and record the samplingdistributions of the estimators for later reference. In this course, students are not required to be ontop of the derivations or the mathematical details, but just to be aware of the results as a backgroundfor the forthcoming methods. We also consider assessment of the assumption of multivariate normalityand the possible remedial transformations when the assumption is violated.

2.1

Multivariate Normal Distribution

DefinitionA random vector x is said to have a multivariate normal distribution (multinormaldistribution) if every linear combination of its components has a univariate normal distribution.

Assessing Normality Assumption

Before any statistical modeling, it is crucial to verify if the data at hand satisfy the underlyingdistributional assumptions. For most multivariate analyses, it is important that the data indeed followthe multivariate normal, at least approximately if not exactly. Here are some commonly used methods.1. Check each variable for univariate normality (necessary for multinormality but notsufficient) [Use either SAS procedure proc univariate or SAS/INSIGHT. To invoke the latter,we need to select buttons in the following sequence: Solution . Analysis . Interactive DataAnalysis] Q-Q plot (quantile against quantile plot) for normal distribution sample quantiles are plotted against the theoretical quantiles of a standard normaldistribution a straight line indicates univariate normality non-linear transformation on the variable may help to achieve the normality.HKU STA7005 (2016-17, Semester 1)

A straight line indicates multivariate normality.

(b) Bivariate Normal Distribution

Chi-square Quantile

Chi-square Quantile

(a) Skewed Bivariate Normal Distribution

Ordered Mahalanobis Distance

(c) Chi-square QQ Plot of (a)

Ordered Mahalanobis Distance

(d) Chi-square QQ Plot of (b)

In this course, a SAS macro code to produce this Q-Q plot will be provided.3. Check each Principal Component (PC) for univariate normality (necessary condition; and ifthe sample size n is large enough, a sufficient condition)The PCs are readily available and their univariate normality easily checked by SAS/INSIGHT;otherwise the procedure proc princomp is required before we can use proc univariate to checknormality.

2.5

Transformations to Near Normality

To achieve the multinormality of the data, univariate transformation is applied to each variableindividually. After then, the multinormality of transformed variables is checked again. Followings arethe transformation commonly used in practice:HKU STA7005 (2016-17, Semester 1)

STAT7005 Multivariate Methods

Chapter 2

1. Use the most common transformation: log x or log(x + 1).

2. Choose a transformation based on theory; some examples are given below

count data: xp

percentages or proportions: sin1 x/100 or sin1 x

3. Use univariate Box-Cox transformationThe transformed x is denoted as x() where

x 1()x = logx

for 6= 0for = 0

and is a unknown parameter. Typically, can be chosen by

(a) priori information of Power, 3210.500.5123

Transformationx3x2x

xlog x

1/ x1/x1/x21/x3

= arg min Pn (x() x() )2 .

(b) minimum sample variance of x() , i.e., i=1 i

log(x)

Normal Quantile

Normal Quantile

(a) Log-normal Distribution

(b) Log-transformation ( = 0)

(Before Box-Cox Transformation)

(After Box-Cox Transformation)

4. Use multivariate Box-Cox transformation

Each variable is transformed by univariate Box-Cox transformation with different parameters.HKU STA7005 (2016-17, Semester 1)

STAT7005 Multivariate Methods

Chapter 2

The parameters s are estimated jointly by the maximum likelihood estimation.

Note: The transformations above require x 0 and some of them require x > 0. For more generaltransformation methods, see Sakia (1992, The Statistician, 169178).