3 Description

Consider p variables observed on ng populations or groups. Let x-j be the sample mean and Sj the within-group variance-covariance matrix for the jth group and let xk be the kth sample point in a dataset. A measure of the distance of the point from the jth population or group is given by the Mahalanobis distance, Dkj:

Dkj2=xk-x-jTSj-1xk-x-j.

If the pooled estimated of the variance-covariance matrix S is used rather than the within-group variance-covariance matrices, then the distance is:

Dkj2=xk-x-jTS-1xk-x-j.

Instead of using the variance-covariance matrices S and Sj, G03DBF uses the upper triangular matrices R and Rj supplied by G03DAF such that S=RTR and Sj=RjTRj. Dkj2 can then be calculated as zTz where Rjz=xk-x-j or Rz=xk-x-j as appropriate.

A particular case is when the distance between the group or population means is to be estimated. The Mahalanobis squared distance between the ith and jth groups is:

Dij2=x-i-x-jTSj-1x-i-x-j

or

Dij2=x-i-x-jTS-1x-i-x-j.

Note:Djj2=0 and that in the case when the pooled variance-covariance matrix is used Dij2=Dji2 so in this case only the lower triangular values of Dij2, i>j, are computed.

On entry: the
jth row of GMN contains the means of the p selected variables for the jth group, for j=1,2,…,ng. These are returned by G03DAF.

6: LDGMN – INTEGERInput

On entry: the first dimension of the array GMN as declared in the (sub)program from which G03DBF is called.

Constraint:
LDGMN≥NG.

7: GC(NG+1×NVAR×NVAR+1/2) – REAL (KIND=nag_wp) arrayInput

On entry: the first pp+1/2 elements of GC should contain the upper triangular matrix R and the next ng blocks of pp+1/2 elements should contain the upper triangular matrices Rj. All matrices must be stored packed by column. These matrices are returned by G03DAF. If EQUAL='E' only the first pp+1/2 elements are referenced, if EQUAL='U' only the elements pp+1/2+1 to ng+1pp+1/2 are referenced.

Constraints:

if EQUAL='E', R≠0.0;

if EQUAL='U', the diagonal elements of the Rj≠0.0, for j=1,2,…,NG.

8: NOBS – INTEGERInput

On entry: if MODE='S', the number of sample points in X for which distances are to be calculated.

On entry: if MODE='S',
ISXl indicates if the lth variable in X is to be included in the distance calculations. If ISXl>0 the lth variable is included, for l=1,2,…,M; otherwise the lth variable is not referenced.

If MODE='S',
Dkj contains the squared distance of the kth sample point from the jth group mean, Dkj2, for k=1,2,…,NOBS and j=1,2,…,ng.

If MODE='M' and EQUAL='U',
Dij contains the squared distance between the ith mean and the jth mean, Dij2, for i=1,2,…,ng and j=1,2,…,i-1,i+1,…,ng. The elements
Dii are not referenced, for i=1,2,…,ng.

If MODE='M' and EQUAL='E',
Dij contains the squared distance between the ith mean and the jth mean, Dij2, for i=1,2,…,ng and j=1,2,…,i-1. Since
Dij=Dji the elements Dij are not referenced, for i=1,2,…,ng and j=i+1,…,ng.

14: LDD – INTEGERInput

On entry: the first dimension of the array D as declared in the (sub)program from which G03DBF is called.

Constraints:

if MODE='S', LDD≥NOBS;

if MODE='M', LDD≥NG.

15: WK(2×NVAR) – REAL (KIND=nag_wp) arrayWorkspace

16: IFAIL – INTEGERInput/Output

On entry: IFAIL must be set to 0, -1​ or ​1. If you are unfamiliar with this parameter you should refer to Section 3.3 in the Essential Introduction for details.

For environments where it might be inappropriate to halt program execution when an error is detected, the value -1​ or ​1 is recommended. If the output of error messages is undesirable, then the value 1 is recommended. Otherwise, if you are not familiar with this parameter, the recommended value is 0. When the value -1​ or ​1 is used it is essential to test the value of IFAIL on exit.

On exit: IFAIL=0 unless the routine detects an error or a warning has been flagged (see Section 6).

6 Error Indicators and Warnings

If on entry IFAIL=0 or -1, explanatory error messages are output on the current error message unit (as defined by X04AAF).

Errors or warnings detected by the routine:

IFAIL=1

On entry,

NVAR<1,

or

NG<2,

or

LDGMN<NG,

or

MODE='S' and NOBS<1,

or

MODE='S' and M<NVAR,

or

MODE='S' and LDX<NOBS,

or

MODE='S' and LDD<NOBS,

or

MODE='M' and LDD<NG,

or

EQUAL≠'E' or ‘U’,

or

MODE≠'M' or ‘S’.

IFAIL=2

On entry,

MODE='S' and the number of variables indicated by ISX is not equal to NVAR,

or

EQUAL='E' and a diagonal element of R is zero,

or

EQUAL='U' and a diagonal element of Rj for some j is zero.

7 Accuracy

The accuracy will depend upon the accuracy of the input R or Rj matrices.

8 Further Comments

9 Example

The data, taken from Aitchison and Dunsmore (1975), is concerned with the diagnosis of three ‘types’ of Cushing's syndrome. The variables are the logarithms of the urinary excretion rates (mg/24hr) of two steroid metabolites. Observations for a total of 21 patients are input and the group means and R matrices are computed by G03DAF. A further six observations of unknown type are input, and the distances from the group means of the 21 patients of known type are computed under the assumption that the within-group variance-covariance matrices are not equal. These results are printed and indicate that the first four are close to one of the groups while observations 5 and 6 are some distance from any group.