The effective number of observations, that is the number of observations with nonzero weight (see WT for more detail), must be greater than the number of fixed effects in the model (as returned in NFF).

On entry: a matrix of data, with DATij holding the ith observation on the jth variable. The two design matrices X and Z are constructed from DAT and the information given in FIXED (for X) and RNDM (for Z).

Constraint:
if LEVELSj≠1,1≤DATij≤LEVELSj.

5: LDDAT – INTEGERInput

On entry: the first dimension of the array DAT as declared in the (sub)program from which G02JCF is called.

On entry: LEVELSi contains the number of levels associated with the ith variable held in DAT.

If the ith variable is continuous or binary (i.e., only takes the values zero or one) then LEVELSi must be set to 1. Otherwise the ith variable is assumed to take an integer value between 1 and LEVELSi, (i.e., the ith variable is discrete with LEVELSi levels).

On exit: if LICOMM=2, ICOMM1 holds the minimum required value for LICOMM and ICOMM2 holds the minimum required value for LRCOMM, otherwise ICOMM is a communication array as required by the analysis routines G02JDF and G02JEF.

20: LICOMM – INTEGERInput

On entry: the dimension of the array ICOMM as declared in the (sub)program from which G02JCF is called.

On entry: IFAIL must be set to 0, -1​ or ​1. If you are unfamiliar with this parameter you should refer to Section 3.3 in the Essential Introduction for details.

For environments where it might be inappropriate to halt program execution when an error is detected, the value -1​ or ​1 is recommended. If the output of error messages is undesirable, then the value 1 is recommended. Otherwise, if you are not familiar with this parameter, the recommended value is 0. When the value -1​ or ​1 is used it is essential to test the value of IFAIL on exit.

On exit: IFAIL=0 unless the routine detects an error or a warning has been flagged (see Section 6).

6 Error Indicators and Warnings

If on entry IFAIL=0 or -1, explanatory error messages are output on the current error message unit (as defined by X04AAF).

Errors or warnings detected by the routine:

IFAIL=1

On entry, WEIGHT≠'W' or 'U'.

IFAIL=2

On entry, N<1.

IFAIL=3

On entry, NCOL<0.

IFAIL=4

On entry, at least one data point for variable j has data taking a value less than one or greater than LEVELSj.

In summary, G02JCF converts all non-binary categorical variables (i.e., where LFj>1) to dummy variables. If a fixed intercept is included in the model then the first level of all such variables is dropped. If a fixed intercept is not included in the model then the first level of all such variables, other than the first, is dropped. The variables are added into the model in the order they are specified in FIXED.

8.2 Construction of random effects design matrix, Z

Let

NRb denote the number of random variables in the bth random statement, that is NRb=RNDM1b;

Rjb denote the jth random variable from the bth random statement, that is the vector of values held in the kth column of DAT when RNDM2+jb=k;

Rijb denote the ith element of Rjb;

LRjb denote the number of levels for Rjb, that is LRjb=LEVELSRNDM2+jb;

DvRjb denoted an indicator function that returns a vector of values whose ith element is 1 if Rijb=v and 0 otherwise;

NSb denote the number of subject variables in the bth random statement, that is NSb=RNDM3+NRbb;

Sjb denote the jth subject variable from the bth random statement, that is the vector of values held in the kth column of DAT when RNDM3+NRb+jb=k;

Sijb denote the ith element of Sjb;

LSjb denote the number of levels for Sjb, that is LSjb=LEVELSRNDM3+NRb+jb;

Ibs1,s2,…,sNSb denoted an indicator function that returns a vector of values whose ith element is 1 if Sijb=sj for all j=1,2,…,NSb and 0 otherwise.

The design matrix for the random effects, Z, is constructed as follows:

set k to one;

loop over each random statement, so for each b=1,2,…,NRNDM,

loop over each level of the last subject variable, so for each sNSb=1,2,…,LRNSbb,

⋮

loop over each level of the second subject variable, so for each s2=1,2,…,LR2b,

loop over each level of the first subject variable, so for each s1=1,2,…,LR1b,

if a random intercept is included, that is RNDM2b=1,

set the kth column of Z to Ibs1,s2,…,sNSb;

set k=k+1;

loop over each random variable in the bth random statement, so for each j=1,2,…,NRb,

if LRjb=1,

set the kth column of Z to Rjb×Ibs1,s2,…,sNSb where × indicates an element-wise multiplication between the two vectors, Rjb and Ib…;

set k=k+1;

else

set the LRbj columns, k to k+LRbj, of Z to DvRjb×Ibs1,s2,…,sNSb, for v=1,2,…,LRjb. As before, × indicates an element-wise multiplication between the two vectors, Dv… and Ib…;

set k=k+LRjb.

In summary, each column of RNDM defines a block of consecutive columns in Z. G02JCF converts all non-binary categorical variables (i.e., where LRjb or LSjb>1) to dummy variables. All random variables defined within a column of RNDM are nested within all subject variables defined in the same column of RNDM. In addition each of the subject variables are nested within each other, starting with the first (i.e., each of the Rjb,j=1,2,…,NRb are nested within S1b which in turn is nested within S2b, which in turn is nested within S3b, etc.).

If the last subject variable in each column of RNDM are the same (i.e., SNS11=SNS22=…=SNSbb) then all random effects in the model are nested within this variable. In such instances the last subject variable (SNS11) is called the overall subject variable. The fact that all of the random effects in the model are nested within the overall subject variable means that ZTZ is block diagonal in structure. This fact can be utilised to improve the efficiency of the underlying computation and reduce the amount of internal storage required. The number of levels in the overall subject variable is returned in NLSV=LSNS11.

If the last k subject variables in each column of RNDM are the same, for k>1 then the overall subject variable is defined as the interaction of these k variables and

To illustrate some additional points about the RNDM parameter, we assume that we have a dataset with three discrete variables, V1, V2 and V3, with 2,4 and 3 levels respectively, and that V1 is in the first column of DAT, V2 in the second and V3 the third. Also assume that we wish to fit a model containing V1 along with V2 nested within V3, as random effects. In order to do this the RNDM matrix requires two columns:

RNDM=1100120103

The first column, 1,0,1,0,0, indicates one random variable (RNDM11=1), no intercept (RNDM21=0), the random variable is in the first column of DAT (RNDM31=1), there are no subject variables; as no nesting is required for V1 (RNDM41=0). The last element in this column is ignored.

The second column, 1,0,2,1,3, indicates one random variable (RNDM12=1), no intercept (RNDM22=0), the random variable is in the second column of DATRNDM32=2, there is one subject variable (RNDM42=1), and the subject variable is in the third column of DATRNDM52=3.

The corresponding Z matrix would have 14 columns, with 2 coming from V1 and 12 (4×3) from V2 nested within V3. The, symmetric, ZTZ matrix has the form

where 0 indicates a structural zero, i.e., it always takes the value 0, irrespective of the data, and - a value that is not a structural zero. The first two rows and columns of ZTZ correspond to V1. The block diagonal matrix in the 12 rows and columns in the bottom right correspond to V2 nested within V3. With the 4×4 blocks corresponding to the levels of V2. There are three blocks as the subject variable (V3) has three levels.

The model fitting routines, G02JDF and G02JEF, use the sweep algorithm to calculate the log likelihood function for a given set of variance components. This algorithm consists of moving down the diagonal elements (called pivots) of a matrix which is similar in structure to ZTZ, and updating each element in that matrix. When using the k diagonal element of a matrix A, an element aij,i≠k,j≠k, is adjusted by an amount equal to aikaij/akk. This process can be referred to as sweeping on the kth pivot. As there are no structural zeros in the first row or column of the above ZTZ, sweeping on the first pivot of ZTZ would alter each element of the matrix and therefore destroy the structural zeros, i.e., we could no longer guarantee they would be zero.

This matrix is identical to the previous one, except the first two rows and columns have become the last two rows and columns. Sweeping a matrix, A=aij, of this form on the first pivot will only affect those elements aij, where ai1≠0​ and ​a1j≠0, which is only the 13th and 14th row and columns, and the top left hand block of 4 rows and columns. The block diagonal nature of the first 12 rows and columns therefore greatly reduces the amount of work the algorithm needs to perform.

G02JCF constructs the ZTZ as specified by the RNDM matrix, and does not attempt to reorder it to improve performance. Therefore for best performance some thought is required on what ordering to use. In general it is more efficient to structure RNDM in such a way that the first row relates to the deepest level of nesting, the second to the next level, etc..