Limit on the number of variates and/or factors in the terms to be fitted; default 3

NTIMES = scalar

Number of permutations to make; default 999

BLOCKSTRUCTURE = formula

Model formula defining any blocking to consider during the randomization; default none

EXCLUDE = factors

Factors in the block formula whose levels are not to be randomized

SEED = scalar

Seed for the random number generator used to make the permutations; default 0 continues from the previous generation or (if none) initializes the seed automatically

Parameter

TERMS = formula

List of explanatory variates and factors, or model formula, defining the model to fit

Description

In regression analyses, random permutation tests provide an alternative to using the F probabilities, printed for variance ratios in summary or accumulated analysis of variance tables, when the assumptions of the analysis are not satisfied. These assumptions can be assessed by studying the residual plots produced by RCHECK. In particular, the use of the F distribution to calculate the probabilities is based on the assumption that the residuals from each stratum have Normal distributions with equal variances, and so the histogram of residuals produced by RCHECK should look reasonably close to the Normal, bell-shaped curve. Experience shows the analysis is robust to small departures from Normality. RPERMTEST can be useful if the histogram looks very non-Normal. You can also use RPERMTEST to generate probabilities for deviances or deviance ratios in generalized linear models, instead of using the customary chi-square or F distributions (which are justified by asymptotic theory).

Before using RPERMTEST, you need to give a MODEL statement to define the y-variate and so on, as usual for a regression or generalized model. The terms to fit in the regression model are specified by the TERMS parameter of RPERMTEST. As in the FIT directive, this can supply a list of variates for a simple or multiple linear regression, or a model formula with variates and/or factors for more complicated models. As usual, the CONSTANT option indicates whether or not to fit the constant, and the FACTORIAL option sets a limit as usual on the number of variates and/or factors in each of the terms generated from a TERMS formula.

The NTIMES option defines how many random permutations to perform; by default there are 999 (as well as the “null” permutation where the data keep their original order). The SEED option allows you to specify the seed to use for the random-number generator that is used to construct them. The default, SEED=0, continues the sequence of random numbers from a previous generation or, if this is the first use of the generator in this run of Genstat, it initializes the seed automatically. If NTIMES exceed the maximum possible number of permutations for the data, an “exact” test is performed in which every permutation is used once. This is feasible only for small datasets. There are n! (n factorial) permutations of n units: 3!=6, 4!=24, 5!=120, 6!=720, 7!=5040, 8!=40320, and so on.

If the regression is being used to analyse a designed experiment, you may need to use the BLOCKSTRUCTURE option to specify a block model to define how to do the randomization. The EXCLUDE option can then restrict the randomization so that one or more of the factors in the block model is not randomized. See the RANDOMIZE directive for further details.

The probabilities are determined from the distribution of the statistics of interest, over the permuted datasets. In an ordinary regression, the statistics are the variance ratios from the summary-of-analysis or accumulated-analysis-of-variance tables. In generalized linear models they will be deviances when the dispersion is fixed, or deviance ratios when it is estimated (as defined by the DISPERSION option of the MODEL directive).

Output is controlled by the PRINT option, with settings:

probability

to print the probability for the whole regression model;

summary

to print the summary-of-analysis table with the usual probability for the regression model replaced by the probability from the permutation test;

accumulated

to print the accumulated analysis of variance or deviance table with the usual probabilities replaced by those from the permutation test;

critical

to accompany the summary or accumulated tables by a table giving estimated critical values for each of the statistics.

Method

RPERMTEST uses RANDOMIZE to perform the permutations, taking account of any block structure of the date. The model is fitted, for each data set using either FIT or FITINDIVIDUALLY. (FITINDIVIDUALLY is needed if the accumulated table is required for a generalized linear model.) The ACCUMULATED and SUMMARY options of RKEEP are used to save the information from each analysis, and the QUANTILES function is used to calculate the critical values.