Spatial correlation in designed experiments

Last Wednesday I had a meeting with the folks of the New Zealand Drylands Forest Initiative in Blenheim. In addition to sitting in a conference room and having nice sandwiches we went to visit one of our progeny trials at Cravens. Plantation forestry trials are usually laid out following a rectangular lattice defined by rows and columns. The trial follows an incomplete block design with 148 blocks and is testing 60 Eucalyptus bosistoana families. A quick look at survival shows an interesting trend: the bottom of the trial was much more affected by frost than the top.

1

2

3

4

5

6

7

setwd('~/Dropbox/quantumforest')

library(ggplot2)# for qplot

library(asreml)

load('cravens.Rdata')

qplot(col,row,fill=Surv,geom='tile',data=cravens)

We have the trial assessed for tree height (ht) at 2 years of age, where a plot also shows some spatial trends, with better growth on the top-left corner; this is the trait that I will analyze here.

Tree height can be expressed as a function of an overall constant, random block effects, random family effects and a normally distributed residual (this is our base model, m1). I will then take into account the position of the trees (expressed as rows and columns within the trial) to fit spatially correlated residuals (m2)—using separable rows and columns processes—to finally add a spatially independent residual to the mix (m3).

1

2

3

4

5

6

7

8

9

10

11

# Base model (non-spatial)

m1=asreml(ht~1,random=~Block+Family,data=cravens)

summary(m1)$loglik

[1]-8834.071

summary(m1)$varcomp

gamma component std.errorz.ratio constraint

Block!Block.var0.526067391227.4058168.846817.269345Positive

Family!Family.var0.06257139145.989842.206753.458921Positive

R!variance1.000000002333.172278.3273329.787459Positive

m1 represents a traditional family model with only the overall constant as the only fixed effect and a diagonal residual matrix (identity times the residual variance). In m2 I am modeling the R matrix as the Kronecker product of two separable autoregressive processes (one in rows and one in columns) times a spatial residual.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

# Spatial model (without non-spatial residual)

m2=asreml(ht~1,random=~Block+Family,

rcov=~ar1(row):ar1(col),

data=cravens[order(cravens$row,cravens$col),])

summary(m2)$loglik

[1]-8782.112

summary(m2)$varcomp

gamma component std.errorz.ratio

Block!Block.var0.422323031025.9588216157.007994426.534437

Family!Family.var0.05848639142.082287440.203469563.534080

R!variance1.000000002429.322430888.8306420927.347798

R!row.cor0.099153200.09915320.029818083.325271

R!col.cor0.280440240.28044020.0260597210.761445

constraint

Block!Block.varPositive

Family!Family.varPositive

R!variance Positive

R!row.cor Unconstrained

R!col.cor Unconstrained

Adding two parameters to the model results in improving the log-likelihood from -8834.071 to -8782.112, a difference of 51.959, but with what appears to be a low correlation in both directions. In m3 I am adding an spatially independent residual (using the keyword units), improving log-likelihood from -8782.112 to -8660.411: not too shabby.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

m3=asreml(ht~1,random=~Block+Family+units,

rcov=~ar1(row):ar1(col),

data=cravens[order(cravens$row,cravens$col),])

summary(m3)$loglik

[1]-8660.411

summary(m3)$varcomp

gamma component std.errorz.ratio

Block!Block.var3.069864e-074.155431e-046.676718e-056.223763

Family!Family.var1.205382e-011.631630e+024.327885e+013.770040

units!units.var1.354166e+001.833027e+037.085134e+0125.871456

R!variance1.000000e+001.353621e+032.174923e+026.223763

R!row.cor7.814065e-017.814065e-014.355976e-0217.938724

R!col.cor9.529984e-019.529984e-019.100529e-03104.719017

constraint

Block!Block.varBoundary

Family!Family.varPositive

units!units.varPositive

R!variance Positive

R!row.cor Unconstrained

R!col.cor Unconstrained

Allowing for the independent residual (in addition to the spatial one) has permitted higher autocorrelations (particularly in columns), while the Block variance has gone very close to zero. Most of the Block variation is being captured now through the residual matrix (in the rows and columns autoregressive correlations), but we are going to keep it in the model, as it represents a restriction to the randomization process. In addition to log-likelihood (or AIC) we can get graphical descriptions of the fit by plotting the empirical semi-variogram as in plot(variogram(m3)):