Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

Clearly, the variogram cloud gives too much information. If there is a relation between separation and semivariance, it is hard to see. The usual way to visualize this is by groupingthe point-pairs into lagsor binsaccording to some separation range, and computing some representative semivariancefor the entire lag.

Residuals*: Lack of fit of individual observations; here from about -31.6 to +20.5 percent clay;

Coefficients: Multipliers of each term in the polynomial; here for every meter East, the clay content increases by 0.00065%, i.e. 0.65% per km

Adjusted R-squared: Proportion of variance explained by the model, here about 50%.

Recall: The “adjusted” decreases the apparent , computed from the Analysis of Variance (ANOVA) table, to account for the number of predictive factors:

where is the number of observation and is the number of coefficients.

* The error of an observed value is the deviation of the observed value from the (unobservable) true function value, while the residual of an observed value is the difference between the observed value and the estimated function value (c. Wikipedia).

Some trends require a higher order than a simple plane. They are fit in the same way as the first-order surface, but include higher powers of the coordinates as predictors.

In the present example it is clear from the surface/post-plot and the regression diagnostics that a first-order surface was not satisfactory. So we compute with the square of the coordinates and their cross-product included in the model, and get the following summary:

Residuals:Lack of fit of individual observations; here from -29.5 to +20.8 percent clay; about the same as for the first-order surface

Coefficients:Multipliers of each term in the polynomial, but now there are four: the coordinates and their squares, and of course the intercept.

Statistical significance of coefficients: The listed Pr(>|t|) gives the probability that the coefficient is in fact 0 (the null hypothesis), i.e. that it contributes nothing to the model. In this example the cross-product of the coordinates, i.e. I(UTM_E * UTM_N) is almost surely not necessary and the model can be re-fit without it.

Adjusted R-squared: Proportion of variance explained by the model, here about 52%, a slight improvement over the first-order surface

We evaluate the fit with the same diagnostics as in the first-order model.

Now we turn to local spatial dependence. The idea here is that there is a local process that causes nearby points to be “similar”. We will see how to quantify this and use it in modeling and prediction.

But first we need some theoretical background: a brief explanation the theory underlying optimal geostatistical estimation by kriging.

The presentation is based on R. Webster and M. Oliver, 2001 Geostatistics for environmental scientists, Chichester etc.: John Wiley & Sons.

Problem: We have no way to estimate the expected values of the random process at each location , since we only have one realization (what we actually measure), rather than the whole set of realizations that could have been produced by the random process.

Solution: assume that the expected values at all locations in the field are the same:

This is called first-order stationarityof the random process; note that is now not a function of position.

Then we can estimate the (common) expected value from the sample and its presumed spatial structure

Replace mean valueswith mean differences, which are the same over the whole random field, at least within some “small” separation . Then the expected value is 0:

Replace covariance of values with variances of differences:

The equations only involve the difference in values at a separation, not the values, so the necessary assumption of finite variance need only be assumed for the differences, a less stringent condition.

Can't use non-spatial formulas for sample size, because spatial samples are correlated, and each sample is used multiple times in the variogram estimate

No way to estimate the actual error in the variogramfit, since we have only one realization of the random field

Stochastic simulation from an assumed random field with a known variogram suggests:

< 50 points: not at all reliable

100 to 150 points: more or less acceptable

> 250 points: almost certainly reliable

More points are needed to estimate an anisotropicvariogram.

This is very worrying for many environmental datasets (soil cores, vegetation plots, . . . ) especially from short-term fieldwork, where sample sizes of 40 - 60 are typical. Should variogramseven be attempted on such small samples?