The goals of this lesson are to introduce how spatial dependence detected and modeled for uni- and multi-variate response variables. Although in this lesson we focus on spatial dependence the core concepts of how to detect and model a lack of independence in samples may be applied to dependence due to: time, relatedness, position on a genome, and many other study specific driver variables.
There are two general reasons why spatial dependence may exist in a response variable.
autocorrelation - the variable is spatially non-random due its own internal dynamics. In ecology this is sometimes referred to as a false gradient. An example of this is when offspring have limited dispersal from their natal site and therefore finding one organism increases the likelihood of finding another.
induced spatial dependence: the variable is not inherently spatially structured but the driving factors that it responds to are. In ecology this is sometimes referred to as a true gradient. An example of this phenomena would be if an organism’s physiology is driven by temperature, and if temperature is non-randomly spatially structured then the organism should also display non-random patterns of occurrence.
There is no statistical way to distinguish false from true gradients.
We are going to use data on the occurrence of Oribatid mites collected from 70 soil cores across a field to examine how to detect non-random spatial signals for both univariate and multi-variate response variables. Specifically, we will examine a vector of species richness (i.e., number of species in a site), and a matrix of species composition (i.e., a community abundance matrix):
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.6-4
library(nlme)
# Oribatid mite data. 70 soil cores collected by Daniel Borcard in 1989.
# See Borcard et al. (1992, 1994) for details.
data(mite)
data(mite.env)
data(mite.xy)
?mite
plot(mite.xy)
sr <- rowSums(mite > 0)
hist(sr)
plot(mite.xy, cex = sr/max(sr))
col_brks <- hist(sr, plot=F)$breaks
col_indices <- as.numeric(cut(sr, col_brks))
cols <- rev(terrain.colors(length(col_brks)))
plot(mite.xy, cex=2, pch=19, col=cols[col_indices])
Visually it appears that low richness sites (i.e., brown circles) are more likely to be near other low richness sites - this indicates a pattern of spatial dependence. We can carry out some very simple analyses to examine if this relationship is stronger than we would expect under a null model of randomly shuffled spatial positions. Our test statistic in this context is the Pearson correlation coefficient between the difference in the response variable (i.e., richness) and the difference in spatial proximity.
# calculate Euclidean distance between richness and spatial coordinates
sr_dist <- dist(sr)
xy_dist <- dist(mite.xy)
For interpretation purposes a rule of thumb is not to interpret distances great than 1/2 the maximum distance in the dataset. This is to avoid examining spatial patterns that are underlaid by only a few samples. At small to intermediate distances there are typically many more pairs of samples where as at the extreme ends of a sampling grid there are only two sets of samples (i.e., those that lie along the two diagonals corners from one another)
max_dist <- max(xy_dist) / 2
# plot result
plot(xy_dist, sr_dist)
abline(lm(sr_dist ~ xy_dist), lwd=3, col='red')
lines(lowess(xy_dist, sr_dist), lwd=3, col='pink')
abline(v = max_dist, col='red', lwd=3, lty=2)
# compute correlation
obs_cor <- cor(xy_dist, sr_dist)
obs_cor
## [1] 0.366253
Based on the graphic above does it appear that values of species richness are more similar or more different the further apart they are in space?
The above graphic demonstrates that the greater the distance between two samples
the more different those samples are in their species richness. In other words,
values of species richness are displaying positive autocorrelation or a spatially
aggregated pattern.
What would a random spatial signal look like on this graph?
A flat line (i.e., a correlation close to zero).
We can better examine if our intuition is correct by randomizing the location of each sample and re-making our graphic and re-estimating the correlation coefficient.
# randomize the rows of the spatial coordinates matrix
?sample
null_xy <- mite.xy[sample(nrow(mite.xy)), ]
null_dist <- dist(null_xy)
plot(null_dist, sr_dist)
abline(lm(sr_dist ~ null_dist), lwd=3, col='red')
lines(lowess(null_dist, sr_dist), lwd=3, col='pink')
abline(v = max_dist, col='red', lwd=3, lty=2)
# compute null correlation
null_cor <- cor(null_dist, sr_dist)
null_cor
## [1] 0.0655278
That does look pretty flat for this randomized spatial location. We can develop a permutation test if we carry out this same procedure many times to develop a null distribution of correlation values. Then we can ask if our observed value of r = 0.37 is larger than we would expect simply due to random chance.
Here we will demonstrate how to make make this on your own and how to
carry it out using the vegan
function
mantel
# carry out a permutation test for significance:
nperm <- 999
null_cor <- NULL
# now we'll want to generate 999 random null estimates of r
for (i in 1:nperm) {
# shuffle the rows of the spatial coordinates
null_xy <- mite.xy[sample(nrow(mite.xy)), ]
# correlation between the shuffled spatial coordinates and sr_dist
null_cor[i] <- cor(dist(null_xy), sr_dist)
}
# to get our 1000th possible outcome we'll consider that the observed
# correlation value is a 'possibility' we should consider in the null distribution:
null_cor <- c(null_cor, obs_cor)
# compute the p-value which is simply the proportion of times we observed a
# value equal to or greater than the observed correlation. We will learn that a
# value equal to or as large as the observed value only occurs once out of the
# 1000 possiblities considered so p = 0.001
sum(null_cor >= obs_cor) / (nperm + 1)
## [1] 0.001
# carry out the same analysis using the function mantel()
?mantel
sr_mantel <- mantel(xy_dist, sr_dist)
sr_mantel
##
## Mantel statistic based on Pearson's product-moment correlation
##
## Call:
## mantel(xdis = xy_dist, ydis = sr_dist)
##
## Mantel statistic r: 0.3663
## Significance: 0.001
##
## Upper quantiles of permutations (null model):
## 90% 95% 97.5% 99%
## 0.0575 0.0694 0.0831 0.1060
## Permutation: free
## Number of permutations: 999
We can see that the estimated p-value is 0.001 (from both our hand
made algorithm and the vegan::mantel
function). This
indicates that there is a statistically significant pattern of positive
spatial autocorrelation which fits our intuitive interpretation we
developed graphically.
Now let’s examine the distribution of null correlation values to better understand how the p-value was computed.
# compare the two approaches graphically using stacked boxplots
boxplot(list(null_cor, sr_mantel$perm), horizontal = F, boxwex = 0.5,
names = c('hand-made algo', 'vegan::mantel'), ylab='Correlation',
ylim = range(null_cor))
abline(h=obs_cor, col='red')
abline(h=0, lty = 2, col = 'grey')
As expected the null distribution is much closer to zero (grey dashed
line) than the observed value at 0.37 (red solid line). Additionally, it
is re-assuring to see that our hand-made algorithm and the
vegan::mantel
approaches are generating identical null
distributions.
What would a significant negative correlation indicate?
That the spatial pattern is uniform (rather than random or aggregated). This
might occur for example when individuals repel one another due to competition.
This preliminary analysis suggests that there is a significant relationship between the spatial distance that two points are separated and the difference in species richness of the points.
Let’s take a similar approach but using a multivariate response variable in this case a site-by-species community matrix. It is difficult to visualize from a birds-eye view changes in a multivariate response variable because you would need to layer a different map for each column in the matrix. That still might not really provide you with a reduction of information that is useful for making an inference. Here we will first reduce the information in the response matrix using a non-metric multi-dimensional scaling ordination procedure. For simplicity of visualization we’ll just map the 1st axis on to the spatial location of the samples.
mite_mds <- metaMDS(mite)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1491318
## Run 1 stress 0.1658143
## Run 2 stress 0.1534989
## Run 3 stress 0.1535265
## Run 4 stress 0.169644
## Run 5 stress 0.1629577
## Run 6 stress 0.1694029
## Run 7 stress 0.1673939
## Run 8 stress 0.1598001
## Run 9 stress 0.1624032
## Run 10 stress 0.1606456
## Run 11 stress 0.1524338
## Run 12 stress 0.1652367
## Run 13 stress 0.1673888
## Run 14 stress 0.1566935
## Run 15 stress 0.1587531
## Run 16 stress 0.1541819
## Run 17 stress 0.1549726
## Run 18 stress 0.158146
## Run 19 stress 0.1609032
## Run 20 stress 0.1661189
## *** Best solution was not repeated -- monoMDS stopping criteria:
## 1: no. of iterations >= maxit
## 19: stress ratio > sratmax
mds_axs1 <- mite_mds$points[ , 1]
col_brks <- hist(mds_axs1, plot=F)$breaks
col_indices <- as.numeric(cut(mds_axs1, col_brks))
cols <- rev(terrain.colors(length(col_brks)))
plot(mite.xy, cex=2, pch=19, col=cols[col_indices])
As with richness there is a pretty clear spatial pattern from south to north. In this case though it appears that the signal is even stronger than in the species richness example as clear bands appear.
Let’s now undertake a more quantitative approach and estimate the degree of autocorrelation.
## compute bray curtis distance for the community matrix
## note when working with species-richness we just used Euclidean distance
comm_dist <- vegdist(mite)
plot(xy_dist, comm_dist)
abline(lm(comm_dist ~ xy_dist), lwd=3, col='red')
lines(lowess(xy_dist, comm_dist), lwd=3, col='pink')
lines(lowess(xy_dist, comm_dist, f=0.1), lwd=3, col='blue')
abline(v = max_dist, col='red', lwd=3, lty=2)
comm_mantel <- mantel(xy_dist, comm_dist)
comm_mantel
##
## Mantel statistic based on Pearson's product-moment correlation
##
## Call:
## mantel(xdis = xy_dist, ydis = comm_dist)
##
## Mantel statistic r: 0.4589
## Significance: 0.001
##
## Upper quantiles of permutations (null model):
## 90% 95% 97.5% 99%
## 0.0491 0.0634 0.0814 0.1016
## Permutation: free
## Number of permutations: 999
Species composition also appears to display positive spatial auto-correlation and it is a little stronger than species richness.
The previous plots included both the linear regression model which is what the mantel analysis is based upon and the lowess smoother line. The smoother can help to identify if the relationship is non-linear and how the strength of the relationship varies with spatial distance.
Note that the estimated correlation coefficients that we have been
using as our test statistic use all of the distance classes including
distances that are greater than 1/2 the max distance which we noted
above is generally frowned upon. Notice in the above plot for
comm_dist
that the slope of the lowess lines is steeper
than the linear regression line - this indicates a signal of spatial
dependence or a non-linear change in the degree of spatial
autocorrelation as a function of distance
To develop a more nuanced understanding of the pattern of spatial autocorrelation we can bin distances into bins and then compute r at each distance class to build a correlogram.
sr_corlog <- mantel.correlog(sr_dist, xy_dist)
comm_corlog <- mantel.correlog(comm_dist, xy_dist)
sr_corlog
##
## Mantel Correlogram Analysis
##
## Call:
##
## mantel.correlog(D.eco = sr_dist, D.geo = xy_dist)
##
## class.index n.dist Mantel.cor Pr(Mantel) Pr(corrected)
## D.cl.1 0.514182 358.000000 0.102654 0.001 0.001 ***
## D.cl.2 1.242546 650.000000 0.135647 0.001 0.002 **
## D.cl.3 1.970910 796.000000 0.139241 0.001 0.003 **
## D.cl.4 2.699274 696.000000 0.077577 0.001 0.004 **
## D.cl.5 3.427638 500.000000 0.032271 0.065 0.065 .
## D.cl.6 4.156002 468.000000 -0.043035 0.021 0.042 *
## D.cl.7 4.884366 364.000000 -0.055028 0.003 0.009 **
## D.cl.8 5.612730 326.000000 NA NA NA
## D.cl.9 6.341094 260.000000 NA NA NA
## D.cl.10 7.069458 184.000000 NA NA NA
## D.cl.11 7.797822 130.000000 NA NA NA
## D.cl.12 8.526186 66.000000 NA NA NA
## D.cl.13 9.254550 32.000000 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
comm_corlog
##
## Mantel Correlogram Analysis
##
## Call:
##
## mantel.correlog(D.eco = comm_dist, D.geo = xy_dist)
##
## class.index n.dist Mantel.cor Pr(Mantel) Pr(corrected)
## D.cl.1 0.514182 358.000000 0.216312 0.001 0.001 ***
## D.cl.2 1.242546 650.000000 0.240594 0.001 0.002 **
## D.cl.3 1.970910 796.000000 0.175418 0.001 0.003 **
## D.cl.4 2.699274 696.000000 0.057967 0.002 0.004 **
## D.cl.5 3.427638 500.000000 -0.045814 0.017 0.017 *
## D.cl.6 4.156002 468.000000 -0.139885 0.001 0.006 **
## D.cl.7 4.884366 364.000000 -0.164148 0.001 0.007 **
## D.cl.8 5.612730 326.000000 NA NA NA
## D.cl.9 6.341094 260.000000 NA NA NA
## D.cl.10 7.069458 184.000000 NA NA NA
## D.cl.11 7.797822 130.000000 NA NA NA
## D.cl.12 8.526186 66.000000 NA NA NA
## D.cl.13 9.254550 32.000000 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow=c(1,2))
plot(sr_corlog)
mtext(side=3, 'Species Richness')
abline(v = max_dist, col='red', lwd=3, lty=2)
plot(comm_corlog)
mtext(side=3, 'Community Composition')
abline(v = max_dist, col='red', lwd=3, lty=2)
Above we can see that samples close to one another are more similar while samples that are further apart are more different than expected due to chance. Note that for some of the intermediate distance classes species-richness does not differ from random expectation.
Last point to make is that in this analysis rather than computing 1
p-value we have computed a p-value for each distance class. Therefore,
this kind of analysis suffers from the problem of multiple comparisons
in which the p-values are not adhering to expected type-I error.
Therefore, a correction is applied to each successive test using by
default the Holm correction. See the documentation for the
vegan::manel_correlog
function for more information.
Spatial (and temporal) dependence is a potential problem for inferential statistics because of an assumption of independence of error. However, if sufficient data is available it is often possible to model the spatial component of error and thus “correct” for the lack of independence in a model’s error.
Crawley (2014) provides a straightforward description of these methods and a few examples. Pinheiro and Bates (2000) provide a more detailed discussion with more examples and they provide a useful table and figure that is helpful when deciding which error model to chose from:
This is Table 5.2 from Pinheiro and Bates (2000) in which s is the spatial lag and rho is the correlation parameter. This is a subset of the models presented in Cressie (1993).
Graphically these models of spatial correlation can be visualized like this (Figure 5.9 of Pinheiro and Bates 2000):
When we carried out ordinary regression we used the formula:
\[y = X\beta + \varepsilon, \qquad \varepsilon \sim N(0, \sigma^2) \qquad \text{(equ. 1)}\] Where y was a vector we wished to explain variance in (our response variable), X was a matrix of explanatory variables we thought influenced y, the vector \(\beta\) was the vector of intercept and slope coefficients, and \(\varepsilon\) was the error or residuals of the model.
As noted above we can consider two sources of non-random spatial structure in y. For induced spatial dependence or true gradients we can consider which explanatory variables we include in X and if they themselves are spatially structured. In practice, considering induced spatial dependence requires no changes to our modeling structure or philosophy assuming that our samples are truly independent of each other (i.e., sample similarity is only due to having similar conditions in X and not because they were simply located near each other).
If however samples are not truly independent and autocorrelation is present in the response variable then we are violating a core assumption of OLS regression that residual error is identically independently distributed (i.i.d). As equ. 1 states the error should be Normally distributed with variance \(\sigma^2\) regardless of which samples we are considering. The key problem with ignoring correlations in the error between samples is it means that our sample size is inflated.
If we are over estimating our sample size will that bias p-values to be smaller or larger?
They will be biased to be smaller because larger samples require a smaller
signal to appear to not be conforming to the null hypothesis. Thus when we
inflate our samples size artifically we are inflating our Type I error and are
more likely to reject the null hypothesis when there is actually no signal.
Well what are we going to do about this inflated sample size? The most conservative but often least appealing approach would be to censor your data down to a set of samples that are truly independent. Alternatively you average across non-independent replicates (aka pseudo-replicates). This is often either not feasible or simply unappealing because of the loss of precision and/or power that accompanies a loss of samples. An alternative solution we will examine here is modeling the pattern of autocorrelation in the error of the model using Generalized Linear Models (GLS).
Note: GLS models are distinct from General Linear Models (GLM) because GLS models are primarily concerned with modeling the covariance between samples while still adhering to the standard Normal distribution assumption for measurement error. In contract, GLMs are typically harnessed when the Normal distribution is not appropriate and we wish to use a different distribution for the error term such as Poisson or Binomial distributions. It is possible to model the correlation structure of the error term in of a GLM but we will not consider that case here.
Let’s start with a simple OLS model for species richness of mites.
We’ll examine if substrate density (SubsDens
) is linearly
correlated with richness and if the error of that regression model shows
spatial structure thus violating a core assumption of the model.
sr_dat <- data.frame(sr, mite.env, mite.xy)
sr_lm <- gls(sr ~ SubsDens, data=sr_dat)
summary(sr_lm)
## Generalized least squares fit by REML
## Model: sr ~ SubsDens
## Data: sr_dat
## AIC BIC logLik
## 422.1378 428.7963 -208.0689
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 15.587432 1.9332242 8.062920 0.000
## SubsDens -0.012045 0.0471129 -0.255652 0.799
##
## Correlation:
## (Intr)
## SubsDens -0.957
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.11867624 -0.67441942 0.07126437 0.65748711 2.10544812
##
## Residual standard error: 4.674175
## Degrees of freedom: 70 total; 68 residual
We can see from the output of that model that substrate density does
not seem to correlated to species richness. Let’s examine the residuals
of that model using a special kind of plot called a Variogram
which plots the semivariance of the residuals against spatial lag (i.e.,
the spatial distance between samples). Although that might sound
intimidating we can build our own kind of variogram plot using the
dist
function we used above in the lesson to compute
Euclidean distances between entries in a vector or matrix.
vario_sr <- Variogram(sr_lm, form= ~ x + y, resType = 'response')
# note the default of teh Variogram function is to compute normalized residuals
plot(vario_sr)
# compare that to a plot we make on our own
res <- residuals(sr_lm)
res_var <- dist(res)^2 * 0.5
plot(dist(sr_dat[, c('x', 'y')]), res_var)
lines(lowess(dist(sr_dat[, c('x', 'y')]), res_var), col='red', lwd=2)
abline(v = max_dist, col='red', lwd=3, lty=2)
# they seem to tell a similar story - the error of the model is showing
# positive autocorrelation.
The two plots may look a little different only because the Variogram
plot has averaged across many pairs of points (the n.pairs
column of that function’s ouput) to get semivariance estimates for
different distance classes. One important point to note is that if you
take the weighted average of all the semi-variance estiamtes then that
equals the variance of the residuals, here’s a quick demonstration of
that:
sum(vario_sr$variog * vario_sr$n.pairs) / sum(vario_sr$n.pairs)
## [1] 21.54601
var(res)
## [1] 21.53127
Ok so far we’ve learned that substrate density does not seem to be influencing richness spatially or otherwise and that there still remains autcorrelation in richness. Let’s examine the most common model of spatial (and temporal) autocorrelation: the exponential model also know as Autoregressive 1 model or AR1.
sr_exp <- update(sr_lm, corr=corExp(form=~x + y))
summary(sr_exp)
## Generalized least squares fit by REML
## Model: sr ~ SubsDens
## Data: sr_dat
## AIC BIC logLik
## 396.3691 405.2471 -194.1846
##
## Correlation Structure: Exponential spatial correlation
## Formula: ~x + y
## Parameter estimate(s):
## range
## 0.652223
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 16.687888 1.8686249 8.930571 0.0000
## SubsDens -0.028673 0.0384893 -0.744963 0.4589
##
## Correlation:
## (Intr)
## SubsDens -0.753
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.11165790 -0.76670852 -0.03920844 0.59883364 1.95922655
##
## Residual standard error: 4.763405
## Degrees of freedom: 70 total; 68 residual
# examine fit of error model to the raw model residuals
# note this function defaults to displaying pearson standardized residuals
# resType='p' or resType='pearson' which have been standardized
plot(Variogram(sr_exp, maxDist = max_dist))
# that doesn't look so good because clearly the model does not fit the error
# very well, it appears that there is a nugget (i.e., non-zero y-intercept)
# Let's examine the normalized residuals in which the residuals are
# divided by the estimate of the variance-covariance matrix. If the model
# fits well these residuals should be normally distributed.
plot(Variogram(sr_exp, resType='normalized', maxDist = max_dist))
# we see a little bit of a trend in the residuals but not too bad
# actually which is a bit surprising given the output of the raw residuals
# let's look at the same model but with a nugget
sr_exp_nug <- update(sr_exp, corr=corExp(c(0.5, 0.1), form=~x + y, nugget=T))
# Notice above I had to put in starting values for the rate and nugget of the
# spatial model. I found that if I left those off the model did not converge.
# This may be due to the fact that the estimated value of the nugget is very
# close to zero.
summary(sr_exp_nug)
## Generalized least squares fit by REML
## Model: sr ~ SubsDens
## Data: sr_dat
## AIC BIC logLik
## 380.1734 391.2709 -185.0867
##
## Correlation Structure: Exponential spatial correlation
## Formula: ~x + y
## Parameter estimate(s):
## range nugget
## 9.659676e+05 3.369981e-06
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 18.881353 1540.7569 0.0122546 0.9903
## SubsDens -0.065564 0.0339 -1.9316070 0.0576
##
## Correlation:
## (Intr)
## SubsDens -0.001
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -0.0065915288 -0.0028609986 -0.0007088509 0.0011049155 0.0054856495
##
## Residual standard error: 1540.763
## Degrees of freedom: 70 total; 68 residual
plot(Variogram(sr_exp_nug, maxDist = max_dist))
plot(Variogram(sr_exp_nug, resType='n', maxDist = max_dist))
# those look like they provide a better fit to the data
# let's examine the rational quadratic error model
sr_rat_nug<-update(sr_lm, corr=corRatio(form=~x + y, nugget=T))
# examine fit of error model to model residuals
plot(Variogram(sr_rat_nug, maxDist = max_dist))
plot(Variogram(sr_rat_nug, resType='n', maxDist = max_dist))
# this model seems to fit about as a good as the exponential with the nugget
# let's compare the models
anova(sr_lm, sr_exp, sr_exp_nug, sr_rat_nug, test=F)
## Model df AIC BIC logLik
## sr_lm 1 3 422.1378 428.7963 -208.0689
## sr_exp 2 4 396.3691 405.2471 -194.1846
## sr_exp_nug 3 5 380.1734 391.2709 -185.0867
## sr_rat_nug 4 5 380.3153 391.4128 -185.1576
# so it appears that the exponential and rational models with the nuggets
# fit equally as well and much better than models without spatial error terms
# and better than a model with a nugget set to zero.
summary(sr_exp_nug)
## Generalized least squares fit by REML
## Model: sr ~ SubsDens
## Data: sr_dat
## AIC BIC logLik
## 380.1734 391.2709 -185.0867
##
## Correlation Structure: Exponential spatial correlation
## Formula: ~x + y
## Parameter estimate(s):
## range nugget
## 9.659676e+05 3.369981e-06
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 18.881353 1540.7569 0.0122546 0.9903
## SubsDens -0.065564 0.0339 -1.9316070 0.0576
##
## Correlation:
## (Intr)
## SubsDens -0.001
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -0.0065915288 -0.0028609986 -0.0007088509 0.0011049155 0.0054856495
##
## Residual standard error: 1540.763
## Degrees of freedom: 70 total; 68 residual
summary(sr_rat_nug)
## Generalized least squares fit by REML
## Model: sr ~ SubsDens
## Data: sr_dat
## AIC BIC logLik
## 380.3153 391.4128 -185.1576
##
## Correlation Structure: Rational quadratic spatial correlation
## Formula: ~x + y
## Parameter estimate(s):
## range nugget
## 4.171307 0.267800
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 18.288674 3.861352 4.736340 0.0000
## SubsDens -0.064731 0.033315 -1.943026 0.0562
##
## Correlation:
## (Intr)
## SubsDens -0.306
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.65514692 -0.66062258 -0.09103254 0.39015364 1.55258416
##
## Residual standard error: 5.80652
## Degrees of freedom: 70 total; 68 residual
Note that in the above model summaries both the regression coefficients and the associated t-statistics were shifted towards greater importance. In general, it is thought that beta coefficients should be fairly robust to spatial dependence but that tests of significance will be highly sensitive.
Let’s now examine our spatial map of richness and this time focus on the residuals of our model. Do they still look spatially structured?
col_brks <- hist(residuals(sr_exp_nug), plot=F)$breaks
col_indices <- as.numeric(cut(residuals(sr_exp_nug), col_brks))
cols <- rev(terrain.colors(length(col_brks)))
plot(mite.xy, cex=2, pch=19, col=cols[col_indices])
It does appear that they do still look spatially structured at least at large distances, it is difficult to tell at short distances.
Exercise: Include the variable WatrCont
along with
SubsDens
in the linear model of richness. Go back through
the spatial modeling routines. Does the map of the residuals still seem
to show the strong pattern of non-stationary?
Now let’s examine spatial dependence in multivariate response such as species composition in a modeling framework
mite_rda <- rda(mite, mite.env[ , 1:2])
plot(mite_rda, display=c('sp', 'bp'))
anova(mite_rda)
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(X = mite, Y = mite.env[, 1:2])
## Df Variance F Pr(>F)
## Model 2 1977.8 9.3048 0.007 **
## Residual 67 7120.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mite_mso_raw <- mso(rda(mite), mite.xy, permutations = 1000)
mite_mso <- mso(mite_rda, mite.xy, permutations = 1000)
mite_mso
## Call: mso(object.cca = mite_rda, object.xy = mite.xy, permutations =
## 1000)
##
## Inertia Proportion Rank
## Total 9098.5913 1.0000
## Constrained 1977.8327 0.2174 2
## Unconstrained 7120.7586 0.7826 35
## Inertia is variance
##
## Eigenvalues for constrained axes:
## RDA1 RDA2
## 1919.3 58.6
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## 6252 289 153 112 92 57 41 30
## (Showing 8 of 35 unconstrained eigenvalues)
##
## mso variogram:
##
## H Dist n All Sum CA CCA se CA.signif
## 0 0 0.3555 63 69.95 80.83 69.23 11.60 54.81 0.47053
## 1 1 1.0659 393 105.29 108.30 91.67 16.63 27.79 0.65834
## 2 2 2.0089 534 65.74 74.67 56.93 17.74 16.08 0.01199
## 3 3 2.9786 417 127.66 127.36 100.78 26.58 27.10 0.91808
## 4 4 3.9817 322 132.20 133.52 100.77 32.75 31.32 0.94106
## 5 5 5.0204 245 141.72 148.28 105.64 42.64 39.07 0.90509
## 10 10 6.8069 441 242.71 223.33 177.06 46.27 39.86 0.11988
##
## Permutation: free
## Number of permutations: 1000
par(mfrow=c(1,1))
msoplot(mite_mso_raw)
msoplot(mite_mso)
## Error variance of regression model underestimated by 11.3 percent