Home Page - http://dmcglinn.github.io/quant_methods/ GitHub Repo - https://github.com/dmcglinn/quant_methods
https://raw.githubusercontent.com/dmcglinn/quant_methods/gh-pages/lessons/multivariate_models.Rmd
The goal of this lesson is to introduce multivariate ordination analyses.
Multivariate statistical analyses are generalizations of the univariate approaches we examined in the last lesson. In the case of a univariate approach we considered a model in which
\[\hat{y_i} = b_0 1 + b_1 x_{i1} + \cdots + b_p x_{ip} + \varepsilon_i, \qquad i = 1, \ldots, n, \qquad \text{(equ. 1)}\]
We are interested in using multivariate methods when we no longer are just interested in explaining variance in the vector y instead we would like to understand the dominant patterns and sources of variation in a matrix of response variables Y that are all of the same type (i.e., same units).
Often times, we may be simply be interested in if we can represent our response matrix Y using a relatively small number of dimensions (without any reference to an explanatory matrix). For example, if Y has n samples (i.e., rows) and p descriptors (i.e., columns) then we can fully represent all the variation in Y using p hypothetical axes; however, what would a p-dimensional graph look like - any dimensionality > 3 is difficult for the human mind to comprehend. Furthermore, is that degree of complexity even needed to see the major trends in Y? Picture trying to interpret a pairs plot that has \(p*(p-1)/2\) graphs (one graph for each unique combination of columns in Y).
The good news is that in general, the information in matrices can be reduced down to a few major axes of variation. Ordination is a term that describes this process of reducing the information needed to represent a matrix. The term derives from the idea to ordinate or put things into order. With ordination approaches we are attempting to take a high-dimensional data matrix and explain its patterns with a small number of axes. Although this is a very general concept across scientific disciplines, the term ordination is generally only used in ecology.
There are generally considered to be two types of ordination.
The figure below (Legendre and Legendre 1998, Fig 11.1) visually illustrates this conceptual shift from modeling a vector y (panel b) to a matrix Y. Indirect ordination does not use an explanatory matrix X (panel a) while direct ordination methods do (panel c)
Today we will examine both indirect and direction methods of ordination. In general, ordination is frequently used when exploring patterns in datasets graphically; however, it can also be used to carry out hypothesis testing.
Despite their sometimes terrifying names ordination has a close kinship with multiple regression (as illustrated in the figure above). One key difference is that the response variable is multivariate rather than univariate; however, with many approaches the underlying algebra is very similar between regression and ordination.
Overview of Ordination methods (adapted from http://ordination.okstate.edu/terminol.htm)
Acronym | Name | Type of analysis | Algorithm | R function |
---|---|---|---|---|
PCA | Principal Components Analysis | indirect | Eigenanalysis, SVD | stats::princomp, vegan::rda |
CA | Correspondence Analysis | indirect | RA, eigenanalysis, SVD | vegan::cca |
DCA | Detrended Correspondence Analysis | indirect | RA with detrending and rescaling | vegan::decorana |
NMDS | Non-metric Mulidimensional Scaling | indirect | Distance based ordination, non-eigenbased | vegan::metaMDS |
RDA | Redundancy Analysis | direct | Eigenanalysis, SVD | vegan::rda |
CCA | Canonical Correspondence Analysis | direct | RA with regressions, eigenanalysis | vegan::cca |
DCCA | Detrended Canonical Correspondence Analysis | direct | RA with regressions and detrending | NA |
This presentation by Abel Valdivia provides a review of the types of ordination and provides examples of their graphical output.
Additionally this key created by Mike Palmer provides a decision tree that can help to guide your choice of methods.
The first very common challenge when working with multivariate analyses is to construct the multivariate matrix we wish to analyze. Essentially a community matrix is a cross-tab structure in which you have each descriptor element (e.g., species identities) as column ids and each sample element (e.g., site identities) as row ids. A cross-tab structure is a very inefficient way to store your raw data and in many cases we must aggregate the raw data which can be accomplished using a for loop or other simple functions. Using the Great Smokey Mountains tree dataset that was using in HW 3, I will demonstrate how to do this with a computationally inefficient but easy to understand for loop. As a reminder the metadata for the tree data is located here.
Quick note: In this lesson we will use the R package
dummies
to create dummy variables. This package is now
defunct and no longer on the main package repository CRAN. To install
you will have to use the package remotes
and the function
install_github
specifically once remotes
is
installed use remotes::install_github('cran/dummies')
# load relevant packages and code for today's lesson
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.6-4
library(dummies) # this package is no longer on cran see note above
## dummies-1.5.6 provided by Decision Patterns
source('./scripts/utility_functions.R')
# load data
tree <- read.csv('./data/treedata_subset.csv')
# create a community site x species matrix by summing species cover values
# we can do this with a for loop but it take a while to run
uni_sp <- unique(tree$spcode)
uni_site <- unique(tree$plotID)
comm <- matrix(0, ncol=length(uni_sp), nrow=length(uni_site))
colnames(comm) <- uni_sp
rownames(comm) <- uni_site
for(i in seq_along(uni_sp)) {
for(j in seq_along(uni_site)) {
comm[j , i] <- mean(tree$cover[tree$spcode == uni_sp[i] &
tree$plotID == uni_site[j]])
}
}
comm[1:5, 1:5]
Alternatively we could use the function tapply
which is
much more efficient.
# alternatively we can use a tapply function
comm <- tapply(tree$cover,
INDEX = list(tree$plotID, tree$spcode),
mean)
# examine the community matrix
comm[1:5, 1:5]
## ABIEFRA ACERNEG ACERNIG ACERRUB AILAALT
## ATBN-01-0303 NA NA NA 6 NA
## ATBN-01-0304 NA NA NA 7 NA
## ATBN-01-0305 NA NA NA 5 NA
## ATBN-01-0306 NA NA NA 7 NA
## ATBN-01-0307 NA NA NA 5 NA
# replace the NAs with zeros
comm <- ifelse(is.na(comm), 0, comm)
comm[1:5, 1:5]
## ABIEFRA ACERNEG ACERNIG ACERRUB AILAALT
## ATBN-01-0303 0 0 0 6 0
## ATBN-01-0304 0 0 0 7 0
## ATBN-01-0305 0 0 0 5 0
## ATBN-01-0306 0 0 0 7 0
## ATBN-01-0307 0 0 0 5 0
Now that we’ve created our matrix it is usually a good idea to make sure everything looks ok - the row and column sums (i.e., marginal sums) provide one reasonable metric to see overall how the data is structured in a very coarse way.
# visually explore the cover variable between species and sites
uni_sp <- unique(tree$spcode)
sp_sum <- apply(comm, 2, sum)
site_sum <- apply(comm, 1, sum)
par(mfrow=c(2,2))
hist(sp_sum)
col <- colorRamp(c('red', 'orange', 'blue'))
sp_cols <- col(length(uni_sp))
plot(sp_sum[order(sp_sum, decreasing=T)], type='o', col='red', lwd=2,
xlab='Sp Rank', ylab='Sum Cover')
hist(site_sum)
plot(site_sum[order(site_sum, decreasing=T)], type='o', col='red', lwd=2,
xlab='Site Rank', ylab='Sum Cover')
par(mfrow=c(1,1))
Above we can see that most species have small total covers (i.e., most species are rare) which is a fairly universal ecological law so this seems correct. Also we can see that most sites have a total cover of approximately 40 and that this distribution isn’t as highly skewed as the species marginal totals.
NOTE: These plots are not terribly helpful at understanding the variability in the community matrix. They are just useful for looking at gross aggregate properties of the matrix; therefore, they can be useful for spotting an error in the matrix. Let’s get our environmental variables ready to use and start to conduct some actual analyses.
In the tree dataset, each site has one set of environmental
measurements. These are replicated across the rows of the
tree
data object
## plotID spcode species cover elev tci streamdist
## 1 ATBN-01-0403 ABIEFRA Abies fraseri 1 1660 5.70146 490.9
## 775 ATBN-01-0403 BETUALL Betula alleghaniensis 1 1660 5.70146 490.9
## 4547 ATBN-01-0403 PICERUB Picea rubens 1 1660 5.70146 490.9
## disturb beers
## 1 CORPLOG 0.2244286
## 775 CORPLOG 0.2244286
## 4547 CORPLOG 0.2244286
So we just need to pull out the environmental data for a single one
of those rows. In the end we need the explanatory matrix to have the
same number of rows in the same order as the community matrix. We could
write a for loop to do this but we’ll take the easier approach of using
the function aggregate
.
cols_to_keep <- c('elev', 'tci', 'streamdist', 'disturb', 'beers')
env <- aggregate(tree[ , cols_to_keep], by = list(tree$plotID), function(x) x[1])
# aggregate does have a side effect of adding a column to the output which
# is the unique plotID in this case. We just need to move that so it is a
# row name instead.
row.names(env) <- env[ , 1]
env <- env[ , -1]
head(env)
## elev tci streamdist disturb beers
## ATBN-01-0303 896.1 4.705636 197.00 CORPLOG 1.9906803
## ATBN-01-0304 947.3 4.447437 125.30 CORPLOG 0.8167341
## ATBN-01-0305 1027.0 6.149170 174.60 CORPLOG 0.5860782
## ATBN-01-0306 450.2 4.133772 202.50 LT-SEL 0.8601108
## ATBN-01-0307 477.0 5.587310 134.20 LT-SEL 0.1009244
## ATBN-01-0308 461.8 4.415634 31.62 LT-SEL 1.4324567
# before we can use this explanatory matrix we need to check
# that its rows are in the same order as our response matrix
all.equal(rownames(comm), rownames(env))
## [1] TRUE
# now that we've completed that check we can rename the rows to something
# more manageable
rownames(comm) <- 1:nrow(comm)
rownames(env) <- 1:nrow(env)
Principle Components Analysis (PCA) is a useful tool for either
Sometimes PCA is also known as Reciprocal Averaging (RA).
Simplified description of PCA algorithm (from Zelený 2018):
Figure 1: PCA ordination of five samples and two species. (Fig. 9.2 from Legendre & Legendre 1998.)
Figure 2: 3D schema of PCA ordination algorithm
Check out the nice interactive visualization at Explained Visually
We’ll start by using PCA to simply examine the correlations in tree species covers that are contained in the community matrix.
tree_pca <- rda(comm)
tree_pca
## Call: rda(X = comm)
##
## Inertia Rank
## Total 114.3
## Unconstrained 114.3 52
## Inertia is variance
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## 23.245 14.729 9.554 8.586 6.875 4.925 4.379 4.207
## (Showing 8 of 52 unconstrained eigenvalues)
Like the object returned by the function lm
, the output
of rda
is just a named list. We can get a sense of the
structure of this named list using the function str
str(tree_pca)
## List of 10
## $ colsum : Named num [1:52] 1.061 0.184 0.185 2.751 0.332 ...
## ..- attr(*, "names")= chr [1:52] "ABIEFRA" "ACERNEG" "ACERNIG" "ACERRUB" ...
## $ tot.chi : num 114
## $ Ybar : num [1:734, 1:52] -0.00692 -0.00692 -0.00692 -0.00692 -0.00692 ...
## ..- attr(*, "scaled:center")= Named num [1:52] 0.18733 0.00954 0.00681 4.05995 0.01226 ...
## .. ..- attr(*, "names")= chr [1:52] "ABIEFRA" "ACERNEG" "ACERNIG" "ACERRUB" ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:734] "1" "2" "3" "4" ...
## .. ..$ : chr [1:52] "ABIEFRA" "ACERNEG" "ACERNIG" "ACERRUB" ...
## ..- attr(*, "METHOD")= chr "PCA"
## $ method : chr "rda"
## $ call : language rda(X = comm)
## $ pCCA : NULL
## $ CCA : NULL
## $ CA :List of 6
## ..$ eig : Named num [1:52] 23.24 14.73 9.55 8.59 6.88 ...
## .. ..- attr(*, "names")= chr [1:52] "PC1" "PC2" "PC3" "PC4" ...
## ..$ poseig : NULL
## ..$ u : num [1:734, 1:52] -0.0148 -0.0263 0.0329 -0.0505 -0.048 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:734] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:52] "PC1" "PC2" "PC3" "PC4" ...
## ..$ v : num [1:52, 1:52] 0.04414 0.000326 -0.000369 -0.407768 -0.005665 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:52] "ABIEFRA" "ACERNEG" "ACERNIG" "ACERRUB" ...
## .. .. ..$ : chr [1:52] "PC1" "PC2" "PC3" "PC4" ...
## ..$ rank : int 52
## ..$ tot.chi: num 114
## $ inertia : chr "variance"
## $ regularization: chr "this is a vegan::rda result object"
## - attr(*, "class")= chr [1:2] "rda" "cca"
This is useful because for example if we wanted to pull out the
eigenvalues of the analysis we would see that they are stored under
$CA$eig
. We can access each of the objects stored in a
named list as we would reference columns in a data.frame
using the $
.
tree_pca$tot.chi
## [1] 114.3032
tree_pca$CA$eig
## PC1 PC2 PC3 PC4 PC5 PC6
## 23.244770597 14.729455729 9.553715342 8.586350801 6.875460601 4.924553087
## PC7 PC8 PC9 PC10 PC11 PC12
## 4.379409841 4.206722212 3.648091703 3.432661578 2.784056682 2.579211642
## PC13 PC14 PC15 PC16 PC17 PC18
## 2.231308613 2.172385623 2.005743843 1.939263045 1.756857179 1.595933867
## PC19 PC20 PC21 PC22 PC23 PC24
## 1.524147315 1.392534007 1.265278989 1.144346819 1.022333994 0.987984586
## PC25 PC26 PC27 PC28 PC29 PC30
## 0.894203997 0.890931916 0.758238314 0.580826522 0.513812015 0.475536085
## PC31 PC32 PC33 PC34 PC35 PC36
## 0.338373458 0.295940221 0.221512264 0.213810566 0.181667497 0.155612934
## PC37 PC38 PC39 PC40 PC41 PC42
## 0.124854861 0.110463860 0.092544030 0.084629816 0.078266544 0.077549045
## PC43 PC44 PC45 PC46 PC47 PC48
## 0.055542679 0.046692964 0.031058911 0.025908859 0.024458021 0.022436260
## PC49 PC50 PC51 PC52
## 0.013732027 0.007551564 0.003227927 0.001201937
# the eigenvalues sum up to equal the total inertia
sum(tree_pca$CA$eig)
## [1] 114.3032
# inertia in a PCA is equal to the total variance of the matrix because PCA
# uses the usual Euclidean distance as its metric of how different samples are:
# 1 / (n-1) * sum(x - xbar)^2 where n is sample size and xbar is the mean x value
# you can verify that this is equal to the sum of the variances (i.e., diagonal
# elements) in the species covariance matrix
sum(diag(cov(comm)))
## [1] 114.3032
# of if you want to do it by hand with matrix algebra
# first center the community matrix by subtracting the means
cen_comm <- apply(comm, 2, function(x) x - mean(x))
# then compute the covariance matrix
cov_comm <- nrow(comm)^-1 * t(cen_comm) %*% cen_comm
sum(diag(cov_comm))
## [1] 114.1474
# the ratio of the eigenvalue to the total variance is the amount of
# variance explained by each PCA axis
round(tree_pca$CA$eig / tree_pca$tot.chi, 2)
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16
## 0.20 0.13 0.08 0.08 0.06 0.04 0.04 0.04 0.03 0.03 0.02 0.02 0.02 0.02 0.02 0.02
## PC17 PC18 PC19 PC20 PC21 PC22 PC23 PC24 PC25 PC26 PC27 PC28 PC29 PC30 PC31 PC32
## 0.02 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.00 0.00 0.00 0.00
## PC33 PC34 PC35 PC36 PC37 PC38 PC39 PC40 PC41 PC42 PC43 PC44 PC45 PC46 PC47 PC48
## 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## PC49 PC50 PC51 PC52
## 0.00 0.00 0.00 0.00
We can see from above that the PCA axis 1 captures approximately 20% of the total variance in the community matrix. Let’s graph the data to better get a sense of the correlation structure.
plot(tree_pca)
biplot(tree_pca)
cleanplot.pca(tree_pca)
## Warning in arrows(0, 0, spe.sc1[, ax1], spe.sc1[, ax2], length = ahead, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, spe.sc2[, ax1], spe.sc2[, ax2], length = ahead, :
## zero-length arrow is of indeterminate angle and so skipped
# p120-121 of Numerical Ecology in R:
# Scaling 1 = distance biplot: the eigenvectors are scaled to unit length. (1)
# Distances among objects in the biplot are approximations of their
# Euclidean distances in multidimensional space. (2) The angles among
# descriptor vectors are meaningless.
# Scaling 2 = correlation biplot: each eigenvector is scaled to the square root of
# its eigenvalue. (1) Distances among objects in the biplot are not approximations
# of their Euclidean distances in multidimensional space. (2) The angles
# between descriptors in the biplot reflect their correlations.
# scaling 2 is the default for vegan ordination graphing functions
ordiplot(tree_pca, display = 'sp')
# each of these different indirect ordination approaches
# has different strengths and weaknesses
# Correspondence analysis (CA) examines differences on weighted
# averages of the columns (i.e., species in this case)
tree_ca <- cca(comm)
# Detrended correspondence analysis (DCA) is identical except it attempts
# to account for a well known artifact of CA known as the arch-effect
# by detrending subsequent axes from previous axes.
tree_dca <- decorana(comm)
# Non-metric multidimensional scaling (MDS) is unique in that you may
# specify one of a number of different distance metrics to use. By
# default the Bray-Curtis distance is used by metaMDS.
tree_mds <- metaMDS(comm)
NMDS Maximizes rank-order correlation between distance measures and distance in ordination space. Points are moved to minimize “stress”. Stress is a measure of the mismatch between the two kinds of distance.
First let’s carry out an RDA which expects a linear response of each species to the environmental variables. RDA is the most direct analog of OLS regression to the multivariate response variable.
rda_tree <- rda(comm, env)
# the above breaks b/c we have a categorical factor in env
# vegan requires that we write out each term if we are not going to
# convert the factor to a dummy matrix
rda_tree <- rda(comm ~ env$elev + env$tci +
env$streamdist + env$disturb + env$beers)
# alternatively we could use a shorthand approach
rda_tree <- rda(comm ~ . , data=env)
rda_tree
## Call: rda(formula = comm ~ elev + tci + streamdist + disturb + beers,
## data = env)
##
## Inertia Proportion Rank
## Total 114.3032 1.0000
## Constrained 19.1096 0.1672 7
## Unconstrained 95.1936 0.8328 52
## Inertia is variance
##
## Eigenvalues for constrained axes:
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 RDA7
## 12.454 3.922 0.889 0.751 0.693 0.285 0.116
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## 14.210 12.741 7.891 6.811 5.608 4.455 4.122 3.415
## (Showing 8 of 52 unconstrained eigenvalues)
RsquareAdj(rda_tree)
## $r.squared
## [1] 0.1671835
##
## $adj.r.squared
## [1] 0.1591536
The output above provides us with some useful information. Inertia is
another name for variation or variance in this case. “Total” refers to
total variance, “Constrained” refers to the amount of variance explained
by the explanatory variables, “Unconstrained” refers to the residual
variance. Constrained + Unconstrained = Total. An \(R^2\) statistic can be derived simply as
Constrained / Total. The function RsquareAdj
computes \(R^2\) and \(R^2\)-adjusted. The variable “Rank”
indicates the number of variables included. The eigenvalues are
displayed for both the constrained and unconstrained axes. In this
context these eigenvalues indicate how much variance each of the axes
contribute to.
We can plot our model result to get a sense of which variables are correlating with with species along which axes.
plot(rda_tree, type='n', scaling=1)
orditorp(rda_tree, display='sp', cex=0.5, scaling=1, col='blue')
text(rda_tree, display='cn', col='red')
We interpret the plot above as we have interpreted the previously
ordination plots with one important difference. The environmental
variables are now displayed and their placement indicates their loading
on the two displayed RDA axes. elev
is loading heavily on
RDA1 indicating that this variable explains a larger portion of the
variance associated with axis 1. The location of the species relative to
the environmental variables indicates how strongly a species is
associated with a particular environmental variable. So for example
ABIEFRA or Abies fraseri increases as elevation increases.
Let’s carry out a Canonical Correspondence Analysis (CCA) as well. CCA is appropriate for modeling unimodal or hump-shaped responses to explanatory variables (rather than linear as with RDA).
cca_tree <- cca(comm ~ ., data=env)
RsquareAdj(cca_tree, 100)
## $r.squared
## [1] 0.09932996
##
## $adj.r.squared
## [1] 0.09060108
anova(cca_tree, permutations = 999)
## Permutation test for cca under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: cca(formula = comm ~ elev + tci + streamdist + disturb + beers, data = env)
## Df ChiSquare F Pr(>F)
## Model 7 0.5546 11.438 0.001 ***
## Residual 726 5.0285
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(cca_tree, by='margin', permutations = 999)
## Permutation test for cca under reduced model
## Marginal effects of terms
## Permutation: free
## Number of permutations: 999
##
## Model: cca(formula = comm ~ elev + tci + streamdist + disturb + beers, data = env)
## Df ChiSquare F Pr(>F)
## elev 1 0.2329 33.6268 0.001 ***
## tci 1 0.0335 4.8363 0.001 ***
## streamdist 1 0.0525 7.5778 0.001 ***
## disturb 3 0.0920 4.4260 0.001 ***
## beers 1 0.0341 4.9295 0.001 ***
## Residual 726 5.0285
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(cca_tree, type='n', scaling=1)
orditorp(cca_tree, display='sp', cex=0.5, scaling=1, col='blue')
text(cca_tree, display='bp', col='red')
The CCA models don’t explain as much variation and their plots look slightly different but the general take home message has not changed much.
Now let’s carry out hypothesis testing.
anova(rda_tree, permutations=10)
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 10
##
## Model: rda(formula = comm ~ elev + tci + streamdist + disturb + beers, data = env)
## Df Variance F Pr(>F)
## Model 7 19.110 20.82 0.09091 .
## Residual 726 95.194
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(rda_tree, by='margin', permutations=10)
## Permutation test for rda under reduced model
## Marginal effects of terms
## Permutation: free
## Number of permutations: 10
##
## Model: rda(formula = comm ~ elev + tci + streamdist + disturb + beers, data = env)
## Df Variance F Pr(>F)
## elev 1 8.019 61.154 0.09091 .
## tci 1 1.597 12.176 0.09091 .
## streamdist 1 2.300 17.544 0.09091 .
## disturb 3 2.421 6.155 0.09091 .
## beers 1 1.363 10.396 0.09091 .
## Residual 726 95.194
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In a real analysis you would specify a much larger number of permutations (at least 1000). The first test examines overall model fit relative to a randomized or permuted matrix of data. The second test examines the partial effects of the individual variables included in the model.
Model comparison can be carried out using direct ordination methods
either using nested models of different degrees of complexity as we did
with univariate lm
type models
rda_tree_simple <- update(rda_tree, . ~ . - beers)
anova(rda_tree_simple, rda_tree)
## Permutation tests for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model 1: comm ~ elev + tci + streamdist + disturb
## Model 2: comm ~ elev + tci + streamdist + disturb + beers
## ResDf ResChiSquare Df ChiSquare F Pr(>F)
## 1 727 96.557
## 2 726 95.194 1 1.3631 10.396 0.001 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The above tests suggests that the more complex model including the
variable beers
is strongly supported. Note this specific
example is identical to the output for the beers
parameter
when using anova(tree_rda, by = 'margin')
.
Model comparison can also be undertaken using adjusted \(R^2\). This is often easier to discuss and communicate then a myriad of significance tests.
Lastly let’s carry out variance partitioning. We can use this approach to examine how much of the explained variance is due to different groups of variables. In other words this approach is really only useful if you are interested in comparing the relative importance of several variables to another set of variables.
## variance partitioning
moisture <- env[ , c('elev', 'tci', 'beers', 'streamdist')]
# because the variable disturb is a factor we need to convert it into
# a dummy matrix using the function dummies::dummy
disturb <- dummy(env$disturb)
## Warning in model.matrix.default(~x - 1, model.frame(~x - 1), contrasts =
## FALSE): non-list contrasts argument ignored
# examine the explanatory variable of each class of variables.
varpart(comm, moisture, disturb)
## Warning: collinearity detected in X2: mm = 4, m = 3
## Warning: collinearity detected in cbind(X1,X2): mm = 8, m = 7
##
## Partition of variance in RDA
##
## Call: varpart(Y = comm, X = moisture, disturb)
##
## Explanatory tables:
## X1: moisture
## X2: disturb
##
## No. of explanatory tables: 2
## Total variation (SS): 83784
## Variance: 114.3
## No. of observations: 734
##
## Partition table:
## Df R.squared Adj.R.squared Testable
## [a+c] = X1 4 0.14600 0.14132 TRUE
## [b+c] = X2 3 0.04722 0.04331 TRUE
## [a+b+c] = X1+X2 7 0.16718 0.15915 TRUE
## Individual fractions
## [a] = X1|X2 4 0.11584 TRUE
## [b] = X2|X1 3 0.01784 TRUE
## [c] 0 0.02547 FALSE
## [d] = Residuals 0.84085 FALSE
## ---
## Use function 'rda' to test significance of fractions of interest
showvarparts(2)
The output indicates that the moisture group of variables has the largest individual fraction of explained variance (10%), whereas, the disturbance groups of variables explain only approximately 2%. We can also see that there are not any really large fractions of shared variance which indicates the variables effects are somewhat independent of one another.