In complex systems such as a forest it is rare that the variable you are interested is only influenced by a single other variable. In fact, the driver variable that you are testing the effect of may be obscured because of variation due to a different variable.
For example, you might be interested in comparing the effect of nutrient addition on plant growth but you know that plant growth is also strongly driven by salinity. If we wish to apply our inference about nutrients across a salinity gradient then we should replicate our nutrient addition at different salinities. Since we are not necessarily interested in the salinity effect we are treating it more as a nuisance variable or a “block” effect.
Let’s examine the incorrect approach to this analysis using a linear model that ignores the block effect, and then we will examine how the model interpretation changes once the block effect is correctly considered. We will see that controlling for the block effect is the same as using a paired (aka 1 sample) t-test.
set.seed(1)
First setup the block variable
nblock <- 20
block <- rep(1:nblock, each = 2)
block
## [1] 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13
## [26] 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20
Treatment will just have two levels: 0 = control, 1 = treatment
trt <- rep(0:1, nblock)
trt
## [1] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
## [39] 0 1
set error or uncertainty of model
noise <- 0.1
err <- rnorm(length(trt), mean = 0, sd = noise)
Now we can generate response variable y using a linear model where we know the coefficients. Note the last term with Gaussian noise is added in.
y <- 0.33 * block + 0.25 * trt + err
par(mfrow=c(1,2))
boxplot(y ~ trt)
boxplot(y ~ block)
note above that the figures show that most of the variation in
y
is due to block effects rather than treatment effects.
These strong block effects have the potential to obsurce treatment
effects if they are ignored.
# recode block and treatment as factors
block <- as.factor(block)
trt <- as.factor(trt)
Let’s fit models. We will find that if block is ignored then
treatment has no effect similar to ignoring important biology you get
wrong answer (i.e., not signifianct). Note tje trt
coefficient is still correct (i.e., matches what we set in our model)
which is impressive.
summary(lm(y ~ trt))
##
## Call:
## lm(formula = y ~ trt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2217 -1.5257 0.0664 1.6019 3.2210
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.4890 0.4361 8.000 1.14e-09 ***
## trt1 0.2203 0.6168 0.357 0.723
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.951 on 38 degrees of freedom
## Multiple R-squared: 0.003347, Adjusted R-squared: -0.02288
## F-statistic: 0.1276 on 1 and 38 DF, p-value: 0.7229
if you include block with treatment in the model now you get correct inference (i.e., trt p-value < 0.05)
summary(lm(y ~ block + trt))
##
## Call:
## lm(formula = y ~ block + trt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.13637 -0.04292 0.00000 0.04292 0.13637
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.32268 0.05741 5.620 2.03e-05 ***
## block2 0.39012 0.07924 4.923 9.43e-05 ***
## block3 0.65759 0.07924 8.299 9.68e-08 ***
## block4 1.07343 0.07924 13.547 3.26e-11 ***
## block5 1.35566 0.07924 17.109 5.33e-13 ***
## block6 1.76722 0.07924 22.303 4.35e-15 ***
## block7 1.86034 0.07924 23.478 1.70e-15 ***
## block8 2.38614 0.07924 30.114 < 2e-16 ***
## block9 2.70852 0.07924 34.182 < 2e-16 ***
## block10 3.06290 0.07924 38.655 < 2e-16 ***
## block11 3.40720 0.07924 43.000 < 2e-16 ***
## block12 3.55640 0.07924 44.883 < 2e-16 ***
## block13 4.01033 0.07924 50.612 < 2e-16 ***
## block14 4.23081 0.07924 53.394 < 2e-16 ***
## block15 4.63913 0.07924 58.547 < 2e-16 ***
## block16 5.03494 0.07924 63.543 < 2e-16 ***
## block17 5.31883 0.07924 67.126 < 2e-16 ***
## block18 5.54254 0.07924 69.949 < 2e-16 ***
## block19 5.93946 0.07924 74.958 < 2e-16 ***
## block20 6.38530 0.07924 80.585 < 2e-16 ***
## trt1 0.22035 0.02506 8.794 4.00e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07924 on 19 degrees of freedom
## Multiple R-squared: 0.9992, Adjusted R-squared: 0.9983
## F-statistic: 1154 on 20 and 19 DF, p-value: < 2.2e-16
now let’s consider that each sample in each block is a paired sample. In other words we are most interested in the comparison of the samples within the blocks not between the blocks
To carry out paired test we need to compute the differences in y in the two treatment levels. This is easiest if we reshape the data to wide format
dat <- data.frame(y, block, trt)
dat_pair <- tidyr::pivot_wider(dat, names_from = trt,
names_prefix = 'trt_',
values_from = y)
head(dat_pair)
## # A tibble: 6 × 3
## block trt_0 trt_1
## <fct> <dbl> <dbl>
## 1 1 0.267 0.598
## 2 2 0.576 1.07
## 3 3 1.02 1.16
## 4 4 1.37 1.64
## 5 5 1.71 1.87
## 6 6 2.13 2.27
now we can compute diff directly for testing
dat_pair$diff <- dat_pair$trt_1 - dat_pair$trt_0
we can use the t.test
function with the formula response
~ 1 here the 1 indicates that this is a paired 1 sample test. Usually
the 1 is the grouping variable
t.test(dat_pair$diff ~ 1)
##
## One Sample t-test
##
## data: dat_pair$diff
## t = 8.7939, df = 19, p-value = 3.998e-08
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.1679045 0.2727942
## sample estimates:
## mean of x
## 0.2203494
Rather than compute the difference manually in trt_1 and trt_0 we can
use the Pair
function
t.test(Pair(trt_1, trt_0) ~ 1, data = dat_pair)
##
## Paired t-test
##
## data: Pair(trt_1, trt_0)
## t = 8.7939, df = 19, p-value = 3.998e-08
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 0.1679045 0.2727942
## sample estimates:
## mean difference
## 0.2203494
Ok let’s see if the paired t-test gave us a different result to our simple linear model
summary(lm(y ~ block + trt))
##
## Call:
## lm(formula = y ~ block + trt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.13637 -0.04292 0.00000 0.04292 0.13637
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.32268 0.05741 5.620 2.03e-05 ***
## block2 0.39012 0.07924 4.923 9.43e-05 ***
## block3 0.65759 0.07924 8.299 9.68e-08 ***
## block4 1.07343 0.07924 13.547 3.26e-11 ***
## block5 1.35566 0.07924 17.109 5.33e-13 ***
## block6 1.76722 0.07924 22.303 4.35e-15 ***
## block7 1.86034 0.07924 23.478 1.70e-15 ***
## block8 2.38614 0.07924 30.114 < 2e-16 ***
## block9 2.70852 0.07924 34.182 < 2e-16 ***
## block10 3.06290 0.07924 38.655 < 2e-16 ***
## block11 3.40720 0.07924 43.000 < 2e-16 ***
## block12 3.55640 0.07924 44.883 < 2e-16 ***
## block13 4.01033 0.07924 50.612 < 2e-16 ***
## block14 4.23081 0.07924 53.394 < 2e-16 ***
## block15 4.63913 0.07924 58.547 < 2e-16 ***
## block16 5.03494 0.07924 63.543 < 2e-16 ***
## block17 5.31883 0.07924 67.126 < 2e-16 ***
## block18 5.54254 0.07924 69.949 < 2e-16 ***
## block19 5.93946 0.07924 74.958 < 2e-16 ***
## block20 6.38530 0.07924 80.585 < 2e-16 ***
## trt1 0.22035 0.02506 8.794 4.00e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07924 on 19 degrees of freedom
## Multiple R-squared: 0.9992, Adjusted R-squared: 0.9983
## F-statistic: 1154 on 20 and 19 DF, p-value: < 2.2e-16
the coefficient estimate and t values of trt
is
identical between the paired t-test and the linear model which includes
block as a factor. In this special case if we want to summary the test
results using ANOVA we can use the default anova
function
which carries out sequential tests (i.e., the order of the variables in
the models matters!). Here we want to first factor out block effects
then test trt
effect
anova(lm(y ~ block + trt))
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## block 19 144.451 7.6027 1210.906 < 2.2e-16 ***
## trt 1 0.486 0.4855 77.333 3.998e-08 ***
## Residuals 19 0.119 0.0063
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
we can verify that this is identical to the more usual type 3 anova test we are typically interesed when using observational (rather than experimental) datasets.
car::Anova(lm(y ~ block + trt), type = 3)
## Anova Table (Type III tests)
##
## Response: y
## Sum Sq Df F value Pr(>F)
## (Intercept) 0.198 1 31.589 2.028e-05 ***
## block 144.451 19 1210.906 < 2.2e-16 ***
## trt 0.486 1 77.333 3.998e-08 ***
## Residuals 0.119 19
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1