Partial Residuals Plots

Source Code Link

https://raw.githubusercontent.com/dmcglinn/quant_methods/gh-pages/lessons/partial_residual_plots.Rmd

When working with multiple regression models we often spend a lot of time examining tabular summary tables and not enough time examining graphical fits of the models to the data. Partial residual plots help us to visualize the fit of each independent variable in the multiple regression model after controlling for the other variables. Mathematically partial residuals are defined as:

\(\text{Residuals} + \hat{\beta}_iX_i \text{ versus } X_i,\)

where

\(\text{Residuals = residuals from the full model}\)
\(\hat{\beta}_i \text{= regression coefficient from the i-th independent variable in the full model}\)
\(X_i \text{= the i-th independent variable}\)

We will use the function termplot to examine partial regression plots in R for lm models but these also work for glm models.

Before we go further we should mention two caveats:

This approach is not well suited for models that contain interaction effects because in that case examining the partial effect of a single term that is included in an interaction effect does not make sense.

This approach can also be misleading if there is strong multicollinearity in which the independent variables are highly correlated with each other. In this case, the variance indicated by the partial residual plot can be much less than the actual variance.

We will use a simple simulated example with 3 independent variables.

set.seed(1)
x1 <- rnorm(100)                          # continuous variable 1
x2 <- rnorm(100)                          # continuous variable 2
x3 <- as.factor(rep(c('cnt', 'trt'), 50)) # categorical variable

y <- .5 * x1^2 + .75 * x2 + ifelse(x3 == 'cnt', -0.5, 0.5) + rnorm(100, 0, 0.1)

We can visually examine our simulated data relationships to y prior to modeling

par(mfrow=c(1,3))
plot(y ~ x1)
plot(y ~ x2)
plot(y ~ x3)

Even though in this simulated case we know the ‘true’ model because we designed it by visually examining the relationship between y and each variable individually it does look like x1 has no relationship, x1 has a positive relationship, and for x3 it appears that the trt samples have a higher y

Let’s carryout multiple regression modeling to see if we can uncover more information:

mod <- lm(y ~ x1 + x2 + x3)
summary(mod)

## 
## Call:
## lm(formula = y ~ x1 + x2 + x3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7435 -0.3487 -0.1985  0.1948  2.4465 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.12350    0.08271  -1.493    0.139    
## x1           0.08076    0.06492   1.244    0.217    
## x2           0.66073    0.06060  10.904  < 2e-16 ***
## x3trt        1.03916    0.11606   8.954 2.63e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5775 on 96 degrees of freedom
## Multiple R-squared:  0.678,  Adjusted R-squared:  0.6679 
## F-statistic: 67.37 on 3 and 96 DF,  p-value: < 2.2e-16

OK we fit the model and the tabular output seems to agree with our initial graphical exploration prior to model fitting. Let’s now examine the partial residual plot for this example:

par(mfrow = c(1, 3))
termplot(mod, partial.resid = TRUE, se=T, smooth=panel.smooth, pch=19, cex=0.75,
             col.res = 'black', col.term = 'dodgerblue', lwd.term=2,
             col.se = 'dodgerblue', lwd.se = 2, lty.se=3,
             col.smth = 'red', lty.smth = 2)

First let’s get oriented on these graphics. The y-axis is the partial residual of the response variable y against each specific independent variable. There are several sets of lines on the graphs. The light blue lines are the linear model fits with their associated standard errors (there is more uncertainty at the ends of the best fit line). The graphs also include a localized smoother or loess curve (in red) which fits the data as well as possible.

In the first panel we are examining the relationship y ~ x1 | x2 + x3 where the | stands for given so this formula asking what is the strength of the relationship of x1 to y given or after controlling for x2 and x3 in other words:

lm(residuals(lm(y ~ x2 + x3)) ~ x1)

## 
## Call:
## lm(formula = residuals(lm(y ~ x2 + x3)) ~ x1)
## 
## Coefficients:
## (Intercept)           x1  
##   -0.008711     0.079998

Note that the leftmost panel above indicates something really useful to us. It shows clearly that the regression model (light blue line) has the wrong functional form. It isn’t that x1 has no relationship to y it is just that it is not a linear relationship. The loess smoother (red line) shows this clearly.

The other two panels for x2 and x3 also so their partial effects on y. These panels look good and indicate that our linear model assumption for the relationship between y1 and x2 seems pretty reasonable.

Let’s re-specify our model and re-examine the partial residual plot

mod2 <- lm(y ~ I(x1^2) + x2 + x3)
summary(mod2)

## 
## Call:
## lm(formula = y ~ I(x1^2) + x2 + x3)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.304170 -0.047998  0.003126  0.072060  0.254556 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.50274    0.01654  -30.40   <2e-16 ***
## I(x1^2)      0.49776    0.00928   53.64   <2e-16 ***
## x2           0.74416    0.01109   67.12   <2e-16 ***
## x3trt        1.01460    0.02092   48.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1046 on 96 degrees of freedom
## Multiple R-squared:  0.9894, Adjusted R-squared:  0.9891 
## F-statistic:  2997 on 3 and 96 DF,  p-value: < 2.2e-16

par(mfrow = c(1, 3))
termplot(mod2, partial.resid = TRUE, se=T, smooth=panel.smooth, pch=19, cex=0.75,
             col.res = 'black', col.term = 'dodgerblue', lwd.term=2,
             col.se = 'dodgerblue', lwd.se = 2, lty.se=3,
             col.smth = 'red', lty.smth = 2)

Now the model is properly specified the model fits look better and they have lower uncertainty associated with them.

Home Page - http://dmcglinn.github.io/quant_methods/ GitHub Repo - https://github.com/dmcglinn/quant_methods

Partial Residuals Plots

Dan McGlinn

First created on 2015-01-29. Updated on 2023-01-31

Source Code Link