Home Page - http://dmcglinn.github.io/quant_methods/ GitHub Repo - https://github.com/dmcglinn/quant_methods
This mini-lesson is to introduce the concept of standardized regression coefficients in R. A standardized regression coefficient is simply the \(\beta\) estimate from a regression on standardized variables. A standardized variable is a variable that has a mean of 0 and a standard deviation of 1.
One reason for standardizing variables is that you can interpret the \(\beta\) estimates as partial correlation coefficients. In other words now that the variables are standardized you can compare how correlated they are to the response variable using their regression coefficients. Below is a demo of this.
## We will use this function to plot the data and correlations
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor=3, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y))
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste0(prefix, txt)
if(missing(cex.cor))
cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor)
}
Simulate some data for running models. Here to provide a clear demonstration we need explanatory variables that are independent normal variates.
set.seed(10)
n = 90
x1 = rnorm(n)
x2 = rnorm(n)
x3 = rnorm(n)
#create noise b/c there is always error in real life
epsilon = rnorm(n, 0, 3)
#generate response: additive model plus noise, intercept=0
y = 2*x1 + x2 + 3*x3 + epsilon
#organize predictors in data frame
sim_data = data.frame(y, x1, x2, x3)
Before standardizing variables it is worthwhile to highlight that the relationship between correlation and regression statistics. Specifically, the t-statistic from a simple correlation coefficient is exactly what is reported for the \(\beta_1\) coefficient in a regression model.
cor.test(sim_data$y, sim_data$x1)$statistic
## t
## 3.28821
summary(lm(y ~ x1, data=sim_data))$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5675109 0.5015436 1.131529 0.260906812
## x1 1.7411233 0.5295049 3.288210 0.001450304
The \(\beta\) coefficient reported
by the regression is not equal to the correlation coefficient though
because the \(\beta\) is in the units
of the \(x_1\) variable (i.e., it has
not been standardized). Now let’s use the function scale()
to standardize the independent and dependent variables.
sim_data_std = data.frame(scale(sim_data))
mod = lm(y ~ x1 + x2 + x3, data=sim_data)
mod_std = lm(y ~ x1 + x2 + x3, data=sim_data_std)
round(summary(mod)$coef, 3)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.089 0.320 3.400 0.001
## x1 2.071 0.336 6.161 0.000
## x2 1.089 0.327 3.327 0.001
## x3 3.580 0.323 11.076 0.000
round(summary(mod_std)$coef, 3)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.000 0.063 0.000 1.000
## x1 0.393 0.064 6.161 0.000
## x2 0.212 0.064 3.327 0.001
## x3 0.707 0.064 11.076 0.000
cor(sim_data$y, sim_data$x1)
## [1] 0.3307912
cor(sim_data$y, sim_data$x2)
## [1] 0.2037518
cor(sim_data$y, sim_data$x3)
## [1] 0.6772098
Notice that above the t-statistics and consequently the p-values
between mod
and mod_std
don’t change (with the
exception of the intercept term which is always 0 in a regression of
standardized variables). This is because the t-statistic is a pivotal
statistic meaning that its value doesn’t depend on the scale of the
difference.
Additionally notice that the individual correlation coefficients are
very similar to the \(\beta\) estimates
in mod_std
. Why are these not exactly the same? Here’s a
hint - what would happen if their was strong multicollinarity between
the explanatory variables?
Let’s plot the variables against one another and also display their individual Pearson correlation coefficients to get a visual perspective on the problem
pairs(sim_data, lower.panel = panel.cor, upper.panel = panel.smooth)
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
Home Page - http://dmcglinn.github.io/quant_methods/ GitHub Repo - https://github.com/dmcglinn/quant_methods