Learn R Programming

dotwhisker (version 0.8.3)

by_2sd: Rescale regression results by multiplying by 2 standard deviations

Description

by_2sd rescales regression results to facilitate making dot-and-whisker plots using dwplot.

Usage

by_2sd(df, dataset)

Value

A tidy data frame

Arguments

df

A data frame including the variables term (names of independent variables), estimate (corresponding coefficient estimates), std.error (corresponding standard errors), and optionally model (when multiple models are desired on a single plot) such as generated those by tidy.

dataset

The data analyzed in the models whose results are recorded in df, or (preferably) the model matrix used by the models in df; the information required for complex models can more easily be generated from the model matrix than from the original data set. In many cases the model matrix can be extracted from the original model via model.matrix.

Details

by_2sd multiplies the results from regression models saved as tidy data frames for predictors that are not binary by twice the standard deviation of these variables in the dataset analyzed. Standardizing in this way yields coefficients that are directly comparable to each other and to those for untransformed binary predictors (Gelman 2008) and so facilitates plotting using dwplot. Note that the current version of by_2sd does not subtract the mean (in contrast to Gelman's (2008) formula). However, all estimates and standard errors of the independent variables are the same as if the mean was subtracted. The only difference from Gelman (2008) is that for all variables in the model the intercept is shifted by the coefficient times the mean of the variable.

An alternative available in some circumstances is to pass a model object to arm::standardize before passing the results to tidy and then on to dwplot. The advantages of by_2sd are that (1) it takes a tidy data frame as its input and so is not restricted to only those model objects that standardize accepts and (2) it is much more efficient because it operates on the parameters rather than refitting the original model with scaled data.

References

Gelman, Andrew. 2008. "Scaling Regression Inputs by Dividing by Two Standard Deviations." Statistics in Medicine, 27:2865-2873.

Examples

Run this code
library(broom)
library(dplyr)

data(mtcars)
m1 <- lm(mpg ~ wt + cyl + disp, data = mtcars)
m1_df <- tidy(m1) %>% by_2sd(mtcars) # create data frame of rescaled regression results

Run the code above in your browser using DataLab