Learn R Programming

jtools (version 0.5.0)

gscale: Scale and/or center regression inputs, including from survey designs, by dividing by 2 SD

Description

gscale() standardizes variables by dividing them by 2 standard deviations and mean-centering them by default. It contains options for handling binary variables separately. gscale() is a fork of rescale from the arm package---the key feature difference is that gscale() will perform the same functions for variables in svydesign objects. gscale() is also more user-friendly in that it is more flexible in how it accepts input.

Usage

gscale(x = NULL, binary.inputs = "center", data = NULL, n.sd = 2,
  center.only = FALSE, scale.only = FALSE, weights = NULL)

Arguments

x

A vector to be rescaled, or a vector of variable names. If none provided, but data frame or svydesign object is, all columns will be processed and returned.

binary.inputs

Options for binary variables. Default is center; 0/1 keeps original scale; -0.5/0.5 rescales 0 as -0.5 and 1 as 0.5; center subtracts the mean; and full subtracts the mean and divides by 2 sd.

data

A data frame or survey design. Only needed if you would like to rescale multiple variables at once. If x = NULL, all columns will be rescaled. Otherwise, x should be a vector of variable names. If x is a numeric vector, this argument is ignored.

n.sd

By how many standard deviations should the variables be divided by? Default is 2, as in arm's rescale. Choosing 1 would make for a more typical standardization scheme.

center.only

A logical value indicating whether you would like to mean-center the values, but not scale them.

scale.only

A logical value indicating whether you would like to scale the values, but not mean-center them.

weights

A vector of weights equal in length to x. If iterating over a data frame, the weights will need to be equal in length to all the columns to avoid errors. You may need to remove missing values before using the weights.

Details

This function is adapted from the rescale function of the arm package. It is named gscale() after the popularizer of this scaling method, Andrew Gelman. By default, it works just like rescale. But it contains many additional options and can also accept multiple types of input without breaking a sweat.

Only numeric variables are altered when in a data.frame or survey design. Character variables, factors, etc. are skipped.

For those dealing with survey data, if you provide a survey.design object you can rest assured that the mean-centering and scaling is performed with help from the svymean and svyvar functions, respectively. It was among the primary motivations for creating this function. gscale() will not center or scale the weights variables defined in the survey design unless the user specifically requests them in the x = argument.

References

Gelman, A. (2008). Scaling regression inputs by dividing by two standard deviations. Statistics in Medicine, 27, 2865<U+2013>2873. http://www.stat.columbia.edu/~gelman/research/published/standardizing7.pdf

See Also

j_summ is a replacement for the summary function for regression models. On request, it will center and/or standardize variables before printing its output.

Other standardization, scaling, and centering tools: center_lm, scale_lm

Examples

Run this code
# NOT RUN {
x <- rnorm(100, 2, 1)
x2 <- rbinom(100, 1, .5)

# Basic use
gscale(x)
# Normal standardization
gscale(x, n.sd = 1)
# Scale only
gscale(x, scale.only = TRUE)
# Center only
gscale(x, center.only = TRUE)
# Binary inputs
gscale(x2, binary.inputs = "0/1")
gscale(x2, binary.inputs = "full") # treats it like a continous var
gscale(x2, binary.inputs = "-0.5/0.5") # keep scale, center at zero
gscale(x2, binary.inputs = "center") # mean center it

# Data frame as input
gscale(data = mtcars, binary.inputs = "-0.5/0.5") # loops through each numeric column
# Specified vars in data frame
gscale(c("hp", "wt", "vs"), data = mtcars, binary.inputs = "center")

wts <- runif(100, 0, 1)
mtcars$weights <- wts[1:32]

# Weighted inputs
gscale(x, weights = wts)
# If using a weights column of data frame, give its name
gscale(data = mtcars, weights = weights) # will skip over mtcars$weights
# If using a weights column of data frame, can still select variables
gscale(x = c("hp", "wt", "vs"), data = mtcars, weights = weights)

# Survey designs
library(survey)
data(api)
## Create survey design object
dstrat <- svydesign(id = ~1,strata = ~stype, weights = ~pw, data = apistrat,
                     fpc=~fpc)
dstrat$variables$binary <- rbinom(200, 1, 0.5) # Creating test binary variable

gscale(data = dstrat, binary.inputs = "-0.5/0.5")
gscale(c("api00","meals","binary"), data = dstrat, binary.inputs = "-0.5/0.5")




# }

Run the code above in your browser using DataLab