
stdize
standardizes variables by centring and scaling.
stdizeFit
modifies a model call or existing model to use standardized
variables.
# S3 method for default
stdize(x, center = TRUE, scale = TRUE, ...)# S3 method for logical
stdize(x, binary = c("center", "scale", "binary", "half", "omit"),
center = TRUE, scale = FALSE, ...)
## also for two-level factors
# S3 method for data.frame
stdize(x, binary = c("center", "scale", "binary", "half", "omit"),
center = TRUE, scale = TRUE, omit.cols = NULL, source = NULL,
prefix = TRUE, append = FALSE, ...)
# S3 method for formula
stdize(x, data = NULL, response = FALSE,
binary = c("center", "scale", "binary", "half", "omit"),
center = TRUE, scale = TRUE, omit.cols = NULL, prefix = TRUE,
append = FALSE, ...)
stdizeFit(object, newdata, which = c("formula", "subset", "offset", "weights",
"fixed", "random", "model"), evaluate = TRUE, quote = NA)
stdize
returns a vector or object of the same dimensions as x
,
where the values are centred and/or scaled. Transformation is carried out
column-wise in data.frame
s and matrices.
The returned value is compatible with that of scale
in that the
numeric centring and scalings used are stored in attributes
"scaled:center"
and "scaled:scale"
(these can be NA
if no
centring or scaling has been done).
stdizeFit
returns a modified, fitted model object that uses transformed
variables from newdata
, or, if evaluate
is FALSE
, an
unevaluated call where the variable names are replaced to point the transformed
variables.
a numeric or logical vector, factor, numeric matrix,
data.frame
or a formula.
either a logical value or a logical or numeric vector
of length equal to the number of columns of x
(see
‘Details’). scale
can be also a function to use for
scaling.
specifies how binary variables (logical or two-level factors)
are scaled. Default is to "center"
by subtracting the mean
assuming levels are equal to 0 and 1; use "scale"
to
both centre and scale by SD, "binary"
to centre to 0 /
1, "half"
to centre to -0.5 / 0.5, and "omit"
to leave
binary variables unmodified.
This argument has precedence over center
and scale
, unless
it is set to NA
(in which case binary variables are treated like
numeric variables).
a reference data.frame
, being a result of previous
stdize
, from which scale
and center
values are
taken. Column names are matched. This can be used for scaling new data
using statistics of another data.
column names or numeric indices of columns that should be left unaltered.
either a logical value specifying whether the names of transformed columns should be prefixed, or a two-element character vector giving the prefixes. The prefixes default to “z.” for scaled and “c.” for centred variables.
logical, if TRUE
, modified columns are appended to the
original data frame.
logical, stating whether the response should be standardized. By default, only variables on the right-hand side of the formula are standardized.
an object coercible to data.frame
, containing the
variables in formula
. Passed to, and used by model.frame
.
a data.frame
returned by stdize
, to be used
by the modified model.
for the formula
method, additional arguments passed to
model.frame
. For other methods, it is silently ignored.
a fitted model object or an expression being a call
to
the modelling function.
a character string naming arguments which should be modified.
This should be all arguments which are evaluated in the data
environment. Can be also TRUE
to modify the expression as a
whole. The data
argument is additionally replaced with that
passed to stdizeFit
.
if TRUE
, the modified call is evaluated and the
fitted model object is returned.
if TRUE
, avoids evaluating object
. Equivalent to
stdizeFit(quote(expr), ...)
. Defaults to NA
in which case
object
being a call to non-primitive function is quoted.
Kamil Bartoń
stdize
resembles scale
, but uses special rules
for factors, similarly to standardize
in package arm.
stdize
differs from standardize
in that it is used on
data rather than on the fitted model object. The scaled data should afterwards
be passed to the modelling function, instead of the original data.
Unlike standardize
, it applies special ‘binary’ scaling only to
two-level factor
s and logical variables, rather than to any variable with
two unique values.
Variables of only one unique value are unchanged.
By default, stdize
scales by dividing by standard deviation rather than twice
the SD as standardize
does. Scaling by SD is used
also on uncentred values, which is different from scale
where
root-mean-square is used.
If center
or scale
are logical scalars or vectors of length equal
to the number of columns of x
, the centring is done by subtracting the
mean (if center
corresponding to the column is TRUE
), and scaling
is done by dividing the (centred) value by standard deviation (if corresponding
scale
is TRUE
).
If center
or scale
are numeric vectors with length equal
to the number of columns of x
(or numeric scalars for vector methods),
then these are used instead. Any NA
s in the numeric vector result in no
centring or scaling on the corresponding column.
Note that scale = 0
is equivalent to no scaling (i.e. scale = 1
).
Binary variables, logical or factors with two levels, are converted to
numeric variables and transformed according to the argument binary
,
unless center
or scale
are explicitly given.
Gelman, A. 2008 Scaling regression inputs by dividing by two standard deviations. Statistics in medicine 27, 2865--2873.
# compare "stdize" and "scale"
nmat <- matrix(runif(15, 0, 10), ncol = 3)
stdize(nmat)
scale(nmat)
rootmeansq <- function(v) {
v <- v[!is.na(v)]
sqrt(sum(v^2) / max(1, length(v) - 1L))
}
scale(nmat, center = FALSE)
stdize(nmat, center = FALSE, scale = rootmeansq)
if(require(lme4)) {
# define scale function as twice the SD to reproduce "arm::standardize"
twosd <- function(v) 2 * sd(v, na.rm = TRUE)
# standardize data (scaled variables are prefixed with "z.")
z.CO2 <- stdize(uptake ~ conc + Plant, data = CO2, omit = "Plant", scale = twosd)
summary(z.CO2)
fmz <- stdizeFit(lmer(uptake ~ conc + I(conc^2) + (1 | Plant)), newdata = z.CO2)
# produces:
# lmer(uptake ~ z.conc + I(z.conc^2) + (1 | Plant), data = z.CO2)
## standardize using scale and center from "z.CO2", keeping the original data:
z.CO2a <- stdize(CO2, source = z.CO2, append = TRUE)
# Here, the "subset" expression uses untransformed variable, so we modify only
# "formula" argument, keeping "subset" as-is. For that reason we needed the
# untransformed variables in "newdata".
stdizeFit(lmer(uptake ~ conc + I(conc^2) + (1 | Plant),
subset = conc > 100,
), newdata = z.CO2a, which = "formula", evaluate = FALSE)
# create new data as a sequence along "conc"
newdata <- data.frame(conc = seq(min(CO2$conc), max(CO2$conc), length = 10))
# scale new data using scale and center of the original scaled data:
z.newdata <- stdize(newdata, source = z.CO2)
if(require(graphics)) {
# plot predictions against "conc" on real scale:
plot(newdata$conc, predict(fmz, z.newdata, re.form = NA))
}
# compare with "arm::standardize"
if (FALSE) {
library(arm)
fms <- standardize(lmer(uptake ~ conc + I(conc^2) + (1 | Plant), data = CO2))
plot(newdata$conc, predict(fms, z.newdata, re.form = NA))
}
}
Run the code above in your browser using DataLab