demean()
computes group- and de-meaned versions of a
variable that can be used in regression analysis to model the between-
and within-subject effect.
demean(
x,
select,
group,
suffix_demean = "_within",
suffix_groupmean = "_between"
)
A data frame.
Character vector with names of variables to select that should be group- and de-meaned.
Name of the variable that indicates the group- or cluster-ID.
String value, will be appended to the names of the
group-meaned and de-meaned variables of x
. By default, de-meaned
variables will be suffixed with "_within"
and grouped-meaned variables
with "_between"
.
A data frame with the group-/de-meaned variables, which get the suffix
"_between"
(for the group-meaned variable) and "_within"
(for
the de-meaned variable) by default.
demean()
is intended to create group- and de-meaned variables
for panel regression models (fixed effects models), or for complex
random-effect-within-between models (see Bell et al. 2018),
where group-effects (random effects) and fixed effects correlate (see
Bafumi and Gelman 2006)). This violation of one of the
Gauss-Markov-assumptions can happen, for instance, when analysing panel
data. To control for correlating predictors and group effects, it is
recommended to include the group-meaned and de-meaned version of
time-varying covariates in the model. By this, one can fit
complex multilevel models for panel data, including time-varying predictors,
time-invariant predictors and random effects. This approach is superior to
classic fixed-effects models, which lack information of variation in the
group-effects or between-subject effects.
The group-meaned variable is simply the mean of an independent variable
within each group (or id-level or cluster) represented by group
.
It represents the cluster-mean of an independent variable. The de-meaned
variable is then the centered version of the group-meaned variable. De-meaning
is sometimes also called person-mean centering or centering within clusters.
For continuous time-varying predictors, the recommendation is to include both their de-meaned and group-meaned versions as fixed effects, but not the raw (untransformed) time-varying predictors themselves. The de-meaned predictor should also be included as random effect (random slope). In regression models, the coefficient of the de-meaned predictors indicates the within-subject effect, while the coefficient of the group-meaned predictor indicates the between-subject effect.
For binary time-varying predictors, the recommendation is to include
the raw (untransformed) binary predictor as fixed effect only and the
de-meaned variable as random effect (random slope)
(Hoffmann 2015, chapter 8-2.I). demean()
will thus coerce
categorical time-varying predictors to numeric to compute the de- and
group-meaned versions for these variables.
There are multiple ways to deal with interaction terms of within- and
between-effects. A classical approach is to simply use the product
term of the de-meaned variables (i.e. introducing the de-meaned variables
as interaction term in the model formula, e.g. y ~ x_within * time_within
).
This approach, however, might be subject to bias (see Giesselmann & Schmidt-Catran 2018).
Another option is to first calculate the product term and then apply the
de-meaning to it. This approach produces an estimator “that reflects
unit-level differences of interacted variables whose moderators vary
within units”, which is desirable if no within interaction of
two time-dependent variables is required.
A third option, when the interaction should result in a genuine within
estimator, is to "double de-mean" the interaction terms
(Giesselmann & Schmidt-Catran 2018), however, this is currently
not supported by demean()
. If this is required, the wmb()
function from the panelr package should be used.
To de-mean interaction terms for within-between models, simply specify
the term as interaction for the select
-argument, e.g.
select = "a*b"
(see 'Examples').
A description of how to translate the
formulas described in Bell et al. 2018 into R using lmer()
from lme4 or glmmTMB()
from glmmTMB can be found here:
for lmer()
and for glmmTMB().
Bafumi J, Gelman A. 2006. Fitting Multilevel Models When Predictors and Group Effects Correlate. In. Philadelphia, PA: Annual meeting of the American Political Science Association.
Bell A, Fairbrother M, Jones K. 2018. Fixed and Random Effects Models: Making an Informed Choice. Quality & Quantity.
Giesselmann M, Schmidt-Catran A. (2018). Interactions in fixed effects regression models (Discussion Papers of DIW Berlin No. 1748). DIW Berlin, German Institute for Economic Research. Retrieved from https://ideas.repec.org/p/diw/diwwpp/dp1748.html
Hoffman L. 2015. Longitudinal analysis: modeling within-person fluctuation and change. New York: Routledge
# NOT RUN {
data(iris)
iris$ID <- sample(1:4, nrow(iris), replace = TRUE) # fake-ID
iris$binary <- as.factor(rbinom(150, 1, .35)) # binary variable
x <- demean(iris, select = c("Sepal.Length", "Petal.Length"), group = ID)
head(x)
x <- demean(iris, select = c("Sepal.Length", "binary", "Species"), group = ID)
head(x)
# demean interaction term x*y
dat <- data.frame(
a = c(1, 2, 3, 4, 1, 2, 3, 4),
x = c(4, 3, 3, 4, 1, 2, 1, 2),
y = c(1, 2, 1, 2, 4, 3, 2, 1),
ID = c(1, 2, 3, 1, 2, 3, 1, 2)
)
demean(dat, select = c("a", "x*y"), group = "ID")
# }
Run the code above in your browser using DataLab