demean()
computes group- and de-meaned versions of a
variable that can be used in regression analysis to model the between-
and within-subject effect.
demean(
x,
select,
group,
suffix_demean = "_within",
suffix_groupmean = "_between"
)
A data frame.
Character vector with names of variables to select that should be group- and de-meaned.
Name of the variable that indicates the group- or cluster-ID.
String value, will be appended to the names of the
group-meaned and de-meaned variables of x
. By default, de-meaned
variables will be suffixed with "_within"
and grouped-meaned variables
with "_between"
.
A data frame with the group-/de-meaned variables, which get the suffix
"_between"
(for the group-meaned variable) and "_within"
(for
the de-meaned variable) by default.
Mixed models include different levels of sources of variability, i.e. error terms at each level. When macro-indicators (or level-2 predictors, or higher-level units, or more general: group-level predictors that are constant within groups, such as "education" within participants, or GDP within countries) are included as fixed effects (i.e. treated as covariate at level-1), the variance that is left unaccounted for this covariate will be absorbed into the error terms of level-1 and level-2. Hence, the error terms will be correlated with the covariate, which violates one of the assumptions of mixed models (iid, independent and identically distributed error terms). This bias is also called the heterogeneity bias (Bell et al. 2015). To resolve this problem, level-2 predictors used as (level-1) covariates should be "group-meaned".
demean()
is intended to create group- and de-meaned variables
for panel regression models (fixed effects models), or for complex
random-effect-within-between models (see Bell et al. 2015, 2018),
where group-effects (random effects) and fixed effects correlate (see
Bafumi and Gelman 2006). This can happen, for instance, when
analyzing panel data, which can lead to Heterogeneity Bias. To
control for correlating predictors and group effects, it is recommended
to include the group-meaned and de-meaned version of time-varying covariates
(and group-meaned version of time-invariant covariates that are on
a higher level, e.g. level-2 predictors) in the model. By this, one can
fit complex multilevel models for panel data, including time-varying
predictors, time-invariant predictors and random effects.
A mixed models approach including time-varying and time-constant fixed effects as well as random effects is superior to classic fixed-effects models, which lack information of variation in the group-effects or between-subject effects. Furthermore, fixed effects regression cannot include random slopes, which means that fixed effects regressions are neglecting “cross-cluster differences in the effects of lower-level controls (which) reduces the precision of estimated context effects, resulting in unnecessarily wide confidence intervals and low statistical power” (Heisig et al. 2017).
The group-meaned variable is simply the mean of an independent variable
within each group (or id-level or cluster) represented by group
.
It represents the cluster-mean of an independent variable. The de-meaned
variable is then the centered version of the group-meaned variable. De-meaning
is sometimes also called person-mean centering or centering within clusters.
For continuous time-varying predictors, the recommendation is to include both their de-meaned and group-meaned versions as fixed effects, but not the raw (untransformed) time-varying predictors themselves. The de-meaned predictor should also be included as random effect (random slope). In regression models, the coefficient of the de-meaned predictors indicates the within-subject effect, while the coefficient of the group-meaned predictor indicates the between-subject effect.
For binary time-varying predictors, the recommendation is to include
the raw (untransformed) binary predictor as fixed effect only and the
de-meaned variable as random effect (random slope)
(Hoffmann 2015, chapter 8-2.I). demean()
will thus coerce
categorical time-varying predictors to numeric to compute the de- and
group-meaned versions for these variables.
Factors with more than two levels are demeaned in two ways: first, these are also converted to numeric and de-meaned; second, dummy variables are created (binary, with 0/1 coding for each level) and these binary dummy-variables are de-meaned in the same way (as described above). Packages like panelr internally convert factors to dummies before demeaning, so this behaviour can be mimicked here.
There are multiple ways to deal with interaction terms of within- and
between-effects. A classical approach is to simply use the product
term of the de-meaned variables (i.e. introducing the de-meaned variables
as interaction term in the model formula, e.g. y ~ x_within * time_within
).
This approach, however, might be subject to bias (see Giesselmann & Schmidt-Catran 2018).
Another option is to first calculate the product term and then apply the
de-meaning to it. This approach produces an estimator “that reflects
unit-level differences of interacted variables whose moderators vary
within units”, which is desirable if no within interaction of
two time-dependent variables is required.
A third option, when the interaction should result in a genuine within
estimator, is to "double de-mean" the interaction terms
(Giesselmann & Schmidt-Catran 2018), however, this is currently
not supported by demean()
. If this is required, the wmb()
function from the panelr package should be used.
To de-mean interaction terms for within-between models, simply specify
the term as interaction for the select
-argument, e.g.
select = "a*b"
(see 'Examples').
A description of how to translate the
formulas described in Bell et al. 2018 into R using lmer()
from lme4 or glmmTMB()
from glmmTMB can be found here:
for lmer()
and for glmmTMB().
Bafumi J, Gelman A. 2006. Fitting Multilevel Models When Predictors and Group Effects Correlate. In. Philadelphia, PA: Annual meeting of the American Political Science Association.
Bell A, Fairbrother M, Jones K. 2018. Fixed and Random Effects Models: Making an Informed Choice. Quality & Quantity.
Bell A, Jones K. 2015. Explaining Fixed Effects: Random Effects Modeling of Time-Series Cross-Sectional and Panel Data. Political Science Research and Methods, 3(1), 133<U+2013>153.
Giesselmann M, Schmidt-Catran A. 2018. Interactions in fixed effects regression models (Discussion Papers of DIW Berlin No. 1748). DIW Berlin, German Institute for Economic Research. Retrieved from https://ideas.repec.org/p/diw/diwwpp/dp1748.html
Heisig JP, Schaeffer M, Giesecke J. 2017. The Costs of Simplicity: Why Multilevel Models May Benefit from Accounting for Cross-Cluster Differences in the Effects of Controls. American Sociological Review 82 (4): 796<U+2013>827.
Hoffman L. 2015. Longitudinal analysis: modeling within-person fluctuation and change. New York: Routledge
# NOT RUN {
data(iris)
iris$ID <- sample(1:4, nrow(iris), replace = TRUE) # fake-ID
iris$binary <- as.factor(rbinom(150, 1, .35)) # binary variable
x <- demean(iris, select = c("Sepal.Length", "Petal.Length"), group = ID)
head(x)
x <- demean(iris, select = c("Sepal.Length", "binary", "Species"), group = ID)
head(x)
# demean interaction term x*y
dat <- data.frame(
a = c(1, 2, 3, 4, 1, 2, 3, 4),
x = c(4, 3, 3, 4, 1, 2, 1, 2),
y = c(1, 2, 1, 2, 4, 3, 2, 1),
ID = c(1, 2, 3, 1, 2, 3, 1, 2)
)
demean(dat, select = c("a", "x*y"), group = "ID")
# }
Run the code above in your browser using DataLab