demean: Compute group-meaned and de-meaned variables

Description

demean() computes group- and de-meaned versions of a variable that can be used in regression analysis to model the between- and within-subject effect.

Usage

demean(
  x,
  select,
  group,
  suffix_demean = "_within",
  suffix_groupmean = "_between"
)

Arguments

A data frame.

select

Character vector with names of variables to select that should be group- and de-meaned.

group

Name of the variable that indicates the group- or cluster-ID.

suffix_demean, suffix_groupmean

String value, will be appended to the names of the group-meaned and de-meaned variables of x. By default, de-meaned variables will be suffixed with "_within" and grouped-meaned variables with "_between".

Value

A data frame with the group-/de-meaned variables, which get the suffix "_between" (for the group-meaned variable) and "_within" (for the de-meaned variable) by default.

Details

Panel data and correlating fixed and group effects

demean() is intended to create group- and de-meaned variables for panel regression models (fixed effects models), or for complex random-effect-within-between models (see Bell et al. 2018), where group-effects (random effects) and fixed effects correlate (see Bafumi and Gelman 2006)). This violation of one of the Gauss-Markov-assumptions can happen, for instance, when analysing panel data. To control for correlating predictors and group effects, it is recommended to include the group-meaned and de-meaned version of time-varying covariates in the model. By this, one can fit complex multilevel models for panel data, including time-varying predictors, time-invariant predictors and random effects. This approach is superior to classic fixed-effects models, which lack information of variation in the group-effects or between-subject effects.

Terminology

The group-meaned variable is simply the mean of an independent variable within each group (or id-level or cluster) represented by group. It represents the cluster-mean of an independent variable. The de-meaned variable is then the centered version of the group-meaned variable. De-meaning is sometimes also called person-mean centering or centering within clusters.

De-meaning with continuous predictors

For continuous time-varying predictors, the recommendation is to include both their de-meaned and group-meaned versions as fixed effects, but not the raw (untransformed) time-varying predictors themselves. The de-meaned predictor should also be included as random effect (random slope). In regression models, the coefficient of the de-meaned predictors indicates the within-subject effect, while the coefficient of the group-meaned predictor indicates the between-subject effect.

De-meaning with binary predictors

For binary time-varying predictors, the recommendation is to include the raw (untransformed) binary predictor as fixed effect only and the de-meaned variable as random effect (random slope) (Hoffmann 2015, chapter 8-2.I). demean() will thus coerce categorical time-varying predictors to numeric to compute the de- and group-meaned versions for these variables.

De-meaning interaction terms

There are multiple ways to deal with interaction terms of within- and between-effects. A classical approach is to simply use the product term of the de-meaned variables (i.e. introducing the de-meaned variables as interaction term in the model formula, e.g. y ~ x_within * time_within). This approach, however, might be subject to bias (see Giesselmann & Schmidt-Catran 2018).

Another option is to first calculate the product term and then apply the de-meaning to it. This approach produces an estimator “that reflects unit-level differences of interacted variables whose moderators vary within units”, which is desirable if no within interaction of two time-dependent variables is required. A third option, when the interaction should result in a genuine within estimator, is to "double de-mean" the interaction terms (Giesselmann & Schmidt-Catran 2018), however, this is currently not supported by demean(). If this is required, the wmb() function from the panelr package should be used. To de-mean interaction terms for within-between models, simply specify the term as interaction for the select-argument, e.g. select = "a*b" (see 'Examples').

Analysing panel data with mixed models using lme4

A description of how to translate the formulas described in Bell et al. 2018 into R using lmer() from lme4 or glmmTMB() from glmmTMB can be found here: for lmer() and for glmmTMB().

References

Bafumi J, Gelman A. 2006. Fitting Multilevel Models When Predictors and Group Effects Correlate. In. Philadelphia, PA: Annual meeting of the American Political Science Association.
Bell A, Fairbrother M, Jones K. 2018. Fixed and Random Effects Models: Making an Informed Choice. Quality & Quantity.
Giesselmann M, Schmidt-Catran A. (2018). Interactions in fixed effects regression models (Discussion Papers of DIW Berlin No. 1748). DIW Berlin, German Institute for Economic Research. Retrieved from https://ideas.repec.org/p/diw/diwwpp/dp1748.html
Hoffman L. 2015. Longitudinal analysis: modeling within-person fluctuation and change. New York: Routledge

Examples

Run this code

# NOT RUN {
data(iris)
iris$ID <- sample(1:4, nrow(iris), replace = TRUE) # fake-ID
iris$binary <- as.factor(rbinom(150, 1, .35)) # binary variable

x <- demean(iris, select = c("Sepal.Length", "Petal.Length"), group = ID)
head(x)

x <- demean(iris, select = c("Sepal.Length", "binary", "Species"), group = ID)
head(x)

# demean interaction term x*y
dat <- data.frame(
  a = c(1, 2, 3, 4, 1, 2, 3, 4),
  x = c(4, 3, 3, 4, 1, 2, 1, 2),
  y = c(1, 2, 1, 2, 4, 3, 2, 1),
  ID = c(1, 2, 3, 1, 2, 3, 1, 2)
)
demean(dat, select = c("a", "x*y"), group = "ID")
# }

Run the code above in your browser using DataLab