icc: Intraclass Correlation Coefficient (ICC)

Description

This function calculates the intraclass-correlation coefficient (ICC) - sometimes also called variance partition coefficient (VPC) or repeatability - for mixed effects models. The ICC can be calculated for all models supported by insight::get_variance(). For models fitted with the brms-package, icc() might fail due to the large variety of models and families supported by the brms-package. In such cases, an alternative to the ICC is the variance_decomposition(), which is based on the posterior predictive distribution (see 'Details').

Usage

icc(
  model,
  by_group = FALSE,
  tolerance = 1e-05,
  ci = NULL,
  iterations = 100,
  ci_method = NULL,
  verbose = TRUE,
  ...
)
variance_decomposition(model, re_formula = NULL, robust = TRUE, ci = 0.95, ...)

Value

A list with two values, the adjusted ICC and the unadjusted ICC. For variance_decomposition(), a list with two values, the decomposed ICC as well as the credible intervals for this ICC.

Arguments

model: A (Bayesian) mixed effects model.
by_group: Logical, if TRUE, icc() returns the variance components for each random-effects level (if there are multiple levels). See 'Details'.
tolerance: Tolerance for singularity check of random effects, to decide whether to compute random effect variances or not. Indicates up to which value the convergence result is accepted. The larger tolerance is, the stricter the test will be. See performance::check_singularity().
ci: Confidence resp. credible interval level. For icc() and r2(), confidence intervals are based on bootstrapped samples from the ICC resp. R2 value. See iterations.
iterations: Number of bootstrap-replicates when computing confidence intervals for the ICC or R2.
ci_method: Character string, indicating the bootstrap-method. Should be NULL (default), in which case lme4::bootMer() is used for bootstrapped confidence intervals. However, if bootstrapped intervals cannot be calculated this was, try ci_method = "boot", which falls back to boot::boot(). This may successfully return bootstrapped confidence intervals, but bootstrapped samples may not be appropriate for the multilevel structure of the model. There is also an option ci_method = "analytical", which tries to calculate analytical confidence assuming a chi-squared distribution. However, these intervals are rather inaccurate and often too narrow. It is recommended to calculate bootstrapped confidence intervals for mixed models.
verbose: Toggle warnings and messages.
...: Arguments passed down to brms::posterior_predict().
re_formula: Formula containing group-level effects to be considered in the prediction. If NULL (default), include all group-level effects. Else, for instance for nested models, name a specific group-level effect to calculate the variance decomposition for this group-level. See 'Details' and ?brms::posterior_predict.
robust: Logical, if TRUE, the median instead of mean is used to calculate the central tendency of the variances.

Details

Interpretation

The ICC can be interpreted as "the proportion of the variance explained by the grouping structure in the population". The grouping structure entails that measurements are organized into groups (e.g., test scores in a school can be grouped by classroom if there are multiple classrooms and each classroom was administered the same test) and ICC indexes how strongly measurements in the same group resemble each other. This index goes from 0, if the grouping conveys no information, to 1, if all observations in a group are identical (Gelman and Hill, 2007, p. 258). In other word, the ICC - sometimes conceptualized as the measurement repeatability - "can also be interpreted as the expected correlation between two randomly drawn units that are in the same group" (Hox 2010: 15), although this definition might not apply to mixed models with more complex random effects structures. The ICC can help determine whether a mixed model is even necessary: an ICC of zero (or very close to zero) means the observations within clusters are no more similar than observations from different clusters, and setting it as a random factor might not be necessary.

Difference with R2

The coefficient of determination R2 (that can be computed with r2()) quantifies the proportion of variance explained by a statistical model, but its definition in mixed model is complex (hence, different methods to compute a proxy exist). ICC is related to R2 because they are both ratios of variance components. More precisely, R2 is the proportion of the explained variance (of the full model), while the ICC is the proportion of explained variance that can be attributed to the random effects. In simple cases, the ICC corresponds to the difference between the conditional R2 and the marginal R2 (see r2_nakagawa()).

Calculation

The ICC is calculated by dividing the random effect variance, σ²_i, by the total variance, i.e. the sum of the random effect variance and the residual variance, σ²_ε.

Adjusted and unadjusted ICC

icc() calculates an adjusted and an unadjusted ICC, which both take all sources of uncertainty (i.e. of all random effects) into account. While the adjusted ICC only relates to the random effects, the unadjusted ICC also takes the fixed effects variances into account, more precisely, the fixed effects variance is added to the denominator of the formula to calculate the ICC (see Nakagawa et al. 2017). Typically, the adjusted ICC is of interest when the analysis of random effects is of interest. icc() returns a meaningful ICC also for more complex random effects structures, like models with random slopes or nested design (more than two levels) and is applicable for models with other distributions than Gaussian. For more details on the computation of the variances, see ?insight::get_variance.

ICC for unconditional and conditional models

Usually, the ICC is calculated for the null model ("unconditional model"). However, according to Raudenbush and Bryk (2002) or Rabe-Hesketh and Skrondal (2012) it is also feasible to compute the ICC for full models with covariates ("conditional models") and compare how much, e.g., a level-2 variable explains the portion of variation in the grouping structure (random intercept).

ICC for specific group-levels

The proportion of variance for specific levels related to the overall model can be computed by setting by_group = TRUE. The reported ICC is the variance for each (random effect) group compared to the total variance of the model. For mixed models with a simple random intercept, this is identical to the classical (adjusted) ICC.

Variance decomposition for brms-models

If model is of class brmsfit, icc() might fail due to the large variety of models and families supported by the brms package. In such cases, variance_decomposition() is an alternative ICC measure. The function calculates a variance decomposition based on the posterior predictive distribution. In this case, first, the draws from the posterior predictive distribution not conditioned on group-level terms (posterior_predict(..., re_formula = NA)) are calculated as well as draws from this distribution conditioned on all random effects (by default, unless specified else in re_formula) are taken. Then, second, the variances for each of these draws are calculated. The "ICC" is then the ratio between these two variances. This is the recommended way to analyse random-effect-variances for non-Gaussian models. It is then possible to compare variances across models, also by specifying different group-level terms via the re_formula-argument.

Sometimes, when the variance of the posterior predictive distribution is very large, the variance ratio in the output makes no sense, e.g. because it is negative. In such cases, it might help to use robust = TRUE.

References

Hox, J. J. (2010). Multilevel analysis: techniques and applications (2nd ed). New York: Routledge.
Nakagawa, S., Johnson, P. C. D., and Schielzeth, H. (2017). The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. Journal of The Royal Society Interface, 14(134), 20170213.
Rabe-Hesketh, S., and Skrondal, A. (2012). Multilevel and longitudinal modeling using Stata (3rd ed). College Station, Tex: Stata Press Publication.
Raudenbush, S. W., and Bryk, A. S. (2002). Hierarchical linear models: applications and data analysis methods (2nd ed). Thousand Oaks: Sage Publications.

Examples

Run this code

if (require("lme4")) {
  model <- lmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris)
  icc(model)
}

# ICC for specific group-levels
if (require("lme4")) {
  data(sleepstudy)
  set.seed(12345)
  sleepstudy$grp <- sample(1:5, size = 180, replace = TRUE)
  sleepstudy$subgrp <- NA
  for (i in 1:5) {
    filter_group <- sleepstudy$grp == i
    sleepstudy$subgrp[filter_group] <-
      sample(1:30, size = sum(filter_group), replace = TRUE)
  }
  model <- lmer(
    Reaction ~ Days + (1 | grp / subgrp) + (1 | Subject),
    data = sleepstudy
  )
  icc(model, by_group = TRUE)
}

Run the code above in your browser using DataLab