spread_coef: Spread model coefficients of list-variables into columns

Description

This function extracts coefficients (and standard error and p-values) of fitted model objects from (nested) data frames, which are saved in a list-variable, and spreads the coefficients into new colummns.

Usage

spread_coef(data, model.column, model.term, se, p.val, append = TRUE,
  ...)

Arguments

data

A (nested) data frame with a list-variable that contains fitted model objects (see 'Details').

model.column

Name or index of the list-variable that contains the fitted model objects.

model.term

Optional, name of a model term. If specified, only this model term (including p-value) will be extracted from each model and added as new column.

Logical, if TRUE, standard errors for estimates will also be extracted.

p.val

Logical, if TRUE, p-values for estimates will also be extracted.

append

Logical, if TRUE (default), this function returns data with new columns for the model coefficients; else, a new data frame with model coefficients only are returned.

...

Other arguments passed down to the tidy-function.

Value

A data frame with columns for each coefficient of the models that are stored in the list-variable of data; or, if model.term is given, a data frame with the term's estimate. If se = TRUE or p.val = TRUE, the returned data frame also contains columns for the coefficients' standard error and p-value. If append = TRUE, the columns are appended to data, i.e. data is also returned.

Details

This function requires a (nested) data frame (e.g. created by the nest-function of the tidyr-package), where several fitted models are saved in a list-variable (see 'Examples'). Since nested data frames with fitted models stored as list-variable are typically fit with an identical formula, all models have the same dependent and independent variables and only differ in their subsets of data. The function then extracts all coefficients from each model and saves each estimate in a new column. The result is a data frame, where each row is a model with each model's coefficients in an own column.

Examples

Run this code

# NOT RUN {
library(dplyr)
library(tidyr)
library(purrr)
data(efc)

# create nested data frame, grouped by dependency (e42dep)
# and fit linear model for each group. These models are
# stored in the list variable "models".
model.data <- efc %>%
  filter(!is.na(e42dep)) %>%
  group_by(e42dep) %>%
  nest() %>%
  mutate(
    models = map(data, ~lm(neg_c_7 ~ c12hour + c172code, data = .x))
  )

# spread coefficients, so we can easily access and compare the
# coefficients over all models. arguments `se` and `p.val` default
# to `FALSE`, when `model.term` is not specified
spread_coef(model.data, models)
spread_coef(model.data, models, se = TRUE)

# select only specific model term. `se` and `p.val` default to `TRUE`
spread_coef(model.data, models, c12hour)

# spread_coef can be used directly within a pipe-chain
efc %>%
  filter(!is.na(e42dep)) %>%
  group_by(e42dep) %>%
  nest() %>%
  mutate(
    models = map(data, ~lm(neg_c_7 ~ c12hour + c172code, data = .x))
  ) %>%
  spread_coef(models)

# spread_coef() makes it easy to generate bootstrapped
# confidence intervals, using the 'bootstrap()' and 'boot_ci()'
# functions from the 'sjstats' package, which creates nested
# data frames of bootstrap replicates
library(sjstats)
efc %>%
  # generate bootstrap replicates
  bootstrap(100) %>%
  # apply lm to all bootstrapped data sets
  mutate(
    models = map(strap, ~lm(neg_c_7 ~ e42dep + c161sex + c172code, data = .x))
  ) %>%
  # spread model coefficient for all 100 models
  spread_coef(models, se = FALSE, p.val = FALSE) %>%
  # compute the CI for all bootstrapped model coefficients
  boot_ci(e42dep, c161sex, c172code)

# }

Run the code above in your browser using DataLab