feglm: Efficiently fit glm's with high-dimensional \(k\)-way fixed effects

Description

feglm can be used to fit generalized linear models with many high-dimensional fixed effects. The estimation procedure is based on unconditional maximum likelihood and can be interpreted as a “weighted demeaning” approach that combines the work of Gaure (2013) and Stammann et. al. (2016). For technical details see Stammann (2018). The routine is well suited for large data sets that would be otherwise infeasible to use due to memory limitations.

Remark: The term fixed effect is used in econometrician's sense of having intercepts for each level in each category.

Usage

feglm(
  formula = NULL,
  data = NULL,
  family = binomial(),
  weights = NULL,
  beta.start = NULL,
  eta.start = NULL,
  control = NULL
)

Value

The function feglm returns a named list of class "feglm".

Arguments

formula: an object of class "formula": a symbolic description of the model to be fitted. formula must be of type y ~ x | k, where the second part of the formula refers to factors to be concentrated out. It is also possible to pass additional variables to feglm (e.g. to cluster standard errors). This can be done by specifying the third part of the formula: y ~ x | k | add.
data: an object of class "data.frame" containing the variables in the model.
family: a description of the error distribution and link function to be used in the model. Similar to glm.fit this has to be the result of a call to a family function. Default is binomial(). See family for details of family functions.
weights: an optional string with the name of the 'prior weights' variable in data.
beta.start: an optional vector of starting values for the structural parameters in the linear predictor. Default is \(\boldsymbol{\beta} = \mathbf{0}\).
eta.start: an optional vector of starting values for the linear predictor.
control: a named list of parameters for controlling the fitting process. See feglmControl for details.

Details

If feglm does not converge this is often a sign of linear dependence between one or more regressors and a fixed effects category. In this case, you should carefully inspect your model specification.

References

Gaure, S. (2013). "OLS with Multiple High Dimensional Category Variables". Computational Statistics and Data Analysis, 66.

Marschner, I. (2011). "glm2: Fitting generalized linear models with convergence problems". The R Journal, 3(2).

Stammann, A., F. Heiss, and D. McFadden (2016). "Estimating Fixed Effects Logit Models with Large Panel Data". Working paper.

Stammann, A. (2018). "Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional k-Way Fixed Effects". ArXiv e-prints.

Examples

Run this code

# \donttest{
# Generate an artificial data set for logit models
library(alpaca)
data <- simGLM(1000L, 20L, 1805L, model = "logit")

# Fit 'feglm()'
mod <- feglm(y ~ x1 + x2 + x3 | i + t, data)
summary(mod)
# }

Run the code above in your browser using DataLab