margins-package: Marginal Effects Estimation

Description

This package is an R port of Stata's margins command, implemented as an S3 generic margins() for model objects, like those of class “lm” and “glm”. margins() is an S3 generic function for building a “margins” object from a model object. Methods are currently implemented for “lm” (and, implicitly, “glm”) class objects and support is expanding. See Details, below.

The package also provides a low-level function, marginal_effects, to estimate those quantities and return a data frame of unit-specific effects and another, dydx, to provide variable-specific derivatives from models. Some of the underlying architecture for the package is provided by the low-level function prediction, which provides a consistent data frame interface to predict for a large number of model types.

Usage

margins(model, ...)
# S3 method for default
margins(model, data = find_data(model, parent.frame()),
  at = NULL, type = c("response", "link", "terms"),
  vcov = stats::vcov(model), vce = c("delta", "simulation", "bootstrap",
  "none"), iterations = 50L, unit_ses = FALSE, eps = 1e-07, ...)
# S3 method for lm
margins(model, data = find_data(model, parent.frame()),
  at = NULL, type = c("response", "link", "terms"),
  vcov = stats::vcov(model), vce = c("delta", "simulation", "bootstrap",
  "none"), iterations = 50L, unit_ses = FALSE, eps = 1e-07, ...)
# S3 method for glm
margins(model, data = find_data(model, parent.frame()),
  at = NULL, type = c("response", "link", "terms"),
  vcov = stats::vcov(model), vce = c("delta", "simulation", "bootstrap",
  "none"), iterations = 50L, unit_ses = FALSE, eps = 1e-07, ...)
# S3 method for loess
margins(model, data, at = NULL, eps = 1e-07, ...)

Arguments

model

A model object. See Details for supported model classes.

…

Arguments passed through various internal functions to dydx methods.

data

A data frame containing the data at which to evaluate the marginal effects, as in predict. This is optional, but may be required when the underlying modelling function sets model = FALSE.

A list of one or more named vectors, specifically values at which to calculate the marginal effects. These are used to modify the value of data (see build_datalist for details on use).

type

A character string indicating the type of marginal effects to estimate. Mostly relevant for non-linear models, where the reasonable options are “response” (the default) or “link” (i.e., on the scale of the linear predictor in a GLM).

vcov

A matrix containing the variance-covariance matrix for estimated model coefficients, or a function to perform the estimation with model as its only argument.

vce

A character string indicating the type of estimation procedure to use for estimating variances. The default (“delta”) uses the delta method. Alternatives are “bootstrap”, which uses bootstrap estimation, or “simulation”, which averages across simulations drawn from the joint sampling distribution of model coefficients. The latter two are extremely time intensive.

iterations

If vce = "bootstrap", the number of bootstrap iterations. If vce = "simulation", the number of simulated effects to draw. Ignored otherwise.

unit_ses

If vce = "delta", a logical specifying whether to calculate and return unit-specific marginal effect variances. This calculation is time consuming and the information is often not needed, so this is set to FALSE by default.

eps

A numeric value specifying the “step” to use when calculating numerical derivatives.

Value

A data frame of class “margins” containing the contents of data, fitted values for model, the standard errors of the fitted values, and any estimated marginal effects. If at = NULL (the default), then the data frame will have a number of rows equal to nrow(data). Otherwise, the number of rows will be a multiple thereof based upon the intersection of values specified in at. Columns containing marginal effects are distinguished by their name (prefixed by dydx_). These columns can be extracted from a “margins” object using, for example, marginal_effects(margins(model)). Columns prefixed by Var_ specify the variances of the average marginal effects, whereas (optional) columns prefixed by SE_ contain observation-specific standard errors. A special list column, .at, will contain information on the combination of values from at reflected in each row observation. The summary.margins() method provides for pretty printing of the results.

Details

Methods for this generic return a “margins” object, which is a data frame consisting of the original data, predicted values and standard errors thereof, estimated marginal effects from the model model, with attributes describing various features of the marginal effects estimates.

Some modelling functions set model = FALSE by default. For margins to work best, this should be set to TRUE. Otherwise the data argument to margins is probably required.

See dydx for details on estimation of marginal effects.

Methods are currently implemented for the following object classes:

“lm”, see lm
“glm”, see glm, glm.nb
“loess”, see loess

The margins method for objects of class “lm” or “glm” simply constructs a list of data frames (using build_datalist), calculates marginal effects for each data frame (via marginal_effects and, in turn, prediction), and row-binds the results together. Alternatively, you can use marginal_effects to retrieve a data frame of marginal effects without constructing a “margins” object. That can be efficient for plotting, etc., given the time-consuming nature of variance estimation.

The choice of vce may be important. The default variance-covariance estimation procedure (vce = "delta") uses the delta method to estimate marginal effect variances. This is the fastest method. When vce = "simulation", coefficient estimates are repeatedly drawn from the asymptotic (multivariate normal) distribution of the model coefficients and each draw is used to estimate marginal effects, with the variance based upon the dispersion of those simulated effects. The number of interations used is given by iterations. For vce = "bootstrap", the bootstrap is used to repeatedly subsample data and the variance of marginal effects is estimated from the variance of the bootstrap distribution. This method is markedly slower than the other two procedures. Again, iterations regulates the number of bootstrap subsamples to draw.

References

Greene, W.H. 2012. Econometric Analysis, 7th Ed. Boston: Pearson.

Stata manual: margins. Retrieved 2014-12-15 from http://www.stata.com/manuals13/rmargins.pdf.

Examples

Run this code

# NOT RUN {
# basic example using linear model
require("datasets")
x <- lm(mpg ~ cyl * hp + wt, data = head(mtcars))
margins(x)

# obtain unit-specific standard errors
# }
# NOT RUN {
  margins(x, unit_ses = TRUE)
# }
# NOT RUN {
# use of 'at' argument
## modifying original data values
margins(x, at = list(hp = 150))
## AMEs at various data values
margins(x, at = list(hp = c(95, 150), cyl = c(4,6)))

# use of 'data' argument to obtain AMEs for a subset of data
margins(x, data = mtcars[mtcars[["cyl"]] == 4,])
margins(x, data = mtcars[mtcars[["cyl"]] == 6,])

# return discrete differences for continuous terms
## passes 'change' through '...' to dydx()
margins(x, change = "sd")

# summary() method
summary(margins(x, at = list(hp = c(95, 150))))
## control row order of summary() output
summary(margins(x, at = list(hp = c(95, 150))), by_factor = FALSE)

# alternative 'vce' estimation
# }
# NOT RUN {
  # bootstrap
  margins(x, vce = "bootstrap", iterations = 100L)
  # simulation (ala Clarify/Zelig)
  margins(x, vce = "simulation", iterations = 100L)
# }
# NOT RUN {
# specifying a custom `vcov` argument
if (require("sandwich")) {
  x2 <- lm(Sepal.Length ~ Sepal.Width, data = head(iris))
  summary(margins(x2))
  ## heteroskedasticity-consistent covariance matrix
  summary(margins(x2, vcov = vcovHC(x2)))
}

# generalized linear model
x <- glm(am ~ hp, data = head(mtcars), family = binomial)
margins(x, type = "response")
margins(x, type = "link")

# }

Run the code above in your browser using DataLab