cluster.boot: Bootstrapped multi-way standard error clustering

Description

Return a bootstrapped multi-way cluster-robust variance-covariance matrix

Usage

cluster.boot(model, cluster, parallel = FALSE, use_white = NULL, force_posdef = FALSE, R = 300, boot_type = "xy", wild_type = "rademacher", debug = FALSE)

Arguments

model

The estimated model, usually an lm or glm class object

cluster

A vector, matrix, or data.frame of cluster variables, where each column is a separate variable. If the vector 1:nrow(data) is used, the function effectively produces a regular heteroskedasticity-robust matrix.

parallel

Scalar or list. If a list, use the list as a list of connected processing cores/clusters. Scalar values of TRUE and "snow" (which are equivalent) ask boot to handle parallelization, as does "multicore". See the parallel and boot package.

use_white

Logical or NULL. See description below.

force_posdef

Logical. Force the eigenvalues of the variance-covariance matrix to be positive.

Integer. The number of bootstrap replicates; passed directly to boot.

boot_type

"xy", "residual", or "wild". See details.

wild_type

"rademacher", "mammen", or "norm". See details.

debug

Logical. Print internal values useful for debugging to the console.

Value

a $K x K$ variance-covariance matrix of type matrix

Details

This function implements cluster bootstrapping (also known as the block bootstrap) for variance-covariance matrices, following Cameron, Gelbach, & Miller (CGM) (2008). Usage is generally similar to the cluster.vcov function in this package, but this function does not support degrees of freedome corrections or leverage adjustments.

In the terminology that CGM (2008) use, this function implements pairs, residual, or wild cluster bootstrap-se.

A pairs (or xy) cluster bootstrap can be obtained by setting boot_type = "xy", which resamples the entire regression data set (both X and y). Setting boot_type = "residual" will obtain a residual cluster bootstrap, which resamples only the residuals (in this case, we resample the blocks/clusters rather than the individual observations' residuals). To get a wild cluster bootstrap set boot_type = "wild", which does not resample anything, but instead reforms the dependent variable by multiplying the residual by a randomly drawn value and adding the result to the fitted value. The default method is the pairs/xy bootstrap.

There are three built-in distributions to draw multipliers from for wild bootstraps: the Rademacher (wild_type = "rademacher", the default), which draws from [-1, 1], each with P = 0.5, Mammen's suggested distribution (wild_type = "mammen", see Mammen, 1993), and the standard normal/Gaussian distribution (wild_type = "norm"). The default is the Rademacher distribution, following CGM (2008). Alternatively, you can set the function to draw multipliers from by assigning wild_type to a function that takes no arguments and returns a single real value.

Multi-way clustering is handled as described by Petersen (2009) and generalized according to Cameron, Gelbach, & Miller (2011). This means that cluster.boot estimates a set of variance-covariance matrices for the variables separately and then sums them (subtracting some matrices and adding others). The method described by CGM (2011) estimates a set of variance-covariance matrices for the residuals (sometimes referred to as the meat of the sandwich estimator) and sums them appropriately. Whether you sum the meat matrices and then compute the model's variance-covariance matrix or you compute a series of model matrices and sum those is mathematically irrelevant, but may lead to (very) minor numerical differences.

Instead of passing in a vector, matrix, data.frame, etc, to specify the cluster variables, you can use a formula to specify which variables from the original data frame to use as cluster variables, e.g., ~ firmid + year.

Ma (2014) suggests using the White (1980) variance-covariance matrix as the final, subtracted matrix when the union of the clustering dimensions U results in a single observation per group in U; e.g., if clustering by firm and year, there is only one observation per firm-year, we subtract the White (1980) HC0 variance-covariance from the sum of the firm and year vcov matrices. This is detected automatically (if use_white = NULL), but you can force this one way or the other by setting use_white = TRUE or FALSE.

Unlike the cluster.vcov function, this function does not depend upon the estfun function from the \href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}sandwichsandwich package, although it does make use of the vcovHC function for computing White (1980) variance-covariance matrices.

Parallelization (if used) is handled by the boot package. Be sure to set options(boot.ncpus = N) where N is the number of CPU cores you want the boot function to use.

References

Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008). Bootstrap-based improvements for inference with clustered errors. The Review of Economics and Statistics, 90(3), 414-427. \Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("#1")}10.1162/rest.90.3.414http://doi.org/10.1162/rest.90.3.414doi:\ifelse{latex}{\out{~}}{ }latex~ 10.1162/rest.90.3.414

Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011). Robust inference with multiway clustering. Journal of Business & Economic Statistics, 29(2). \Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("#1")}10.1198/jbes.2010.07136http://doi.org/10.1198/jbes.2010.07136doi:\ifelse{latex}{\out{~}}{ }latex~ 10.1198/jbes.2010.07136

Mammen, E. (1993). Bootstrap and wild bootstrap for high dimensional linear models. The Annals of Statistics, 255-285. \Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("#1")}10.1214/aos/1176349025http://doi.org/10.1214/aos/1176349025doi:\ifelse{latex}{\out{~}}{ }latex~ 10.1214/aos/1176349025

Petersen, M. A. (2009). Estimating standard errors in finance panel data sets: Comparing approaches. Review of Financial Studies, 22(1), 435-480. \Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("#1")}10.1093/rfs/hhn053http://doi.org/10.1093/rfs/hhn053doi:\ifelse{latex}{\out{~}}{ }latex~ 10.1093/rfs/hhn053

White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica: Journal of the Econometric Society, 817--838. \Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("#1")}10.2307/1912934http://doi.org/10.2307/1912934doi:\ifelse{latex}{\out{~}}{ }latex~ 10.2307/1912934

Examples

Run this code

## Not run: 
# library(lmtest)
# data(petersen)
# m1 <- lm(y ~ x, data = petersen)
# 
# # Cluster by firm
# boot_firm <- cluster.boot(m1, petersen$firmid)
# coeftest(m1, boot_firm)
# 
# # Cluster by firm using a formula
# boot_firm <- cluster.boot(m1, ~ firmid)
# coeftest(m1, boot_firm)
# 
# # Cluster by year
# boot_year <- cluster.boot(m1, petersen$year)
# coeftest(m1, boot_year)
# 
# # Double cluster by firm and year
# boot_both <- cluster.boot(m1, cbind(petersen$firmid, petersen$year))
# coeftest(m1, boot_both)
# 
# # Cluster by firm with wild bootstrap and custom wild distribution
# boot_firm2 <- cluster.boot(m1, petersen$firmid, boot_type = "wild",
#                            wild_type = function() sample(c(-1, 1), 1))
# coeftest(m1, boot_firm)
# 
# # Go multicore using the parallel package
# require(parallel)
# cl <- makeCluster(4)
# options(boot.ncpus = 4)
# boot_both <- cluster.boot(m1, cbind(petersen$firmid, petersen$year), parallel = cl)
# stopCluster(cl)
# coeftest(m1, boot_both)
# 
# # Go multicore using the parallel package, let boot handle the parallelization
# require(parallel)
# options(boot.ncpus = 8)
# boot_both <- cluster.boot(m1, cbind(petersen$firmid, petersen$year), parallel = TRUE)
# coeftest(m1, boot_both)
# ## End(Not run)

Run the code above in your browser using DataLab