This function implements cluster bootstrapping (also known as the block bootstrap)
for variance-covariance matrices, following Cameron, Gelbach, & Miller (CGM) (2008).
Usage is generally similar to the cluster.vcov
function in this package, but this
function does not support degrees of freedome corrections or leverage adjustments.In the terminology that CGM (2008) use, this function implements
pairs, residual, or wild cluster bootstrap-se.
A pairs (or xy) cluster bootstrap can be obtained by setting boot_type = "xy"
,
which resamples the entire regression data set (both X and y).
Setting boot_type = "residual"
will obtain a residual cluster
bootstrap, which resamples only the residuals (in this case, we resample the blocks/clusters
rather than the individual observations' residuals). To get a wild cluster bootstrap set
boot_type = "wild"
, which does not resample anything, but instead reforms the
dependent variable by multiplying the residual by a randomly drawn value and adding the
result to the fitted value. The default method is the pairs/xy bootstrap.
There are three built-in distributions to draw multipliers from for wild bootstraps:
the Rademacher (wild_type = "rademacher"
, the default), which draws from [-1, 1],
each with P = 0.5, Mammen's suggested distribution (wild_type = "mammen"
, see
Mammen, 1993), and the standard normal/Gaussian distribution (wild_type = "norm"
).
The default is the Rademacher distribution, following CGM (2008). Alternatively, you can
set the function to draw multipliers from by assigning
wild_type
to a function that takes no arguments and returns a single real value.
Multi-way clustering is handled as described by Petersen (2009) and generalized
according to Cameron, Gelbach, & Miller (2011). This means that cluster.boot
estimates a set of variance-covariance matrices for the variables separately
and then sums them (subtracting some matrices and adding others).
The method described by CGM (2011) estimates a set of variance-covariance matrices
for the residuals (sometimes referred to as the meat of the sandwich estimator)
and sums them appropriately. Whether you sum the meat matrices and then compute
the model's variance-covariance matrix or you compute a series of model matrices
and sum those is mathematically irrelevant, but may lead to (very) minor numerical
differences.
Instead of passing in a vector, matrix, data.frame, etc, to specify the cluster variables,
you can use a formula to specify which variables from the
original data frame to use as cluster variables, e.g., ~ firmid + year
.
Ma (2014) suggests using the White (1980)
variance-covariance matrix as the final, subtracted matrix when the union
of the clustering dimensions U results in a single observation per group in U;
e.g., if clustering by firm and year, there is only one observation
per firm-year, we subtract the White (1980) HC0 variance-covariance
from the sum of the firm and year vcov matrices. This is detected
automatically (if use_white = NULL
), but you can force this one way
or the other by setting use_white = TRUE
or FALSE
.
Unlike the cluster.vcov
function, this function does not depend upon the
estfun
function from the \href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}sandwichsandwich package, although it does make use of the vcovHC
function for computing White (1980) variance-covariance matrices.
Parallelization (if used) is handled by the boot package. Be sure to set
options(boot.ncpus = N)
where N
is the number of CPU cores you want
the boot
function to use.