estimate.m: Estimating a Subsample Size m

Description

Estimates m using the selected method. Additional parameters can be passed to the underlying methods using params. It is also possible to pass parameters to the statistic using '...'.

Usage

estimate.m(
  data,
  statistic,
  tau = NULL,
  R = 1000,
  replace = FALSE,
  min.m = 3,
  method = "bickel",
  params = NULL,
  ...
)

Value

Subsampling size m choosen by the selected method.

Arguments

data: The data to be bootstrapped.
statistic: The estimator of the parameter.
tau: The convergence rate.
R: The amount of bootstrap replicates. Must be a positive integer.
replace: If the sampling should be done with replacement.
min.m: Minimum subsample size to be tried. Should be the minimum size for which the statistic make sense.
method: The method to be used, one of c("goetze","bickel","politis", "sherman").
params: Additional parameters to be passed to the internal functions, see details for more information.
...: Additional parameters to be passed to the statistic.

Details

The different methods have different parameters. Therefore, this wrapper method has been given the params parameter, which can be used to pass method-specific arguments to the underlying methods. The specific parameters are described below. Most of the provided methods need tau. If not provided, it will be estimated using estimate.tau. Note that method 'sherman' is using an alternative approach without using the scalation factor and therefore tau will not be computed if selecting 'sherman' as method. Any non NULL values will be ignored when selecting the method 'sherman'.

Possible methods are:

bickel:: This method works similary to the previous one. The difference here is that the subsample sizes to be compared are consecutive subsample sizes generated by q^j*n for j = seq(2,n) and a chosen q value between zero and one. The parameter q can be selected using params. The default value is q=0.75, as suggested in the corresponding paper.
politis:: This method is also known as the 'minimum volatility method'. It is based on the idea that there should be some range for subsampling sizes, where its choice has little effect on the estimated confidence points. The algorithm starts by smoothing the endpoints of the intervals and then calculates the standard deviation. The h.ci parameter is used to select the number of neighbors used for smoothing. The h.sigma parameter is the number of neighbors used in the standard deviation calculation. Both parameters can be set by using params. Note that the h.* neigbors from each side are used. To use five elements for smoothing, h.ci should therefore be set to 2.
sherman:: This method is based on a 'double-bootstrap' approach. It tries to estimate the coverage error of different subsampling sizes and chooses the subsampling size with the lowest one. As estimating the coverage error is highly computationally intensive, it is not practical to try all m values. Therefore, the beta parameter can be used to control which m values are tried. The values are then calculated by ms = n^beta. The default value is a sequence between 0.3 and 0.9 out of 15 values. This parameter can be set using params.

References

Götze F. and Rackauskas A. (2001) Adaptive choice of bootstrap sample sizes. Lecture Notes-Monograph Series, 36(State of the Art in Probability and Statistics):286-309

Bickel P.J. and Sakov A. (2008) On the choice of m in the m out of n bootstrap and confidence bounds for extrema. Statistic Sinica, 18(3):967-985.

Politis D.N. et al. (1999) Subsampling, Springer, New York.

Sherman M. and Carlstein E. (2004) Confidence intervals based on estimators with unknown rates of convergence. Computional statistics & data analysis, 46(1):123-136.

Examples

Run this code

data <- runif(1000)
estimate.max <- function(data, indices) {return(max(data[indices]))}
tau <- \(x){x} # convergence rate
choosen.m <- estimate.m(data, estimate.max, tau, R = 1000, method = "bickel")
print(choosen.m)

Run the code above in your browser using DataLab