bootstrap: One and two sample bootstrap sampling and permutation tests.

Description

Basic resampling. Supply the data and statistic to resample.

Usage

bootstrap(data, statistic, R = 10000,
          args.stat = NULL, seed = NULL, sampler = samp.bootstrap,
          label = NULL, statisticNames = NULL, block.size = 100,
          trace = FALSE)
bootstrap2(data, statistic, treatment, data2 = NULL, R = 10000,
          ratio = FALSE,
          args.stat = NULL, seed = NULL, sampler = samp.bootstrap,
          label = NULL, statisticNames = NULL, block.size = 100,
          trace = FALSE)
permutationTest(data, statistic, R = 9999,
          alternative = "two.sided", resampleColumns = NULL,
          args.stat = NULL, seed = NULL, sampler = samp.permute,
          label = NULL, statisticNames = NULL, block.size = 100,
          trace = FALSE, tolerance = .Machine$double.eps ^ 0.5)
permutationTest2(data, statistic, treatment, data2 = NULL, R = 9999,
          alternative = "two.sided", ratio = FALSE, paired = FALSE,
          args.stat = NULL, seed = NULL, sampler = samp.permute,
          label = NULL, statisticNames = NULL, block.size = 100,
          trace = FALSE, tolerance = .Machine$double.eps ^ 0.5)

Arguments

data

vector, matrix, or data frame.

statistic

a function, or expression (e.g. mean(myData, trim = .2).

number of replicates (bootstrap samples or permutation resamples).

treatment

a vector with two unique values. For two-sample applications, suppy either treatment or data2.

data2

an object like data; the second sample.

alternative

one of "two.sided", "greater", or "less". If statistic returns a vector, this may be a vector of the same length.

ratio

logical, if FALSE then statistics for two samples are combined using statistic1 - statistic2 (the statistics from the two samples). If TRUE, it uses statistic1 / statistic2.

resampleColumns

integer, or character (a subset of the column names of data); if supplied then only these columns of the data are permuted. For example, for a permutation test of the correlation of x and y, only one of the variables should be permuted.

args.stat

a list of additional arguments to pass to statistic, if it is a function.

paired

logical, if TRUE then observations in data and data2 are paired, and permutations are done within each pair. Not yet implemented.

seed

old value of .Random.seed, or argument to set.seed.

sampler

a function for resampling, see help(samp.bootstrap).

label

used for labeling plots (in a future version).

statisticNames

a character vector the same length as the vector returned by statistic.

block.size

integer. The R replicates are done this many at a time.

trace

logical, if TRUE an indication of progress is printed.

tolerance

when computing P-values, differences smaller than tolerance (absolute or relative) between the observed value and the replicates are considered equal.

Value

a list with class "bootstrap", "bootstrap2", "permutationTest", or "permutationTest2", that inherits from "resample", with components:

observed

the value of the statistic for the original data.

replicates

a matrix with R rows and p columns.

number of observations in the original data, or vector of length 2 in two-sample problems.

length(observed).

number of replications.

seed

the value of the seed at the start of sampling.

call

the matched call.

statistics

a data frame with p rows, with columns "observed", "mean" (the mean of the replicates), and other columns appropriate to resampling; e.g. the bootstrap objects have columns "SE" and "Bias", while the permutation test objects have "Alternative" and "PValue".

The two-sample versions have an additional component:

resultsBoth

containing resampling results from each data set. containing two components, the results from resampling each of the two samples. These are bootstrap objects; in the permutationTest2 case they are the result of sampling without replacement.

There are functions for printing and plotting these objects, in particular print, hist, qqnorm, plot (currently the same as hist), quantile.

Details

There is considerable flexibility in how you specify the data and statistic.

For the statistic, you may supply a function, or an expression. For example, if data = x, you may specify any of

statistic = mean
statistic = mean(x)
statistic = mean(data)

If data is a data frame, the expression may refer to columns in the data frame, e.g.

statistic = mean(x)
statistic = mean(myData$x)
statistic = mean(myData[, "x"])

If data is not just the name of an object, e.g. data = subset(myData, age > 17), or if data2 is supplied, then use the name 'data', e.g.

statistic = colMeans(data)

Examples

Run this code

# NOT RUN {
# See full set of examples in resample-package, including different
# ways to call the functions depending on the structure of the data.
data(Verizon)
CLEC <- with(Verizon, Time[Group == "CLEC"])
bootC <- bootstrap(CLEC, mean)
bootC
hist(bootC)
qqnorm(bootC)
# }

Run the code above in your browser using DataLab