Learn R Programming

kernelPSI (version 1.1.1)

sampleH: samples within the acceptance region defined by the kernel selection event

Description

To approximate the distribution of the test statistics, we iteratively sample replicates of the response in order to generate replicates of the test statistics. The response replicates are iteratively sampled within the acceptance region of the selection event. The goal of the constrained sampling is to obtain a valid post-selection distribution of the test statistic. To perform the constrained sampling, we develop a hit-and-run sampler based on the hypersphere directions algorithm (see references).

Usage

sampleH(
  A,
  initial,
  n_replicates,
  mu = 0,
  sigma = 1,
  n_iter = 1e+05,
  burn_in = 1000
)

Arguments

A

list of matrices modeling the quadratic constraints of the selection event

initial

initialization sample. This sample must belong to the acceptance region given by A. In practice, this parameter is set to the outcome of the original dataset.

n_replicates

total number of replicates to be generated

mu

mean of the outcome

sigma

standard deviation of the outcome

n_iter

maximum number of rejections for the parameter \(\lambda\) in a single iteration

burn_in

number of burn-in iterations

Value

a matrix with n_replicates columns where each column contains a sample within the acceptance region

Details

Given the iterative nature of the sampler, a large number of n_replicates and burn_in iterations is needed to correctly approximate the test statistics distributions.

For high-dimensional responses, and depending on the initialization, the sampler may not scale well to generate tens of thousands of replicates because of an intermediate rejection sampling step.

References

Berbee, H. C. P., Boender, C. G. E., Rinnooy Ran, A. H. G., Scheffer, C. L., Smith, R. L., & Telgen, J. (1987). Hit-and-run algorithms for the identification of non-redundant linear inequalities. Mathematical Programming, 37(2), 184<U+2013>207.

Belisle, C. J. P., Romeijn, H. E., & Smith, R. L. (2016). HIT-AND-RUN ALGORITHMS FOR GENERATING MULTIVARIATE DISTRIBUTIONS, 18(2), 255<U+2013>266.

Examples

Run this code
# NOT RUN {
n <- 30
p <- 20
K <- replicate(5, matrix(rnorm(n*p), nrow = n, ncol = p), simplify = FALSE)
K <-  sapply(K, function(X) return(X %*% t(X) / dim(X)[2]), simplify = FALSE)
Y <- rnorm(n)
L <- Y %*% t(Y)
selection <- FOHSIC(K, L, 2)
constraintQ <- forwardQ(K, select = selection)
samples <- sampleH(A = constraintQ, initial = Y,
                   n_replicates = 50, burn_in = 20)
# }

Run the code above in your browser using DataLab