Learn R Programming

mht (version 3.1.2)

mht: Multiple testing procedure for non-ordered variable selection

Description

Performs multiple hypotheses testing in a linear model

Usage

mht(data,Y,var_nonselect,alpha,sigma,maxordre,ordre,m,show,IT,maxq)

Arguments

data
Input matrix of dimension n * p; each of the n rows is an observation vector of p variables. The intercept should be included in the first column as (1,...,1). If not, it is added.
Y
Response variable of length n.
var_nonselect
Number of variables that don't undergo feature selection. They have to be in the first columns of data. Default is 1, the selection is not performed on the intercept.
alpha
A user supplied type I error sequence. Default is (0.1,0.05).
sigma
Value of the variance if it is known; 0 otherwise. Default is 0.
maxordre
Number of variables to be ordered. Default is min(n/2-1,p/2-1).
ordre
Several possible algorithms to order the variables, ordre=c("bolasso","pval","pval_hd","FR"). "bolasso" uses the dyadic algorithm with the Bolasso technique dyadiqueordre, "pval" uses the p-values obtained with a regression on the full set of variables (only when p
m
Number of bootstrap iteration of the Lasso. Only used if the algorithm is set to "bolasso". Default is m=100.
show
Vector of logical values, show=(showordre,showtest,showresult). Default is (1,0,1). If showordre==TRUE, show the ordered variables at each step of the algorithm. If showtest==TRUE, show the number of regularization parameters tested to show the advancement of the dyadic algorithm. Only use if the algorithm is set to "bolasso". if showresult==TRUE, show the value of the statistics and the estimated quantile at each step of the procedure.
IT
Number of simulations for the calculation of the quantile. Default is 1000.
maxq
Number of maximum multiple hypotheses testing to perform. Default is log(min(n,p)-1,2).

Value

refit, predict and plot are available.
data
A list containing:
  • Y - the input response vector
  • means.X - Vector of means of the input data matrix.
  • sigma.X - Vector of variances of the input data matrix.
coefficients
Matrix of the estimated coefficients. Each row concerns a specific user level alpha.
residuals
Matrix of the residuals. Each row concerns a specific user level alpha.
relevant_var
Set of the relevant variables. Each row concerns a specific user level alpha
fitted.values
Matrix of the fitted values, each column concerns a specific user level alpha.
ordre
Order obtained on the maxordre variables.
ordrebeta
The full order on all the variables.
kchap
Vector containing the length of the estimated set of relevant variables, for each values of alpha.
quantile
The estimated quantiles used in the second step of the procedure.
call
The call that produced this object.

Details

mht is a two-step procedure that performs variable selection in high dimensional linear model. The first step orders the variables taking into account the vector of observations Y. The second step finds a cut-off between the relevant variables (high rank) and the irrelevant ones (low rank) through multiple hypotheses testing. The input maxordre is not to be forgotten: the more variables to order, the more difficult for the algorithm to distinguish which noisy variable is more important that another noisy variable. It is advised to limit maxordre to p/2 or n/2 if they are large. The parameter maxq can be useful for large value of n, it is advised to limit it to 5-6 in order to minimize computational time (for the calculation of the quantile).

References

Multiple hypotheses testing for variable selection; F. Rohart 2011

See Also

predict.mht, refit.mht, plot.mht

Examples

Run this code
## Not run: 
# x=matrix(rnorm(100*20),100,20)
# beta=c(rep(2,5),rep(0,15))
# y=x%*%beta+rnorm(100)
# 
# mod=mht(x,y,alpha=c(0.1,0.05),maxordre=15)
# mod
# ## End(Not run)

Run the code above in your browser using DataLab