Gathers several procedures to determine which explanatory variables have an effect
on a dependent variable.
Works whether there are more explanatory variables than observations or not.
Creates an object of class edrSelec
.
edrSelec(Y, X, H, K, method, pZero=NULL, NZero=NULL, zeta=NULL,
rho=NULL, baseEst=NULL, btspSamp=NULL, lassoParam=NULL)
A numeric vector representing the dependent variable (a response vector).
A matrix representing the quantitative explanatory variables (bind by column).
When method="SR-SIR"
or method="RSIR"
, the chosen number of slices. When method="CSS"
, a vector with various numbers of slices.
The chosen dimension K.
This character string specifies the selection method. It should be either "CSS", "RSIR" or "SR-SIR".
When method="CSS"
, the number of variables to pick when creating a submodel.
When method="CSS"
, the number of submodels to create.
When method="CSS"
, the proportion of 'best' submodels selected from the NZero
submodels.
When method="CSS"
, and if zeta
is not provided, the threshold above which a submodel is considered as 'best'. It must be a real in ]0,1[.
An initial estimate of the EDR space on which each method relies.
When method="RSIR"
, the bootstrap sample size for estimating the asymptotic distribution of the estimated EDR directions.
When method="SR-SIR"
, a vector of lasso parameters from which the optimal one is chosen, using the RIC criterion.
edrSelec
returns an object of class edrSelec
, with some of the
following attributes, depending on the value of method
:
A numeric vector filled with a score for each explanatory variable. Variables that have a high score should be kept. For the "CSS" method, the score is the presence of the variable in the 'best' submodels. For "RSIR", it is one minus the p-value of the test. For the "SR-SIR" procedure, it is a boolean that indicates if the variable should be kept when using the optimal lasso parameter.
The chosen dimension.
The chosen number(s) of slices.
The sample size.
The variable selection method used.
The matrix of the quantitative explanatory variables (bind by column).
The numeric vector of the dependent variable (a response vector).
A NZero
x pZero
matrix that contains the variables of each created submodel, for the "CSS" method.
A matrix with pZero
columns made of the variables of each 'best' submodel, for the "CSS" method.
A vector containing the squared correlation between indices for each submodel, for the "CSS" method.
A vector made of values of the Aka<U+00EF>ke information criterion for every lasso parameter considered by the "SR-SIR" procedure.
A vector made of values of the Bayesian information criterion for every lasso parameter considered by the "SR-SIR" procedure.
A vector made of values of the residual information criterion for every lasso parameter considered by the "SR-SIR" procedure.
A list which gives, for each lasso parameter studied with the "SR-SIR" procedure, a matrix spanning the estimated EDR space.
The "CSS" method builds NZero
submodels using only pZero
explanatory variables.
It estimates the indices for each of them.
The squared correlation between these indices and those found with the whole set of explanatory variables is computed.
Only the submodels with the highest squared correlation are kept.
The method then counts how many times each explanatory variable appears in these
'best' submodels.
The "RSIR" procedure uses an asymptotic test on each element of the estimated
EDR directions.
It was translated from a Matlab code made by Peng Zeng.
The "SR-SIR" procedure relies on a lasso penalty. The underlying parameter is chosen using the residual information criterion (RIC).
It was written using a R code made by Lexin Li.
Coudret, R., Liquet, B. and Saracco, J. Comparison of sliced inverse regression approaches for underdetermined cases. Journal de la Soci<U+00E9>t<U+00E9> Fran<U+00E7>aise de Statistique, in press.
Li, L. and Yin, X. (2008). Sliced inverse regression with regularizations. Biometrics, 64(1):124-131.
Zhong, W., Zeng, P., Ma, P., Liu, J. S., and Zhu, Y. (2005). RSIR: regularized sliced inverse regression for motif discovery. Bioinformatics, 21(22):4169-4175.
# NOT RUN {
# }
# NOT RUN {
n <- 100
p <- 110
K <- 1
H <- 5:12
NZero <- 1000
pZero <- 10
zeta <- 0.1
beta <- c(1,1,1,1,rep(0,p-4))
U <- matrix(runif(p^2,-0.05,0.05),ncol=p)
X <- rmvnorm(n,sigma=diag(p) + U %*% t(U))
eps <- rnorm(n,sd=10)
Y <- (X%*%beta)^3+eps
result <- edrSelec(Y,X,H,K,"CSS",NZero=NZero, pZero=pZero, zeta=zeta)
summary(result)
plot(result)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab