Estimation in the regression model : \(Y= X \beta + \sigma N(0,1)\)
Variable selection by choosing the best predictor among
predictors emanating
from different methods as lasso,
elastic-net, adaptive lasso, pls, randomForest.
VARselect(Y, X, dmax = NULL, normalize = TRUE, method = c("lasso",
"ridge", "pls", "en", "ALridge", "ALpls", "rF", "exhaustive"),
pen.crit = NULL, lasso.dmax = NULL, ridge.dmax = NULL, pls.dmax = NULL,
en.dmax = NULL, ALridge.dmax = NULL, ALpls.dmax = NULL, rF.dmax = NULL,
exhaustive.maxdim = 5e+05, exhaustive.dmax = NULL, en.lambda = c(0.01,
0.1, 0.5, 1, 2, 5), ridge.lambda = c(0.01, 0.1, 0.5,
1, 2, 5), rF.lmtry = 2, pls.ncomp = 5, ALridge.lambda = c(0.01,
0.1, 0.5, 1, 2, 5), ALpls.ncomp = 5, max.steps = NULL,
K = 1.1, verbose = TRUE, long.output = FALSE)
A list with at least length(method)
components.
For each procedure in method
a list with components
support
: vector of integers. Estimated support of the
parameters \(\beta\) for the considered procedure.
crit
: scalar equals to the LINselect criteria
calculated in the estimated support.
fitted
: vector with length n. Fitted value of
the response calculated when the support of \( \beta\)
equals support
.
coef
: vector whose first component is the estimated
intercept.
The other components are the estimated non zero
coefficients when the support of \( \beta\)
equals support
.
If length(method)
> 1, the additional component summary
is a list with three
components:
support
: vector of integers. Estimated support of the
parameters \(\beta\) corresponding to the minimum
of the criteria among all procedures.
crit
: scalar. Minimum value of the
criteria among all procedures.
method
: vector of characters. Names of the
procedures for
which the minimum is reached
If pen.crit = NULL
, the component pen.crit
gives the
values of the penalty calculated by the function penalty
.
If long.output
is TRUE the component named
chatty
is a list with length(method)
components.
For each procedure in method
, a list with components
support
where support[[l]]
is a vector of
integers containing an estimator of the support of the
parameters \( \beta\).
crit
: vector where crit[l]
contains the
value of the LINselect criteria calculated in
support[[l]]
.
vector with n components : response variable.
matrix with n rows and p columns : covariates.
integer : maximum number of variables in the lasso
estimator. dmax
\(\le\) D where
D = min (3*p/4 , n-5) if p\( \ge \)n
D= min(p,n-5) if
p < n.
Default : dmax
= D.
logical : if TRUE the columns of X are scaled
vector of characters whose components are subset of
“lasso”, “ridge”, “pls”, “en”,
“ALridge”, “ALpls”, “rF”,
“exhaustive”.
vector with dmax
+1 components : for d=0,
..., dmax
, penalty[d+1]
gives the value of the
penalty for the dimension d. Default : penalty
= NULL. In
that case, the
penalty will be calculated by the function penalty.
integer lower than dmax
, default = dmax
.
integer lower than dmax
, default = dmax
.
integer lower than dmax
, default = dmax
.
integer lower than dmax
, default = dmax
.
integer lower than dmax
, default = dmax
.
integer lower than dmax
, default = dmax
.
integer lower than dmax
, default = dmax
.
integer : maximum number of subsets of covariates considered in the exhaustive method. See details.
integer lower than dmax
, default = dmax
vector : tuning parameter of the
ridge. It is the input parameter lambda
of function
enet
vector : tuning parameter of the
ridge. It is the input parameter lambda of function
lm.ridge
vector : tuning paramer mtry
of function
randomForest
, mtry
=p/rF.lmtry
.
integer : tuning parameter of the pls. It is the
input parameter ncomp
of the function
plsr
. See details.
similar to
ridge.lambda
in the adaptive lasso procedure.
similar to pls.ncomp
in the
adaptive lasso procedure. See details.
integer. Maximum number of steps in the lasso
procedure. Corresponds to the input max.steps
of the function
enet
.
Default :
max.steps
= 2*min(p,n)
scalar : value of the parameter \(K\) in the LINselect criteria.
logical : if TRUE a trace of the current process is displayed in real time.
logical : if FALSE only the component summary will be returned. See Value.
Yannick Baraud, Christophe Giraud, Sylvie Huet
When method is pls
or ALpls
, the
LINselect
procedure is carried out considering the number
of components in the pls
method as the tuning
parameter.
This tuning parameter varies from 1 to pls.ncomp
.
When method is exhaustive
, the maximum
number of variate d is calculated as
follows.
Let q be the largest integer such that choose(p,q)
<
exhaustive.maxdim
. Then d = min(q, exhaustive.dmax,dmax)
.
See Baraud et al. 2010
http://hal.archives-ouvertes.fr/hal-00502156/fr/
Giraud et al., 2013,
https://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.ss/1356098553
#source("charge.R")
library("LINselect")
# simulate data with
# beta=c(rep(2.5,5),rep(1.5,5),rep(0.5,5),rep(0,p-15))
ex <- simulData(p=100,n=100,r=0.8,rSN=5)
if (FALSE) ex1.VARselect <- VARselect(ex$Y,ex$X,exhaustive.dmax=2)
if (FALSE) data(diabetes)
if (FALSE) attach(diabetes)
if (FALSE) ex.diab <- VARselect(y,x2,exhaustive.dmax=5)
if (FALSE) detach(diabetes)
Run the code above in your browser using DataLab