The function estimates the coefficients of a Lasso regression with
data-driven penalty under homoscedasticity and heteroscedasticity with non-Gaussian noise and X-dependent or X-independent design. The
method of the data-driven penalty can be chosen. The object which is
returned is of the S3 class rlasso
.
rlasso(x, ...)# S3 method for formula
rlasso(
formula,
data = NULL,
post = TRUE,
intercept = TRUE,
model = TRUE,
penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start = NULL,
c = 1.1, gamma = 0.1/log(n)),
control = list(numIter = 15, tol = 10^-5, threshold = NULL),
...
)
# S3 method for character
rlasso(
x,
data = NULL,
post = TRUE,
intercept = TRUE,
model = TRUE,
penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start = NULL,
c = 1.1, gamma = 0.1/log(n)),
control = list(numIter = 15, tol = 10^-5, threshold = NULL),
...
)
# S3 method for default
rlasso(
x,
y,
post = TRUE,
intercept = TRUE,
model = TRUE,
penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start = NULL,
c = 1.1, gamma = 0.1/log(n)),
control = list(numIter = 15, tol = 10^-5, threshold = NULL),
...
)
rlasso
returns an object of class rlasso
. An object of
class "rlasso" is a list containing at least the following components:
coefficients parameter estimates
beta parameter estimates (named vector of coefficients without intercept)
intercept value of the intercept
index index of selected variables (logical vector)
lambda data-driven penalty term for each variable, product of lambda0 (the penalization parameter) and the loadings
lambda0 penalty term
loadings loading for each regressor
residuals residuals, response minus fitted values
sigma root of the variance of the residuals
iter number of iterations
call function call
options options
model model matrix (if model = TRUE
in function call)
regressors (vector, matrix or object can be coerced to matrix)
further arguments (only for consistent defintion of methods)
an object of class "formula" (or one that can be coerced to
that class): a symbolic description of the model to be fitted in the form
y~x
an optional data frame, list or environment (or object coercible
by as.data.frame to a data frame) containing the variables in the model. If
not found in data, the variables are taken from environment(formula),
typically the environment from which rlasso
is called.
logical. If TRUE
, post-Lasso estimation is conducted.
logical. If TRUE
, intercept is included which is not
penalized.
logical. If TRUE
(default), model matrix is returned.
list with options for the calculation of the penalty.
c
and gamma
constants for the penalty with default c=1.1
and gamma=0.1
homoscedastic
logical, if homoscedastic errors are considered (default FALSE
). Option none
is described below.
X.dependent.lambda
logical, TRUE
, if the penalization parameter depends on the the design of the matrix x
. FALSE
, if independent of the design matrix (default).
numSim
number of simulations for the dependent methods, default=5000
lambda.start
initial penalization value, compulsory for method "none"
list with control values.
numIter
number of iterations for the algorithm for
the estimation of the variance and data-driven penalty, ie. loadings,
tol
tolerance for improvement of the estimated variances.
threshold
is applied to the final estimated lasso
coefficients. Absolute values below the threshold are set to zero.
dependent variable (vector, matrix or object can be coerced to matrix)
The function estimates the coefficients of a Lasso regression with
data-driven penalty under homoscedasticity / heteroscedasticity and non-Gaussian noise. The options homoscedastic
is a logical with FALSE
by default.
Moreover, for the calculation of the penalty parameter it can be chosen, if the penalization parameter depends on the design matrix (X.dependent.lambda=TRUE
) or independent
(default, X.dependent.lambda=FALSE
).
The default value of the constant c
is 1.1
in the post-Lasso case and 0.5
in the Lasso case.
A special option is to set homoscedastic
to none
and to supply a values lambda.start
. Then this value is used as penalty parameter with independent design and heteroscedastic errors to weight the regressors.
For details of the
implementation of the Algorithm for estimation of the data-driven penalty,
in particular the regressor-independent loadings, we refer to Appendix A in
Belloni et al. (2012). When the option "none" is chosen for homoscedastic
(together with
lambda.start
), lambda is set to lambda.start
and the
regressor-independent loadings und heteroscedasticity are used. The options "X-dependent" and
"X-independent" under homoscedasticity are described in Belloni et al. (2013).
The option post=TRUE
conducts post-lasso estimation, i.e. a refit of
the model with the selected variables.
A. Belloni, D. Chen, V. Chernozhukov and C. Hansen (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 (6), 2369-2429.
A. Belloni, V. Chernozhukov and C. Hansen (2013). Inference for high-dimensional sparse econometric models. In Advances in Economics and Econometrics: 10th World Congress, Vol. 3: Econometrics, Cambirdge University Press: Cambridge, 245-295.
set.seed(1)
n = 100 #sample size
p = 100 # number of variables
s = 3 # nubmer of variables with non-zero coefficients
X = Xnames = matrix(rnorm(n*p), ncol=p)
colnames(Xnames) <- paste("V", 1:p, sep="")
beta = c(rep(5,s), rep(0,p-s))
Y = X%*%beta + rnorm(n)
reg.lasso <- rlasso(Y~Xnames)
Xnew = matrix(rnorm(n*p), ncol=p) # new X
colnames(Xnew) <- paste("V", 1:p, sep="")
Ynew = Xnew%*%beta + rnorm(n) #new Y
yhat = predict(reg.lasso, newdata = Xnew)
Run the code above in your browser using DataLab