hdglm is used to fit high dimensional generalized
linear models when the model matrix is rank deficent. The
default usage is similar to the glm function in stats; for
instance running the code: 'summary(hdglm(y ~ x, family='binomial'))'
will produce a regression table. A myriad of options are also avaliable,
as described below. For technical and theoretical details of the
underlyingmethods see the Details section below as well.
hdglm(formula, data, subset, family =c("gaussian","binomial","poisson"), bootstrap = 10, siglevel = 0.05, alpha = 0.5, M = NULL, N = NULL, model = TRUE, x = FALSE, y = FALSE, scale=TRUE, pval.method=c('median', 'fdr', 'holm', 'QA'), ..., FUNCVFIT = NULL, FUNLM = NULL, bayes=FALSE, bayesIters=NULL, bayesTune=NULL, refit=FALSE)"formula" (or one that
can be coerced to that class): a symbolic description of the
model to be fitted. The details of model specification are given
under Details.as.data.frame to a data frame) containing
the variables in the model. If not found in data, the
variables are taken from environment(formula),
typically the environment from which lm is called.TRUE the corresponding
components of the fit (the model frame, the model matrix, the
response, the QR decomposition) are returned.
hdglm generally returns an object of class "hdlm",
unless refit is not set to false. In the latter case the output is dependent
on the choice of funtion FUNLM.The function summary is used to obtain and print a summary of the
results. The generic accessor functions coefficients,
effects, fitted.values and residuals extract
various useful features of the value returned by hdlm.
hdglm are specified symbolically. A typical model has
the form response ~ terms where response is the (numeric)
response vector and terms is a series of terms which specifies a
linear predictor for response. A terms specification of the form
first + second indicates all the terms in first together
with all the terms in second with duplicates removed. A
specification of the form first:second indicates the set of
terms obtained by taking the interactions of all terms in first
with all terms in second. The specification first*second
indicates the cross of first and second. This is
the same as first + second + first:second. If the formula includes an offset, this is evaluated and
subtracted from the response.
See model.matrix for some further details. The terms in
the formula will be re-ordered so that main effects come first,
followed by the interactions, all second-order, all third-order and so
on: to avoid this pass a terms object as the formula (see
aov and demo(glm.vr) for an example).
A formula has an implied intercept term. To remove this use either
y ~ x - 1 or y ~ 0 + x. See formula for
more details of allowed formulae. Note that the intercept term will
not be penalized along with other terms. If you want a penalized
intercept, add it to directly to the matrix x.
Bickel, P.J., Y. Ritov, and A.B. Tsybakov (2009) "Simultaneous analysis of Lasso and Dantzig selector". The Annals of Statistics 37.4, pp. 1705--1732.
Buhlmann, P. and S. Van De Geer (2011) Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer-Verlag New York Inc.
Chambers, J. M. (1992) Linear models. Chapter 4 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
Efron, Hastie, Johnstone and Tibshirani (2003) "Least Angle Regression" (with discussion) Annals of Statistics; see also http://www-stat.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf.
Fan, J., Y. Feng, and Y. Wu (2009) "Network exploration via the adaptive LASSO and SCAD penalties". Annals of Applied Statistics 3.2, pp. 521--541.
Hans, C. (2009). Brief Technical Report to Accompany the R Package blasso Bayesian Lasso Regression. URL http://www.stat.osu.edu/~hans/software/blasso/.
Hastie, Tibshirani and Friedman (2002) Elements of Statistical Learning, Springer, NY.
Wasserman, L., and Roeder, K. (2009), "High Dimensional Variable Selection," The Annals of Statistics, 37, 2178--2201.
set.seed(42)
x <- matrix(rnorm(10*100),ncol=10)
mu <- exp(x[,1] + x[,2]*0.5) / (1 + exp(x[,1] + x[,2]*0.5))
y <- rbinom(100,1,prob=mu)
out <- hdglm(y ~ x, family='binomial')
summary(out)
Run the code above in your browser using DataLab