hdglm
is used to fit high dimensional generalized
linear models when the model matrix is rank deficent. The
default usage is similar to the glm function in stats; for
instance running the code: 'summary(hdglm(y ~ x, family='binomial'))'
will produce a regression table. A myriad of options are also avaliable,
as described below. For technical and theoretical details of the
underlyingmethods see the Details section below as well.
hdglm(formula, data, subset, family =c("gaussian","binomial","poisson"), bootstrap = 10, siglevel = 0.05, alpha = 0.5, M = NULL, N = NULL, model = TRUE, x = FALSE, y = FALSE, scale=TRUE, pval.method=c('median', 'fdr', 'holm', 'QA'), ..., FUNCVFIT = NULL, FUNLM = NULL, bayes=FALSE, bayesIters=NULL, bayesTune=NULL, refit=FALSE)
"formula"
(or one that
can be coerced to that class): a symbolic description of the
model to be fitted. The details of model specification are given
under Details.as.data.frame
to a data frame) containing
the variables in the model. If not found in data
, the
variables are taken from environment(formula)
,
typically the environment from which lm
is called.TRUE
the corresponding
components of the fit (the model frame, the model matrix, the
response, the QR decomposition) are returned.
hdglm
generally returns an object of class
"hdlm"
,
unless refit is not set to false. In the latter case the output is dependent
on the choice of funtion FUNLM.The function summary
is used to obtain and print a summary of the
results. The generic accessor functions coefficients
,
effects
, fitted.values
and residuals
extract
various useful features of the value returned by hdlm
.
hdglm
are specified symbolically. A typical model has
the form response ~ terms
where response
is the (numeric)
response vector and terms
is a series of terms which specifies a
linear predictor for response
. A terms specification of the form
first + second
indicates all the terms in first
together
with all the terms in second
with duplicates removed. A
specification of the form first:second
indicates the set of
terms obtained by taking the interactions of all terms in first
with all terms in second
. The specification first*second
indicates the cross of first
and second
. This is
the same as first + second + first:second
. If the formula includes an offset
, this is evaluated and
subtracted from the response.
See model.matrix
for some further details. The terms in
the formula will be re-ordered so that main effects come first,
followed by the interactions, all second-order, all third-order and so
on: to avoid this pass a terms
object as the formula (see
aov
and demo(glm.vr)
for an example).
A formula has an implied intercept term. To remove this use either
y ~ x - 1
or y ~ 0 + x
. See formula
for
more details of allowed formulae. Note that the intercept term will
not be penalized along with other terms. If you want a penalized
intercept, add it to directly to the matrix x.
Bickel, P.J., Y. Ritov, and A.B. Tsybakov (2009) "Simultaneous analysis of Lasso and Dantzig selector". The Annals of Statistics 37.4, pp. 1705--1732.
Buhlmann, P. and S. Van De Geer (2011) Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer-Verlag New York Inc.
Chambers, J. M. (1992) Linear models. Chapter 4 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
Efron, Hastie, Johnstone and Tibshirani (2003) "Least Angle Regression" (with discussion) Annals of Statistics; see also http://www-stat.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf.
Fan, J., Y. Feng, and Y. Wu (2009) "Network exploration via the adaptive LASSO and SCAD penalties". Annals of Applied Statistics 3.2, pp. 521--541.
Hans, C. (2009). Brief Technical Report to Accompany the R Package blasso Bayesian Lasso Regression. URL http://www.stat.osu.edu/~hans/software/blasso/.
Hastie, Tibshirani and Friedman (2002) Elements of Statistical Learning, Springer, NY.
Wasserman, L., and Roeder, K. (2009), "High Dimensional Variable Selection," The Annals of Statistics, 37, 2178--2201.
set.seed(42)
x <- matrix(rnorm(10*100),ncol=10)
mu <- exp(x[,1] + x[,2]*0.5) / (1 + exp(x[,1] + x[,2]*0.5))
y <- rbinom(100,1,prob=mu)
out <- hdglm(y ~ x, family='binomial')
summary(out)
Run the code above in your browser using DataLab