clogitL1: Conditional logistic regression with elastic net penalties

Description

Fit a sequence of conditional logistic regression models with lasso or elastic net penalties

Usage

clogitL1 (x, y, strata, numLambda=100, 
	minLambdaRatio=0.000001, switch=0, alpha = 1)

Arguments

matrix with rows equalling the number of observations. Contains the p-vector regressor values as rows

vector of binary responses with 1 for cases and 0 for controls.

strata

vector with stratum membership of each observation.

numLambda

number of different values of the regularisation parameter \(\lambda\) at which to compute parameter estimates. First fit is made at value just below smallest regularisation parameter value at which all parameter estimates are 0; last fit made at this value multipled by minLambdaRatio

minLambdaRatio

ratio of smallest to larget value of regularisation parameter \(\lambda\) at which we find parameter estimates.

switch

index (between 0 and numLambda) at which we transition from linear to logarithmic jumps.

alpha

parameter controling trade off between lasso and ridge penalties. At value 1, we have a pure lasso penalty; at 0, pure ridge. Intermediate values provide a mixture of the two.

Value

An object of type clogitL1 with the following fields:

beta

(numLambda + 1)-by-p matrix of estimated coefficients. First row has all 0s

lambda

vector of length numLambda + 1 containing the value of the regularisation parameter at which we obtained the fits.

nz_beta

vector of length numLambda + 1 containing the number of nonzero parameter estimates for the fit at the corresponding regularisation parameter.

ss_beta

vector of length numLambda + 1 containing the number of predictors considered by the sequential strong rule at that iteration.

dev_perc

vector of length numLambda + 1 containing the percentage of null deviance explained by the model represented by that row in the matrix.

y_c

reordered vector of responses. Grouped by stratum with cases coming first.

X_c

reordered matrix of predictors. See above.

strata_c

reordered stratum vector. See above.

nVec

vector of length the number of unique strata in strata containing the number of observations encountered in each stratum.

mVec

vector containing the number of cases in each stratum.

alpha

penalty trade off parameter.

Details

The sequence of models implied by numLambda and minLambdaRatio is fit by coordinate descent with warm starts and sequential strong rules. If alpha=1, we fit using a lasso penalty. Otherwise we fit with an elastic net penalty. Note that a pure ridge penalty is never obatined, because the function sets a floor for alpha at 0.000001. This improves the stability of the algorithm. A similar lower bound is set for minLambdaRatio. The sequence of models can be truncated at fewer than numLambda models if it is found that a very large proportion of training set deviance is explained by the model in question.

References

http://www.jstatsoft.org/v58/i12/

Examples

Run this code

# NOT RUN {
set.seed(145)
# data parameters
K = 10 # number of strata
n = 5 # number in strata
m = 2 # cases per stratum
p = 20 # predictors

# generate data
y = rep(c(rep(1, m), rep(0, n-m)), K)
X = matrix (rnorm(K*n*p, 0, 1), ncol = p) # pure noise
strata = sort(rep(1:K, n))

par(mfrow = c(1,2))
# fit the conditional logistic model
clObj = clogitL1(y=y, x=X, strata)
plot(clObj, logX=TRUE)

# cross validation
clcvObj = cv.clogitL1(clObj)
plot(clcvObj)
# }

Run the code above in your browser using DataLab