cv.vda.le: Choose the optimal pair of lambdas, $\lambda_1$ and $\lambda_2$

Description

Use k-fold validation to choose the optmial values for the tuning parameters $\lambda_1$ and $\lambda_2$ to be used in Multicategory Vertex Discriminant Analysis (vda.le).

Usage

cv.vda.le(x, y, kfold, lam.vec.1, lam.vec.2)

Arguments

n x p matrix or data frame containing the cases for each feature. The rows correspond to cases and the columns to the features. Intercept column is not included in this.

n x 1 vector representing the outcome variable. Each element denotes which one of the k classes that case belongs to.

kfold

The number of folds to use for the k-fold validation for each set of $\lambda$_1 and $\lambda$_2

lam.vec.1

A vector containing the set of all values of $\lambda$_1, from which VDA will be conducted. To use only Euclidean penalization, set lam.vec.2=0.

lam.vec.2

A vector containing the set of all values of $\lambda$_2, from which VDA will be conducted. vda.le is relatively insensitive to lambda values, so it is recommended that a vector of few values is used. The default value is 0.01. To use only Lasso penalization, set lam.vec.1=0.

Value

kfold: The number of folds used in k-fold cross validation
lam.vec.1: The user supplied vector of $\lambda$_1 values
lam.vec.2: The user supplied vector of $\lambda$_2 values
error.cv: A matrix of average testing errors. The rows correspond to $\lambda$_1 values and the columns correspond to $\lambda$_2 values.
lam.opt: The pair of $\lambda$_1 and $\lambda$_2 values that return the lowest testing error across k-fold cross validation.

Details

For each pair of $(\lambda_1,\lambda_2)$, k-fold cross-validation will be conducted and the corresponding average testing error over the k folds will be recorded. $\lambda_1$ represents the parameter for the lasso penalization, while $\lambda_2$ represents the parameter for the group euclidean penalization. To use only Lasso penalization, set lam.vec.2=0. To use only Euclidean penalization, set lam.vec.1=0. The optimal pair is considered the pair of values that give the smallest testing error over the cross validation.

To view a plot of the cross validation errors across lambda values, see plot.cv.vda.le.

References

Wu, T.T. and Lange, K. (2010) Multicategory Vertex Discriminant Analysis for High-Dimensional Data. Annals of Applied Statistics, Volume 4, No 4, 1698-1721.

Lange, K. and Wu, T.T. (2008) An MM Algorithm for Multicategory Vertex Discriminant Analysis. Journal of Computational and Graphical Statistics, Volume 17, No 3, 527-544.

Examples

Run this code

### load zoo data
### column 1 is name, columns 2:17 are features, column 18 is class
data(zoo)

### feature matrix 
x <- zoo[,2:17]

### class vector
y <- zoo[,18]

### lambda vector
lam1 <- (1:5)/100
lam2 <- (1:5)/100

### Searching for the best pair, using both lasso and euclidean penalizations
cv <- cv.vda.le(x, y, kfold = 3, lam.vec.1 = exp(1:5)/10000, lam.vec.2 = (1:5)/100)
plot(cv)
outLE <- vda.le(x,y,cv$lam.opt[1],cv$lam.opt[2])

### To search for the best pair, using ONLY lasso penalization, set lambda2=0 (remove comments)
#cvlasso <- cv.vda.le(x, y, kfold = 3, lam.vec.1 = exp(1:10)/1000, lam.vec.2 = 0)
#plot(cvlasso)
#cvlasso$lam.opt

### To search for the best pair, using ONLY euclidean penalization, set lambda1=0 (remove comments)
#cveuclidian <- cv.vda.le(x, y, kfold = 3, lam.vec.1 = 0, lam.vec.2 = exp(1:10)/1000)
#plot(cveuclidian)
#cveuclidian$lam.opt

### Predict five cases based on vda.le (Lasso and Euclidean penalties)
fivecases <- matrix(0,5,16)
fivecases[1,] <- c(1,0,0,1,0,0,0,1,1,1,0,0,4,0,1,0)
fivecases[2,] <- c(1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1)
fivecases[3,] <- c(0,1,1,0,1,0,0,0,1,1,0,0,2,1,1,0)
fivecases[4,] <- c(0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0)
fivecases[5,] <- c(0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0)
predict(outLE, fivecases)

Run the code above in your browser using DataLab