splsda: Sparse Partial Least Squares Discriminate Analysis (sPLS-DA)

Description

Function to perform sparse Partial Least Squares to classify samples. The sPLS-DA approach embeds variable selection for this purpose.

Usage

splsda(X, Y, ncomp = 2, keepX = rep(ncol(X), ncomp),
       max.iter = 500, tol = 1e-06, ...)

Arguments

numeric matrix of predictors. NAs are allowed.

a factor or a class vector for the discrete outcome.

ncomp

the number of components to include in the model (see Details).

keepX

numeric vector of length ncomp, the number of variables to keep in $X$-loadings. By default all variables are kept in the model.

max.iter

integer, the maximum number of iterations.

tol

a positive real, the tolerance used in the iterative algorithm.

...

arguments to pass to nearZeroVar.

Value

splsda returns an object of class "splsda", a list that contains the following components:
Xthe centered and standardized original predictor matrix.
Ythe centered and standardized indicator response vector or matrix.
ind.matthe indicator matrix.
ncompthe number of components included in the model.
keepXnumber of $X$ variables kept in the model on each component.
mat.cmatrix of coefficients to be used internally by predict.
variateslist containing the variates.
loadingslist containing the estimated loadings for the X and Y variates.
nameslist containing the names to be used for individuals and variables.
nzvlist containing the zero- or near-zero predictors information.

encoding

latin1

Details

splsda function fit sPLS models with $1, \ldots ,$ncomp components to the factor or class vector Y. The appropriate indicator matrix is created.

References

L� Cao, K.-A., Martin, P.G.P., Robert-Grani�, C. and Besse, P. (2009). Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics 10:34. L� Cao, K.-A., Rossouw, D., Robert-Grani�, C. and Besse, P. (2008). A sparse PLS for variable selection when integrating Omics data. Statistical Applications in Genetics and Molecular Biology 7, article 35. Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis 99, 1015-1034. P�rez-Enciso, M. and Tenenhaus, M. (2003). Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach. Human Genetics 112, 581-592. Nguyen, D. V. and Rocke, D. M. (2002). Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18, 39-50. Tenenhaus, M. (1998). La r�gression PLS: th�orie et pratique. Paris: Editions Technic. Wold H. (1966). Estimation of principal components and related models by iterative least squares. In: Krishnaiah, P. R. (editors), Multivariate Analysis. Academic Press, N.Y., 391-420.

Examples

Run this code

## First example
data(breast.tumors)
X <- breast.tumors$gene.exp
Y <- breast.tumors$sample$treatment

res <- splsda(X, Y, ncomp = 2, keepX = c(25, 25))
palette(c("red", "blue"))
col <- as.numeric(as.factor(Y))
plotIndiv(res, ind.names = TRUE, col = col)
legend(-0.35, -0.19, c("After", "Before"), pch = c(16, 16), 
       col = c("red", "blue"), cex = 1, pt.cex = c(1.2, 1.2), 
       title = "Treatment")
palette("default")

## Second example
data(liver.toxicity)
X <- as.matrix(liver.toxicity$gene)
Y <- liver.toxicity$treatment[, 4]

splsda.liver = splsda(X, Y, ncomp = 2, keepX = c(20, 20))
col <- as.numeric(as.factor(Y))
plotIndiv(splsda.liver, col = col, ind.names = Y)

Run the code above in your browser using DataLab