spls: Sparse Partial Least Squares (sPLS)

Description

Function to perform sparse Partial Least Squares (sPLS). The sPLS approach combines both integration and variable selection simultaneously on two data sets in a one-step strategy.

Usage

spls(X, Y, ncomp = 2, mode = c("regression", "canonical"),
     max.iter = 500, tol = 1e-06, keepX = rep(ncol(X), ncomp), 
     keepY = rep(ncol(Y), ncomp), ...)

Arguments

numeric matrix of predictors. NAs are allowed.

numeric vector or matrix of responses (for multi-response models). NAs are allowed.

ncomp

the number of components to include in the model (see Details). Default is set to from one to the rank of X.

mode

character string. What type of algorithm to use, (partially) matching one of "regression" or "canonical". See Details.

max.iter

integer, the maximum number of iterations.

tol

a positive real, the tolerance used in the iterative algorithm.

keepX

numeric vector of length ncomp, the number of variables to keep in $X$-loadings. By default all variables are kept in the model.

keepY

numeric vector of length ncomp, the number of variables to keep in $Y$-loadings. By default all variables are kept in the model.

...

arguments to pass to nearZeroVar.

Value

spls returns an object of class "spls", a list that contains the following components:
Xthe centered and standardized original predictor matrix.
Ythe centered and standardized original response vector or matrix.
ncompthe number of components included in the model.
modethe algorithm used to fit the model.
keepXnumber of $X$ variables kept in the model on each component.
keepYnumber of $Y$ variables kept in the model on each component.
mat.cmatrix of coefficients to be used internally by predict.
variateslist containing the variates.
loadingslist containing the estimated loadings for the $X$ and $Y$ variates.
nameslist containing the names to be used for individuals and variables.
nzvlist containing the zero- or near-zero predictors information.

encoding

latin1

Details

spls function fit sPLS models with $1, \ldots ,$ncomp components. Multi-response models are fully supported. The X and Y datasets can contain missing values. The type of algorithm to use is specified with the mode argument. Two sPLS algorithms are available: sPLS regression ("regression") and sPLS canonical analysis ("canonical") (see References). The estimation of the missing values can be performed by the reconstitution of the data matrix using the nipals function. Otherwise, missing values are handled by casewise deletion in the spls function without having to delete the rows with missing data.

References

L� Cao, K.-A., Martin, P.G.P., Robert-Grani�, C. and Besse, P. (2009). Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics 10:34. L� Cao, K.-A., Rossouw, D., Robert-Grani�, C. and Besse, P. (2008). A sparse PLS for variable selection when integrating Omics data. Statistical Applications in Genetics and Molecular Biology 7, article 35. Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis 99, 1015-1034. Tenenhaus, M. (1998). La r�gression PLS: th�orie et pratique. Paris: Editions Technic. Wold H. (1966). Estimation of principal components and related models by iterative least squares. In: Krishnaiah, P. R. (editors), Multivariate Analysis. Academic Press, N.Y., 391-420.

Examples

Run this code

data(liver.toxicity)
X <- liver.toxicity$gene
Y <- liver.toxicity$clinic

toxicity.spls <- spls(X, Y, ncomp = 3, keepX = c(50, 50, 50), 
                      keepY = c(10, 10, 10))

Run the code above in your browser using DataLab