multilevel: Multilevel analysis for repeated measurements (cross-over design)

Description

The analysis of repeated measurements is performed by combining a multilevel approach with multivariate methods: sPLS-DA (Discriminant Analysis) or sPLS (Integrative analysis). Both approaches enable variable selection.

Usage

multilevel(X, 
           Y = NULL,
           design,  
           ncomp = 2,
           keepX = NULL,
           keepY = NULL,  
           method = c("spls", "splsda"),  
           mode = c("regression", "canonical"), 
           max.iter = 500, 
           tol = 1e-06,
           near.zero.var = TRUE)

Arguments

numeric matrix of predictors. NAs are allowed.

if method = "spls", numeric vector or matrix of continuous responses (for multi-response models). NAs are allowed.

design

a numeric matrix or data frame. The first column indicates the repeated measures on each individual, i.e. the individuals ID. If method = 'splsda', the 2nd and 3rd columns are factors. If method = 'spls' then you can choose

ncomp

the number of components to include in the model (see Details).

keepX

numeric vector of length ncomp, the number of variables to keep in $X$-loadings. By default all variables are kept in the model.

keepY

if method = "spls", numeric vector of length ncomp, the number of variables to keep in $Y$-loadings. By default all variables are kept in the model.

method

character string. Which multivariate method and type of analysis to choose, matching "spls" (unsupervised integrative analysis) or "splsda" (discriminant analysis). See Details.

mode

character string. What type of algorithm to use, matching "regression" or "canonical". See detals in ?pls.

max.iter

integer, the maximum number of iterations.

tol

a not negative real, the tolerance used in the iterative algorithm.

near.zero.var

boolean, see the internal nearZeroVar function (should be set to TRUE in particular for data with many zero values). Setting this argument to FALSE (when appropriate) will speed up the computations.

Value

multilevel returns either an object of class "mlspls" for sPLS analysis or an object of class "mlsplsda" for sPLS-DA analysis, a list that contains the following components:
Xthe centered and standardized original predictor matrix.
Ythe centered and standardized original (or indicator) response vector or matrix.
Xwthe within-subject $X$-deviation matrix.
Ywthe within-subject $Y$-deviation matrix if method = "spls".
designthe design matrix.
ind.matthe indicator matrix associated to $Y$ if method = "splsda".
ncompthe number of components included in the model.
keepXnumber of $X$ variables kept in the model on each component.
keepYnumber of $Y$ variables kept in the model on each component $Y$ if method = "spls".
variateslist containing the $X$- and $Y$-variates.
loadingslist containing the estimated loadings for the X and Y variates.
nameslist containing the names to be used for individuals and variables.
modethe algorithm used to fit the model if method = "spls".
nzvlist containing the zero- or near-zero predictors information.

encoding

latin1

Details

multilevel function first decomposes the variance in the data sets $X$ (and $Y$) and applies either sPLS-DA (method = "splsda") or sPLS (method = "spls") on the within-subject deviation.

One or two-factor analyses are available for method = "splsda".

A sPLS or sPLS-DA model is performed with 1,...,ncomp components to the factor in design[, 2] (or design[, 2:3] for two-factor in sPLS-DA).

Multilevel sPLS-DA enables the selection of discriminant variables between the factors in design.

Multilevel sPLS enables the integration of data measured on two different data sets on the same individuals. This approach differs from multilevel sPLS-DA as the aim is to select subsets of variables from both data sets that are highly positively or negatively correlated across samples. The approach is unsupervised, i.e. no prior knowledge about the sample groups is included.

References

On multilevel analysis:

Liquet, B., Le Cao, K.-A., Hocini, H. and Thiebaut, R. (2012) A novel approach for biomarker selection and the integration of repeated measures experiments from two platforms. BMC Bioinformatics 13:325.

Westerhuis, J. A., van Velzen, E. J., Hoefsloot, H. C., and Smilde, A. K. (2010). Multivariate paired data analysis: multilevel PLSDA versus OPLSDA. Metabolomics, 6(1), 119-128.

On sPLS-DA:

Le Cao, K.-A., Boitard, S. and Besse, P. (2011). Sparse PLS Discriminant Analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics 12:253.

On sPLS:

Le Cao, K.-A., Martin, P.G.P., Robert-Granie, C. and Besse, P. (2009). Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics 10:34.

Le Cao, K.-A., Rossouw, D., Robert-Granie, C. and Besse, P. (2008). A sparse PLS for variable selection when integrating Omics data. Statistical Applications in Genetics and Molecular Biology 7, article 35.

Examples

Run this code

## First example: one-factor analysis with sPLS-DA, selecting a subset of variables
# as in the paper Liquet et al.
#--------------------------------------------------------------
data(vac18)
X <- vac18$genes
Y <- vac18$stimulation
# sample indicates the repeated measurements
design <- data.frame(sample = vac18$sample, 
                     stimul = vac18$stimulation)

# multilevel sPLS-DA model
res.1level <- multilevel(X, ncomp = 3, design = design,
                         method = "splsda", keepX = c(30, 137, 123))

# set up colors for plotIndiv
col.stim <- c("darkblue", "purple", "green4","red3")
plotIndiv(res.1level, ind.names = Y, col.per.group = col.stim)


## Second example: two-factor analysis with sPLS-DA, selecting a subset of variables
# as in the paper Liquet et al.
#--------------------------------------------------------------
data(vac18.simulated) # simulated data

X <- vac18.simulated$genes
design <- data.frame(sample = vac18.simulated$sample,
                     stimu = vac18.simulated$stimulation,
                     time = vac18.simulated$time)

res.2level <- multilevel(X, ncomp = 2, design = design,
                         keepX = c(200, 200), method = 'splsda')

plotIndiv(res.2level, group = design$stimu, ind.names = vac18.simulated$time,
    add.legend = TRUE, style = 'lattice')       

## Third example: one-factor analysis with sPLS, selecting a subset of variables
#--------------------------------------------------------------
data(liver.toxicity)
# note: we made up those data, pretending they are repeated measurements
repeat.indiv <- c(1, 2, 1, 2, 1, 2, 1, 2, 3, 3, 4, 3, 4, 3, 4, 4, 5, 6, 5, 5,
                 6, 5, 6, 7, 7, 8, 6, 7, 8, 7, 8, 8, 9, 10, 9, 10, 11, 9, 9,
                 10, 11, 12, 12, 10, 11, 12, 11, 12, 13, 14, 13, 14, 13, 14,
                 13, 14, 15, 16, 15, 16, 15, 16, 15, 16)
summary(as.factor(repeat.indiv)) # 16 rats, 4 measurements each

# this is a spls (unsupervised analysis) so no need to mention any factor in design
# we only perform a one level variation split
design <- data.frame(sample = repeat.indiv) 
res.spls.1level <- multilevel(X = liver.toxicity$gene,
                                       Y=liver.toxicity$clinic,
                                       design = design,
                                       ncomp = 3,
                                       keepX = c(50, 50, 50), keepY = c(5, 5, 5),
                                       method = 'spls', mode = 'canonical')

# set up colors and pch for plotIndiv
col.stimu <- 1:nlevels(design$stimu)

plotIndiv(res.spls.1level, rep.space = 'X-variate', ind.names = FALSE, 
group = liver.toxicity$treatment$Dose.Group,
          pch = 20, main = 'Gene expression subspace',
          add.legend = TRUE)


plotIndiv(res.spls.1level, rep.space = 'Y-variate', ind.names = FALSE,
group = liver.toxicity$treatment$Dose.Group,
          pch = 20, main = 'Clinical measurements ssubpace',
          add.legend = TRUE)
          
plotIndiv(res.spls.1level, rep.space = 'XY-variate', ind.names = FALSE,
group = liver.toxicity$treatment$Dose.Group,
          pch = 20, main = 'Both Gene expression and Clinical subspaces',
          add.legend = TRUE)

Run the code above in your browser using DataLab