tlsce: Total Least Squares Composition Estimator

Description

estimates a matrix X for which: $$(A+\epsilon_A)X = B+\epsilon_B$$ minimize $\sum{\epsilon_A^2 + \epsilon_B^2}$ $$\sum{X_{i,}}=1 \forall i$$ $$X>0$$ the elements of $\epsilon_A$ are NULL if the corresponding elements of A are NULL. A typically contains biomarker concentrations for several taxonomic groups, and B field measurements of the same biomarkers. X is then an estimate of the taxonomic composition of the field sample.

Usage

tlsce(A, B, Wa=NULL, Wb=NULL, minA=NULL, maxA=NULL,
       A_init=A, Xratios=TRUE, ...)

Arguments

a matrix or data frame. If A contains biomarker data for taxonomic groups, the biomarkers have to be organized per row, and the taxonomic groups per column.

a matrix or data frame. If B contains biomarker field data, the biomarkers have to be organized per row, and the samples per column.

weighting of A, a matrix with the same dimensions of A. If Wa=NULL, Wa defaults to 1. This parameter can be used to give more importance to elements of A or A in total compared to B. weights are implemented as proportional to $1/s$ (as opposed to $1/s^2$) with s the standard deviation of the error term.

weighting of B, a matrix with the same dimensions of B. If Wb=NULL, Wb defaults to 1. This parameter can be used to give more importance to elements of B or B in total compared to A. weights are implemented as proportional to $1/s$ (as opposed to $1/s^2$) with s the standard deviation of the error term.

minA

minimum values for A

maxA

maximum values for A

A_init

a matrix with the same structure as A. a general, non-linear optimization routine (default nlminb) is used to minimize the sum of squared residuals of A versus the fitted matrix A\_fit (see value). This optimization routine requires a set of starting values, by default the non-zero elements of A. This provides a good fit, but when in doubt about the convergence of the algorithm, one can provide different starting values for the optimization routine in A\_init.

Xratios

TRUE or FALSE: are the colSums of the matrix X equal to 1? This is for example the case in a compositional matrix. (only if A and B are both expressed relative to the unit of biomass) if Xratios =TRUE, A has pigment concentrations per biomass unit, B has pigment concentrations per biomass unit per sample, and X contains ratios of biomass unit per sample. if Xratios =FALSE, A has pigment concentrations per biomass unit, B has pigment concentrations per sample, and X has biomass units per sample

...

Arguments to be passed to lsei() or to modFit()

Value

A list with the following elements:

Array with dimension c(ncol(A),ncol(B), iter) containing the species composition of each sample

A\_fit

Array with same dimension as A, containing the best-fit values of the input biomarker data per taxonomic group

B\_fit

Array with same dimension as B, containing the biomarker field data, corresponding to Afit

solutionNorms

a vector of 3 values:

the value of the minimised quadratic function at the solution, in this case $\sum{(Afit-A)*Wa)^2 + (Bfit-B)^2}$,

and the shares of this value attributed to A and to B

convergence

An integer code. '0' indicates successful convergence.

Details

instead of a linear least squares regression, in which the elements of A would be fixed, the function tlsce includes the non-zero elements of A in the least squares regression. This is similar to other total least squares regression methods (also called orthogonal regression), with the main difference that only non-zero elements of A contain an error term.

References

Van den Meersche, K., K. Soetaert and J.J. Middelburg (2008) A Bayesian compositional estimator for microbial taxonomy based on biomarkers, Limnology and Oceanography Methods 6, 190-199

Examples

Run this code

# NOT RUN {
A <- t(bceInput$Rat)
B <- t(bceInput$Dat)
tlsce(A,B)
## weighting Wa inversely proportional to A
tlsce(A,B,Wa=1/A)
# }

Run the code above in your browser using DataLab