bce1: Bayesian Composition Estimator

Description

This function estimates taxonomic compositions of algal communities based on biomarker field data. More precisely, it estimates the probability distributions of a sample composition based on an input ratio matrix, A that contains prior estimates of biomarker ratios in different taxa, and an input data matrix, B, containing biomarker ratios measured in field samples.

Probability distributions are estimated based on an adaptive metropolis MCMC method, function modMCMC from package FME.

Usage

bce1(A, B, Wa=NULL, Wb=NULL,
       jmpType="default", jmpA=.1,jmpX=.1, jmpCovar=NULL,
       initX=NULL, initA=NULL, priorA="normal", minA=NULL, maxA=NULL, 
       var0=NULL, wvar0=1e-6, Xratios=TRUE, verbose=TRUE, ...)

Arguments

input (group) ratio matrix; can be a matrix or a dataframe

input (field) data matrix; can be a matrix or a dataframe

elementwise weight matrix for A, with the same dimensions as A. $$[Wa*(A-A_0)]^2$$ is minimized

elementwise weight matrix for B, with the same dimensions as B. $$[Wb*(A\%*\%X-B)]^2$$ is minimized

jmpType

one of "default", "estimate" or "covar"; if default, jmpA and jmpX are the jump lengths. if jmpA or jmpX is a number, then this is the jump length for all elements of A resp. X. If "estimate", the initial jump length is proportional to an estimated covariance matrix for the tlsce fit for A and the lsei fit of X (or Q if Xratios). jmpA and jmpX are then used as rescaling factors for the jump covariance matrix. If "covar", a jump covariance matrix with the correct dimensions, obtained from a previous run, is given as parameter jmpCovar. Covariances can be calculated from the result.

jmpA

jump length of A: a number or a matrix with dim(A); see details jmpType

jmpX

jump lenth of X: a number or a matrix with dim(X); see details jmpType

jmpCovar

only if jmpType="covar", the covariance matrix to initiate the jumps - see details jmpType

initX

composition matrix used to start the markov chain: default the tlsce solution of Ax=B

initA

ratio matrix used to start the markov chain: default the input ratio matrix A

priorA

"normal" (gaussian - default) or "uniform".

minA

minimum values for A

maxA

maximum values for A

var0

initial model variance; if 'NULL', then the model variance of tlsce(A,B,...) is used

wvar0

relative weight of the initial model variance (see modMCMC). Ideally this would be 0 (initial model variance is not taken into account); because wvar0=0 is a special case in modMCMC() (fixed model variance), the default value is set to a small number (wvar0=1e-6)

Xratios

does the composition matrix contain ratios (TRUE) or estimated biomass concentrations (TRUE) per sample? In the latter case, B must contain the pigment concentrations as measured in the samples (not rescaled)

verbose

when TRUE will give more verbose output

...

arguments to pass on to modMCMC()

Value

An object of class bce and _modMCMC_ (returned by the function modMCMC). This object has methods for the generic functions 'summary', 'plot', 'pairs'- see ?modMCMC. It is distinguished from other modMCMC objects by 3 extra attributes that allow to extract matrices A and X from the mcmc result: "dim_A" (dimensions of A), "A_not_null" (which elements of A are not zero and thus included in the mcmc) and Xratios (whether X was rescaled, yes or no).

Details

The function bce1 searches probability distributions for all elements of a taxonomical composition matrix X and a ratio matrix A for which:

$$A\%*\%X \simeq B$$

It does this by returning niter samples for A and X, organized in three-dimensional arrays. The input data matrix B and ratio matrix A should be in the following formats, with the relative concentrations per biomarker organized in columns:

data matrix B:

	sample1	sample2	sample3	sample4
marker1	0.14	0.005	0.35	0.033
marker2	0.15	0.004	0.36	0.034
marker3	0.13	0.004	0.31	0.030
marker4	0.13	0.005	0.33	0.031
marker5	0.14	0.008	0.33	0.036

and ratio matrix A:

	species1	species2	species3	species4
marker1	0.27	0.13	0.35	0.076
marker2	0.084	0	0.5	0.24
marker3	0.195	0.3	0	0.1
marker4	0.06	0	0	0
marker5	0	0	0	0

References

Van den Meersche, K., K. Soetaert and J.J. Middelburg (2008) A Bayesian compositional estimator for microbial taxonomy based on biomarkers, Limnology and Oceanography Methods 6, 190-199

Examples

Run this code

# NOT RUN {
##====================================

# example using bceInput data
# !!! should be weighted to correspond better to example of BCE!!!
A <- t(bceInput$Rat)
B <- t(bceInput$Dat)


result <- bce1(A,B,niter=1000)

## the number of accepted runs is zero;
## try different starting values

result <- bce1(A,B,niter=1000,initX=matrix(1/ncol(A),ncol(A),ncol(B)))

## number of accepted runs is still low;
## smaller jumps

result <- bce1(A,B,niter=1000,initX=matrix(1/ncol(A),ncol(A),ncol(B)),jmpA=.01,jmpX=.01)
Sum <-summary(result)

## did the algorithm converge?
plot(result$SS,type="l")
## no


## more runs, using the output of previous run as input.
result <- bce1(A,B,niter=1e4,jmpA=.01,jmpX=.01,updatecov=1e3,
               initX=Sum$lastX,initA=Sum$lastA,
               jmpCovar=Sum$covar*2.4^2/ncol(result$pars),
               )
Sum <-summary(result)

## we inspect the output:
plot(result$SS,type="l")
plot(result,ask=TRUE)
## looks already pretty good; to get a better result, repeat one more
## time with a  longer run. Uncomment the following paragraph and run.
## go get some coffee, this might take a while (~30s).

## result <- bce1(A,B,niter=1e5,jmpA=.01,jmpX=.01,updatecov=1e3,
##                outputlength=1e3,burninlength=.35e5,
##                initX=Sum$lastX,initA=Sum$lastA,
##                jmpCovar=Sum$covar*2.4^2/ncol(result$pars),
##                )
## Sum <-summary(result)
## plot(result$SS,type="l")
## plot(result,ask=TRUE)

# show results as mean with ranges
print(Sum$meanX)

# plot estimated means and ranges (lbX=lower, ubX=upper bound)
xlim <- range(c(Sum$lbX,Sum$ubX))

# first the mean
dotchart(x=t(Sum$meanX),xlim=xlim,                                                          
         main="Taxonomic composition",
         sub="using bce",pch=16)

# then ranges
nr <- nrow(Sum$meanX)
nc <- ncol(Sum$meanX)

for (i in 1:nr) 
  {ip <-(nr-i)*(nc+2)+1
   cc <- ip : (ip+nc-1)
   segments(t(Sum$lbX[i,]),cc,t(Sum$ubX[i,]),cc)
  }

# }

Run the code above in your browser using DataLab