validate_ftree: Predictions of assembly performances using a species clustering tree

Description

Take a hierarchical tree of species clustering, a matrix of occurrency and the corresponding vector of performances, and return the predictions, statistics and other informations.

Usage

validate_ftree(tree.I, fobs, mOccur,
              xpr = stats::setNames(rep(1, length(fobs)),
                                    rep("a", length(fobs))),
              opt.method = "divisive", opt.mean = "amean",
              opt.model = "byelt",
              opt.jack   = FALSE,
              jack       = as.integer(c(3, 4)),
              opt.nbMax  = dim(mOccur)[2])

Arguments

tree.I

an integer square-matrix. The matrix represents a hierarchical tree of species clustering.

fobs

a numeric vector. The vector fobs contains the quantitative performances of assemblages.

mOccur

a matrix of occurrence (occurrence of elements). Its first dimension equals to length(fobs). Its second dimension equals to the number of elements.

xpr

a vector of numerics of length(fobs). The vector xpr contains the weight of each experiment, and the labels (in names(xpr)) of different experiments. The weigth of each experiment is used in the computation of the Residual Sum of Squares in the function rss_clustering. The used formula is rss if each experiment has the same weight. The used formula is wrss (barycenter of RSS for each experiment) if each experiment has different weights. All assemblages that belong to a given experiment should then have a same weigth. Each experiment is identified by its names (names(xpr)) and the RSS of each experiment is weighted by values of xpr. The vector xpr is generated by the function stats::setNames.

opt.method

a string that specifies the method to use. opt.method = c("sort", "divisive", "agglomerative", "cluster"). The three first methods generate hierarchical trees. Each tree is complete, running from a unique trunk to as many leaves as components. The last method generates, at each level of the tree, a clustering of components into a given, predefined number of clusters. Because it is repeated from the trunk until to leaves, by increasing the number of clusters, the method generates a non-hierarchical tree.

If opt.method = "sort", the components are sorted by their effect of assemblage performances.

If opt.method = "divisive", the components are clustered according to a hierarchical process by using a divisive method, from the trivial cluster where all components are together, towards the clustering where each component is a cluster.

If opt.method = "agglomerative", the components are clustered according to a hierarchical process by using an agglomerative method, from the trivial clustering where each component is a clsuter, towards the cluster where all components are together. The method that gives the best result is opt.method = "divisive".

If opt.method = "cluster", the components are clustered according to a non-hierarchical process by using the method of McNaughton-Smith et al., 1964. In this case, one must specify the number of wished clusters.

Recall that, if affectElt is specified, the option opt.method does not need to be filled out. affectElt determines a level of component clustering, and a tree is built: (i) by using opt.method = "divisive" from the defined level in tree towards as many leaves as components; (ii) by using opt.method = "agglomerative" from the defined level in tree towards the trunk of tree.

opt.mean

a character equals to "amean" or "gmean". Switch to arithmetic formula if opt.mean = "amean". Switch to geometric formula if opt.mean = "gmean".

opt.model

a character equals to "bymot" or "byelt". Switch to simple mean by assembly motif if opt.model = "bymot". Switch to linear model with assembly motif if opt.model = "byelt".

opt.jack

a logical, that switchs towards cross-validation method.

If opt.jack = FALSE, a "leave-one-out" is used: predicted performances are computed as the mean of performances of assemblages that share a same assembly motif, experiment by experiment, except the only assemblage to predict.

If opt.jack = TRUE, a jackknife method is used: the set of assemblages belonging to a same assembly motif is divided into jack[2] subsets of jack[1] assemblages. Predicted performances are computed, experiment by experiment, by excluding jack[1] assemblages, including the assemblage to predict. If the total number of assemblages belonging to the assembly motif is lower than jack[1]*jack[2], predictions are computed by Leave-One-Out method.

jack

an integer vector of length 2. The vector specifies the parameters for jackknife method. The first integer jack[1] specifies the size of subset, the second integer jack[2] specifies the number of subsets.

opt.nbMax

an integer, that indicates the maximum number of tree levels to cluster. By default, opt.nbMax = nbElt for clustering components all along the tree, from the trunc to the leaves, to be able to determine the optimum number of component functional groups. However, in ftest_* and fboot_* functions, there is no point in cluster the components beyond the optimum number of functional groups. In these functions, opt.nbMax = optimum number of functional groups, by default.

Value

Return a list containing predictions of assembly performances and statistics computed by using a species clustering tree.

Recall of inputs:

nbElt, nbAss: the numbers of components, of assemblages
opt.method: the method used to cluster components,
opt.mean: the option for mean values computing,
opt.model: the option for prediction modelling,
opt.jack: the option for method of cross-validation,
jack: the parameters for jackknife,
fobs: the vector of observed performances of assemblages,
mOccur: the matrix of component occurrence,
xpr: the vector of labels of different experiments.

Primary and secondary trees of element clustering:

tree.I: the primary tree of component clustering,
tree.II: the validated secondary tree of component clustering,
nbOpt: the optimum number of clusters,

Matrices of calibration and prediction using tree.I and associated statistics:

mCal: the matrix of modelled values,
mPrd: the matrix of values predicted by cross-validation,
mMotifs: the matrix of labels of assembly motifs,
mStats: the matrix of associated statistics.

Matrices of calibaration and prediction using tree.II and associated statistics:

tCal: the matrix of values modelled using the valid part of tree,
tPrd: the matrix of values predicted using the valid part of tree,
tStats: statistics of valid tree model goodness-of-fit,
tNbcl: the number of clusters used or computing each performance.

Details

None.