Take a hierarchical tree of species clustering, a matrix of occurrency and the corresponding vector of performances, and return the predictions, statistics and other informations.
validate_ftree(tree.I, fobs, mOccur,
xpr = stats::setNames(rep(1, length(fobs)),
rep("a", length(fobs))),
opt.method = "divisive", opt.mean = "amean",
opt.model = "byelt",
opt.jack = FALSE,
jack = as.integer(c(3, 4)),
opt.nbMax = dim(mOccur)[2])
an integer square-matrix. The matrix represents a hierarchical tree of species clustering.
a numeric vector. The vector fobs
contains the
quantitative performances of assemblages.
a matrix of occurrence (occurrence of elements).
Its first dimension equals to length(fobs)
. Its second dimension
equals to the number of elements.
a vector of numerics of length(fobs)
.
The vector xpr
contains the weight of each experiment,
and the labels (in names(xpr)
) of different experiments.
The weigth of each experiment is used
in the computation of the Residual Sum of Squares
in the function rss_clustering
.
The used formula is rss
if each experiment has the same weight.
The used formula is wrss
(barycenter of RSS for each experiment)
if each experiment has different weights.
All assemblages that belong to a given experiment
should then have a same weigth.
Each experiment is identified by its names (names(xpr)
)
and the RSS of each experiment is weighted by values of xpr
.
The vector xpr
is generated
by the function stats::setNames
.
a string that specifies the method to use.
opt.method = c("sort", "divisive", "agglomerative", "cluster")
.
The three first methods generate hierarchical trees.
Each tree is complete, running from a unique trunk
to as many leaves as components.
The last method generates, at each level of the tree,
a clustering of components into a given, predefined number of clusters.
Because it is repeated from the trunk until to leaves,
by increasing the number of clusters,
the method generates a non-hierarchical tree.
If opt.method = "sort"
, the components are sorted
by their effect of assemblage performances.
If opt.method = "divisive"
, the components are clustered
according to a hierarchical process
by using a divisive method,
from the trivial cluster where all components are together,
towards the clustering where each component is a cluster.
If opt.method = "agglomerative"
, the components are clustered
according to a hierarchical process
by using an agglomerative method,
from the trivial clustering where each component is a clsuter,
towards the cluster where all components are together.
The method that gives the best result is opt.method = "divisive"
.
If opt.method = "cluster"
, the components are clustered
according to a non-hierarchical process
by using the method of McNaughton-Smith et al., 1964.
In this case, one must specify the number of wished clusters.
Recall that, if affectElt
is specified,
the option opt.method
does not need to be filled out.
affectElt
determines a level of component clustering,
and a tree is built:
(i) by using opt.method = "divisive"
from the defined level in tree towards as many leaves as components;
(ii) by using opt.method = "agglomerative"
from the defined level in tree towards the trunk of tree.
a character equals to "amean"
or "gmean"
.
Switch to arithmetic formula if opt.mean = "amean"
.
Switch to geometric formula if opt.mean = "gmean"
.
a character equals to "bymot"
or "byelt"
.
Switch to simple mean by assembly motif if opt.model = "bymot"
.
Switch to linear model with assembly motif if opt.model = "byelt"
.
a logical, that switchs towards cross-validation method.
If opt.jack = FALSE
, a "leave-one-out" is used:
predicted performances are computed
as the mean of performances of assemblages
that share a same assembly motif,
experiment by experiment,
except the only assemblage to predict.
If opt.jack = TRUE
, a jackknife method is used:
the set of assemblages belonging to a same assembly motif is divided
into jack[2]
subsets of jack[1]
assemblages.
Predicted performances are computed,
experiment by experiment,
by excluding jack[1]
assemblages,
including the assemblage to predict.
If the total number of assemblages belonging
to the assembly motif is lower than jack[1]*jack[2]
,
predictions are computed by Leave-One-Out method.
an integer vector of length 2
.
The vector specifies the parameters for jackknife method.
The first integer jack[1]
specifies the size of subset,
the second integer jack[2]
specifies the number of subsets.
an integer, that indicates the maximum number
of tree levels to cluster.
By default, opt.nbMax = nbElt
for clustering components
all along the tree, from the trunc to the leaves, to be able to determine
the optimum number of component functional groups.
However, in ftest_*
and fboot_*
functions,
there is no point in cluster the components
beyond the optimum number of functional groups. In these functions,
opt.nbMax =
optimum number of functional groups, by default.
Return a list containing predictions of assembly performances and statistics computed by using a species clustering tree.
Recall of inputs:
nbElt, nbAss
: the numbers of components, of assemblages
opt.method
: the method used to cluster components,
opt.mean
: the option for mean values computing,
opt.model
: the option for prediction modelling,
opt.jack
: the option for method of cross-validation,
jack
: the parameters for jackknife,
fobs
: the vector of observed performances of assemblages,
mOccur
: the matrix of component occurrence,
xpr
: the vector of labels of different experiments.
Primary and secondary trees of element clustering:
tree.I
: the primary tree of component clustering,
tree.II
: the validated secondary
tree of component clustering,
nbOpt
: the optimum number of clusters,
Matrices of calibration and prediction using tree.I and associated statistics:
mCal
: the matrix of modelled values,
mPrd
: the matrix of values predicted by cross-validation,
mMotifs
: the matrix of labels of assembly motifs,
mStats
: the matrix of associated statistics.
Matrices of calibaration and prediction using tree.II and associated statistics:
tCal
: the matrix of values modelled
using the valid part of tree,
tPrd
: the matrix of values predicted
using the valid part of tree,
tStats
: statistics of valid tree model goodness-of-fit,
tNbcl
: the number of clusters used
or computing each performance.
None.