fclust: Build a functional clustering for one or more performances

Description

Fit a primary tree of component clustering to observed assemblage performances, then prune the primary tree for its predicting ability and its parcimony, finally retain a validated secondary tree and the corresponding predictions, statistics and other informations.

Usage

fclust(dat, nbElt,
       weight     = rep(1, dim(dat)[2] - nbElt - 1),
       opt.na     = FALSE,
       opt.repeat = FALSE,
       opt.method = "divisive",
       affectElt  = rep(1, nbElt),
       opt.mean   = "amean",
       opt.model  = "byelt",
       opt.jack   = FALSE,   jack = c(3,4) )

Arguments

dat

a data.frame or matrix that brings together: a vector of assemblage identity, a matrix of occurrence of components within the system, one or more vectors of observed performances. Consequently, the data.frame or matrix dimensions are: dim(dat)[1]= the number of observed assemblages, * dim(dat)[2]= 1 + number of system components + number of observed performances. On a first line (colnames): assemblage identity, a list of components identified by their names, a list of performances identified by their names. On following lines (a line by assemblage), name of the assemblage (read as character), a sequence of 0 (absence) and 1 (presence of component within each assemblage) (this is the matrix of occurrence of components within the system), a sequence of numeric values for informed each observed performances (this is the set of observed performances).

nbElt

an integer, that specifies the number of components belonging to interactive system. nbElt is used to know the dimension of matrix of occurrence.

weight

a vector of numerics, that specifies the weight of each performance. By default, each performance is equally weighted. If weight is informed, it must have the same length as the number of observed performances.

opt.na

a logical. The records for each assemblage can have NA in matrix of occurrence or in observed assemblage performances. If opt.na = FALSE (by default), an error is returned. If opt.na = TRUE, the records with NA are ignored.

opt.repeat

a logical. in any case, the function looks for different assemblages with identical elemental composition. Messages indicate these identical assemblages. If opt.repeat = FALSE (by default), their performances are averaged. If opt.repeat = TRUE, nothing is done, and the data are processed as they are.

opt.method

a string that specifies the method to use. opt.method = c("divisive", "agglomerative", "apriori"). The three methods generate hierarchical trees. Each tree is complete, running from a unique trunk to as many leaves as components.

If opt.method = "divisive", the components are clustered by using a divisive method, from the trivial cluster where all components are together, towards the clustering where each component is a cluster. This method gives the best result for several reasons, exposed in detail in joined vignettes (see "The options of fclust").

If opt.method = "agglomerative", the components are clustered by using an agglomerative method, from the trivial clustering where each component is a cluster, towards the cluster where all components are brought together If all possible assemblages are not observed (that is generally he case in practice), the first clustering of few components can have no effect on convergence criterion, indicing a non-optimum result.

If opt.method = "apriori", the user knows and gives an "a priori" partitioning of the system components he is studying. The partition is arbitrary, in any number of clusters of components, but it must be specified (see following option affectElt). The tree is then built: (i) by using opt.method = "divisive" from the defined component clustering towards as many leaves as components; (ii) by using opt.method = "agglomerative" from the component clustering towards the trunk of tree.

affectElt

a vector of characters or integers, as long as the number of components nbElt, that indicates the labels of different functional clusters to which each component belongs. Each functional cluster is labelled as a character or an integer, and each component must be identified by its name in names(affectElt). The number of functional clusters defined in affectElt determines an a priori level of component clustering (level <- length(unique(affectElt))).

If affectElt = NULL (by default), the option opt.method must be specified. If affectElt is specified, the option opt.method switchs to apriori.

opt.mean

a character, equals to "amean" or "gmean". If opt.mean = "amean", means are computed using an arithmetic formula, if opt.mean = "gmean", mean are computed using a geometric formula.

opt.model

a character equals to "bymot" or "byelt". If opt.model = "bymot", the modelled performances are means of performances of assemblages that share a same assembly motif by including all assemblages that belong to a same assembly motif.

If opt.model = "byelt", the modelled performances are the average of mean performances of assemblages that share a same assembly motif and that contain the same components as the assemblage to predict. This procedure corresponds to a linear model within each assembly motif based on the component occurrence in each assemblage. If no assemblage contains component belonging to assemblage to predict, performance is the mean performance of all assemblages as in opt.model = "bymot".

opt.jack

a logical, that switchs towards cross-validation method.

If opt.jack = FALSE (by default), a Leave-One-Out method is used: predicted performances are computed as the mean of performances of assemblages that share a same assembly motif, experiment by experiment, except the only assemblage to predict.

If opt.jack = TRUE, a jackknife method is used: the set of assemblages belonging to a same assembly motif is divided into jack[2] subsets of jack[1] assemblages. Predicted performances of each subset of jack[1] assemblages are computed, experiment by experiment, by using the other (jack[2] - 1) subsets of assemblages. If the total number of assemblages belonging to the assembly motif is lower than jack[1]*jack[2], predictions are computed by Leave-One-Out method.

jack

an integer vector of length 2. The vector specifies the parameters for jackknife method. The first integer jack[1] specifies the size of subset, the second integer jack[2] specifies the number of subsets.

Value

Return a list containing the primary tree of component clustering, predictions of assembly performances and statistics computed by using the primary and secondary trees of component clustering.

Recall of inputs:

nbElt, nbAss, nbXpr: the number of components that belong to the interactive system, the number of assemblages and the number of performances observed, respectively.
opt.method, opt.mean, opt.model, opt.jack, jack, opt.na, opt.repeat, affectElt: the options used for computing the resulting clustering trees, respectively.
fobs, mOccur, xpr: the vector or matrix of observed performances of assemblages, the binary matrix of occurrence of components, and the vector of weight of different performances, respectively.

Primary and secondary, fitted and validated trees, of component clustering and associated statistics:

tree.I, tree.II, nbOpt: the primary tree of component clustering, the validated secondary tree of component clustering, and the optimum number of functional clusters, respectively. A tree is a list of a square-matrix of dimensions nbLev * nbElt (with nbLev = nbElt), and of a vector of coefficient of determination (of length nbLev).
mCal, mPrd, tCal, tPrd: the numeric matrix of modelled values, and of values predicted by cross-validation, using the primary tree (mCal and (mPrd) or the secondary tree (tCal and (tPrd), respectively. All matrices have the same dimension nbLev * nbAss. rownames contains the number of component clusters, that is from 1 to nbElt clusters. colnames contains the names of assemblages.
mMotifs, tNbcl: the matrix of affectation of assemblages to different assembly motifs, coded as integers, and the matrices of the last tree levels used for predicting assemblage performances. All matrices have the same dimension nbLev * nbAss. rownames contains the number of component clusters, that is from 1 to nbElt clusters. colnames contains the names of assemblages.
mStats, tStats: the matrices of associated statistics. rownames contains the number of component clusters, that is from 1 to nbElt clusters. colnames = c("missing", "R2cal", "R2prd", "AIC", "AICc").

Details

see Vignette "The options of fclust".

References

Jaillard, B., Richon, C., Deleporte, P., Loreau, M. and Violle, C. (2018) An a posteriori species clustering for quantifying the effects of species interactions on ecosystem functioning. Methods in Ecology and Evolution, 9:704-715. https://doi.org/10.1111/2041-210X.12920.

Jaillard, B., Deleporte, P., Loreau, M. and Violle, C. (2018) A combinatorial analysis using observational data identifies species that govern ecosystem functioning. PLoS ONE 13(8): e0201135. https://doi.org/10.1371/journal.pone.0201135.

Examples

Run this code

# NOT RUN {
# Enable the comments
oldOption <- getOption("verbose")
if (!oldOption) options(verbose = TRUE)

nbElt <- 16   # number of components
# index = Identity, Occurrence of components, a Performance
index <- c(1, 1 + 1:nbElt, 1 + nbElt + 1)
dat.2004 <- CedarCreek.2004.2006.dat[ , index]
res <- fclust(dat.2004, nbElt)
names(res)
res$tree.II

options(verbose = oldOption)


# }

Run the code above in your browser using DataLab