fclust_plot: Plot various graphs of a functional clustering for one or several performances

Description

The function plots numerous useful graphs for illustrating results and the ways by which they were obtained: hierarchical trees of component clustering, composition and mean performance of assembly motifs, mean performance of assemblages containing a given components, observed, simulated and predicted performances of assemblages labelled by assembly motif, performances of given assemblages...

Usage

fclust_plot(fres, nbcl = 0, main = "",
            opt.tree  = NULL, opt.perf = NULL, opt.ass = NULL,
            opt.motif = NULL, opt.comp = NULL, opt.all = NULL )

Arguments

fres

an object generated by the function fclust.

nbcl

an integer. The integer indicates the number of component clusters to take into account. It can be lower than or equals to the optimum number fres$nbOpt of component clusters.

main

a string, that is used as the first, reference part of the title of each graph.

opt.tree

a list, that can include opt.tree = list("cal", "prd", cols, "zoom", window, "all"). This option list manages the plot of primary and secondary trees of component clustering, simplified or not, focussed on the main component clusters or not, coloured by the user or not. The item order in list is any.

"cal" plots the primary tree of component clustering, from trunk until leaves. At trunk level, when all components are clustered into a large, trivial cluster, the coefficient of determination R2 is low. At the leaves level, when each component is isolated in a singleton, the coefficient of determination is always equal to 1. The primary tree is therefore necessarily over-fitted near the leaves level. The optimum number fres$nbOpt of component clusters is determined by the minimum AICc.
The blue dashed line indicates the level (optimum number fres$nbOpt of component clusters) where the tree must be optimally cut up. The red solid line indicates the value of tree efficiency E at the nbcl-level. The component clusters are named by lowercase letters, from left to right as "a", "b", "c", ...: the name and content of each component cluster is written on the following page.
"prd" plots the validated, secondary tree of component clustering, from trunk until validated leaves. Secondary tree is the primary tree cut at the level of the optimal number nbOpt of component clusters. nbOpt is determined by the first lowest value of AIC along the primary tree.
The red solid line indicates the value of tree efficiency E. R2 and E are stored in fres$tStats. The component clusters are named by lowercase letters, from left to right as "a", "b", "c", ...: the name and content of each component cluster is written on the following page.
cols is a vector of colours, characters or integers, of same length as the number of components. This option specifies the colour of each component. The components labelled by the same integer have the same colour. If cols is not specified, the components that belong to a same cluster a posteriori determined have the same colour. This option is useful when an a priori clustering is known, to identify the components a priori clustered into the a posteriori clustering.
"zoom" if "cal" or "prd" is checked, this option allows to only plot the first, significant component clusters. The cluster on the far right (the cluster named by the last letter) is most often a large cluster, that includes many components of which the effects of assemblage performance are not significant. When the number of components is large, the tree is dense and the names of components are confusing. The option is useful to focus on the left, more signficant, part of the primary or secondary tree. If "zoom" is checked, window must be informed. If not, the function stops with an error message. Note that the large cluster, that includes many components, is always represented by at least one component.
window an integer, that specifies the number of components to plot. window must be informed when "zoom" is checked. If window is higher than the number of components, it is ignored. If window is lower than the number of significant components, it is ajusted in such a way that the large cluster, that includes many components, is at least represented by one component.
"all" plots all possible graphs. This option is equivalent to opt.tree = list("cal", "prd", "zoom", window = 20). If the number of components is lower than 20, the option is equivalent to opt.tree = list("cal", "prd").

opt.perf

a list, that can include opt.perf = list("stats_I", "stats_II", "cal", "prd", "missing", "pub", "calprd", "seq", "ass", "aov", pvalue, "all"). This option list manages the plot of observed, modelled and predicted performances of assemblages, and associated statistics. It also allows to plot performances of some given, identified assemblages. The item order in list is any.

"stats_I", "stats_II": plot the statistics associated to fit of primary tree that best accounts for observed performances ("stats_I"), and of secondary tree that best predicts observed performances of assemblages ("stats_II"). Four graphs are plotted: 1. coefficient of determination R2 and efficiency E of models of component clustering (on y-axis) versus the number of component clusters (on x-axis); 2. the ratio of assemblage perfomances that cannot be predicted by cross-validation ("predicting ratio"); 3. and 4. the Akaike Information Criterion, corrected AICc or not AIC for small datasets. The green solid line indicates the first minimum of AIC that corresponds to the optimum number nbOpt of component clusters to consider.
"cal", "prd": plot modelled performances versus observed performances ("cal", or modelled and predicted by cross-validation performances versus observed performances ("prd", for a number of component clusters increasing from 1 until the number of component clusters where efficiency E is maximum. Different symbols correspond to different assembly motifs. The prediction error induced by cross-validation is indicated by a short vertical line.
The blue dashed lines are mean performances. The red solid line is 1:1 bissector line. The number of component clusters is indicated on graph left top. Predicting ratio and coefficient of determination R2 of the clustering are indicated on graph right bottom. If "prd" is checked, efficiency E and E/R2 ratio are added. If "aov" is checked, groups significantly different (at a p-value < pvalue) are indicated by differents letters on the right of graph.
"missing": the option "prd" plot modelled and predicted by cross-validation performances versus observed performances, using different symbols for different assembly motifs. The option "missing" plot the same data, but in using different symbols according to the clustering model used for predicting the performances of assemblages. This option allows to identify assemblages of which the performance cannot be predicted using the clustering model of the current level. The assemblages are plotted and named using the symbol corresponding to the level of the used clustering model.
The blue dashed lines are mean performances. The red solid line is 1:1 bissector line. The number of component clusters is indicated on graph left top. Predicting ratio and coefficient of determination of the clustering are indicated on graph right bottom. If "aov" is checked, groups significantly different (at a p-value < pvalue) are indicated by differents letters on the right of graph.
"pub": the option "prd" plot modelled and predicted by cross-validation performances versus observed performances, using different symbols for different assembly motifs. The option "pub" plot the same data, but in using only one symbol. This option is useful for publication.
The blue dashed lines are mean performances. The red solid line is 1:1 bissector line. The number of component clusters is indicated on graph left top. Predicting ratio and coefficient of determination of the clustering are indicated on graph right bottom. If "aov" is checked, groups significantly different (at a p-value < pvalue) are indicated by differents letters on the right of graph.
"calprd": plot performances predicted by cross-validation versus performances predicted by clustering model ("modelled performances"). This option is useful to identify which assembly motifs become difficult to predict by cross-validation.
The blue dashed lines are mean performances. The red solid line is 1:1 bissector line. The number of component clusters is indicated on graph left top. Predicting ratio and coefficient of determination of the clustering are indicated on graph right bottom. If "aov" is checked, groups significantly different (at a p-value < pvalue) are indicated by differents letters on the right of graph. The letters are located at mean(Fprd[motif == label]).
"seq": plot performances of assembly motifs, from 1 to nbMax number of component clusters. Remember that number m of assembly motifs increases with the number nbcl of component clusters (m = 2^nbcl - 1). When the optimal number of component clusters is large, this option is useful to determine a number of component clusters lower than the optimal number of component clusters. Assembly motifs are named as the combinations of component clusters (see "opt.tree").
"ass" plot the name of each assemblage close to its performance. This option can be used with the options "cal", "prd", "pub" and "calprd". It must be used only if the number of assemblages is small. If the number of assemblages is large, the following option "opt.ass" is more convenient.
"aov": does a variance analysis of assemblage performances by assembly motifs, and plot the result on the right of graphs. Different letters correspond to groups significantly different at a p-value < pvalue. If "aov" is checked, pvalue must be informed. If not, pvalue = 0.001.
pvalue: a probability used as threshold in the variance analysis. Then pvalue must be higher than 0 and lower than 1. pvalue must be informed when "aov" is checked. Groups significantly different (at a p-value < pvalue) are then indicated by differents letters on the right of boxplots.
"all": plot all possible graphs. This option is equivalent to opt.pref = list("cal", "prd", "pub", "calprd", "aov", pvalue = 0.001).

opt.ass

a list, that include opt.ass = list(sample, who). This option plot modelled and predicted by cross-validation performances versus observed performances, for a small sample of assemblages randomly drawn (sample), or for given, identified assemblages chosen by the user (who). The item order in list is any.

sample: an integer. This integer specifies the number of assemblages to randomly drawn in the assemblage set, the plot as the option opt.perf = list("prd"). All chosen assemblages are plotted on a same graph.
who: a list of assemblage names. The list contains the names of assemblages to plot. Each assemblage is plotted on a specific graph. This option is useful when ssemblage performances are observed over several experiments.

opt.motif

a list, that can include opt.motif = list("obs", "cal", "prd", cols, "hor", "ver", "seq", pvalue, "all"). This option list manages the plot of mean performances of assembly motifs as boxplots, observed, modelled or predicted by cross-validation, horizontally or vertically, sorted by increasing or decreasing mean values, from 1 to nbOpt clusters of components. The item order in list is any.

"obs", "cal", "prd": plot the observed, modelled or predicted by cross-validation mean performances of assembly motifs as boxplots. Assembly motifs are named as the combinations of component clusters (see "opt.tree"). The coloured squares are the mean performances of assembly motifs. Size (number of observed assemblages) of assembly motifs is indicated on the left of boxplots. The red dashed line is the mean performance of assembly motifs. If "aov" is checked, groups significantly different (at a p-value < pvalue) are indicated by differents letters on the right of boxplots.
"hor": plot boxplots as horizontal boxes: x-axis corresponds to assemblage performances, and y-axis corresponds to assembly motifs. It "hor" is not checked, boxplots are plotted as vertical boxes: x-axis corresponds to assembly motifs, and y-axis corresponds to assemblage performances. Option "ver" can also be used: "ver" = !"hor".
"seq": plot mean performances of assembly motifs, from 2 to nbOpt number of component clusters. Remember that number m of assembly motifs increases with the number nbcl of component clusters (m = 2^nbcl - 1). When the optimal number of component clusters is large, this option is useful to determine a number of component clusters lower than the optimal number of component clusters. Assembly motifs are named as the combinations of component clusters (see "opt.tree").
pvalue = value: a probability used as threshold in the variance analysis. Then pvalue must be higher than 0 and lower than 1. pvalue must be informed when "aov" is checked. Groups significantly different (at a p-value < pvalue) are then indicated by differents letters on the right of boxplots.
"all": plot all possible graphs. This option is equivalent to opt.motif = list("obs", "cal", "prd", "seq", "aov", pvalue = 0.001). <U+00B6>

opt.comp

a list, that can include opt.comp = list("tree", "perf", "hor", "ver", cols, pvalue, "zoom", window, "all"). This option list manages the plot as boxplot of observed mean performances of assemblages that contain a given component, horizontally or vertically, components sorted by increasing or decreasing mean values, or components sorted like the clustering tree. The item order in list is any.

"tree", "perf": plot the observed mean performances of assemblages that contain a given component as boxplots. Each set of assemblages that contains a given component is named by the contained component. The coloured squares are the mean performances of assemblage sets. Size (number of observed assemblages) of assemblage sets is indicated on the left of boxplots. The red dashed line is the mean performance of assemblage sets. If "aov" is checked, groups significantly different (at a p-value < pvalue) are indicated by differents letters on the right of boxplots.
If "tree": is checked, mean performances of assemblages that contain a given component are sorted like the clustering tree. If "perf" is checked, mean performances of assemblages that contain a given component are sorted by increasing mean performances.
"hor": plot boxplots as horizontal boxes: x-axis corresponds to assemblage performances, and y-axis corresponds to assemblage sets. It "hor" is not checked, boxplots are plotted as vertical boxes: x-axis corresponds to assemblage sets, and y-axis corresponds to assemblage performances. Option "ver" can also be used: "ver" = !"hor".
cols: is a vector of integers, of same length as the number of components. This option specifies the colour of each component. The components labelled by the same integer have the same colour. If cols is not specified, the components that belong to a same cluster a posteriori determined have the same colour. This option is useful when an a priori clustering is known, to identify the components a priori clustered into the a posteriori clustering.
pvalue = value: a probability used as threshold in the variance analysis. Then pvalue must be higher than 0 and lower than 1. pvalue must be informed when "aov" is checked. Groups significantly different (at a p-value < pvalue) are then indicated by differents letters on the right of boxplots.
"all": plot all possible graphs. This option is equivalent to opt.motif = list("tree", "aov", pvalue = 0.001, "zoom", window = 20).

opt.all

This option is equivalent to opt.tree = "all", opt.comp = "all", opt.motif = "all", opt.perf = "all". This option is convenient to overview the different options of the function fclust_plot.

Value

Nothing. It is a procedure.

Details

If all the options are NULL, that is opt.tree = NULL, opt.perf = NULL, opt.ass = NULL, opt.motif = NULL, opt.comp = NULL, opt.all = NULL, the function plot the main results, that are: the secondary tree (opt.tree = "prd"), assembly motifs as horizontal boxplots (opt.motif = list("obs", "hor"))), and modelled and predicted by cross-validation mean performances versus observed performances (opt.perf = "prd").

References

Jaillard, B., Richon, C., Deleporte, P., Loreau, M. and Violle, C. (2018) An a posteriori species clustering for quantifying the effects of species interactions on ecosystem functioning. Methods in Ecology and Evolution, 9:704-715. https://doi.org/10.1111/2041-210X.12920.

Jaillard, B., Deleporte, P., Loreau, M. and Violle, C. (2018) A combinatorial analysis using observational data identifies species that govern ecosystem functioning. PLoS ONE 13(8): e0201135. https://doi.org/10.1371/journal.pone.0201135.

Examples

Run this code

# NOT RUN {
res <- CedarCreek.2004.res

# plot the hierarchical tree of functionally redundant components
fclust_plot(res, main = "BioDiv2 2004", opt.tree = "prd")

# plot AIC and AICc versus the number of clusters of components
layout(matrix(c(1,2,3,4), nrow = 2, ncol = 2, byrow = TRUE))
fclust_plot(res, main = "BioDiv2 2004", opt.perf = "stats_II")
layout(1)

# plot the performances modelled and predicted versus observed performances
fclust_plot(res, main = "BioDiv2 2004", opt.perf = "prd")

# plot the performances sorted by assembly motifs
layout(matrix(c(1,2), nrow = 1, ncol = 2, byrow = TRUE))
fclust_plot(res, main = "BioDiv2 2004",
            opt.motif = c("obs", "prd", "hor"))
layout(1)


# }

Run the code above in your browser using DataLab