bioenv: Best Subset of Environmental Variables with Maximum (Rank) Correlation with Community Dissimilarities

Description

Function finds the best subset of environmental variables, so that the Euclidean distances of scaled environmental variables have the maximum (rank) correlation with community dissimilarities.

Usage

"bioenv"(comm, env, method = "spearman", index = "bray", upto = ncol(env), trace = FALSE, partial = NULL,  metric = c("euclidean", "mahalanobis", "manhattan", "gower"), parallel = getOption("mc.cores"), ...)
"bioenv"(formula, data, ...)
bioenvdist(x, which = "best")

Arguments

comm

Community data frame or a dissimilarity object or a square matrix that can be interpreted as dissimilarities.

env

Data frame of continuous environmental variables.

method

The correlation method used in cor.

index

The dissimilarity index used for community data (comm) in vegdist. This is ignored if comm are dissimilarities.

upto

Maximum number of parameters in studied subsets.

formula, data

Model formula and data.

trace

Trace the calculations

partial

Dissimilarities partialled out when inspecting variables in env.

metric

Metric used for distances of environmental distances. See Details.

parallel

Number of parallel processes or a predefined socket cluster. With parallel = 1 uses ordinary, non-parallel processing. The parallel processing is done with parallel package.

bioenv result object.

which

The number of the model for which the environmental distances are evaluated, or the "best" model.

...

Other arguments passed to cor.

Value

The function returns an object of class bioenv with a summary method.

Details

The function calculates a community dissimilarity matrix using vegdist. Then it selects all possible subsets of environmental variables, scales the variables, and calculates Euclidean distances for this subset using dist. The function finds the correlation between community dissimilarities and environmental distances, and for each size of subsets, saves the best result. There are $2^p-1$ subsets of $p$ variables, and an exhaustive search may take a very, very, very long time (parameter upto offers a partial relief).

The argument metric defines distances in the given set of environmental variables. With metric = "euclidean", the variables are scaled to unit variance and Euclidean distances are calculated. With metric = "mahalanobis", the Mahalanobis distances are calculated: in addition to scaling to unit variance, the matrix of the current set of environmental variables is also made orthogonal (uncorrelated). With metric = "manhanttan", the variables are scaled to unit range and Manhattan distances are calculated, so that the distances are sums of differences of environmental variables. With metric = "gower", the Gower distances are calculated using function daisy. This allows also using factor variables, but with continuous variables the results are equal to metric = "manhattan".

The function can be called with a model formula where the LHS is the data matrix and RHS lists the environmental variables. The formula interface is practical in selecting or transforming environmental variables.

With argument partial you can perform “partial” analysis. The partializing item must be a dissimilarity object of class dist. The partial item can be used with any correlation method, but it is strictly correct only for Pearson.

Function bioenvdist recalculates the environmental distances used within the function. The default is to calculate distances for the best model, but the number of any model can be given. Clarke & Ainsworth (1993) suggested this method to be used for selecting the best subset of environmental variables in interpreting results of nonmetric multidimensional scaling (NMDS). They recommended a parallel display of NMDS of community dissimilarities and NMDS of Euclidean distances from the best subset of scaled environmental variables. They warned against the use of Procrustes analysis, but to me this looks like a good way of comparing these two ordinations.

Clarke & Ainsworth wrote a computer program BIO-ENV giving the name to the current function. Presumably BIO-ENV was later incorporated in Clarke's PRIMER software (available for Windows). In addition, Clarke & Ainsworth suggested a novel method of rank correlation which is not available in the current function.

References

Clarke, K. R & Ainsworth, M. 1993. A method of linking multivariate community structure to environmental variables. Marine Ecology Progress Series, 92, 205--219.

Examples

Run this code

# The method is very slow for large number of possible subsets.
# Therefore only 6 variables in this example.
data(varespec)
data(varechem)
sol <- bioenv(wisconsin(varespec) ~ log(N) + P + K + Ca + pH + Al, varechem)
sol
summary(sol)

Run the code above in your browser using DataLab