predict.indicators: Predicts site group from indicators

Description

Function predict.indicators takes an object of class indicators and determines the probability of the indicated site group given a community data set. If no new data set is provided, the function can calculate the probabilities corresponding to the original sites used to build the indicators object.

Usage

# S3 method for indicators
predict(object, newdata = NULL, cv = FALSE, ...)

Value

If confidence intervals are available in x, function predict.indicators returns a matrix where communities are in rows and there are three columns, correspoinding to the probability of the indicated site group along with the confidence interval. If confidence intervals are not available in x, or if cv = TRUE, then predict.indicators returns a single vector with the probability of the indicated site group for each community.

Arguments

object: An object of class 'indicators'.
newdata: A community data table (with sites in rows and species in columns) for which predictions are needed. This table can contain either presence-absence or abundance data, but only presence-absence information is used for the prediction. If NULL, then the original data set used to derive the indicators object is used as data.
cv: A boolean flag to indicate that probabilities should be calculated using leave-one-out cross validation (i.e recalculating positive predictive value of indicators after excluding the target site).
...: In function predict, additional arguments not used (included for compatibility with predict).

Author

Miquel De Cáceres Ainsa, EMF-CREAF

Details

Function indicators explores the indicator value of the simultaneous occurrence of sets of species (i.e. species combinations). The method is described in De Cáceres et al. (2012) and is a generalization of the Indicator Value method of Dufrêne & Legendre (1997). The current function predict.indicators is used to predict the indicated site group from the composition of a new set of observations. For communities where one or more of the indicator species combinations are found, the function returns the probability associated to the indicator that has the highest positive predictive value (if confidence intervals are available, the maximum value is calculated across the lower bounds of the confidence interval). For communities where none of the indicator species combinations is found, the function returns zeroes. If newdata = NULL, the function can be used to evaluate the predictive power of a set of indicators in a cross-validated fashion. For each site in the data set, recalculates the predictive value of indicators after excluding the information of the site, and then evaluates the probability of the site group.

References

De Cáceres, M., Legendre, P., Wiser, S.K. and Brotons, L. 2012. Using species combinations in indicator analyses. Methods in Ecology and Evolution 3(6): 973-982.

Dufrêne, M. and P. Legendre. 1997. Species assemblages and indicator species: The need for a flexible asymetrical approach. Ecological Monographs 67:345-366.

Examples

Run this code

library(stats)

data(wetland) ## Loads species data

## Creates three clusters using kmeans
wetkm <- kmeans(wetland, centers=3) 


## Run indicator analysis with species combinations for the first group
sc <- indicators(X=wetland, cluster=wetkm$cluster, group=1, verbose=TRUE, At=0.5, Bt=0.2)

## Use the indicators to make predictions of the probability of group #1
## Normally an independent data set should be used, because 'wetland' was used to derive
## indicators. The same would be obtained calling 'predict(sc)' without further arguments.
p <- predict(sc, wetland)

## Calculate cross-validated probabilities (recalculates 'A' statistics once for each site 
## after excluding it, and then calls predict.indicators for that site)
pcv <- predict(sc, cv = TRUE)

## Show original membership to group 1 along with (resubstitution) predicted probabilities  
## and cross-validated probabilities. Cross-validated probabilities can be lower for sites
## originally belonging to the target site group and higher for other sites.
data.frame(Group1 = as.numeric(wetkm$cluster==1), Prob = p, Prob_CV = pcv)

Run the code above in your browser using DataLab