pertables: Function to incorporate the effect of taxonomic uncertainty on multivariate analyses of ecological data.

Description

This function implements a permutational method to incorporate taxonomic uncertainty on multivariate analyses typically used in the analysis of ecological data. The procedure is based on iterative randomizations that randomly re-assign non identified species in each site to any of the other species found in the remaining sites.

Usage

pertables(data, index = NULL, nsim = 100)
pertables.p2(data, index = NULL, nsim = 100, ncl=2, iseed = NULL)

Value

The function return a list of class pertables with the following components

taxunc: Summary of the number of species fully identified (0), identified to genus (1), identified to family (2), or fully undetermined (3).
pertables: A list with all the simulated data matrices.
raw: The raw data matrix, without the unidentified especies.

Arguments

data: Community data matrix. The three first columns are factors referring to the family, genus and species specific names. The remaining columns are numeric vectors indicating species abundances at each site.
index: List of additional parameters to determine the level at which species have been identified. Default values include 'Indet', 'indet', 'sp', 'sp1' to 'sp100', 'sp 1' to 'sp 100', '', and ' '.
nsim: Number of simulations of species' identities, i.e., number of data tables to simulate.
ncl: Number of clusters for parallel simulation.
iseed: An integer to be supplied to clusterSetRNGStream, or NULL not to set reproducible seeds.

Author

Luis Cayuela and Marcelino de la Cruz

Details

The procedure is implemented in two sequential steps:

Step 1. Morphospecies identified only to genus are randomly re-assigned with the same probability within the group of species and morphospecies that share the same genus, provided they are not found in the same sites. In the re-assignment of the species identity, the species considered can also receive its own identity. For instance, let's assume we have three floristic inventories. In site A we have Eugenia sp1 and E. nesiotica. In site B we have Eugenia nesiotica, E. principium and E. salamensis. In site C we have Eugenia sp2 and E. salamensis. Eugenia sp1 can be thus re-identified with equal probability as Eugenia sp2, E. principium, E. salamensis or just maintain its own identity (Eugenia sp1). In the latter case, this means that we assume that E. sp1 is a completely different species, although we do not know its true identity. On the contrary, we cannot re-identify E. sp1 as E. nesiotica because they were found in the same site, so we are quite certain that E. sp1 is different from E. nesiotica. The same is applied to species identified only to family and fully unidentified species. Note that when collating inventories from different researchers, we must rename all unidentified species. This is because two researchers can use the same label, e.g. Eugenia sp1, even though this name does not necessarily refer to the same species. For a verification of the biological identity of Eugenia sp1 one would need to cross-check the vouchers bearing the same name.

Step 2. Step 1 is iterated nsim times. As a result, nsim matrices are obtained, all of which contain the same number of sites but variable number of species depending on the resulting re-assignment of morphospecies, The process can be time-consuming if community data matrices are large.

Function pertables.p2 implements a parallelized version which considerably reduces computation time.

References

Cayuela, L., De la Cruz, M. and Ruokolainen, K. (2011). A method to incorporate the effect of taxonomic uncertainty on multivariate analyses of ecological data. Ecography, 34: 94-102. http://dx.doi.org/10.1111/j.1600-0587.2009.05899.x.

Examples

Run this code


data(Amazonia)
data(soils)

# Define a new index that includes the terms used in the \code{Amazonia} dataset to define
# undetermined taxa at different taxonomic levels

index.Amazon <- c(paste("sp.", rep(1:20), sep=""), "Indet.", "indet.")

#Generate a pertables object (i.e. a list of biological data tables simulated from taxonomic
# uncertainty)
 
 if (FALSE) {
# compare prformance of pertables and pertables.p2
nsim <-100
ncl <-2
gc()
t0<- Sys.time()
 Amazonia100<- pertables(Amazonia, index=index.Amazon, nsim=nsim)
 Sys.time()-t0
gc()
t0<- Sys.time()
 Amazonia100.p2<- pertables.p2(Amazonia, index=index.Amazon, nsim=nsim, ncl=ncl)
 Sys.time()-t0
}
# Example for Rcheck

Amazonia4.p2<- pertables.p2(Amazonia, index=index.Amazon, nsim=4, ncl=2)

Run the code above in your browser using DataLab