compare.datasets: Compare datasets for matching variables (species)

Description

Compare two datasets and summarise species occurrance and abundance of species recorded in dataset one across dataset two. Useful for examining the conformity between sediment core and training set species data.

Usage

compare.datasets(y1, y2, n.cut=c(5, 10, 20, 50), 
      max.cut=c(2, 5, 10, 20, 50))

Value

Function compare.datasets returns a list with two names elements:

vars: data frame listing for each variable in the first dataset: N.occur = number of occurences in dataset 1, N2, Hill's N2 for species in dataset 1, Max = maximum value in dataset 1, N.2 = number of occurences in dataset 2, N2.2 = Hill's N2 for species in dataset 2, Max.2 = maximum value in dataset 2, N.005, number of occurences where the species is greater than 5 etc.
objs: data frame listing for each observation in the first dataset: N.taxa = number of species greater than zero abundance, N2, Hill's N2 for samples, Max = maximum value, total = sample total, M.002 = number of taxa with a maximum abundance greater than 2 2 etc., N2.005 = number of taxa in dataset 1 with more than 5 occurences in 2 dataset 2 etc., Sum.N2.005 = sample total including only those taxa with at least 5 occurrences in dataset 2 etc., M2.005 = number of taxa in dataset 1 with maximum abundance greater than 2 in dataset 2 etc., and Sum. M2.005 = sample total including only those taxa with a maximum abundance greater than 2 in dataset 2 etc.

Arguments

y1, y2: two data frames or matrices, usually of biological species abundance data, to compare.
n.cut: vector of abundances to be used for species occurrence calculations (see details).
max.cut: vector of occurences to be used for species maximum abundance calculations (see details).

Author

Steve Juggins

Details

Function compare.datasets compares two datasets. It summarise the species profile (number of occurences etc.) and sample profile (number of species in each sample etc.) of dataset 1. For those species recorded in dataset 1 it also provides summaries of their occurence and abundance in dataset 2. It is useful diagnostic for checking the conformity between core and training set data, specifically for identifying core taxa absent from the training set, and core samples with portions of their assemblage missing from the training set.

plot.compare.datasets provides a simple visualisation of the comparisons. It produces a matrix of plots, one for each sample in dataset 1, showing the abundance of each taxon in dataset 1 (x-axis) against the N2 value of that taxon in dataset 2 (y-axis, with symbols scaled according to abundance in dataset 2. The plots shouls aid identification of samples with high abundance of taxa that are rare (low N2) or have low abundance in the training set. Taxa thar are absent from the training set are indicated with a red "+".

Examples

Run this code

# compare diatom data from core from Round Loch of Glenhead
# with SWAP surface sample dataset
data(RLGH)
data(SWAP)
result <- compare.datasets(RLGH$spec, SWAP$spec)
result

Run the code above in your browser using DataLab