Learn R Programming

rioja (version 1.0-7)

WA: Weighted averaging (WA) regression and calibration

Description

Functions for reconstructing (predicting) environmental values from biological assemblages using weighted averaging (WA) regression and calibration.

Usage

WA(y, x, mono=FALSE, tolDW = FALSE, use.N2=TRUE, tol.cut=.01, 
      check.data=TRUE, lean=FALSE)

WA.fit(y, x, mono=FALSE, tolDW=FALSE, use.N2=TRUE, tol.cut=.01, lean=FALSE)

# S3 method for WA predict (object, newdata=NULL, sse=FALSE, nboot=100, match.data=TRUE, verbose=TRUE, ...)

# S3 method for WA crossval(object, cv.method="loo", verbose=TRUE, ngroups=10, nboot=100, h.cutoff=0, h.dist=NULL, ...)

# S3 method for WA performance(object, ...)

# S3 method for WA rand.t.test(object, n.perm=999, ...)

# S3 method for WA print(x, ...)

# S3 method for WA summary(object, full=FALSE, ...)

# S3 method for WA plot(x, resid=FALSE, xval=FALSE, tolDW=FALSE, deshrink="inverse", xlab="", ylab="", ylim=NULL, xlim=NULL, add.ref=TRUE, add.smooth=FALSE, ...)

# S3 method for WA residuals(object, cv=FALSE, ...)

# S3 method for WA coef(object, ...)

# S3 method for WA fitted(object, ...)

Value

Function WA returns an object of class WA with the following named elements:

coefficients

species coefficients ("optima" and, optionally, "tolerances").

deshrink.coefficients

deshrinking coefficients.

tolDW

logical to indicate tolerance downweighted results in model.

fitted.values

fitted values for the training set.

call

original function call.

x

environmental variable used in the model.

If function predict is called with newdata=NULL it returns the fitted values of the original model, otherwise it returns a list with the following named elements:

fit

predicted values for newdata.

If sample specific errors were requested the list will also include:

fit.boot

mean of the bootstrap estimates of newdata.

v1

standard error of the bootstrap estimates for each new sample.

v2

root mean squared error for the training set samples, across all bootstram samples.

SEP

standard error of prediction, calculated as the square root of v1^2 + v2^2.

Function crossval also returns an object of class WA and adds the following named elements:

predicted

predicted values of each training set sample under cross-validation.

residuals.cv

prediction residuals.

Function performance returns a matrix of performance statistics for the WA model. See performance, for a description of the summary.

Arguments

y

a data frame or matrix of biological abundance data.

x, object

a vector of environmental values to be modelled or an object of class WA.

newdata

new biological data to be predicted.

mono

logical to perform monotonic curvilinear deshrinking.

tolDW

logical to include regressions and predictions using tolerance downweighting.

use.N2

logical to adjust tolerance by species N2 values.

tol.cut

tolerances less than tol.cut are replaced by the mean tolerance.

check.data

logical to perform simple checks on the input data.

lean

logical to exclude some output from the resulting models (used when cross-validating to speed calculations).

full

logical to show head and tail of output in summaries.

match.data

logical indicate the function will match two species datasets by their column names. You should only set this to FALSE if you are sure the column names match exactly.

resid

logical to plot residuals instead of fitted values.

xval

logical to plot cross-validation estimates.

xlab, ylab, xlim, ylim

additional graphical arguments to plot.WA.

deshrink

deshrinking type to show in plot.

add.ref

add 1:1 line on plot.

add.smooth

add loess smooth to plot.

cv.method

cross-validation method, either "loo", "lgo", "bootstrap" or "h-block".

verbose

logical to show feedback during cross-validation.

nboot

number of bootstrap samples.

ngroups

number of groups in leave-group-out cross-validation.

h.cutoff

cutoff for h-block cross-validation. Only training samples greater than h.cutoff from each test sample will be used.

h.dist

distance matrix for use in h-block cross-validation. Usually a matrix of geographical distances between samples.

sse

logical indicating that sample specific errors should be calculated.

n.perm

number of permutations for randomisation t-test.

cv

logical to indicate model or cross-validation residuals.

...

additional arguments.

Author

Steve Juggins

Details

Function WA performs weighted average (WA) regression and calibration. Weighted averaging has a long history in ecology and forms the basis of many biotic indices. It WAs popularised in palaeolimnology by ter Brakk and van Dam (1989) and Birks et al. (1990) follwoing ter Braak & Barendregt (1986) and ter Braak and Looman (1986) who demonstrated it's theroetical properties in providing a robust and simple alternative to species response modelling using Gaussian logistic regression. Function WA predicts environmental values from sub-fossil biological assemblages, given a training dataset of modern species and envionmental data. It calculates estimates using inverse and classical deshrinking, and, optionally, with taxa downweighted by their tolerances. Prediction errors and model complexity (simple or tolerance downweighted WA) can be estimated by cross-validation using crossval which implements leave-one out, leave-group-out, or bootstrapping. With leave-group out one may also supply a vector of group memberships for more carefully designed cross-validation experiments.

Function predict predicts values of the environemntal variable for newdata or returns the fitted (predicted) values from the original modern dataset if newdata is NULL. Variables are matched between training and newdata by column name (if match.data is TRUE). Use compare.datasets to assess conformity of two species datasets and identify possible no-analogue samples.

Function rand.t.test performs a randomisation t-test to test the significance of the difference in cross-validation RMSE between tolerance-downweighted and simple WA, after van der Voet (1994).

WA has methods fitted and rediduals that return the fitted values (estimates) and residuals for the training set, performance, which returns summary performance statistics (see below), coef which returns the species coefficients (optima and tolerances), and print and summary to summarise the output. WA also has a plot method that produces scatter plots of predicted vs observed measurements for the training set.

References

Birks, H.J.B., Line, J.M., Juggins, S., Stevenson, A.C., & ter Braak, C.J.F. (1990) Diatoms and pH reconstruction. Philosophical Transactions of the Royal Society of London, B, 327, 263-278.

ter Braak, C.J.F. & Barendregt, L.G. (1986) Weighted averaging of species indicator values: its efficiency in environmental calibration. Mathematical Biosciences, 78, 57-72.

ter Braak, C.J.F. & Looman, C.W.N. (1986) Weighted averaging, logistic regression and the Gaussian response model. Vegetatio, 65, 3-11.

ter Braak, C.J.F. & van Dam, H. (1989) Inferring pH from diatoms: a comparison of old and new calibration methods. Hydrobiologia, 178, 209-223.

van der Voet, H. (1994) Comparing the predictive accuracy of models uing a simple randomization test. Chemometrics and Intelligent Laboratory Systems, 25, 313-323.

See Also

WAPLS, MAT, and compare.datasets for diagnostics.

Examples

Run this code
# pH reconstruction of core K05 from the Round Loch of Glenhead,
# Galloway, SW Scotland. This lake has become acidified over the 
# last c. 150 years

data(SWAP)
data(RLGH)
spec <- SWAP$spec
pH <- SWAP$pH
core <- RLGH$spec
age <- RLGH$depths$Age

fit <- WA(spec, pH, tolDW=TRUE)
# plot predicted vs. observed
plot(fit)
plot(fit, resid=TRUE)

# RLGH reconstruction
pred <- predict(fit, core)

#plot the reconstructio
plot(age, pred$fit[, 1], type="b")

# cross-validation model using bootstrapping
if (FALSE) {
fit.xv <- crossval(fit, cv.method="boot", nboot=1000)
par(mfrow=c(1,2))
plot(fit)
plot(fit, resid=TRUE)
plot(fit.xv, xval=TRUE)
plot(fit.xv, xval=TRUE, resid=TRUE)

# RLGH reconstruction with sample specific errors
pred <- predict(fit, core, sse=TRUE, nboot=1000)
}

Run the code above in your browser using DataLab