tran: Common data transformations and standardizations

Description

Provides common data transformations and standardizations useful for palaeoecological data. The function acts as a wrapper to function decostand in package vegan for several of the available options.

The formula method allows a convenient method for selecting or excluding subsets of variables before applying the chosen transformation.

Usage

# S3 method for default
tran(x, method, a = 1, b = 0, p = 2, base = exp(1),
     na.rm = FALSE, na.value = 0, ...)
# S3 method for formula
tran(formula, data = NULL, subset = NULL,
     na.action = na.pass, ...)

Value

Returns the suitably transformed or standardized x. If x

is a data frame, the returned value is like-wise a data frame. The returned object also has an attribute "tran" giving the name of applied transformation or standardization "method".

Arguments

x: A matrix-like object.
method: transformation or standardization method to apply. See Details for available options.
a: Constant to multiply x by. method = "log" only. Can be a vector, in which case the vector of values to multiply each column of x by.
b: Constant to add to x before taking logs. method = "log" only. Can be a vector, in which case the vector of values to add to each column of x.
p: The power to use in the power transformation.
base: the base with respect to which logarithms are computed. See log for further details. The default is to compute natural logarithms.
na.rm: Should missing values be removed before some computations?
na.value: The value with which to replace missing values (NA).
...: Arguments passed to decostand, or other tran methods.
formula: A model formula describing the variables to be transformed. The formula should have only a right hand side, e.g.~~ foo + bar.
data, subset, na.action: See model.frame for details on these arguments. data will generally be the object or environment within which the variables in the forumla are searched for.

Author

Gavin L. Simpson. Much of the functionality of tran is provided by decostand, written by Jari Oksanen.

Details

The function offers following transformation and standardization methods for community data:

sqrt: take the square roots of the observed values.
cubert: take the cube root of the observed values.
rootroot: take the fourth root of the observed values. This is also known as the root root transformation (Field et al 1982).
log: take the logarithms of the observed values. The tansformation applied can be modified by constants a and b and the base of the logarithms. The transformation applied is \(x^* = \log_{\mathrm{base}}(ax + b)\)
log1p: computes \(log(1 + x)\) accurately also for \(|x| << 1\) via log1p. Note the arguments a and b have no effect in this method.
expm1: computes \(exp(x) - 1)\) accurately for \(|x| << 1\) via expm1.
reciprocal: returns the multiplicative inverse or reciprocal, \(1/x\), of the observed values.
freq: divide by column (variable, species) maximum and multiply by the number of non-zero items, so that the average of non-zero entries is 1 (Oksanen 1983).
center: centre all variables to zero mean.
range: standardize values into range 0 ... 1. If all values are constant, they will be transformed to 0.
percent: convert observed count values to percentages.
proportion: convert observed count values to proportions.
standardize: scale x to zero mean and unit variance.
pa: scale x to presence/absence scale (0/1).
missing: replace missing values with na.value.
chi.square: divide by row sums and square root of column sums, and adjust for square root of matrix total (Legendre & Gallagher 2001). When used with the Euclidean distance, the distances should be similar to the the Chi-square distance used in correspondence analysis. However, the results from cmdscale would still differ, since CA is a weighted ordination method.
hellinger: square root of observed values that have first been divided by row (site) sums (Legendre & Gallagher 2001).
wisconsin: applies the Wisconsin double standardization, where columns (species, variables) are first standardized by maxima and then sites (rows) by site totals.
pcent2prop: convert percentages to proportions.
prop2pcent: convert proportions to percentages.
logRatio: applies a log ransformation (see log above) to the data, then centres the data by rows (by subtraction of the mean for row i from the observations in row i). Using this transformation subsequent to PCA results in Aitchison's Log Ratio Analysis (LRA), a means of dealing with closed compositional data such as common in palaeoecology (Aitchison, 1983).
power: applies a power tranformation.
rowCentre, rowCenter: Centres x by rows through the subtraction of the corresponding row mean from the observations in the row.
colCentre colCenter: Centres x by columns through the subtraction of the corresponding column mean from the observations in the row.
none none: no transformation is applied.

References

Aitchison, J. (1983) Principal components analysis of compositional data. Biometrika 70(1); 57--65.

Field, J.G., Clarke, K.R., & Warwick, R.M. (1982) A practical strategy for analysing multispecies distributions patterns. Marine Ecology Progress Series 8; 37--52.

Legendre, P. & Gallagher, E.D. (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129; 271-280.

Oksanen, J. (1983) Ordination of boreal heath-like vegetation with principal component analysis, correspondence analysis and multidimensional scaling. Vegetatio 52; 181-189.

Examples

Run this code

data(swapdiat)
## convert percentages to proportions
sptrans <- tran(swapdiat, "pcent2prop")

## apply Hellinger transformation
spHell <- tran(swapdiat, "hellinger")

## Dummy data to illustrate formula method
d <- data.frame(A = runif(10), B = runif(10), C = runif(10))
## simulate some missings
d[sample(10,3), 1] <- NA
## apply tran using formula
tran(~ . - B, data = d, na.action = na.pass,
     method = "missing", na.value = 0)

Run the code above in your browser using DataLab