Provides common data transformations and standardizations useful for
palaeoecological data. The function acts as a wrapper to function
decostand
in package vegan for several of the
available options.
The formula
method allows a convenient method for selecting or
excluding subsets of variables before applying the chosen
transformation.
# S3 method for default
tran(x, method, a = 1, b = 0, p = 2, base = exp(1),
na.rm = FALSE, na.value = 0, ...)# S3 method for formula
tran(formula, data = NULL, subset = NULL,
na.action = na.pass, ...)
Returns the suitably transformed or standardized x
. If x
is a data frame, the returned value is like-wise a data frame. The
returned object also has an attribute "tran"
giving the name of
applied transformation or standardization "method"
.
A matrix-like object.
transformation or standardization method to apply. See Details for available options.
Constant to multiply x
by. method = "log"
only. Can be a vector, in which case the vector of values to
multiply each column of x
by.
Constant to add to x
before taking logs. method
= "log"
only. Can be a vector, in which case the vector of values
to add to each column of x
.
The power to use in the power transformation.
the base with respect to which logarithms are
computed. See log
for further details. The default is
to compute natural logarithms.
Should missing values be removed before some computations?
The value with which to replace missing values
(NA
).
Arguments passed to decostand
, or
other tran
methods.
A model formula describing the variables to be
transformed. The formula should have only a right hand side,
e.g.~~ foo + bar
.
See model.frame
for
details on these arguments. data
will generally be the
object or environment within which the variables in the forumla are
searched for.
Gavin L. Simpson. Much of the functionality of tran
is
provided by decostand
, written by Jari Oksanen.
The function offers following transformation and standardization methods for community data:
sqrt
: take the square roots of the observed values.
cubert
: take the cube root of the observed values.
rootroot
: take the fourth root of the observed
values. This is also known as the root root transformation (Field
et al 1982).
log
: take the logarithms of the observed values. The
tansformation applied can be modified by constants a
and
b
and the base
of the logarithms. The transformation
applied is \(x^* = \log_{\mathrm{base}}(ax + b)\)
log1p
: computes \(log(1 + x)\) accurately also for
\(|x| << 1\) via log1p
. Note the arguments a
and b
have no effect in this method.
expm1
: computes \(exp(x) - 1)\) accurately for
\(|x| << 1\) via expm1
.
reciprocal
: returns the multiplicative inverse or
reciprocal, \(1/x\), of the observed values.
freq
: divide by column (variable, species) maximum and
multiply by the number of non-zero items, so that the average of
non-zero entries is 1 (Oksanen 1983).
center
: centre all variables to zero mean.
range
: standardize values into range 0 ... 1. If all
values are constant, they will be transformed to 0.
percent
: convert observed count values to percentages.
proportion
: convert observed count values to proportions.
standardize
: scale x
to zero mean and unit
variance.
pa
: scale x
to presence/absence scale (0/1).
missing
: replace missing values with na.value
.
chi.square
: divide by row sums and square root of
column sums, and adjust for square root of matrix total
(Legendre & Gallagher 2001). When used with the Euclidean
distance, the distances should be similar to the the
Chi-square distance used in correspondence analysis. However, the
results from cmdscale
would still differ, since
CA is a weighted ordination method.
hellinger
: square root of observed values that have
first been divided by row (site) sums (Legendre & Gallagher 2001).
wisconsin
: applies the Wisconsin double
standardization, where columns (species, variables) are first
standardized by maxima and then sites (rows) by site totals.
pcent2prop
: convert percentages to proportions.
prop2pcent
: convert proportions to percentages.
logRatio
: applies a log ransformation (see log
above) to the data, then centres the data by rows (by subtraction of
the mean for row i from the observations in row
i). Using this transformation subsequent to PCA results in
Aitchison's Log Ratio Analysis (LRA), a means of dealing with closed
compositional data such as common in palaeoecology (Aitchison, 1983).
power
: applies a power tranformation.
rowCentre
, rowCenter
: Centres x
by rows
through the subtraction of the corresponding row mean from the
observations in the row.
colCentre
colCenter
: Centres x
by columns
through the subtraction of the corresponding column mean from the
observations in the row.
none
none
: no transformation is applied.
Aitchison, J. (1983) Principal components analysis of compositional data. Biometrika 70(1); 57--65.
Field, J.G., Clarke, K.R., & Warwick, R.M. (1982) A practical strategy for analysing multispecies distributions patterns. Marine Ecology Progress Series 8; 37--52.
Legendre, P. & Gallagher, E.D. (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129; 271-280.
Oksanen, J. (1983) Ordination of boreal heath-like vegetation with principal component analysis, correspondence analysis and multidimensional scaling. Vegetatio 52; 181-189.
decostand
data(swapdiat)
## convert percentages to proportions
sptrans <- tran(swapdiat, "pcent2prop")
## apply Hellinger transformation
spHell <- tran(swapdiat, "hellinger")
## Dummy data to illustrate formula method
d <- data.frame(A = runif(10), B = runif(10), C = runif(10))
## simulate some missings
d[sample(10,3), 1] <- NA
## apply tran using formula
tran(~ . - B, data = d, na.action = na.pass,
method = "missing", na.value = 0)
Run the code above in your browser using DataLab