A class providing the means to analyse compositions in the philosophical framework of the Aitchison Simplex.
acomp(X,parts=1:NCOL(oneOrDataset(X)),total=1,warn.na=FALSE,
detectionlimit=NULL,BDL=NULL,MAR=NULL,MNAR=NULL,SZ=NULL)
a vector of class "acomp"
representing one closed composition
or a matrix of class "acomp"
representing
multiple closed compositions each in one row.
composition or dataset of compositions
vector containing the indices xor names of the columns to be used
the total amount to be used, typically 1 or 100
should the user be warned in case of NA,NaN or 0 coding different types of missing values?
a number, vector or matrix of positive numbers giving the detection limit of all values, all columns or each value, respectively
the code for 'Below Detection Limit' in X
the code for 'Structural Zero' in X
the code for 'Missing At Random' in X
the code for 'Missing Not At Random' in X
The policy of treatment of zeroes, missing values and values below detecion limit is explained in depth in compositions.missing.
K.Gerald v.d. Boogaart http://www.stat.boogaart.de, Raimon Tolosana-Delgado
Many multivariate datasets essentially describe amounts of D different
parts in a whole. This has some important implications justifying to
regard them as a scale for its own, called a
composition. This scale was in-depth analysed by Aitchison
(1986) and the functions around the class "acomp"
follow his
approach.
Compositions have some important properties: Amounts are always
positive. The amount of every part is limited to the whole. The
absolute amount of the whole is noninformative since it is typically due
to artifacts on the measurement procedure. Thus only relative changes
are relevant. If the relative amount of one part
increases, the amounts of other parts must decrease, introducing
spurious anticorrelation (Chayes 1960), when analysed directly. Often
parts (e.g H2O, Si) are missing in the dataset leaving the total
amount unreported and longing for analysis procedures avoiding
spurious effects when applied to such subcompositions. Furthermore,
the result of an analysis should be indepent of the units (ppm, g/l, vol.%, mass.%, molar
fraction) of the dataset.
From these properties Aitchison showed that the
analysis should be based on ratios or log-ratios only. He introduced
several transformations (e.g. clr
,alr
),
operations (e.g. perturbe
, power.acomp
),
and a distance (dist
) which are compatible
with these
properties. Later it was found that the set of compostions equipped with
perturbation as addition and power-transform as scalar multiplication
and the dist
as distance form a D-1 dimensional
euclidean vector space (Billheimer, Fagan and Guttorp, 2001), which
can be mapped isometrically to a usual real vector space by ilr
(Pawlowsky-Glahn and Egozcue, 2001).
The general approach in analysing acomp objects is thus to perform
classical multivariate analysis on clr/alr/ilr-transformed coordinates
and to backtransform or display the results in such a way that they
can be interpreted in terms of the original compositional parts.
A side effect of the procedure is to force the compositions to sum up to a
total, which is done by the closure operation clo
.
Aitchison, J. (1986) The Statistical Analysis of Compositional
Data Monographs on Statistics and Applied Probability. Chapman &
Hall Ltd., London (UK). 416p.
Aitchison, J, C. Barcel'o-Vidal, J.J. Egozcue, V. Pawlowsky-Glahn
(2002) A consise guide to the algebraic geometric structure of the
simplex, the sample space for compositional data analysis, Terra
Nostra, Schriften der Alfred Wegener-Stiftung, 03/2003
Billheimer, D., P. Guttorp, W.F. and Fagan (2001) Statistical interpretation of species composition,
Journal of the American Statistical Association, 96 (456), 1205-1214
Chayes, F. (1960). On correlation between variables of constant sum. Journal of Geophysical Research 65~(12), 4185--4193.
Pawlowsky-Glahn, V. and J.J. Egozcue (2001) Geometric approach to
statistical analysis on the simplex. SERRA 15(5), 384-398
Pawlowsky-Glahn, V. (2003) Statistical modelling on coordinates. In:
Thi\'o-Henestrosa, S. and Mart\'in-Fern\'andez, J.A. (Eds.)
Proceedings of the 1st International Workshop on Compositional Data Analysis,
Universitat de Girona, ISBN 84-8458-111-X, http://ima.udg.es/Activitats/CoDaWork03/
Mateu-Figueras, G. and Barcel\'o-Vidal, C. (Eds.)
Proceedings of the 2nd International Workshop on Compositional Data Analysis,
Universitat de Girona, ISBN 84-8458-222-1, http://ima.udg.es/Activitats/CoDaWork05/
van den Boogaart, K.G. and R. Tolosana-Delgado (2008) "compositions": a unified R package to analyze Compositional Data, Computers & Geosciences, 34 (4), pages 320-338, tools:::Rd_expr_doi("10.1016/j.cageo.2006.11.017").
clr
,rcomp
, aplus
,
princomp.acomp
,
plot.acomp
, boxplot.acomp
,
barplot.acomp
, mean.acomp
,
var.acomp
, variation.acomp
,
cov.acomp
, msd
data(SimulatedAmounts)
plot(acomp(sa.lognormals))
Run the code above in your browser using DataLab