Computes the basis information for plot functions supporting the interpretation of multivariate outliers in case of compositional data.
mvoutlier.CoDa(x, quan = 0.75, alpha = 0.025,
col.quantile = c(0, 0.05, 0.1, 0.5, 0.9, 0.95, 1),
symb.pch = c(3, 3, 16, 1, 1), symb.cex = c(1.5, 1, 0.5, 1, 1.5),
adaptive = TRUE)
data set (matrix or data frame) containing the raw untransformed compositional data
quantity of data used for robust estimation; between 0.5 and 1
maximum threshold for adaptive outlier detection
quantiles of an average concentration defining the colors
plotting character for symbols
plotting size for symbols
if TRUE then the adaptive method for the outlier threshold is used
the ilr transformed data matrix
TRUE/FALSE vector; TRUE refers to outlier
object from PCA
vector with the colors
vector with the grey scales
vector with plotting symbols
vector with sizes of plot symbols
In a first step, the raw compositional data set in transformed by the isometric logratio (ilr) transformation to the usual Euclidean space. Then adaptive outlier detection is perfomed: Starting from a quantile 1-alpha of the chisquare distribution, one looks for the supremum of the differences between the chisquare distribution and the empirical distribution of the squared Mahalanobis distances. The latter are derived from the MCD estimator using the proportion quan of the data. The supremum is the outlier cutoff, and certain colors and symbols for the outliers are computed: The colors should reflect the magnitude of the median element concentration of the observations, which is done by computing for each observation along the single ilr variables the distances to the medians. The mediab of all distances determines the color (or grey scale): a high value, resulting in a red (or dark) symbol, means that most univariate parts have higher values than the average, and a low value (blue or light symbol) refers to an observation with mainly low values. The symbols are according to the cut-points from the quantiles 0.25, 0.5, 0.75, and the outlier cutoff of the squared Mahalanobis distances.
P. Filzmoser, K. Hron, and C. Reimann. Interpretation of multivariate outliers for compositional data. Submitted to Computers and Geosciences.
# NOT RUN {
data(humus)
d <- humus[,c("As","Cd","Co","Cu","Mg","Pb","Zn")]
res <- mvoutlier.CoDa(d)
str(res)
# }
Run the code above in your browser using DataLab