Transforms multivariate data X
using the wrapping function with b = 1.5
and c = 4
. By default, it starts by calling checkDataSet
to clean the data and estLocScale
to estimate the location and scale of the variables in the cleaned data, yielding the vectors \((\hat{\mu}_1,\ldots,\hat{\mu}_d)\) and \((\hat{\sigma}_1,\ldots,\hat{\sigma}_d)\) where \(d\) is the number of variables. Alternatively, the user can specify such vectors in the arguments locX
and scaleX
. In either case, the data cell \(x_{ij}\) containing variable \(j\) of case \(i\) is transformed to $$y_{ij} = \hat{\mu}_j - b_j + \hat{\sigma}_j*\psi((x_{ij} - \hat{\mu}_j)/\hat{\sigma}_j)/a_j$$ in which \(a_j\) and \(b_j\) are such that for any fixed \(j\) the average of \(y_{ij}\) equals \(\hat{\mu}_j\) and the standard deviation of \(y_{ij}\) equals \(\hat{\sigma}_j\).
wrap(X, locX = NULL, scaleX = NULL, precScale = 1e-12,
imputeNA = TRUE, checkPars = list())
A list with components:
Xw
The wrapped data.
colInWrap
The column numbers of the variables which were wrapped. Variables which were filtered out by checkDataSet
(because of a (near) zero scale for example), will not appear in this output.
loc
The location estimates for all variables used for wrapping.
scale
The scale estimates for all variables used for wrapping.
the input data. It must be an \(n\) by \(d\) matrix or a data frame.
The location estimates of the columns of the input data X
. Must be a vector of length \(d\).
The scale estimates of the columns of the input data X
. Must be a vector of length \(d\).
The precision scale used throughout the algorithm. Defaults to \(1e-12\)
Whether or not to impute the NA
s with the location estimate
of the corresponding variable. Defaults to TRUE
.
Optional list of parameters used in the call to
checkDataSet
. The options are:
coreOnly
If TRUE
, skip the execution of checkDataset. Defaults to FALSE
numDiscrete
A column that takes on numDiscrete or fewer values
will be considered discrete and not retained in the cleaned data.
Defaults to \(5\).
precScale
Only consider columns whose scale is larger than precScale.
Here scale is measured by the median absolute deviation.
Defaults to \(1e-12\).
silent
Whether or not the function progress messages should be printed.
Defaults to FALSE
.
Raymaekers, J. and Rousseeuw P.J.
Raymaekers, J., Rousseeuw P.J. (2019). Fast robust correlation for high dimensional data. Technometrics, 63(2), 184-198. (link to open access pdf)
estLocScale
library(MASS)
set.seed(12345)
n <- 100; d <- 10
X <- mvrnorm(n, rep(0, 10), diag(10))
locScale <- estLocScale(X)
Xw <- wrap(X, locScale$loc, locScale$scale)$Xw
# For more examples, we refer to the vignette:
if (FALSE) {
vignette("wrap_examples")
}
Run the code above in your browser using DataLab