Constructs a covariance matrix and associated location using a variety
of (possibly robust) estimators. The returned object is suitable for use
by plot.d.ellipse
.
cov.dellipse(x, y = NULL, cov.method = c("spearman", "kendall", "pearson",
"MCD", "OGK", "GK", "gk", "rgk", "mcd", "mve"),
scalefn = NULL, locfn = NULL, cov.control = list())
An R numeric
object. Can be a vector (in which case y
must be specified
and of the same length) or a two-column numeric matrix.
A numeric vector of the same length as x
. It is an error to provide y
in addition to a two-column matrix for x
.
A character value specifying the covariance method used.
A function that computes univariate scale and (optionally) location estimates from a
numeric vector.
If provided, scalefn()
should return a single numeric value containing a scale
(standard deviation) estimate. For many covariance methods this can be a simple
scale estimator. For cov.method "GK", scalefn must accept
an additional argument mu.too
. When mu.too is true, scalefn()
should
return a numeric vector of length 2 containing location and scale estimates. See
scaleTau2
, s_Qn
,s_mad
, or s_IQR
for examples to be used as scalefn
argument.
A function that computes univariate location estimates from a numeric vector.
If used, locfn()
should return a single numeric value containing a location
(mean) estimate.
A named list of arguments passed to the covariance calculation used. Note that this can
override scalefn
and locfn
; see below for details.
An object of class cov.dellipse
, which is a list with (at least) components
Character string describing method; identical to cov.method
2x2 covariance matrix
2x2 correlation matrix
vector (length 2) specifying centre of ellipse
vector, length 2, specifying scale estimates for each variable
number of points (rows) used in the covariance estimate
This list is intended to be consistent with that returned by cov.wt
.
cov.dellipse
is a wrapper for a range of covariance estimation methods found in
various packages. Its operation and defaults depend on the particular covariance
estimator specified by cov.method
. Details for each are as follows.
spearman
, kendall
By default, the median and mad are used as location and scale respectively,
and the covariance is calculated from the product of scale estimates and the
Spearman rank correlation or Kendall's tau respectively.
If either scalefn
or locfn
is supplied, scalefn
is used for scale estimation and
locfn
for location. For both spearman
and kendall
, scalefn
is
only used as a scale estimator and need not take a mu.too
argument.
pearson
By default, the mean and sd are used as location and scale respectively,
and the covariance is calculated from the product of scale estimates and the
Pearson correlation.
If either scalefn
or locfn
is supplied, scalefn
is used for scale
estimation and locfn
for location, making it possible (if not very sensible) to
use a combination of robust scale or location functions with the Pearson correlation coefficient.
For this case, scalefn
is only used as a scale estimator and need
not take a mu.too
argument.
MCD
, mcd
Both compute the Minimum Covariance Determinant (MCD) estimator, a robust multivariate
location and scale estimate with a high breakdown point, via the 'Fast MCD' or 'Deterministic MCD'
("DetMcd") algorithm. "MCD"
uses the implementation covMcd
in the robustbase package;
"mcd"
uses cov.mcd
in the MASS package.
Neither require or use scalefn
or locfn
.
Note that these MCD implementations differ appreciably for small samples (at least to n=60). MCD
includes consistency and finite sample correction whereas mcd
apparently does not apply a finite
sample correction. As a result, the mcd
scales can be considerably smaller for modest
data set sizes.
OGK
Computes the orthogonalized pairwise covariance matrix estimate described by Maronna and Zamar (2002),
as implemented by the covOGK
in the robustbase package.
By default, scale and location use scaleTau2
from robustbase. Alternatives
can be specified either by providing both scalefn
and locfn
or by including
an argument sigmamu
in cov.control
, which is passed to covOGK
. See
covOGK
for a description of sigmamu
.
If sigmamu
is not present in cov.control
and both scalefn
and locfn
are provided, scale and location are constructed from scalefn
and locfn
. If only one
of these is provided, a warning is issued and ]{scaleTau2}
is used.
GK
Computes a simple pairwise covariance estimate suggested by Gnanadesikan and Kettenring (1972),
as implemented by the covGK
in the robustbase package.
By default, scale and location use scaleTau2
from robustbase. Alternatives
can be specified either by providing scalefn
and locfn
or by including
an argument scalefn
in cov.control
, which is passed to covGK
. See
covGK
for a description of scalefn
.
If scalefn
is not present in cov.control
, scale and location are constructed from scalefn
and locfn
. If locfn
is omitted, scalefn
is used if it takes an argument mu.too
and the median is used otherwise.
gk
As GK
, except that the variables are scaled to unit (robust) sd (using scalefn
) before
calculating the covariance (which is then rescaled). This can prevent large scale differences from
masking outliers in a variable with small scale.
rgk
Implements Gnanadesikan and Kettenring's second covariance estimate
based on scaled variables \((Z_1, Z_2)\) and a robust correlation \(\rho^*\)
calculated as
$$\rho^*=(\hat{\sigma}_{+}^{*2} - \hat{\sigma}_{-}^{*2})/(\hat{\sigma}_{+}^{*2} - \hat{\sigma}_{-}^{*2})$$
where \(\hat{\sigma}_{+}^{*2}\) and \(\hat{\sigma}_{-}^{*2}\) are robust variances of
\((Z_1+Z_2)\) and \((Z_1-Z_2)\) respectively, calculated using scalefn
.
The advantage over "gk"
and "GK"
is that the correlation
coefficient is guaranteed to be in \([-1,1]\), making for a positive definite covariance matrix. Scaling also
helps prevent large scale differences from masking outliers in a variable with small scale.
mve
Uses uses cov.mve
in the MASS package, which is based on the location and covariance matrix for
a minimum volume ellipsoid. The method neither requires nor uses scalefn
or locfn
.
Maronna, R.A. and Zamar, R.H. (2002) Robust estimates of location and dispersion of high-dimensional datasets; Technometrics 44(4), 307-317.
Gnanadesikan, R. and John R. Kettenring (1972) Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28, 81-124
# NOT RUN {
data(potassium)
cov.dellipse(potassium) #Defaults to Spearman rank correlation
#With different method
cov.dellipse(potassium, cov.method="OGK")
#Same as above but specifying control parameters
library(robustbase) #For scaleTau2
cov.dellipse(potassium, cov.method="OGK", cov.control=list(sigmamu=scaleTau2))
#With individually specified (mad) scale
cov.dellipse(potassium, cov.method="GK", scalefn=mad)
#Defaults to median for location because mad()
#does not accept a mu.too argument
cov.dellipse(potassium, cov.method="GK", scalefn=scaleTau2)
#Defaults to specified scalefn for location because scaleTau2
#accepts mu.too=TRUE
# }
Run the code above in your browser using DataLab