cov.dellipse: Constructs a covariance and location object for use in plotting data ellipses.

Description

Constructs a covariance matrix and associated location using a variety of (possibly robust) estimators. The returned object is suitable for use by plot.d.ellipse.

Usage

cov.dellipse(x, y = NULL, cov.method = c("spearman", "kendall", "pearson", 
                           "MCD", "OGK", "GK", "gk", "rgk", "mcd", "mve"), 
                           scalefn = NULL, locfn = NULL, cov.control = list())

Value

An object of class cov.dellipse, which is a list with (at least) components

method: Character string describing method; identical to cov.method
cov: 2x2 covariance matrix
cor: 2x2 correlation matrix
center: vector (length 2) specifying centre of ellipse
scale: vector, length 2, specifying scale estimates for each variable
n.obs: number of points (rows) used in the covariance estimate

This list is intended to be consistent with that returned by cov.wt.

Arguments

x: An R numeric object. Can be a vector (in which case y must be specified and of the same length) or a two-column numeric matrix.
y: A numeric vector of the same length as x. It is an error to provide y in addition to a two-column matrix for x.
cov.method: A character value specifying the covariance method used.
scalefn: A function that computes univariate scale and (optionally) location estimates from a numeric vector. If provided, scalefn() should return a single numeric value containing a scale (standard deviation) estimate. For many covariance methods this can be a simple scale estimator. For cov.method "GK", scalefn must accept an additional argument mu.too. When mu.too is true, scalefn() should return a numeric vector of length 2 containing location and scale estimates. See scaleTau2, s_Qn,s_mad, or s_IQR for examples to be used as scalefn argument.
locfn: A function that computes univariate location estimates from a numeric vector. If used, locfn() should return a single numeric value containing a location (mean) estimate.
cov.control: A named list of arguments passed to the covariance calculation used. Note that this can override scalefn and locfn; see below for details.

Author

Stephen L R Ellison

Details

cov.dellipse is a wrapper for a range of covariance estimation methods found in various packages. Its operation and defaults depend on the particular covariance estimator specified by cov.method. Details for each are as follows.

spearman, kendall: By default, the median and mad are used as location and scale respectively, and the covariance is calculated from the product of scale estimates and the Spearman rank correlation or Kendall's tau respectively. If either scalefn or locfnis supplied, scalefn is used for scale estimation and locfn for location. For both spearman and kendall, scalefn is only used as a scale estimator and need not take a mu.too argument.
pearson: By default, the mean and sd are used as location and scale respectively, and the covariance is calculated from the product of scale estimates and the Pearson correlation. If either scalefn or locfnis supplied, scalefn is used for scale estimation and locfn for location, making it possible (if not very sensible) to use a combination of robust scale or location functions with the Pearson correlation coefficient. For this case, scalefn is only used as a scale estimator and need not take a mu.too argument.
MCD, mcd: Both compute the Minimum Covariance Determinant (MCD) estimator, a robust multivariate location and scale estimate with a high breakdown point, via the 'Fast MCD' or 'Deterministic MCD' ("DetMcd") algorithm. "MCD" uses the implementation covMcd in the robustbase package; "mcd" uses cov.mcd in the MASS package. Neither require or use scalefn or locfn. Note that these MCD implementations differ appreciably for small samples (at least to n=60). MCD includes consistency and finite sample correction whereas mcd apparently does not apply a finite sample correction. As a result, the mcd scales can be considerably smaller for modest data set sizes.
OGK: Computes the orthogonalized pairwise covariance matrix estimate described by Maronna and Zamar (2002), as implemented by the covOGK in the robustbase package. By default, scale and location use scaleTau2 from robustbase. Alternatives can be specified either by providing both scalefn and locfn or by including an argument sigmamu in cov.control, which is passed to covOGK. See covOGK for a description of sigmamu. If sigmamu is not present in cov.control and both scalefn and locfn are provided, scale and location are constructed from scalefn and locfn. If only one of these is provided, a warning is issued and ]{scaleTau2} is used.
GK: Computes a simple pairwise covariance estimate suggested by Gnanadesikan and Kettenring (1972), as implemented by the covGK in the robustbase package. By default, scale and location use scaleTau2 from robustbase. Alternatives can be specified either by providing scalefn and locfn or by including an argument scalefn in cov.control, which is passed to covGK. See covGK for a description of scalefn. If scalefn is not present in cov.control, scale and location are constructed from scalefn and locfn. If locfn is omitted, scalefn is used if it takes an argument mu.too and the median is used otherwise.
gk: As GK, except that the variables are scaled to unit (robust) sd (using scalefn) before calculating the covariance (which is then rescaled). This can prevent large scale differences from masking outliers in a variable with small scale.
rgk: Implements Gnanadesikan and Kettenring's second covariance estimate based on scaled variables $(Z_1, Z_2)$ and a robust correlation $\rho^*$ calculated as $$\rho^*=(\hat{\sigma}_{+}^{*2} - \hat{\sigma}_{-}^{*2})/(\hat{\sigma}_{+}^{*2} - \hat{\sigma}_{-}^{*2})$$ where $\hat{\sigma}_{+}^{*2}$ and $\hat{\sigma}_{-}^{*2}$ are robust variances of $(Z_1+Z_2)$ and $(Z_1-Z_2)$ respectively, calculated using scalefn. The advantage over "gk" and "GK" is that the correlation coefficient is guaranteed to be in $[-1,1]$, making for a positive definite covariance matrix. Scaling also helps prevent large scale differences from masking outliers in a variable with small scale.
mve: Uses uses cov.mve in the MASS package, which is based on the location and covariance matrix for a minimum volume ellipsoid. The method neither requires nor uses scalefn or locfn.

References

Maronna, R.A. and Zamar, R.H. (2002) Robust estimates of location and dispersion of high-dimensional datasets; Technometrics 44(4), 307-317.

Gnanadesikan, R. and John R. Kettenring (1972) Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28, 81-124

Examples

Run this code

data(potassium)
cov.dellipse(potassium) #Defaults to Spearman rank correlation

#With different method
cov.dellipse(potassium, cov.method="OGK") 

#Same as above but specifying control parameters
library(robustbase) #For scaleTau2
cov.dellipse(potassium, cov.method="OGK", cov.control=list(sigmamu=scaleTau2)) 
	
#With individually specified (mad) scale
cov.dellipse(potassium, cov.method="GK", scalefn=mad) 
	#Defaults to median for location because mad()
	#does not accept a mu.too argument

cov.dellipse(potassium, cov.method="GK", scalefn=scaleTau2) 
	#Defaults to specified scalefn for location because scaleTau2 
	#accepts mu.too=TRUE

Run the code above in your browser using DataLab