coverage: Coverage and self-coverage plots.

Description

These functions compute coverages and self-coverages, and produce corresponding plots, for any principal curve object. The former may be used as goodness-of-fit measures, and the latter for for bandwidth selection.

Usage

coverage.raw(X, vec, tau, weights=1, plot.type="p", print=FALSE,
      label=NULL,...)
coverage(X, vec, taumin=0.02, taumax, gridsize=25, weights=1,
      plot.type="o", print=FALSE,...)
lpc.coverage(object, taumin=0.02, taumax, gridsize=25, quick=TRUE,
      plot.type="o", print=FALSE, ...)
lpc.self.coverage(X,  taumin=0.02, taumax=0.5,   gridsize=25, x0=1,
     way = "two", scaled=1,  weights=1, pen=2, depth=1,
     control=lpc.control(boundary=0, cross=FALSE),   quick=TRUE,
     plot.type="o", print=FALSE, ... )
 
ms.self.coverage(X, taumin=0.02, taumax=0.5, gridsize=25,
       thr=0.001, scaled=1, cluster=FALSE, plot.type="o", 
       print=FALSE, ...)
       
select.self.coverage(self,  smin, plot.type="o", plot.segments=NULL)

Value

A list of items, and a plot (unless plot.type=0).

The functions lpc.self.coverage and ms.self.coverage produce an object of class

self. The component $select recommends suitable bandwidths for the use in lpc, in the order of strength of evidence. These correspond to points of strong negative curvature (implemented via second differences) of the self-coverage curve.

Arguments

X: a $N \times d$ data matrix.
object: An object of type lpc, lpc.spline or ms.
vec: A matrix with $d$ columns. The rows contain the points which make up the fitted object.
tau: tube size.
taumin: Minimal tube size.
taumax: Maximal tube size.
weights: An optional vector of weights. If weights are specified, then the coverage is the weighted mean of the indicator functions for falling within the tube. The function lpc.coverage does not have a weights argument, as it extracts the weights from the $weights component of the fitted object.
label: Experimental option; don't use.
gridsize: The number of different tube sizes to consider.
quick: If TRUE, an approximate coverage curve is provided by computing distances between data points and the curve through the closest local centers or mass; whereas with FALSE we use the distances of the points when projected orthogonally onto the spline representation of the local principal curve. The latter takes considerably more computing time. The resulting coverage curves are generally very similar, but the quick version may deliver little spurious peaks occasionally.
thr: adjacent mean shift clusters are merged if their relative distance falls below this threshold.
cluster: if TRUE, distances are always measured to the cluster to which an observation is assigned, rather than to the nearest cluster.
self: An object of class self, or a matrix with two columns providing a self-coverage curve.
smin: Minimum coverage for bandwidth selection. Default: 1/3 for clustering, 2/3 for principal curves.
plot.type: If set to 0, no plotted output is given. Otherwise, an appropriate plot is provided, using the plotting type as specified.
plot.segments: A list with default list(lty=c(1,2,3), lwd=c(2,1,1),lcol=c(3,3,3)) which specifies how (and how many) bandwidth candidates, in order of decreasing negative second derivative of self-coverage, are to be highlighted.
print: If TRUE, coverage values are printed on the screen as soon as computed. This is quite helpful especially if gridsize is large.
x0, way, scaled, pen, depth, control: Auxiliary parameters as outlined in lpc, lpc.control, and ms.
...: Optional graphical parameters passed to the corresponding plotting functions.

Author

J. Einbeck

Details

The function coverage.raw computes the coverage, i.e. the proportion of data points lying inside a circle or band with radius $\tau$, for a fixed value tau. The whole coverage curve $C(\tau)$ is constructed through function coverage.

Functions coverage.raw and coverage can be used for any object fitted by an unsupervised learning technique (for instance, HS principal curves, or even clustering algorithms), while the functions prefixing with lpc. and ms. can only be used for the corresponding objects. The functions lpc.coverage and ms.coverage are wrappers around coverage which operate directly a fitted object, rather than a data matrix.

Function select.self.coverage extracts suitable bandwidths from the self-coverage curve, and produces a plot. The function is called from within lpc.self.coverage or ms.self.coverage but can also be called directly by the user (for instance, if the graphical output is to be reproduced, or if the minimum coverage smin is to be modified). The component $select contains the selected candidate bandwidths, in the order of strength of evidence provided by the self-coverage criterion (the best bandwidth comes first, etc.). A plot is produced as a by-product, which symbolizes the best bandwidth by a thick solid line, the second-best by a dashed line, and the third-best by a dotted line. It is recommended to run the self-coverage functions with fixed starting points, as in the examples below, and to scale by the range only.

See Einbeck (2011) for details. Note that the original publication by Einbeck, Tutz, and Evers (2005) uses `quick' coverage curves.

References

Einbeck, J., Tutz, G., & Evers, L. (2005). Local principal curves. Statistics and Computing 15, 301-313.

Einbeck, J. (2011). Bandwidth selection for mean-shift based unsupervised learning techniques: a unified approach via self-coverage. Journal of Pattern Recognition Research 6, 175-192.

Examples

Run this code


data(faithful)
mfit <- ms(faithful)
coverage(mfit$data, mfit$cluster.center, gridsize=16)

# \donttest{
f.self <- ms.self.coverage(faithful,gridsize= 50, taumin=0.1, taumax=0.5, plot.type="o")   
h <- select.self.coverage(f.self)$select
mfit2 <- ms(faithful,h=h[2]) # using `second-best' suggested bandwidth 
# }

# \donttest{
data(gvessel)
g.self <-lpc.self.coverage(gvessel[,c(2,4,5)], x0=c(35, 1870, 6.3), print=FALSE, plot.type=0)
h <- select.self.coverage(g.self)$select
g.lfit <- lpc(gvessel[,c(2,4,5)], h=h[1],  x0=c(35, 1870, 6.3))
lpc.coverage(g.lfit, gridsize=10, print=FALSE)
# }

Run the code above in your browser using DataLab