coverage: Coverage and self-coverage plots.

Description

These functions compute coverages (for any principal object), and self-coverages (only for local principal curves, these may be used for bandwidth selection).

Usage

coverage.raw(X, vec, tau, weights=1, plot.type="p", print=FALSE,
      label=NULL,...)
coverage(X, vec, taumin=0.02, taumax, gridsize=25, weights=1,
      plot.type="o", print=FALSE,...)
lpc.coverage(object, taumin=0.02, taumax, gridsize=25, quick=TRUE,
      plot.type="o", print=FALSE, ...)
lpc.self.coverage(X,  taumin=0.02, taumax=0.5,   gridsize=25, x0, mult=1, 
     way = "two", scaled=TRUE,  weights=1, pen=2, depth=1,
     control=lpc.control(boundary=0, cross=FALSE),   quick=TRUE,
     plot.type="o", print=FALSE, ... )
select.self.coverage(self,  smin, plot.type="o", plot.segments=NULL)

Arguments

a $N \times d$ data matrix.

object

An object of type lpc or lpc.spline.

vec

A matrix with $d$ columns. The rows contain the points which make up the fitted object.

tau

tube size.

taumin

Minimal tube size.

taumax

Maximal tube size.

weights

An optional vector of weights. If weights are specified, then the coverage is the weighted mean of the indicator functions for falling within the tube. The function lpc.coverage does not have a weights argument, a

label

Experimental option; don't use.

gridsize

The number of different tube sizes to consider.

quick

If TRUE, an approximate coverage curve is provided by computing distances between data points and the curve through the closest local centers or mass; whereas with FALSE we use the distances of the points when projected orthogonally onto the splin

self

An object of class self, or a matrix with two colums providing a self-coverage curve.

smin

Minimum coverage for bandwidth selection. Default: 1/3 for clustering, 2/3 for principal curves.

plot.type

If set to 0, no plotted output is given. Otherwise, an appropriate plot is provided, using the plotting type as specified.

plot.segments

A list with default

list(lty=c(1,2,3),
    lwd=c(2,1,1),lcol=c(3,3,3))

which specifies how (and how many) bandwidth candidates, in order of decreasing negative second derivative of self-coverage, are to be highlighted.

If TRUE, coverage values are printed on the screen as soon as computed. This is quite helpful especially if gridsize is large.

x0, way, scaled, pen, mult, depth, control

LPC parameters as outlined in lpc and lpc.control.

...

Optional graphical parameters passed to the corresponding plotting functions.

Value

A list of items, and a plot (unless plot.type=0).
For function lpc.self.coverage, the item $select recommends suitable bandwidths for the use in lpc. These correspond to points of strong negative curvature (implemented via second differences) of the self-coverage curve.

Details

The function coverage.raw computes the coverage, i.e. the proportion of data points lying inside a circle or band with radius $\tau$, for a fixed value tau. The whole coverage curve $C(\tau)$ is constructed through function coverage. Functions coverage.raw and coverage can be used for any object fitted by an unsupervised learning technique (for instance, HS principal curves, or even clustering algorithms), while the functions prefixing with lpc. can only be used for local principal curves. The function lpc.coverage is a wrapper around coverage which takes directly a fitted lpc object, rather than a data matrix.

Function select.self.coverage is called by lpc.self.coverage. It extracts suitable bandwidths from the self-coverage curve, and produces a plot. The function can also be called directly by the user (for instance, if the graphical output is to be reproduced, or if the minimum coverage smin is to be modified). The component $select contains the selected candidate bandwidths, in the order of strength of evidence provided by the self-coverage criterion (the best bandwidth comes first, etc.). A plot is produced as a by-product, which symbolizes the best bandwidth by a thick solid line, the second-best by a dashed line, and the third-best by a dotted line. See Einbeck (2011) for details.

Note that the original publication by Einbeck, Tutz, and Evers (2005) uses `quick' coverage curves.

References

Einbeck, J., Tutz, G., & Evers, L. (2005). Local principal curves. Statistics and Computing 15, 301-313.

Einbeck, J. (2011). Bandwidth selection for mean-shift based unsupervised learning techniques: a unified approach via self-coverage. Journal of Pattern Recognition Research, to appear.

Examples

Run this code

data(gvessel)
gvessel.self <-lpc.self.coverage(gvessel[,c(2,4,5)], x0=c(35, 1870,
6.3), print=FALSE, plot.type=0)
h <- select.self.coverage(gvessel.self)$select
gvessel.lpc <- lpc(gvessel[,c(2,4,5)], h=h[1],  x0=c(35, 1870, 6.3))
lpc.coverage(gvessel.lpc, gridsize=10, print=FALSE)

data(calspeedflow)
fitms <- ms(calspeedflow[,3:4], h=0.1)
coverage(fitms$data, fitms$cluster.center)

Run the code above in your browser using DataLab