These functions compute coverages and self-coverages, and produce corresponding plots, for any principal curve object. The former may be used as goodness-of-fit measures, and the latter for for bandwidth selection.
coverage.raw(X, vec, tau, weights=1, plot.type="p", print=FALSE,
label=NULL,...)coverage(X, vec, taumin=0.02, taumax, gridsize=25, weights=1,
plot.type="o", print=FALSE,...)
lpc.coverage(object, taumin=0.02, taumax, gridsize=25, quick=TRUE,
plot.type="o", print=FALSE, ...)
lpc.self.coverage(X, taumin=0.02, taumax=0.5, gridsize=25, x0=1,
way = "two", scaled=1, weights=1, pen=2, depth=1,
control=lpc.control(boundary=0, cross=FALSE), quick=TRUE,
plot.type="o", print=FALSE, ... )
ms.self.coverage(X, taumin=0.02, taumax=0.5, gridsize=25,
thr=0.001, scaled=1, cluster=FALSE, plot.type="o",
print=FALSE, ...)
select.self.coverage(self, smin, plot.type="o", plot.segments=NULL)
A list of items, and a plot (unless plot.type=0
).
The functions lpc.self.coverage
and ms.self.coverage
produce an object of class
self
. The component $select
recommends suitable
bandwidths for the use in lpc
, in the order of strength of
evidence. These correspond to points of strong negative curvature (implemented via second
differences) of the self-coverage curve.
a \(N \times d\) data matrix.
An object of type lpc
, lpc.spline
or ms
.
A matrix with \(d\) columns. The rows contain the points which make up the fitted object.
tube size.
Minimal tube size.
Maximal tube size.
An optional vector of weights. If weights are specified,
then the coverage is the weighted mean of the indicator functions
for falling within the tube. The function lpc.coverage
does not have a
weights
argument, as it extracts the weights from the
$weights
component of the fitted object
.
Experimental option; don't use.
The number of different tube sizes to consider.
If TRUE, an approximate coverage curve is provided by computing distances between data points and the curve through the closest local centers or mass; whereas with FALSE we use the distances of the points when projected orthogonally onto the spline representation of the local principal curve. The latter takes considerably more computing time. The resulting coverage curves are generally very similar, but the quick version may deliver little spurious peaks occasionally.
adjacent mean shift clusters are merged if their relative distance falls below this threshold.
if TRUE
, distances are always measured to the
cluster to which an observation is assigned, rather than to the
nearest cluster.
An object of class self
, or a matrix with two columns
providing a self-coverage curve.
Minimum coverage for bandwidth selection. Default: 1/3 for clustering, 2/3 for principal curves.
If set to 0, no plotted output is given. Otherwise, an appropriate plot is provided, using the plotting type as specified.
A list with default list(lty=c(1,2,3),
lwd=c(2,1,1),lcol=c(3,3,3))
which specifies how (and how many)
bandwidth candidates, in order of decreasing negative second derivative of
self-coverage, are to be highlighted.
If TRUE, coverage values are printed on the screen as soon as
computed. This is quite helpful especially if gridsize
is large.
Auxiliary parameters as outlined in
lpc
, lpc.control
, and ms
.
Optional graphical parameters passed to the corresponding plotting functions.
J. Einbeck
The function coverage.raw
computes the coverage, i.e. the
proportion of data points lying inside a circle or band with radius
\(\tau\), for a fixed value tau
. The whole coverage curve
\(C(\tau)\) is constructed through function coverage
.
Functions coverage.raw
and coverage
can be used for any
object fitted by an unsupervised learning technique (for instance, HS principal curves, or even clustering
algorithms), while the functions prefixing with lpc.
and ms.
can only be
used for the corresponding objects. The functions lpc.coverage
and ms.coverage
are wrappers around
coverage
which operate directly a fitted object, rather
than a data matrix.
Function select.self.coverage
extracts suitable bandwidths from the
self-coverage curve, and produces a plot. The function is called from
within lpc.self.coverage
or ms.self.coverage
but can also be called directly by the user (for instance, if the graphical output is to be reproduced, or if
the minimum coverage smin
is to be modified). The component
$select
contains the selected candidate bandwidths, in the order
of strength of evidence provided by the self-coverage criterion (the
best bandwidth comes first, etc.). A plot is produced as a by-product,
which symbolizes the best bandwidth by a thick solid line, the
second-best by a dashed line, and the third-best by a dotted line. It is
recommended to run the self-coverage functions with fixed starting
points, as in the examples below, and to scale by the range only.
See Einbeck (2011) for details. Note that the original publication by Einbeck, Tutz, and Evers (2005) uses `quick' coverage curves.
Einbeck, J., Tutz, G., & Evers, L. (2005). Local principal curves. Statistics and Computing 15, 301-313.
Einbeck, J. (2011). Bandwidth selection for mean-shift based unsupervised learning techniques: a unified approach via self-coverage. Journal of Pattern Recognition Research 6, 175-192.
lpc
, ms
data(faithful)
mfit <- ms(faithful)
coverage(mfit$data, mfit$cluster.center, gridsize=16)
# \donttest{
f.self <- ms.self.coverage(faithful,gridsize= 50, taumin=0.1, taumax=0.5, plot.type="o")
h <- select.self.coverage(f.self)$select
mfit2 <- ms(faithful,h=h[2]) # using `second-best' suggested bandwidth
# }
# \donttest{
data(gvessel)
g.self <-lpc.self.coverage(gvessel[,c(2,4,5)], x0=c(35, 1870, 6.3), print=FALSE, plot.type=0)
h <- select.self.coverage(g.self)$select
g.lfit <- lpc(gvessel[,c(2,4,5)], h=h[1], x0=c(35, 1870, 6.3))
lpc.coverage(g.lfit, gridsize=10, print=FALSE)
# }
Run the code above in your browser using DataLab