Rc: Goodness-of-fit for principal objects

Description

These functions compute the goodness-of-fit criterion $R_C$ proposed in Einbeck, Tutz, and Evers (2005), and extended beyond the scope of principal curves in Einbeck (2011).

Usage

Rc(data,  closest.coords, type="curve")
lpc.Rc(object)

Arguments

data

A data matrix.

object

An object of class lpc or lpc.spline.

closest.coords

A matrix of coordinates of the projected data.

type

For principal curves, don't modify. For principal points, set "points".

Details

Rc computes the value $R_C$, a quantity which estimates the goodness-of-fit of a fitted principal curve. This quantity can be interpreted similar to the coeffient of determination in regression analysis: Values close to 1 indicate a good fit, while values close to 0 indicate a `bad' fit (corresponding to linear PCA).

In principle, function Rc can be used for assessing goodness-of-fit of any principal curve algorithm provided that the coordinates (closest.coords) of the projected data are available (For instance, for HS principal curves fitted via princurve, this information is contained in component $s, and for LPCs this information is given in component $closest.coords of the spline representation). It can also be used for assessing the goodness-of-fit of algorithms which find "principal points" (such as iterative mean shift, or k-means); set type="points" in this case (see also help file to ms).

lpc.Rc is a wrappper around Rc, which takes an object of type lpc or lpc.spline. This function computes all missing information, so computation will take the longer the less informative the given object is.

If the data were scaled, then the scaled data and results should also be used as arguments in Rc. The function lpc.Rc looks up the option scaled in the fitted object, and cares for this automatically. Important: If the data were scaled, then do NOT unscale the results by hand in order to feed the unscaled version into Rc, this will give a wrong result.

In terms of methodology, these functions compute $R_c$ directly through the mean reduction of absolute residual length, rather than through the area above the coverage curve.

These functions do currently not account for observation weights, i.e. $R_C$ is computed through the unweighted mean reduction in absolute residual length (even if weights have been used for the curve fitting).

References

Einbeck, Tutz, and Evers (2005). Local principal curves. Statistics and Computing 15, 301-313.

Einbeck (2011). Bandwidth selection for nonparametric unsupervised learning techniques -- a unified approach via self-coverage. Journal of Pattern Recognition Research, to appear.

Examples

Run this code

data(calspeedflow)
lpc1 <- lpc(calspeedflow[,3:4])
lpc.Rc(lpc1)