Learn R Programming

analogue (version 0.17-7)

prcurve: Fits a principal curve to m-dimensional data

Description

A principal curve is a non-parametric generalisation of the principal component and is a curve that passes through the middle of a cloud of data points for a certain definition of `middle'.

Usage

prcurve(X, method = c("ca", "pca", "random", "user"), start = NULL,
        smoother = smoothSpline, complexity, vary = FALSE,
        maxComp, finalCV = FALSE, axis = 1, rank = FALSE,
        stretch = 2, maxit = 10, trace = FALSE, thresh = 0.001,
        plotit = FALSE, ...)

initCurve(X, method = c("ca", "pca", "random", "user"), rank = FALSE, axis = 1, start)

Value

An object of class "prcurve" with the following components:

s

a matrix corresponding to X, giving their projections onto the curve.

tag

an index, such that s[tag, ] is smooth.

lambda

for each point, its arc-length from the beginning of the curve.

dist

the sum-of-squared distances from the points to their projections.

converged

logical; did the algorithm converge?

iter

numeric; the number of iterations performed.

totalDist

numeric; total sum-of-squared distances.

complexity

numeric vector; the complexity of the smoother fitted to each variable in X.

call

the matched call.

ordination

an object of class "rda", the result of a call to rda. This is a principal components analysis of the input data X.

data

a copy of the data used to fit the principal curve.

Arguments

X

a matrix-like object containing the variables to which the principal curve is to be fitted.

method

character; method to use when initialising the principal curve. "ca" fits a correspondence analysis to X and uses the axis-th axis scores as the initial curve. "pca" does the same but fits a principal components analysis to X. "random" produces a random ordering as the initial curve.

start

numeric vector specifying the initial curve when method = "user". Must be of length nrow(X).

smoother

function; the choice of smoother used to fit the principal curve. Currently, the only options are smoothSpline, which is a wrapper to smooth.spline, and smoothGAM, which is a wrapper to gam.

complexity

numeric; the complexity of the fitted smooth functions.

The function passed as argument smoother should arrange for this argument to be passed on to relevant aspect of the underlying smoother. In the case of smoothSpline, complexity is the df argument of smooth.spline.

vary

logical; should the complexity of the smoother fitted to each variable in X be allowed to vary (i.e. to allow a more or less smooth function for a particular variable. If FALSE the median complexity over all m variables is chosen as the fixed complexity for all m smooths.

maxComp

numeric; the upper limt on the allowed complexity.

finalCV

logial; should a final fit of the smooth function be performed using cross validation?

axis

numeric; the ordinaion axis to use as the initial curve.

rank

logical; should rank position on the gradient be used? Not yet implemented.

stretch

numeric; a factor by which the curve can be extrapolated when points are projected. Default is 2 (times the last segment length).

maxit

numeric; the maximum number of iterations.

trace

logical; print progress on the iterations be printed to the console?

thresh

numeric; convergence threshold on shortest distances to the curve. The algorithm is considered to have converged when the latest iteration produces a total residual distance to the curve that is within thresh of the value obtained during the previous iteration.

plotit

logical; should the fitting process be plotted? If TRUE, then the fitted principal curve and observations in X are plotted in principal component space.

...

additional arguments are passed solely on to the function smoother.

Author

Gavin L. Simpson

See Also

smoothGAM and smoothSpline for the wrappers fitting smooth functions to each variable.

Examples

Run this code
## Load Abernethy Forest data set
data(abernethy)

## Remove the Depth and Age variables
abernethy2 <- abernethy[, -(37:38)]

## Fit the principal curve using the median complexity over
## all species
aber.pc <- prcurve(abernethy2, method = "ca", trace = TRUE,
                   vary = FALSE, penalty = 1.4)

## Extract fitted values
fit <- fitted(aber.pc) ## locations on curve
abun <- fitted(aber.pc, type = "smooths") ## fitted response

## Fit the principal curve using varying complexity of smoothers
## for each species
aber.pc2 <- prcurve(abernethy2, method = "ca", trace = TRUE,
                    vary = TRUE, penalty = 1.4)

## Predict new locations
take <- abernethy2[1:10, ]
pred <- predict(aber.pc2, take)

if (FALSE) {
## Fit principal curve using a GAM - currently slow ~10secs
aber.pc3 <- prcurve(abernethy2 / 100, method = "ca", trace = TRUE,
                    vary = TRUE, smoother = smoothGAM, bs = "cr", family = mgcv::betar())
}

Run the code above in your browser using DataLab