classifdist: Classification of unclustered points

Description

Various methods for classification of unclustered points from clustered points for use within functions nselectboot and prediction.strength.

Usage

classifdist(cdist,clustering,
                      method="averagedist",
                      centroids=NULL,nnk=1)
classifnp(data,clustering,
                      method="centroid",cdist=NULL,
                      centroids=NULL,nnk=1)

Arguments

cdist

dissimilarity matrix or dist-object. Necessary for classifdist but optional for classifnp and there only used if method="averagedist" (if not provided, dist is applied to data).

data

something that can be coerced into a an n*p-data matrix.

clustering

integer vector. Gives the cluster number (between 1 and k for k clusters) for clustered points and should be -1 for points to be classified.

method

one of "averagedist", "centroid", "qda", "knn". See details.

centroids

for classifnp a k times p matrix of cluster centroids. For classifdist a vector of numbers of centroid objects as provided by pam. Only used if method="centroid"; in that case mandatory for classifdist but optional for classifnp, where cluster mean vectors are computed if centroids=NULL.

nnk

number of nearest neighbours if method="knn".

Value

An integer vector giving cluster numbers for all observations; those for the observations already clustered in the input are the same as in the input.

Details

classifdist is for data given as dissimilarity matrix, classifnp is for data given as n times p data matrix. The following methods are supported:

"centroid": assigns observations to the cluster with closest cluster centroid as specified in argument centroids (this is associated to k-means and pam/clara-clustering).
"averagedist": assigns to the cluster to which an observation has the minimum average dissimilarity to all points in the cluster (this is associated to average linkage clustering).
"qda": only in classifnp. Classifies by quadratic discriminant analysis (this is associated to Gaussian clusters with flexible covariance matrices), calling qda with default settings. If qda gives an error (usually because a class was too small), lda is used.
"knn": classifies by nnk nearest neighbours (for nnk=1, this is associated with single linkage clustering). Calls knn in classifnp.

Examples

Run this code

# NOT RUN {
set.seed(20000)
x1 <- rnorm(50)
y <- rnorm(100)
x2 <- rnorm(40,mean=20)
x3 <- rnorm(10,mean=25,sd=100)
x <- cbind(c(x1,x2,x3),y)
truec <- c(rep(1,50),rep(2,40),rep(3,10))
topredict <- c(1,2,51,52,91)
clumin <- truec
clumin[topredict] <- -1

classifnp(x,clumin, method="averagedist")
classifnp(x,clumin, method="qda")
classifdist(dist(x),clumin, centroids=c(3,53,93),method="centroid")
classifdist(dist(x),clumin,method="knn")

# }