A weighted kernel version of the famous k-means algorithm.
# S4 method for formula
kkmeans(x, data = NULL, na.action = na.omit, ...)# S4 method for matrix
kkmeans(x, centers, kernel = "rbfdot", kpar = "automatic",
alg="kkmeans", p=1, na.action = na.omit, ...)
# S4 method for kernelMatrix
kkmeans(x, centers, ...)
# S4 method for list
kkmeans(x, centers, kernel = "stringdot",
kpar = list(length=4, lambda=0.5),
alg ="kkmeans", p = 1, na.action = na.omit, ...)
the matrix of data to be clustered, or a symbolic
description of the model to be fit, or a kernel Matrix of class
kernelMatrix
, or a list of character vectors.
an optional data frame containing the variables in the model. By default the variables are taken from the environment which `kkmeans' is called from.
Either the number of clusters or a matrix of initial cluster centers. If the first a random initial partitioning is used.
the kernel function used in training and predicting.
This parameter can be set to any function, of class kernel, which
computes a inner product in feature space between two
vector arguments (see link{kernels}
). kernlab provides the most popular kernel functions
which can be used by setting the kernel parameter to the following
strings:
rbfdot
Radial Basis kernel "Gaussian"
polydot
Polynomial kernel
vanilladot
Linear kernel
tanhdot
Hyperbolic tangent kernel
laplacedot
Laplacian kernel
besseldot
Bessel kernel
anovadot
ANOVA RBF kernel
splinedot
Spline kernel
stringdot
String kernel
Setting the kernel parameter to "matrix" treats x
as a kernel
matrix calling the kernelMatrix
interface.
The kernel parameter can also be set to a user defined function of class kernel by passing the function name as an argument.
a character string or the list of hyper-parameters (kernel parameters).
The default character string "automatic"
uses a heuristic the determine a
suitable value for the width parameter of the RBF kernel.
A list can also be used containing the parameters to be used with the kernel function. Valid parameters for existing kernels are :
sigma
inverse kernel width for the Radial Basis
kernel function "rbfdot" and the Laplacian kernel "laplacedot".
degree, scale, offset
for the Polynomial kernel "polydot"
scale, offset
for the Hyperbolic tangent kernel
function "tanhdot"
sigma, order, degree
for the Bessel kernel "besseldot".
sigma, degree
for the ANOVA kernel "anovadot".
length, lambda, normalized
for the "stringdot" kernel
where length is the length of the strings considered, lambda the
decay factor and normalized a logical parameter determining if the
kernel evaluations should be normalized.
Hyper-parameters for user defined kernels can be passed through the kpar parameter as well.
the algorithm to use. Options currently include
kkmeans
and kerninghan
.
a parameter used to keep the affinity matrix positive semidefinite
The action to perform on NA
additional parameters
An S4 object of class specc
which extends the class vector
containing integers indicating the cluster to which
each point is allocated. The following slots contain useful information
A matrix of cluster centers.
The number of point in each cluster
The within-cluster sum of squares for each cluster
The kernel function used
kernel k-means
uses the 'kernel trick' (i.e. implicitly projecting all data
into a non-linear feature space with the use of a kernel) in order to
deal with one of the major drawbacks of k-means
that is that it cannot
capture clusters that are not linearly separable in input space.
The algorithm is implemented using the triangle inequality to avoid
unnecessary and computational expensive distance calculations.
This leads to significant speedup particularly on large data sets with
a high number of clusters.
With a particular choice of weights this algorithm becomes
equivalent to Kernighan-Lin, and the norm-cut graph partitioning
algorithms.
The function also support input in the form of a kernel matrix
or a list of characters for text clustering.
The data can be passed to the kkmeans
function in a matrix
or a
data.frame
, in addition kkmeans
also supports input in the form of a
kernel matrix of class kernelMatrix
or as a list of character
vectors where a string kernel has to be used.
Inderjit Dhillon, Yuqiang Guan, Brian Kulis A Unified view of Kernel k-means, Spectral Clustering and Graph Partitioning UTCS Technical Report http://people.bu.edu/bkulis/pubs/spectral_techreport.pdf
# NOT RUN {
## Cluster the iris data set.
data(iris)
sc <- kkmeans(as.matrix(iris[,-5]), centers=3)
sc
centers(sc)
size(sc)
withinss(sc)
# }
Run the code above in your browser using DataLab