kha: Kernel Principal Components Analysis

Description

Kernel Hebbian Algorithm is a nonlinear iterative algorithm for principal component analysis.

Usage

# S4 method for formula
kha(x, data = NULL, na.action, ...)
# S4 method for matrix
kha(x, kernel = "rbfdot", kpar = list(sigma = 0.1), features = 5, 
         eta = 0.005, th = 1e-4, maxiter = 10000, verbose = FALSE,
        na.action = na.omit, ...)

Value

An S4 object containing the principal component vectors along with the corresponding normalization values.

pcv: a matrix containing the principal component vectors (column wise)
eig: The normalization values
xmatrix: The original data matrix

all the slots of the object can be accessed by accessor functions.

Arguments

x

The data matrix indexed by row or a formula describing the model. Note, that an intercept is always included, whether given in the formula or not.

data

an optional data frame containing the variables in the model (when using a formula).

kernel

the kernel function used in training and predicting. This parameter can be set to any function, of class kernel, which computes the inner product in feature space between two vector arguments (see kernels). kernlab provides the most popular kernel functions which can be used by setting the kernel parameter to the following strings:

rbfdot Radial Basis kernel function "Gaussian"
polydot Polynomial kernel function
vanilladot Linear kernel function
tanhdot Hyperbolic tangent kernel function
laplacedot Laplacian kernel function
besseldot Bessel kernel function
anovadot ANOVA RBF kernel function
splinedot Spline kernel

The kernel parameter can also be set to a user defined function of class kernel by passing the function name as an argument.

kpar

the list of hyper-parameters (kernel parameters). This is a list which contains the parameters to be used with the kernel function. Valid parameters for existing kernels are :

sigma inverse kernel width for the Radial Basis kernel function "rbfdot" and the Laplacian kernel "laplacedot".
degree, scale, offset for the Polynomial kernel "polydot"
scale, offset for the Hyperbolic tangent kernel function "tanhdot"
sigma, order, degree for the Bessel kernel "besseldot".
sigma, degree for the ANOVA kernel "anovadot".

Hyper-parameters for user defined kernels can be passed through the kpar parameter as well.

features

Number of features (principal components) to return. (default: 5)

eta

The hebbian learning rate (default : 0.005)

th

the smallest value of the convergence step (default : 0.0001)

maxiter

the maximum number of iterations.

verbose

print convergence every 100 iterations. (default : FALSE)

na.action

A function to specify the action to be taken if NAs are found. The default action is na.omit, which leads to rejection of cases with missing values on any required variable. An alternative is na.fail, which causes an error if NA cases are found. (NOTE: If given, this argument must be named.)

...

additional parameters

Author

Alexandros Karatzoglou
alexandros.karatzoglou@ci.tuwien.ac.at

Details

The original form of KPCA can only be used on small data sets since it requires the estimation of the eigenvectors of a full kernel matrix. The Kernel Hebbian Algorithm iteratively estimates the Kernel Principal Components with only linear order memory complexity. (see ref. for more details)

References

Kwang In Kim, M.O. Franz and B. Schölkopf
Kernel Hebbian Algorithm for Iterative Kernel Principal Component Analysis
Max-Planck-Institut für biologische Kybernetik, Tübingen (109)
https://is.mpg.de/fileadmin/user_upload/files/publications/pdf2302.pdf

Examples

Run this code

# another example using the iris
data(iris)
test <- sample(1:150,70)

kpc <- kha(~.,data=iris[-test,-5],kernel="rbfdot",
           kpar=list(sigma=0.2),features=2, eta=0.001, maxiter=65)

#print the principal component vectors
pcv(kpc)

#plot the data projection on the components
plot(predict(kpc,iris[,-5]),col=as.integer(iris[,5]),
     xlab="1st Principal Component",ylab="2nd Principal Component")

Run the code above in your browser using DataLab