bkpc: Bayesian Kernel Projection Classifier

Description

Function bkpc is used to train a Bayesian kernel projection classifier. This is a nonlinear multicategory classifier which performs the classification of the projections of the data to the principal axes of the feature space. The Gibbs sampler is implemented to find the posterior distributions of the parameters, so probability distributions of prediction can be obtained for for new observations.

Usage

# S3 method for default
bkpc(x, y, theta = NULL, n.kpc = NULL, thin = 100, n.iter = 1e+05, std = 10, 
g1 = 0.001, g2 = 0.001, g3 = 1, g4 = 1, initSigmasq = NULL, initBeta = NULL,
initTau = NULL, intercept = TRUE, rotate = TRUE, ...)

# S3 method for kern
bkpc(x, y, n.kpc = NULL, thin = 100, n.iter = 1e+05, std = 10, 
g1 = 0.001, g2 = 0.001, g3 = 1, g4 = 1, initSigmasq = NULL, initBeta = NULL, 
initTau = NULL, intercept = TRUE, rotate = TRUE, ...)

# S3 method for kernelMatrix
bkpc(x, y, n.kpc = NULL, thin = 100, n.iter = 1e+05, std = 10, 
g1 = 0.001, g2 = 0.001, g3 = 1, g4 = 1, initSigmasq = NULL, initBeta = NULL, 
initTau = NULL, intercept = TRUE, rotate = TRUE, ...)

Arguments

either: a data matrix, a kernel matrix of class "kernelMatrix" or a kernel matrix of class "kern".

a response vector with one label for each row of x. Should be a factor.

theta

the inverse kernel bandwidth parameter.

n.kpc

number of kernel principal components to use.

n.iter

number of iterations for the MCMC algorithm.

thin

thinning interval.

std

standard deviation parameter for the random walk proposal.

\(\gamma_1\) hyper-parameter of the prior inverse gamma distribution for the \(\sigma^2\) parameter in the BKPC model.

\(\gamma_2\) hyper-parameter of the prior inverse gamma distribution for the \(\sigma^2\) parameter of the BKPC model.

\(\gamma_3\) hyper-parameter of the prior gamma distribution for the \(\tau\) parameter in the BKPC model.

\(\gamma_4\) hyper-parameter of the prior gamma distribution for the \(\tau\) parameter in the BKPC model.

initSigmasq

optional specification of initial value for the \(\sigma^2\) parameter in the BKPC model.

initBeta

optional specification of initial values for the \(\beta\) parameters in the BKPC model.

initTau

optional specification of initial values for the \(\tau\) parameters in the BKPC model.

intercept

if intercept=TRUE (the default) then include the intercept in the model.

rotate

if rotate=TRUE (the default) then run the BKPC model. Else run the BKMC model.

…

Currently not used.

Value

An object of class "bkpc" including:

beta

realizations of the \(\beta\) parameters from the joint posterior distribution in the BKPC model.

tau

realizations of the \(\tau\) parameters from the joint posterior distribution in the BKPC model.

realizations of the latent variables \(z\) from the joint posterior distribution in the BKPC model.

sigmasq

realizations of the \(\sigma^2\) parameter from the joint posterior distribution in the BKPC model.

n.class

number of independent classes of the response variable i.e. number of classes - 1.

n.kpc

number of kernel principal components used.

n.iter

number of iterations of the MCMC algorithm.

thin

thinning interval.

intercept

if true, intercept was included in the model.

rotate

if true, the sparse BKPC model was fitted, else BKMC model.

kPCA

if rotate=TRUE an object of class "kPCA", else NULL.

the supplied data matrix or kernel matrix.

theta

if data was supplied, as opposed to the kernel, this is the inverse kernel bandwidth parameter used in obtaining the Gaussian kernel, else NULL.

Details

Initial values for a BKPC model can be supplied, otherwise they are generated using runif function.

The data can be passed to the bkpc function in a matrix and the Gaussian kernel computed using the gaussKern function is then used in training the algorithm and predicting. The bandwidth parameter theta can be supplied to the gaussKern function, else a default value is used.

In addition, bkpc also supports input in the form of a kernel matrix of class "kern" or "kernelMatrix".The latter allows for a range of kernel functions as well as user specified ones.

If rotate=TRUE (the default) then the BKPC is trained. This algorithm performs the classification of the projections of the data to the principal axes of the feature space. Else the Bayesian kernel multicategory classifier (BKMC) is trained, where the data is mapped to the feature space via the kernel matrix, but not projected (rotated) to the principal axes. The hierarchichal prior structure for the two models is the same, but BKMC model is not sparse.

References

Domijan K. and Wilson S. P.: Bayesian kernel projections for classification of high dimensional data. Statistics and Computing, 2011, Volume 21, Issue 2, pp 203-216

Examples

Run this code

# NOT RUN {
set.seed(-88106935)

data(microarray)

# consider only four tumour classes (NOTE: "NORM" is not a class of tumour)
y <- microarray[, 2309]
train <- as.matrix(microarray[y != "NORM", -2309])
wtr <- factor(microarray[y != "NORM", 2309], levels = c("BL" ,  "EWS" , "NB" ,"RMS" ))

n.kpc <- 6
n.class <- length(levels(wtr)) - 1

K <- gaussKern(train)$K

# supply starting values for the parameters
# use Gaussian kernel as input

result <- bkpc(K, y = wtr, n.iter = 1000,  thin = 10, n.kpc = n.kpc,  
initSigmasq = 0.001, initBeta = matrix(10, n.kpc *n.class, 1), 
initTau =matrix(10, n.kpc * n.class, 1), intercept = FALSE, rotate = TRUE)

# predict

out <- predict(result, n.burnin = 10) 

table(out$class, as.numeric(wtr))

# plot the data projection on the kernel principal components

pairs(result$kPCA$KPCs[, 1 : n.kpc], col = as.numeric(wtr), 
main =  paste("symbol = predicted class", "\n", "color = true class" ), 
pch = out$class, upper.panel = NULL)
par(xpd=TRUE)
legend('topright', levels(wtr), pch = unique(out$class), 
text.col = as.numeric(unique(wtr)), bty = "n")




# Another example: Iris data

data(iris)
testset <- sample(1:150,50)

train <- as.matrix(iris[-testset,-5])
test <- as.matrix(iris[testset,-5])

wtr <- iris[-testset, 5]
wte <- iris[testset, 5]

# use default starting values for paramteres in the model.

result <- bkpc(train, y = wtr,  n.iter = 1000,  thin = 10, n.kpc = 2, 
intercept = FALSE, rotate = TRUE)

# predict
out <- predict(result, test, n.burnin = 10) 

# classification rate
sum(out$class == as.numeric(wte))/dim(test)[1]

table(out$class, as.numeric(wte))

# }
# NOT RUN {
# Another example: synthetic data from MASS library

library(MASS)

train<- as.matrix(synth.tr[, -3])
test<- as.matrix(synth.te[, -3])

wtr <- as.factor(synth.tr[, 3])
wte <- as.factor(synth.te[, 3])


#  make training set kernel using kernelMatrix from kernlab library

library(kernlab)

kfunc <- laplacedot(sigma = 1)
Ktrain <- kernelMatrix(kfunc, train)

#  make testing set kernel using kernelMatrix {kernlab}

Ktest <- kernelMatrix(kfunc, test, train)

result <- bkpc(Ktrain, y = wtr, n.iter = 1000,  thin = 10,  n.kpc = 3, 
intercept = FALSE, rotate = TRUE)

# predict

out <- predict(result, Ktest, n.burnin = 10) 

# classification rate

sum(out$class == as.numeric(wte))/dim(test)[1]
table(out$class, as.numeric(wte))


# embed data from the testing set on the new space:

KPCtest <- predict(result$kPCA, Ktest)

# new data is linearly separable in the new feature space where classification takes place
library(rgl)
plot3d(KPCtest[ , 1 :  3], col = as.numeric(wte))


# another model:  do not project the data to the principal axes of the feature space. 
# NOTE: Slow
# use Gaussian kernel with the default bandwidth parameter

Ktrain <- gaussKern(train)$K

Ktest <- gaussKern(train, test, theta = gaussKern(train)$theta)$K

resultBKMC <- bkpc(Ktrain, y = wtr, n.iter = 1000,  thin = 10,  
intercept = FALSE, rotate = FALSE)

# predict
outBKMC <- predict(resultBKMC, Ktest, n.burnin = 10)

# to compare with previous model
table(outBKMC$class, as.numeric(wte))


# another example: wine data from gclus library

library(gclus)
data(wine)

testset <- sample(1 : 178, 90)
train <- as.matrix(wine[-testset, -1])
test <- as.matrix(wine[testset, -1])

wtr <- as.factor(wine[-testset, 1])
wte <- as.factor(wine[testset, 1])

#  make training set kernel using kernelMatrix from kernlab library

kfunc <- anovadot(sigma = 1, degree = 1)
Ktrain <- kernelMatrix(kfunc, train)

#  make testing set kernel using kernelMatrix {kernlab}
Ktest <- kernelMatrix(kfunc, test, train)

result <- bkpc(Ktrain, y = wtr, n.iter = 1000,  thin = 10,  n.kpc = 3, 
intercept = FALSE, rotate = TRUE)

out <- predict(result, Ktest, n.burnin = 10) 

# classification rate in the test set
sum(out$class == as.numeric(wte))/dim(test)[1]


# embed data from the testing set on the new space:
KPCtest <- predict(result$kPCA, Ktest)

# new data is linearly separable in the new feature space where classification takes place


pairs(KPCtest[ , 1 :  3], col = as.numeric(wte), 
main =  paste("symbol = predicted class", "\n", "color = true class" ), 
pch = out$class, upper.panel = NULL)

par(xpd=TRUE)

legend('topright', levels(wte), pch = unique(out$class), 
text.col = as.numeric(unique(wte)), bty = "n")


# }

Run the code above in your browser using DataLab