Learn R Programming

ks (version 1.7.0)

kda: Kernel discriminant analysis for multivariate data

Description

Kernel discriminant analysis for 1- to 6-dimensional data.

Usage

Hkda(x, x.group, Hstart, bw="plugin", nstage=2, pilot="samse",
     pre="sphere", binned=FALSE, bgridsize)
Hkda.diag(x, x.group, bw="plugin", nstage=2, pilot="samse", 
     pre="sphere", binned=FALSE, bgridsize)
hkda(x, x.group, bw="plugin", nstage=2, binned=TRUE, bgridsize)

kda(x, x.group, Hs, hs, y, prior.prob=NULL) compare(x.group, est.group, by.group=FALSE) compare.kda.cv(x, x.group, bw="plugin", prior.prob=NULL, Hstart, by.group=FALSE, trace=FALSE, binned=FALSE, bgridsize, recompute=FALSE, ...) compare.kda.diag.cv(x, x.group, bw="plugin", prior.prob=NULL, by.group=FALSE, trace=FALSE, binned=FALSE, bgridsize, recompute=FALSE, ...)

Arguments

x
matrix of training data values
x.group
vector of group labels for training data
y
matrix of test data
Hs
(stacked) matrix of bandwidth matrices
hs
vector of scalar bandwidths
prior.prob
vector of prior probabilities
bw
bandwidth: "plugin" = plug-in, "lscv" = LSCV, "scv" = SCV
nstage
number of stages in the plug-in bandwidth selector (1 or 2)
pilot
"amse" = AMSE pilot bandwidths, "samse" = single SAMSE pilot bandwidth
pre
"scale" = pre-scaling, "sphere" = pre-sphering
Hstart
(stacked) matrix of initial bandwidth matrices, used in numerical optimisation
binned
flag for binned kernel estimation. Default is FALSE.
bgridsize
vector of binning grid sizes
est.group
vector of estimated group labels
by.group
flag to give results also within each group
trace
flag for printing messages in command line to trace the execution
recompute
flag for recomputing the bandwidth matrix after excluding the i-th data item
...
other optional parameters for bandwidth selection, see Hpi, Hlscv, Hscv

Value

  • --The result from Hkda and Hkda.diag is a stacked matrix of bandwidth matrices, one for each training data group. The result from hkda is a vector of bandwidths, one for each training data group.

    --The result from kda is a vector of group labels estimated via the kernel discriminant rule. If the test data y are given then these are classified. Otherwise the training data x are classified.

    --The compare functions create a comparison between the true group labels x.group and the estimated ones. It returns a list with fields

  • crosscross-classification table with the rows indicating the true group and the columns the estimated group
  • errormisclassification rate (MR)
  • In the case where we have test data that is independent of the training data, compare computes MR = (number of points wrongly classified)/ (total number of points). In the case where we don't have independent test data e.g. we are classifying the training data set itself, then the cross validated estimate of MR is more appropriate. These are implemented as compare.kda.cv (full bandwidth selectors) and compare.kda.diag.cv (for diagonal bandwidth selectors). These functions are only available for d > 1.

    If by.group=FALSE then only the total MR rate is given. If it is set to TRUE, then the MR rates for each class are also given (estimated number in group divided by true number).

Details

--The values that valid for bw are "plugin", "lscv" and "scv" for Hkda. These in turn call Hpi, Hlscv and Hscv. For plugin selectors, all of nstage, pilot and pre need to be set. For SCV selectors, currently nstage=1 always but pilot and pre need to be set. For LSCV selectors, none of them are required. Hkda.diag makes analogous calls to diagonal selectors.

For details on the pre-transformations in pre, see pre.sphere and pre.scale.

--If you have prior probabilities then set prior.prob to these. Otherwise prior.prob=NULL is the default i.e. use the sample proportions as estimates of the prior probabilities. If trace=TRUE, a message is printed in the command line indicating that it's processing the i-th data item: cross-validated estimates may take a long time to execute.

References

Simonoff, J. S. (1996) Smoothing Methods in Statistics. Springer-Verlag. New York

Examples

Run this code
### See examples in ? plot.kda.kde

### univariate example -- independent test data
x <- c(rnorm.mixt(n=100, mus=1, sigmas=1, props=1),
       rnorm.mixt(n=100, mus=-1, sigmas=1, props=1))
x.gr <- rep(c(1,2), times=c(100,100))
y <- c(rnorm.mixt(n=100, mus=1, sigmas=1, props=1),
       rnorm.mixt(n=100, mus=-1, sigmas=1, props=1))
kda.gr <- kda(x, x.gr, hs=sqrt(c(0.09, 0.09)), y=y)
compare(x.gr, kda.gr)
compare(x.gr, kda.gr, by.group=TRUE) 

### bivariate example - restricted iris dataset, dependent test data
library(MASS)
data(iris)
ir <- iris[,c(1,2)]
ir.gr <- iris[,5]
compare.kda.cv(ir, ir.gr, bw="plug-in", pilot="samse")

Run the code above in your browser using DataLab