Learn R Programming

MuViCP (version 1.3.2)

bel.builder: Building Belief Functions

Description

These are a set of functions that can be used to build belief functions (hence the name *.builder). Each of these returns a function that can be used to classify points in two dimensions.

The algorithm used can be judged from the first three letters. Thus the kde_bel function uses the kernel density estimate (kde), the knn_bel function uses the kernel density estimate together with information on the Nearest Neighbours, the jit_bel function uses jittering of the point in the neighbourhood. Finally, the cor_bel function uses the kde but includes a factor for self-correction. These generated functions (return values) are meant to be passed to the ensemble function to build an ensemble.

Usage

kde_bel.builder(labs, test, train, options = list(coef = 0.90)) knn_bel.builder(labs, test, train, options = list(k = 3, p = FALSE, dist.type = c('euclidean', 'absolute', 'mahal'), out = c('var', 'cv'), coef = 0.90)) jit_bel.builder(labs, test, train, options = list(k = 3, p = FALSE, s = 5, dist.type = c('euclidean', 'absolute', 'mahal'), out = c('var', 'cv'), coef = 0.90))

Arguments

labs
The possible labels for the points. Can be strings. Must be of the same length as train
test
The indices of the test data in P
train
The indices of the training data in P
options
A list of arguments that determine the behaviour of the constructed belief function.
k
The number of nearest neighbours to consider, specified as a definite integer
p
The number of nearest neighbours to consider, specified as a fraction of the test set
s
For the jitter belief function : how many times should each point be jittered in the neighbourhood? Usually, 2 or 3 works.
dist.type
The type of distance to use when computing nearest neighbours. Can be one of "euclidean", "absolute", or "mahal"
out
Should beliefs be built from the variance (var) or the coefficient of variation(cv)? Also see the Details section below.
coef
The classifier only assigns the class labels that actually occur, that is, ignorance is, by default not accounted for. This argument specifies what amount of belief should be allocated to ignorance; the beliefs to the other classes are correspondingly adjusted. Note that for the 'corrected' classifier, the actual belief assigned to ignorance may be higher than this for some projections. See Details.

Value

ensemble function.Alternately, 2-D projected data may directly be passed to the classifier function returned, in which case, a matrix of dimensions (Number of Classes) x (length(test)) is returned. Each column sums to 1, and represents the partial assignment of that point to each of the classes. The rows are named after the class names, while the columns are named after the test points. Ignorance is represented by the special symbol 'Inf' and is the last class in the matrix.

Details

Each of these functions uses a different algorithm for classification.

The kde_bel.builder returns a classifier that simply evaluates the kernel density estimate of each class on each point, and classifies that point to that class which has the maximum density on it.

The knn_bel.builder returns a classifier that tries to locate k (or p*length(train)) nearest neighbours of each of the points in the test set. It then evaluates the kernel density estimate of each class in the training set on each of these nearest neighbours, and at each of the testing points. With argument var, the variance of the set of density values, centered at the density value at the testing point itself, is taken as a measure of that point belonging to this class. With argument cv, the coefficient of variation is used instead, and for the mean, one uses the density value on the point itself. Generally, the var classifier has higher accuracy.

The jit_bel.builder works very similar to the knn_bel.builder classifier, but instead uses the nearest neighbour information to determine a point "neighbourhood". The test points are then jittered in this neighbourhood, and on these fake points the kernel density is evaluated. The var and cv work here as they work in the knn_bel.builder classifier.

Examples

Run this code
##Setting Up
data(cancer)
table(cancer$V2)
colnames(cancer)[1:2] <- c('id', 'type')
cancer.d <- as.matrix(cancer[,3:32])
labs <- cancer$type
test_size <- floor(0.15*nrow(cancer.d))
train <- sample(1:nrow(cancer.d), size = nrow(cancer.d) - test_size)
test <- which(!(1:569 %in% train))
truelabs = labs[test]

projectron <- function(A) cancer.d %*% A

seed <- .Random.seed
F <- projectron(basis_random(30))

##Simple Density Classification
kdebel <- kde_bel.builder(labs = labs[train], test = test, train = train)
x1 <- kdebel(F)
predicted1 <- apply(x1, MARGIN = 2, FUN = function(x) names(which.max(x)))
table(truelabs, predicted1)

##Density Classification Using Nearest Neighbor Information
knnbel <- knn_bel.builder(labs = labs[train], test = test, train =
train, options = list(k = 3, p = FALSE, dist.type = 'euclidean', out = 'var', coef
= 0.90))
x2 <- knnbel(F)
predicted2 <- apply(x2, MARGIN = 2, FUN = function(x) names(which.max(x)))
table(truelabs, predicted2)

##Same as above but now using the Coefficient of Variation for Classification
knnbel2 <- knn_bel.builder(labs = labs[train], test = test, train =
train, options = list(k = 3, p = FALSE, dist.type = 'euclidean', out = 'cv', coef =
0.90))
x3 <- knnbel2(F)
predicted3 <- apply(x3, MARGIN = 2, FUN = function(x) names(which.max(x)))
table(truelabs, predicted3)

##Density Classification Using Jitter & NN Information
jitbel <- jit_bel.builder(labs = labs[train], test = test, train =
train, options = list(k = 3, s = 2, p = FALSE, dist.type = 'euclidean', out =
'var', coef = 0.90))
x4 <- jitbel(F)
predicted4 <- apply(x4, MARGIN = 2, FUN = function(x) names(which.max(x)))
table(truelabs, predicted4)

##Same as above but now using the Coefficient of Variation for Classification
jitbel2 <- jit_bel.builder(labs = labs[train], test = test, train =
train, options = list(k = 3, p = FALSE, dist.type = 'euclidean', out =
'cv', s = 2, coef = 0.90))
x5 <- jitbel2(F)
predicted5 <- apply(x5, MARGIN = 2, FUN = function(x) names(which.max(x)))
table(truelabs, predicted5)

Run the code above in your browser using DataLab