Learn R Programming

fastNaiveBayes (version 1.1.2)

fastNaiveBayes.bernoulli: Fast Naive Bayes Classifier with a Bernoulli event model

Description

Extremely fast implementation of a Naive Bayes Classifier. This instance only uses the Bernoulli event model for all columns.

Usage

fastNaiveBayes.bernoulli(x, y, laplace = 0, sparse = FALSE, ...)

# S3 method for default fastNaiveBayes.bernoulli(x, y, laplace = 0, sparse = FALSE, ...)

Arguments

x

a numeric matrix with 1's and 0's to indicate the presence or absence of features. A sparse dgcMatrix is also accepted

y

a factor of classes

laplace

A number used for Laplace smoothing. Default is 0

sparse

Use a sparse matrix? If true a sparse matrix will be constructed from x, which can give up to a 40 It's possible to directly feed a sparse dgcMatrix as x, which will set this parameter to TRUE

...

Not used.

Value

A fitted object of class "fastNaiveBayes.bernoulli". It has four components:

probability_table

Posterior probabilities

priors

calculated prior probabilities for each class

names

names of features used to train this fastNaiveBayes

Details

A Naive Bayes classifier that assumes independence between the feature variables. The bernoulli distribution should be used when the features are 0 or 1 to indicate the presence or absence of the feature in each document. NA's are simply treated as 0.

By setting sparse = TRUE the numeric matrix x will be converted to a sparse dgcMatrix. This can be considerably faster in case few observations have a value different than 0.

It's also possible to directly supply a sparse dgcMatrix, which can be a lot faster in case a fastNaiveBayes model is trained multiple times on the same matrix or a subset of this. See examples for more details. Bear in mind that converting to a sparse matrix can actually be slower depending on the data.

See Also

predict.fastNaiveBayes.bernoulli for the predict function for the fastNaiveBayes.bernoulli class, fastNaiveBayes.mixed for the general fastNaiveBayes model, fastNaiveBayes.gaussian for a Gaussian distribution only model, and finally, fastNaiveBayes.multinomial for a multinomial only distribution model.

Examples

Run this code
# NOT RUN {
rm(list = ls())
library(fastNaiveBayes)
cars <- mtcars
y <- as.factor(ifelse(cars$mpg > 25, "High", "Low"))
x <- cars[, 2:ncol(cars)]

dist <- fastNaiveBayes::fastNaiveBayes.detect_distribution(x, nrows = nrow(x))

# Bernoulli only
vars <- c(dist$bernoulli, dist$multinomial)
newx <- x[, vars]
for (i in 1:ncol(newx)) {
  newx[[i]] <- as.factor(newx[[i]])
}
new_mat <- model.matrix(y ~ . - 1, cbind(y, newx))

mod <- fastNaiveBayes.bernoulli(new_mat, y, laplace = 1)
pred <- predict(mod, newdata = new_mat)
mean(pred != y)
# }

Run the code above in your browser using DataLab