fastNaiveBayes.mixed: Fast Naive Bayes Classifier for mixed Distributions

Description

Extremely fast implementation of a Naive Bayes Classifier.

Usage

fastNaiveBayes.mixed(x, y, laplace = 0, sparse = FALSE,
  distribution = NULL, ...)
# S3 method for default
fastNaiveBayes.mixed(x, y, laplace = 0,
  sparse = FALSE, distribution = NULL, ...)

Arguments

a numeric matrix, or a dgcMatrix

a factor of classes

laplace

A number used for Laplace smoothing. Default is 0

sparse

Use a sparse matrix? If true a sparse matrix will be constructed from x, which can give up to a 40 It's possible to directly feed a sparse dgcMatrix as x, which will set this parameter to TRUE

distribution

A list with distribution names and column names to for which to use the distribution, see examples.

...

Not used.

Value

A fitted object of class "fastNaiveBayes". It has four components:

models: The fitted models, one for each distribution specified
priors: calculated prior probabilities for each class
names: names of features used to train this fastNaiveBayes
distribution: the distribution assumed for probability calculations and predictions

Details

A Naive Bayes classifier that assumes independence between the feature variables. Currently, either a Bernoulli, multinomial, or Gaussian distribution can be used. The bernoulli distribution should be used when the features are 0 or 1 to indicate the presence or absence of the feature in each document. The multinomial distribution should be used when the features are the frequency that the feature occurs in each document. NA's are simply treated as 0. Finally, the Gaussian distribution should be used with numerical variables. By setting the distribution parameter a mix of different distributions can be used for different columns in the input matrix

By setting sparse = TRUE the numeric matrix x will be converted to a sparse dgcMatrix. This can be considerably faster in case few observations have a value different than 0.

It's also possible to directly supply a sparse dgcMatrix, which can be a lot faster in case a fastNaiveBayes model is trained multiple times on the same matrix or a subset of this. See examples for more details. Bear in mind that converting to a sparse matrix can actually be slower depending on the data.

Examples

Run this code

# NOT RUN {
rm(list = ls())
library(fastNaiveBayes)
cars <- mtcars
y <- as.factor(ifelse(cars$mpg > 25, "High", "Low"))
x <- cars[, 2:ncol(cars)]

# Mixed event models
dist <- fastNaiveBayes::fastNaiveBayes.detect_distribution(x, nrows = nrow(x))
print(dist)
mod <- fastNaiveBayes.mixed(x, y, laplace = 1)
pred <- predict(mod, newdata = x)
mean(pred != y)
# }

Run the code above in your browser using DataLab