A Naive Bayes classifier that assumes independence between the feature variables. Currently, either a Bernoulli,
multinomial, or Gaussian distribution can be used. The bernoulli distribution should be used when the features are 0 or 1 to
indicate the presence or absence of the feature in each document. The multinomial distribution should be used when the
features are the frequency that the feature occurs in each document. NA's are simply treated as 0. Finally, the Gaussian distribution
should be used with numerical variables. By setting the distribution parameter a mix of different distributions can be used
for different columns in the input matrix
By setting sparse = TRUE the numeric matrix x will be converted to a sparse dgcMatrix. This can be considerably faster
in case few observations have a value different than 0.
It's also possible to directly supply a sparse dgcMatrix, which can be a lot faster in case a fastNaiveBayes model
is trained multiple times on the same matrix or a subset of this. See examples for more details. Bear in mind that
converting to a sparse matrix can actually be slower depending on the data.