The Exact Multinomial Test is a Goodness-of-fit test for discrete multivariate data.
It is tested if a given observation is likely to have occurred under the assumption of an ab-initio model.
In the experimental setup belonging to the test, n items fall into k categories with certain probabilities
(sample size n with k categories).
The observation, described by the vector observed
, indicates how many items have been observed in each category.
The model, determined by the vector prob
, assigns to each category the hypothetical probability that an item falls into it.
Now, if the observation is unlikely to have occurred under the assumption of the model, it is advisible to
regard the model as not valid. The p-value estimates how likely the observation is, given the model.
In particular, low p-values suggest that the model is not valid.
The default approach used by multinomial.test
obtains the p-values by
calculating the exact probabilities of all possible outcomes given n
and k
,
using the multinomial probability distribution function dmultinom
provided by R.
Then, by default, the p-value is obtained by summing the probabilities of all outcomes which are less likely
than the observed outcome (or equally likely as the observed outcome), i.e. by summing all \(p(i) <= p(observed)\)
(distance measure based on probabilities).
Alternatively, the p-value can be obtained by summing the probabilities of all outcomes connected with a chisquare no smaller than
the chisquare connected with the actual observation (distance measure based on chisquare).
The latter is triggered by setting useChisq = TRUE
.
Having a sample of size n in an experiment with k categories, the number of distinct
possible outcomes is the binomial coefficient choose(n+k-1,k-1)
. This number grows rapidly with increasing parameters n and
k. If the parameters grow too big, numerical calculation might fail because of time or
memory limitations.
In this case, usage of a Monte Carlo approach provided by multinomial.test
is suggested.
A Monte Carlo approach, activated by setting MonteCarlo = TRUE
,
simulates withdrawal of ntrial samples of size n from the hypothetical distribution specified by the vector prob
.
The default value for ntrial is 100000
but might be incremented for big n and k.
The advantage of the Monte Carlo approach is that memory requirements and running time are essentially determined by ntrial
but not by n or k.
By default, the p-value is then obtained by summing the relative frequencies of occurrence of unusual outcomes, i.e. of
outcomes occurring less frequently than the observed one (or equally frequent as the observed one).
Alternatively, as above, Pearson's chisquare can be used as a distance measure by setting useChisq = TRUE
.
The parameter atOnce is of more technical nature, with a default value of \(1000000\). This value should be decremented
for computers with low memory to avoid overflow, and can be incremented for large-CPU computers to speed up calculations.
The parameter is only effective for Monte Carlo calculations.