The basic representation is in terms of a two-way table:
and the associated hypergeometric probability \(P(x)=C_x^a C_{k-x}^b / C_k^N\).
The table is constrained so that rows and columns add to the margins. In all cases x is an integer or zero, but meaningful probability distributions occur when the other parameters are real. Johnson, Kotz and Kemp (1992) give a general discussion.
Kemp and Kemp (1956) classify the possible probability distributions that can occur when real values are allowed, into eight types. The classic hypergeometric with integer values forms a ninth type. Five of the eight types correspond to known distributions used in various contexts. Three of the eight types, appear to have no practical applications, but for completeness they have been implemented.
The Kemp and Kemp types are defined in terms of the ranges of the a, k, and N parameters and are given in ghyper.types
. The function tghyper()
will give details for specific values of a, k, and N.
These distributions apply to many important problems, which has lead to a variety of names:
The Kemp and Kemp types IIA and IIIA are known as:
The advantages of the conditional argument are considerable. Consider a few examples:
Future event: Consider two events which have occurred u and v times respectively. The distribution function of x occurrences of the first event in a sample of k new trials is calculated. Here a = -u-1, and N = -u-v-2.
Example: Suppose Toronto has won 3 games and Atlanta 1 in the World Series. What is the probability that Toronto will win the series by taking 2 or more of the remaining 3 games?
Exceedance: Consider two samples of size m and k, then the distribution function of x, the number of elements out of k which exceed the r th largest element in the size m sample is calculated. Here a = -r, and N = -m-1.
Example: Suppose that only once in the last century has the high-water mark at the St. Joe bridge exceeded 12 feet, what is the probability that it will not do so in the next ten years?
Waiting time: Consider an urn with T balls, m of which are white, and that drawing without replacement is continued until w white balls are obtained, then the distribution function of x, the number of balls in excess of w that must be drawn is desired. Here a = -w , N = -m-1, and k = T - m.
Example: Suppose a lot of 100 contains 5 defectives. What is the mean number of items that must be inspected before a defective item is found?
Mixture: Suppose x has a binomial distribution with parameter p, and number of trials k. Suppose that p is not fixed, but itself distributed like a beta variable with parameters A and B, then the distribution of x is calculated with a = -A and N = -A -B.
Names for Kemp and Kemp type IV are:
Beta-negative-binomial
Beta-Pascal
Generalized Waring
One application is accidents:
Suppose accidents follow a Poisson distribution with mean L, and suppose L varies with individuals according to accident proneness, m. In particular, suppose L follows a gamma distribution with parameter r and scale factor m , and that the scale factor n itself follows a beta distribution with parameters A and B, then the distribution of accidents, x, is beta-negative-binomial with a = -B, k = -r , and N = A -1. See Xekalki (1983) for a discussion of this as well as a discussion of accident models for proneness, contagion and spells.