Learn R Programming

EnvStats (version 2.3.1)

ehyper: Estimate Parameter of a Hypergeometric Distribution

Description

Estimate \(m\), the number of white balls in the urn, or \(m+n\), the total number of balls in the urn, for a hypergeometric distribution.

Usage

ehyper(x, m = NULL, total = NULL, k, method = "mle")

Arguments

x

non-negative integer indicating the number of white balls out of a sample of size k drawn without replacement from the urn. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are not allowed.

m

non-negative integer indicating the number of white balls in the urn. You must supply m or total, but not both. Missing values (NAs) are not allowed.

total

positive integer indicating the total number of balls in the urn (i.e., m+n). You must supply m or total, but not both. Missing values (NAs) are not allowed.

k

positive integer indicating the number of balls drawn without replacement from the urn. Missing values (NAs) are not allowed.

method

character string specifying the method of estimation. Possible values are "mle" (maximum likelihood; the default) and "mvue" (minimum variance unbiased). The mvue method is only available when you are estimating \(m\) (i.e., when you supply the argument total). See the DETAILS section for more information on these estimation methods.

Value

a list of class "estimate" containing the estimated parameters and other information. See estimate.object for details.

Details

Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are not allowed.

Let \(x\) be an observation from a hypergeometric distribution with parameters m=\(M\), n=\(N\), and k=\(K\). In R nomenclature, \(x\) represents the number of white balls drawn out of a sample of \(K\) balls drawn without replacement from an urn containing \(M\) white balls and \(N\) black balls. The total number of balls in the urn is thus \(M+N\). Denote the total number of balls by \(T = M+N\).

Estimation

Estimating M, Given T and K are known When \(T\) and \(K\) are known, the maximum likelihood estimator (mle) of \(M\) is given by (Forbes et al., 2011): $$\hat{M}_{mle} = floor[(T + 1) x / K] \;\;\;\; (1)$$ where \(floor()\) represents the floor function. That is, \(floor(y)\) is the largest integer less than or equal to \(y\).

If the quantity \(floor[(T + 1) x / K]\) is an integer, then the mle of \(M\) is also given by (Johnson et al., 1992, p.263): $$\hat{M}_{mle} = [(T + 1) x / K] - 1 \;\;\;\; (2)$$ which is what the function ehyper uses for this case.

The minimum variance unbiased estimator (mvue) of \(M\) is given by (Forbes et al., 2011): $$\hat{M}_{mvue} = (T x / K) \;\;\;\; (3)$$

Estimating T, given M and K are known When \(M\) and \(K\) are known, the maximum likelihood estimator (mle) of \(T\) is given by (Forbes et al., 2011): $$\hat{T}_{mle} = floor(K M / x) \;\;\;\; (4)$$

References

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Johnson, N. L., S. Kotz, and A. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, Chapter 6.

See Also

Hypergeometric.

Examples

Run this code
# NOT RUN {
  # Generate an observation from a hypergeometric distribution with 
  # parameters m=10, n=30, and k=5, then estimate the parameter m. 
  # Note: the call to set.seed simply allows you to reproduce this example. 
  # Also, the only parameter actually estimated is m; once m is estimated, 
  # n is computed by subtracting the estimated value of m (8 in this example) 
  # from the given of value of m+n (40 in this example).  The parameters 
  # n and k are shown in the output in order to provide information on 
  # all of the parameters associated with the hypergeometric distribution.

  set.seed(250) 
  dat <- rhyper(nn = 1, m = 10, n = 30, k = 5) 
  dat 
  #[1] 1   

  ehyper(dat, total = 40, k = 5) 

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Hypergeometric
  #
  #Estimated Parameter(s):          m =  8
  #                                 n = 32
  #                                 k =  5
  #
  #Estimation Method:               mle for 'm'
  #
  #Data:                            dat
  #
  #Sample Size:                     1

  #----------

  # Use the same data as in the previous example, but estimate m+n instead. 
  # Note: The only parameter estimated is m+n. Once this is estimated, 
  # n is computed by subtracting the given value of m (10 in this case) 
  # from the estimated value of m+n (50 in this example).

  ehyper(dat, m = 10, k = 5)

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Hypergeometric
  #
  #Estimated Parameter(s):          m = 10
  #                                 n = 40
  #                                 k =  5
  #
  #Estimation Method:               mle for 'm+n'
  #
  #Data:                            dat
  #
  #Sample Size:                     1


  #----------

  # Clean up
  #---------
  rm(dat)
# }

Run the code above in your browser using DataLab