Distribution.df: Data Frame Summarizing Available Probability Distributions and Estimation Methods

Description

Data frame summarizing information about available probability distributions in R and the EnvStats package, and which distributions have associated functions for estimating distribution parameters.

Usage

Distribution.df

Arguments

Format

A data frame with 35 rows corresponding to 35 different available probability distributions, and 25 columns containing information associated with these probability distributions.

Name

a character vector containing the name of the probability distribution (see the column labeled Name in the table below).

Type

a character vector indicating the type of distribution (see the column labeled Type in the table below). Possible values are "Finite Discrete", "Discrete", "Continuous", and "Mixed".

Support.Min

a character vector indicating the minimum value the random variable can assume (see the column labeled Range in the table below). The reason this is a character vector instead of a numeric vector is because some distributions have a lower bound that depends on the value of a distribution parameter. For example, the minimum value for a Uniform distribution is given by the value of the parameter min.

Support.Max

a character vector indicating the maximum value the random variable can assume (see the column labeled Range in the table below). The reason this is a character vector instead of a numeric vector is because some distributions have an upper bound that depends on the value of a distribution parameter. For example, the maximum value for a Uniform distribution is given by the value of the parameter max.

Estimation.Method(s)

a character vector indicating the names of the methods available to estimate the distribution parameter(s) (see the column labeled Estimation Method(s) in the table below). Possible values include "mle" (maximum likelihood), "mme" (method of moments), "mmue" (method of moments based on the unbiased estimate of variance), "mvue" (minimum variance unbiased), "qmle" (quasi-mle), etc., or some combination of these. In cases where an estimator is more than one kind, a slash (/) is used to denote all methods covered by the single estimator. For example, for the Binomial distribution, the sample proportion is the maximum likelihood, method of moments, and minimum variance unbiased estimator, so this method is denoted as "mle/mme/mvue". See the help files for the specific function listed under Estimating Distribution Parameters for an explanation of each of these estimation methods.

Quantile.Estimation.Method(s)

a character vector indicating the names of the methods available to estimate the distribution quantiles. For many distributions, these are the same as Estimation.Method(s). See the help files for the specific function listed under Estimating Distribution Quantiles for an explanation of each of these estimation methods.

Prediction.Interval.Method(s)

a character vector indicating the names of the methods available to create prediction intervals. See the help files for the specific function listed under Prediction Intervals for an explanation of each of these estimation methods.

Singly.Censored.Estimation.Method(s)

a character vector indicating the names of the methods available to estimate the distribution parameter(s) for Type I singly-censored data. See the help files for the specific function listed under Estimating Distribution Parameters in the help file for Censored Data for an explanation of each of these estimation methods.

Multiply.Censored.Estimation.Method(s)

a character vector indicating the names of the methods available to estimate the distribution parameter(s) for Type I multiply-censored data. See the help files for the specific function listed under Estimating Distribution Parameters in the help file for Censored Data for an explanation of each of these estimation methods.

Number.parameters

a numeric vector indicating the number of parameters associated with the distribution (see the column labeled Parameters in the table below).

Parameter.1

the columns labeled Parameter.1, Parameter.2, ..., Parameter.5 are character vectors containing the names of the distribution parameters (see the column labeled Parameters in the table below). If a distribution has $n$ parameters and $n < 5$, then the columns labeled Parameter.n+1, ..., Parameter.5 are empty. For example, the Normal distribution has only two parameters associated with it (mean and sd), so the fields in Parameter.3, Parameter.4, and Parameter.5 are empty.

Parameter.2

see Parameter.1

Parameter.3

see Parameter.1

Parameter.4

see Parameter.1

Parameter.5

see Parameter.1

Parameter.1.Min

the columns labeled Parameter.1.Min, Parameter.2.Min, ...,
Parameter.5.Min are character vectors containing the minimum values that can be assumed by the distribution parameters (see the column labeled Parameter Range(s) in the table below).

The reason these are character vectors instead of numeric vectors is because some parameters have a lower bound of 0 but must be strictly bigger than 0 (e.g., the parameter sd for the Normal distribution), in which case the lower bound is .Machine$double.eps, which may vary from machine to machine. Also, some parameters have a lower bound that depends on the value of another parameter. For example, the parameter max for a Uniform distribution is bounded below by the value of the parameter min.

If a distribution has $n$ parameters and $n < 5$, then the columns labeled Parameter.n+1.Min, ..., Parameter.5.Min have the missing value code (NA). For example, the Normal distribution has only two parameters associated with it (mean and sd) so the fields in
Parameter.3.Min, Parameter.4.Min, and Parameter.5.Min have NAs in them.

Parameter.2.Min

see Parameter.1.Min

Parameter.3.Min

see Parameter.1.Min

Parameter.4.Min

see Parameter.1.Min

Parameter.5.Min

see Parameter.1.Min

Parameter.1.Max

the columns labeled Parameter.1.Max, Parameter.2.Max, ...,
Parameter.5.Max are character vectors containing the maximum values that can be assumed by the distribution parameters (see the column labeled Parameter Range(s) in the table below).

The reason these are character vectors instead of numeric vectors is because some parameters have an upper bound that depends on the value of another parameter. For example, the parameter min for a Uniform distribution is bounded above by the value of the parameter max.

If a distribution has $n$ parameters and $n < 5$, then the columns labeled Parameter.n+1.Max, ..., Parameter.5.Max have the missing value code (NA). For example, the Normal distribution has only two parameters associated with it (mean and sd) so the fields in
Parameter.3.Max, Parameter.4.Max, and Parameter.5.Max have NAs in them.

Parameter.2.Max

see Parameter.1.Max

Parameter.3.Max

see Parameter.1.Max

Parameter.4.Max

see Parameter.1.Max

Parameter.5.Max

see Parameter.1.Max

Details

The table below summarizes the probability distributions available in R and EnvStats. For each distribution, there are four associated functions for computing density values, percentiles, quantiles, and random numbers. The form of the names of these functions are dabb, pabb, qabb, and rabb, where abb is the abbreviated name of the distribution (see table below). These functions are described in the help file with the name of the distribution (see the first column of the table below). For example, the help file for Beta describes the behavior of dbeta, pbeta, qbeta, and rbeta.

For most distributions, there is also an associated function for estimating the distribution parameters, and the form of the names of these functions is eabb, where abb is the abbreviated name of the distribution (see table below). All of these functions are listed in the help file Estimating Distribution Parameters. For example, the function ebeta estimates the shape parameters of a Beta distribution based on a random sample of observations from this distribution.

For some distributions, there are functions to estimate distribution parameters based on Type I censored data. The form of the names of these functions is eabbSinglyCensored for singly censored data and eabbMultiplyCensored for multiply censored data. All of these functions are listed under the heading Estimating Distribution Parameters in the help file Censored Data.

Table 1a. Available Distributions: Name, Abbreviation, Type, and Range

Name	Abbreviation	Type	Range
Beta	`beta`	Continuous	$[0, 1]$

Binomial	`binom`	Finite	$[0, size]$
		Discrete	(integer)

Cauchy	`cauchy`	Continuous	$(-\infty, \infty)$

Chi	`chi`	Continuous	$[0, \infty)$

Chi-square	`chisq`	Continuous	$[0, \infty)$

Exponential	`exp`	Continuous	$[0, \infty)$

Extreme	`evd`	Continuous	$(-\infty, \infty)$
Value

F	`f`	Continuous	$[0, \infty)$

Gamma	`gamma`	Continuous	$[0, \infty)$

Gamma	`gammaAlt`	Continuous	$[0, \infty)$
(Alternative)

Generalized	`gevd`	Continuous	$(-\infty, \infty)$
Extreme			for $shape = 0$
Value
			$(-\infty, location + \frac{scale}{shape}]$
			for $shape > 0$

			$[location + \frac{scale}{shape}, \infty)$
			for $shape < 0$

Geometric	`geom`	Discrete	$[0, \infty)$
			(integer)

Hypergeometric	`hyper`	Finite	$[0, min(k,m)]$
		Discrete	(integer)

Logistic	`logis`	Continuous	$(-\infty, \infty)$

Lognormal	`lnorm`	Continuous	$[0, \infty)$

Lognormal	`lnormAlt`	Continuous	$[0, \infty)$
(Alternative)

Lognormal	`lnormMix`	Continuous	$[0, \infty)$
Mixture

Lognormal	`lnormMixAlt`	Continuous	$[0, \infty)$
Mixture
(Alternative)

Three-	`lnorm3`	Continuous	$[threshold, \infty)$
Parameter
Lognormal

Truncated	`lnormTrunc`	Continuous	$[min, max]$
Lognormal

Truncated	`lnormTruncAlt`	Continuous	$[min, max]$
Lognormal
(Alternative)

Negative	`nbinom`	Discrete	$[0, \infty)$
Binomial			(integer)

Normal	`norm`	Continuous	$(-\infty, \infty)$

Normal	`normMix`	Continuous	$(-\infty, \infty)$
Mixture

Truncated	`normTrunc`	Continuous	$[min, max]$
Normal

Pareto	`pareto`	Continuous	$[location, \infty)$

Poisson	`pois`	Discrete	$[0, \infty)$
			(integer)

Student's t	`t`	Continuous	$(-\infty, \infty)$

Triangular	`tri`	Continuous	$[min, max]$

Uniform	`unif`	Continuous	$[min, max]$

Weibull	`weibull`	Continuous	$[0, \infty)$

Wilcoxon	`wilcox`	Finite	$[0, m n]$
Rank Sum		Discrete	(integer)

Zero-Modified	`zmlnorm`	Mixed	$[0, \infty)$
Lognormal
(Delta)

Zero-Modified	`zmlnormAlt`	Mixed	$[0, \infty)$
Lognormal
(Delta)
(Alternative)

Zero-Modified	`zmnorm`	Mixed	$(-\infty, \infty)$
Normal

Table 1b. Available Distributions: Name, Parameters, Parameter Default Values, Parameter Ranges, Estimation Method(s)

		Default	Parameter	Estimation
Name	Parameter(s)	Value(s)	Range(s)	Method(s)
Beta	`shape1`		$(0, \infty)$	mle, mme, mmue
	`shape2`		$(0, \infty)$
	`ncp`	`0`	$(0, \infty)$

Binomial	`size`		$[0, \infty)$	mle/mme/mvue
	`prob`		$[0, 1]$

Cauchy	`location`	`0`	$(-\infty, \infty)$
	`scale`	`1`	$(0, \infty)$

Chi	`df`		$(0, \infty)$

Chi-square	`df`		$(0, \infty)$
	`ncp`	`0`	$(-\infty, \infty)$

Exponential	`rate`	`1`	$(0, \infty)$	mle/mme

Extreme	`location`	`0`	$ (-\infty, \infty)$	mle, mme, mmue, pwme
Value	`scale`	`1`	$(0, \infty)$

F	`df1`		$(0, \infty)$
	`df2`		$(0, \infty)$
	`ncp`	`0`	$(0, \infty)$

Gamma	`shape`		$(0, \infty)$	mle, bcmle, mme, mmue
	`scale`	`1`	$(0, \infty)$

Gamma	`mean`		$(0, \infty)$	mle, bcmle, mme, mmue
(Alternative)	`cv`	`1`	$(0, \infty)$

Generalized	`location`	`0`	$(-\infty, \infty)$	mle, pwme, tsoe
Extreme	`scale`	`1`	$(0, \infty)$
Value	`shape`	`0`	$(-\infty, \infty)$

Geometric	`prob`		$(0, 1)$	mle/mme, mvue

Hypergeometric	`m`		$[0, \infty)$	mle, mvue
	`n`		$[0, \infty)$
	`k`		$[1, m+n]$

Logistic	`location`	`0`	$(-\infty, \infty)$	mle, mme, mmue
	`scale`	`1`	$(0, \infty)$

Lognormal	`meanlog`	`0`	$(-\infty, \infty)$	mle/mme, mvue
	`sdlog`	`1`	$(0, \infty)$

Lognormal	`mean`	`exp(1/2)`	$(0, \infty)$	mle, mme, mmue,
(Alternative)	`cv`	`sqrt(exp(1)-1)`	$(0, \infty)$	mvue, qmle

Lognormal	`meanlog1`	`0`	$(-\infty, \infty)$
Mixture	`sdlog1`	`1`	$(0, \infty)$
	`meanlog2`	`0`	$(-\infty, \infty)$
	`sdlog2`	`1`	$(0, \infty)$
	`p.mix`	`0.5`	$[0, 1]$

Lognormal	`mean1`	`exp(1/2)`	$(0, \infty)$
Mixture	`cv1`	`sqrt(exp(1)-1)`	$(0, \infty)$
(Alternative)	`mean2`	`exp(1/2)`	$(0, \infty)$
	`cv2`	`sqrt(exp(1)-1)`	$(0, \infty)$
	`p.mix`	`0.5`	$[0, 1]$

Three-	`meanlog`	`0`	$(-\infty, \infty)$	lmle, mme,
Parameter	`sdlog`	`1`	$(0, \infty)$	mmue, mmme,
Lognormal	`threshold`	`0`	$(-\infty, \infty)$	royston.skew,
				zero.skew

Truncated	`meanlog`	`0`	$(-\infty, \infty)$
Lognormal	`sdlog`	`1`	$(0, \infty)$
	`min`	`0`	$[0, max)$
	`max`	`Inf`	$(min, \infty)$

Truncated	`mean`	`exp(1/2)`	$(0, \infty)$
Lognormal	`cv`	`sqrt(exp(1)-1)`	$(0, \infty)$
(Alternative)	`min`	`0`	$[0, max)$
	`max`	`Inf`	$(min, \infty)$

Negative	`size`		$[1, \infty)$	mle/mme, mvue
Binomial	`prob`		$(0, 1]$
	`mu`		$(0, \infty)$

Normal	`mean`	`0`	$(-\infty, \infty)$	mle/mme, mvue
	`sd`	`1`	$(0, \infty)$

Normal	`mean1`	`0`	$(-\infty, \infty)$
Mixture	`sd1`	`1`	$(0, \infty)$
	`mean2`	`0`	$(-\infty, \infty)$
	`sd2`	`1`	$(0, \infty)$
	`p.mix`	`0.5`	$[0, 1]$

Truncated	`mean`	`0`	$(-\infty, \infty)$
Normal	`sd`	`1`	$(0, \infty)$
	`min`	`-Inf`	$(-\infty, max)$
	`max`	`Inf`	$(min, \infty)$

Pareto	`location`		$(0, \infty)$	lse, mle
	`shape`	`1`	$(0, \infty)$

Poisson	`lambda`		$(0, \infty)$	mle/mme/mvue

Student's t	`df`		$(0, \infty)$
	`ncp`	`0`	$(-\infty, \infty)$

Triangular	`min`	`0`	$(-\infty, max)$
	`max`	`1`	$(min, \infty)$
	`mode`	`0.5`	$(min, max)$

Uniform	`min`	`0`	$(-\infty, max)$	mle, mme, mmue
	`max`	`1`	$(min, \infty)$

Weibull	`shape`		$(0, \infty)$	mle, mme, mmue
	`scale`	`1`	$(0, \infty)$

Wilcoxon	`m`		$[1, \infty)$
Rank Sum	`n`		$[1, \infty)$

Zero-Modified	`meanlog`	`0`	$(-\infty, \infty)$	mvue
Lognormal	`sdlog`	`1`	$(0, \infty)$
(Delta)	`p.zero`	`0.5`	$[0, 1]$

Zero-Modified	`mean`	`exp(1/2)`	$(0, \infty)$	mvue
Lognormal	`cv`	`sqrt(exp(1)-1)`	$(0, \infty)$
(Delta)	`p.zero`	`0.5`	$[0, 1]$
(Alternative)
				Zero-Modified
`mean`	`0`	$(-\infty, \infty)$	mvue	Normal
`sd`	`1`	$(0, \infty)$
`p.zero`	`0.5`	$[0, 1]$

References

Millard, S.P. (2013). EnvStats: An R Package for Environmental Statistics. Springer, New York. https://link.springer.com/book/10.1007/978-1-4614-8456-1.