FitDistr: FitDistr: Maximum-likelihood Fitting of Univariate Distributions

Description

Maximum-likelihood fitting of univariate distributions, allowing parameters to be held fixed if desired.

Usage

FitDistr(x, densfun, start, ...)

Value

The function `FitDistr` returns an object of class `fitdistr`, which is a list containing:

estimate: a named vector of parameter estimates.
sd: a named vector of the estimated standard errors for the parameters.
vcov: the estimated variance-covariance matrix of the parameter estimates.
loglik: the log-likelihood of the fitted model.
n: length vector.

Arguments

x: A numeric vector of length at least one containing only finite values. Either a character string or a function returning a density evaluated at its first argument.
densfun: character string specifying the density function to be used for fitting the distribution. Distributions `"beta"`, `"cauchy"`, `"chi-squared"`, `"exponential"`, `"gamma"`, `"geometric"`, `"log-normal"`, `"lognormal"`, `"logistic"`, `"negative binomial"`, `"normal"`, `"Poisson"`, `"t"` and "weibull" are recognised, case being ignored.
start: A named list giving the parameters to be optimized with initial values. This can be omitted for some of the named distributions and must be for others (see Details).
...: Additional parameters, either for `densfun` or for `optim`. In particular, it can be used to specify bounds via `lower` or `upper` or both. If arguments of `densfun` (or the density function corresponding to a character-string specification) are included they will be held fixed.

Details

For the Normal, log-Normal, geometric, exponential and Poisson distributions the closed-form MLEs (and exact standard errors) are used, and `start` should not be supplied.

For all other distributions, direct optimization of the log-likelihood is performed using `optim`. The estimated standard errors are taken from the observed information matrix, calculated by a numerical approximation. For one-dimensional problems the Nelder-Mead method is used and for multi-dimensional problems the BFGS method, unless arguments named `lower` or `upper` are supplied (when `L-BFGS-B` is used) or `method` is supplied explicitly.

For the `"t"` named distribution the density is taken to be the location-scale family with location `m` and scale `s`.

For the following named distributions, reasonable starting values will be computed if `start` is omitted or only partially specified: `"cauchy"`, `"gamma"`, `"logistic"`, `"negative binomial"` (parametrized by mu and size), `"t"` and `"weibull"`. Note that these starting values may not be good enough if the fit is poor: in particular they are not resistant to outliers unless the fitted distribution is long-tailed.

There are `print`, `coef`, `vcov` and `logLik` methods for class `"FitDistr"`.

Examples

Run this code

set.seed(123)
x = rgamma(100, shape = 5, rate = 0.1)
FitDistr(x, "gamma")

# Now do this directly with more control.
FitDistr(x, dgamma, list(shape = 1, rate = 0.1), lower = 0.001)

set.seed(123)
x2 = rt(250, df = 9)
FitDistr(x2, "t", df = 9)

# Allow df to vary: not a very good idea!
FitDistr(x2, "t")

# Now do fixed-df fit directly with more control.
mydt = function(x, m, s, df) dt((x-m)/s, df)/s
FitDistr(x2, mydt, list(m = 0, s = 1), df = 9, lower = c(-Inf, 0))

set.seed(123)
x3 = rweibull(100, shape = 4, scale = 100)
FitDistr(x3, "weibull")