Learn R Programming

sn (version 2.1.1)

fitdistr.grouped: Maximum-likelihood fitting of a univariate distribution from grouped data

Description

Maximum-likelihood fitting of a univariate distribution when the data are represented by a set of frequencies pertaining to given set of contiguous intervals.

Usage

fitdistr.grouped(breaks, counts, family, weights, trace = FALSE, wpar = NULL)

Value

An object of class fitdistr.grouped, whose components are described in fitdistr.grouped-class.

Arguments

breaks

A numeric vector of strictly increasing values which identify a set of contiguous intervals on the real line. See ‘Details’ for additional information.

counts

A vector of non-negative integers representing the number of observations falling in the intervals specified by breaks; it is then required that length(counts)+1=length(breaks).

family

A character string specifying the parametric family of distributions to be used for fitted. Admissible names are: "normal", "logistic", "t", "Cauchy", "SN", "ST", "SC", "gamma", "Weibull"; the names "gaussian" and "Gaussian" are also allowed, and are converted to "normal".

weights

An alias for counts, allowed for analogy with the selm function.

trace

A logical value which indicates whether intermediate evaluations of the optimization process are printed (default: FALSE).

wpar

An optional vector with initial values of the ‘working parameters’ for starting the maximization of the log-likelihood function; see ‘Details’ for their description.

Author

Adelchi Azzalini

Details

The original motivation of this function was fitting a univariate SN, ST or SC distribution from grouped data; its scope was later extended to include some other continuous distributions. The adopted name of the function reflects the broad similarity of its purpose with the one of fitdistr, but there are substantial differences in the actual working of the two functions.

The parameter set of a given family is the same as appearing in the corresponding d<basename> function, with the exception of the "t" distribution, for which a location and a scale parameter are included, besides df.

The range of breaks does not need to span the whole support of the chosen family of distributions, that is, (0, Inf) for "Weibull" and "gamma" families, (-Inf, Inf) for the other families. In fact, for the purpose of post-fitting plotting, an infinite range(breaks) represents a complication, requiring an ad hoc handling; so it is sensible to avoid it. However, at the maximum-likelihood fitting stage, the full support of the probability distribution is considered, with the following implications. If max(breaks)=xR, say, and xR<Inf, then an additional interval (xR, Inf) is introduced, with value counts=0 assigned. A similar action is taken at the lower end: if min(breaks)=xL is larger than the infimum of the support of the distribution, an extra 0-counts interval is introduced as (0, xL) or (-Inf, xL), depending on the support of the family.

Maximum likelihood fitting is obtained by maximizing the pertaining multinomial log-likelihood using the optim function with method="Nelder-Mead". For numerical convenience, the numerical search is performed using ‘working parameters’ in place of the original ones, with reverse conversion at the end. The working parameters coincide with the original distribution parameters when they have unbounded range, while they are log-transformed in case of intrinsically positive parameters. This transformation applies to the parameters of the positive-valued distributions ("gamma" and "Weibull"), all scale parameters and df of the "t" distribution.

See Also

For methods pertaining to this class of objects, see fitdistr.grouped-class and plot.fitdistr.grouped; see also dsn, dst, dsc, Distributions, dmultinom; see also selm for ungrouped data fitting and an example elaborated on below.

Examples

Run this code
data(barolo)
attach(barolo)
A75 <- (reseller=="A" & volume==75)
logPrice <- log(price[A75], 10) # used in documentation of 'selm'; see its fitting
detach(barolo)
breaks<- seq(1, 3, by=0.25)
f <- cut(logPrice, breaks = breaks)
counts <- tabulate(f, length(levels(f))) 
logPrice.grouped <- fitdistr.grouped(breaks, counts, family='ST')
summary(logPrice.grouped) # compare this fit with the ungrouped data fitting 

Run the code above in your browser using DataLab