Maximum-likelihood fitting of a univariate distribution when the data are represented by a set of frequencies pertaining to given set of contiguous intervals.
fitdistr.grouped(breaks, counts, family, weights, trace = FALSE, wpar = NULL)
An object of class fitdistr.grouped
, whose components are described in
fitdistr.grouped-class
.
A numeric vector of strictly increasing values which identify a set of contiguous intervals on the real line. See ‘Details’ for additional information.
A vector of non-negative integers representing the number
of observations falling in the intervals specified by breaks
;
it is then required that length(counts)+1=length(breaks)
.
A character string specifying the parametric family of
distributions to be used for fitted.
Admissible names are: "normal"
, "logistic"
, "t"
,
"Cauchy"
, "SN"
, "ST"
, "SC"
, "gamma"
,
"Weibull"
;
the names "gaussian"
and "Gaussian"
are also allowed,
and are converted to "normal"
.
An alias for counts
, allowed for analogy with the
selm
function.
A logical value which indicates whether intermediate evaluations
of the optimization process are printed (default: FALSE
).
An optional vector with initial values of the ‘working parameters’ for starting the maximization of the log-likelihood function; see ‘Details’ for their description.
Adelchi Azzalini
The original motivation of this function was fitting a univariate SN,
ST or SC distribution from grouped data;
its scope was later extended to include some other continuous distributions.
The adopted name of the function reflects the broad similarity of its purpose
with the one of fitdistr
, but there are substantial
differences in the actual working of the two functions.
The parameter set of a given family
is the same as appearing in the
corresponding d<basename>
function, with the exception of the "t"
distribution, for which a location and a scale parameter are included,
besides df
.
The range of breaks
does not need to span the whole support of the
chosen family
of distributions, that is,
(0, Inf)
for "Weibull"
and "gamma"
families,
(-Inf, Inf)
for the other families.
In fact, for the purpose of post-fitting plotting, an infinite
range(breaks)
represents a complication, requiring an ad hoc
handling; so it is sensible to avoid it.
However, at the maximum-likelihood fitting stage, the full support of
the probability distribution is considered, with the following implications.
If max(breaks)=xR
, say, and xR<Inf
, then an additional
interval (xR, Inf)
is introduced, with value counts=0
assigned.
A similar action is taken at the lower end: if min(breaks)=xL
is
larger than the infimum of the support of the distribution,
an extra 0-counts
interval is introduced as (0, xL)
or (-Inf, xL)
, depending on the support of the family
.
Maximum likelihood fitting is obtained by maximizing the pertaining
multinomial log-likelihood using the optim
function
with method="Nelder-Mead"
. For numerical convenience, the numerical
search is performed using ‘working parameters’ in place of the original
ones, with reverse conversion at the end. The working parameters coincide
with the original distribution parameters when they have unbounded range,
while they are log-transformed in case of intrinsically positive parameters.
This transformation applies to the parameters of the positive-valued
distributions ("gamma" and "Weibull"), all scale parameters and df
of the "t"
distribution.
For methods pertaining to this class of objects, see
fitdistr.grouped-class
and
plot.fitdistr.grouped
; see also
dsn
, dst
, dsc
,
Distributions
, dmultinom
;
see also selm
for ungrouped data fitting and an example
elaborated on below.
data(barolo)
attach(barolo)
A75 <- (reseller=="A" & volume==75)
logPrice <- log(price[A75], 10) # used in documentation of 'selm'; see its fitting
detach(barolo)
breaks<- seq(1, 3, by=0.25)
f <- cut(logPrice, breaks = breaks)
counts <- tabulate(f, length(levels(f)))
logPrice.grouped <- fitdistr.grouped(breaks, counts, family='ST')
summary(logPrice.grouped) # compare this fit with the ungrouped data fitting
Run the code above in your browser using DataLab