Maximum likelihood estimation of the Tweedie index parameter \(p\).
tweedie.profile(formula, p.vec=NULL, xi.vec=NULL, link.power=0,
data, weights, offset, fit.glm=FALSE,
do.smooth=TRUE, do.plot=FALSE, do.ci=do.smooth,
eps=1/6,
control=list( epsilon=1e-09, maxit=glm.control()$maxit, trace=glm.control()$trace ),
do.points=do.plot, method="inversion", conf.level=0.95,
phi.method=ifelse(method == "saddlepoint", "saddlepoint", "mle"),
verbose=FALSE, add0=FALSE)
The main purpose of the function is to estimate the value
of the Tweedie index parameter, \(p\),
which is produced by the output list as p.max
.
Optionally (if do.plot=TRUE
),
a plot is produced that shows the profile log-likelihood
computed at each value in p.vec
(smoothed if do.smooth=TRUE
).
This function can be temperamental
(for theoretical reasons involved in numerically computing the density),
and this plot shows the values of \(p\) requested on the
horizontal axis (using rug
);
there may be fewer points on the plot,
since the likelihood some values of \(p\) requested
may have returned NaN
, Inf
or NA
.
A list containing the components:
y
and x
(such that plot(x,y)
(partially)
recreates the profile likelihood plot);
ht
(the height of the nominal confidence interval);
L
(the estimate of the (log-) likelihood at each given value of p
);
p
(the p
-values used);
phi
(the computed values of phi
at the values in p
);
p.max
(the estimate of the mle of p
);
L.max
(the estimate of the (log-) likelihood at p.max
);
phi.max
(the estimate of phi
at p.max
);
ci
(the lower and upper limits of the confidence interval for p
);
method
(the method used for estimation: series
, inversion
,
interpolation
or saddlepoint
);
phi.method
(the method used for estimation of phi
:
saddlepoint
or phi
).
If glm.fit
is TRUE
,
the list also contains a component glm.obj
,
a glm
object for the fitted Tweedie generalized linear model.
a formula expression as for other regression models and generalized linear models,
of the form response ~ predictors
.
For details,
see the documentation for lm
,
glm
and formula
a vector of p
values for consideration.
The values must all be larger than one
(if the response variable has exact zeros,
the values must all be between one and two).
If NULL
(the default),
p.vec
is set to
seq(1.2, 1.8, by=0.1)
if the
response contains any zeros,
or
seq(1.5, 5, by=0.5)
if the
response contains no zeros.
See the DETAILS section below for further details.
the same as p.vec
;
some authors use the \(p\) notation for the index parameter,
and some use \(\xi\);
this function detects which is used and then uses that notation throughout
the power link function to use.
These link functions \(g(\cdot)\) are of the form
\(g(\eta)=\eta^{\rm link.power}\),
and the special case of link.power=0
(the default)
refers to the logarithm link function.
See the documentation for
tweedie
also.
an optional data frame, list or environment
(or object coercible by as.data.frame
to a data frame)
containing the variables in the model.
If not found in data
,
the variables are taken from environment(formula)
,
typically the environment from which glm
is called.
an optional vector of weights to be used in the fitting
process. Should be NULL
or a numeric vector.
this can be used to specify an a priori
known component to be included in the linear predictor during fitting.
This should be NULL
or a numeric vector of length either one or
equal to the number of cases.
One or more offset
terms can
be included in the formula instead or as well,
and if both are specified their sum is used.
See model.offset
.
logical flag.
If TRUE
,
the Tweedie generalized linear model is fitted using the value of \(p\)
found by the profiling function.
If FALSE
(the default),
no model is fitted.
logical flag.
If TRUE
(the default),
a spline is fitted to the data to smooth the profile likelihood plot.
If FALSE
,
no smoothing is used
(and the function is quicker).
Note that p.vec
must contain at least five points
for smoothing to be allowed.
logical flag.
If TRUE
,
a plot of the profile likelihood is produce.
If FALSE
(the default),
no plot is produced.
logical flag.
If TRUE
,
the nominal 100*conf.level
is computed.
If FALSE
,
no confidence interval is computed.
By default,
do.ci
is the same value as do.smooth
,
since a confidence interval will only be accurate if
smoothing has been performed.
Indeed,
if do.smooth=FALSE
,
confidence intervals are never computed and
do.ci
is forced to FALSE
if it is given as TRUE
.
the offset in computing the variance function.
The default is eps=1/6
(as suggested by Nelder and Pregibon, 1987).
Note eps
is ignored unless the
method="saddlepoint"
as it makes no sense otherwise.
a list of parameters for controlling the fitting process;
see glm.control
and glm
.
The default is to use the maximum number of iterations maxit
and the
trace
setting as given in glm.control
,
but to set epsilon
to 1e-09
to ensure a smoother plot
plot the points on the plot where the
(log-) likelihood is computed for the given values of p
;
defaults to the same value as do.plot
the method for computing the (log-) likelihood.
One of
"series"
,
"inversion"
(the default),
"interpolation"
or
"saddlepoint"
.
If there are any troubles using this function,
sometimes a change of method will fix the problem.
Note that method="saddlepoint"
is only an approximate method for computing the (log-) likelihood.
Using method="interpolation"
may produce a jump in the profile likelihood as it changes computational regimes.
the confidence level for the computation of the nominal
confidence interval.
The default is conf.level=0.95
.
the method for estimating phi
,
one of
"saddlepoint"
or
"mle"
.
A maximum likelihood estimate is used unless
method="saddlepoint"
,
when the saddlepoint approximation method is used.
Note that using
phi.method="saddlepoint"
is equivalent to using the mean deviance estimator of phi
.
the amount of feedback requested:
0
or FALSE
means minimal feedback (the default),
1
or TRUE
means some feedback,
or 2
means to show all feedback.
Since the function can be slow and sometimes problematic,
feedback can be good;
but it can also be unnecessary when one knows all is well.
if TRUE
, the value p=0
is used in forming the profile log-likelihood
(corresponding to the normal distribution);
the default value is add0=FALSE
Peter Dunn (pdunn2@usc.edu.au)
For each value in p.vec
,
the function computes an estimate of phi
and then computes the value of the log-likelihood for these parameters.
The plot of the log-likelihood against p.vec
allows the maximum likelihood value of p
to be found.
Once the value of p
is found,
the distribution within the class of Tweedie distribution is identified.
Dunn, P. K. and Smyth, G. K. (2008). Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Statistics and Computing, 18, 73--86. tools:::Rd_expr_doi("10.1007/s11222-007-9039-6")
Dunn, Peter K and Smyth, Gordon K (2005). Series evaluation of Tweedie exponential dispersion model densities Statistics and Computing, 15(4). 267--280. tools:::Rd_expr_doi("10.1007/s11222-005-4070-y")
Dunn, Peter K and Smyth, Gordon K (2001). Tweedie family densities: methods of evaluation. Proceedings of the 16th International Workshop on Statistical Modelling, Odense, Denmark, 2--6 July
Jorgensen, B. (1987). Exponential dispersion models. Journal of the Royal Statistical Society, B, 49, 127--162.
Jorgensen, B. (1997). Theory of Dispersion Models. Chapman and Hall, London.
Nelder, J. A. and Pregibon, D. (1987). An extended quasi-likelihood function. Biometrika 74(2), 221--232. tools:::Rd_expr_doi("10.1093/biomet/74.2.221")
Tweedie, M. C. K. (1984). An index which distinguishes between some important exponential families. Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference (Eds. J. K. Ghosh and J. Roy), pp. 579-604. Calcutta: Indian Statistical Institute.
dtweedie
,
dtweedie.saddle
,
tweedie
library(statmod) # Needed to use tweedie.profile
# Generate some fictitious data
test.data <- rgamma(n=200, scale=1, shape=1)
# The gamma is a Tweedie distribution with power=2;
# let's see if p=2 is suggested by tweedie.profile:
if (FALSE) {
out <- tweedie.profile( test.data ~ 1,
p.vec=seq(1.5, 2.5, by=0.2) )
out$p.max
out$ci
}
Run the code above in your browser using DataLab