paretoff: Pareto and Truncated Pareto Distribution Family Functions

Description

Estimates one of the parameters of the Pareto(I) distribution by maximum likelihood estimation. Also includes the upper truncated Pareto(I) distribution.

Usage

paretoff(scale = NULL, lshape = "loglink")
truncpareto(lower, upper, lshape = "loglink", ishape = NULL, imethod = 1)

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, and vgam.

Arguments

lshape: Parameter link function applied to the parameter $k$. See Links for more choices. A log link is the default because $k$ is positive.
scale: Numeric. The parameter $\alpha$ below. If the user inputs a number then it is assumed known with this value. The default means it is estimated by maximum likelihood estimation, which means min(y) is used, where y is the response vector.
lower, upper: Numeric. Lower and upper limits for the truncated Pareto distribution. Each must be positive and of length 1. They are called $\alpha$ and $U$ below.
ishape: Numeric. Optional initial value for the shape parameter. A NULL means a value is obtained internally. If failure to converge occurs try specifying a value, e.g., 1 or 2.
imethod: See CommonVGAMffArguments for information. If failure to converge occurs then try specifying a value for ishape.

Author

T. W. Yee

Warning

The usual or unbounded Pareto distribution has two parameters (called $\alpha$ and $k$ here) but the family function paretoff estimates only $k$ using iteratively reweighted least squares. The MLE of the $\alpha$ parameter lies on the boundary and is min(y) where y is the response. Consequently, using the default argument values, the standard errors are incorrect when one does a summary on the fitted object. If the user inputs a value for alpha then it is assumed known with this value and then summary on the fitted object should be correct. Numerical problems may occur for small $k$, e.g., $k < 1$.

Details

A random variable $Y$ has a Pareto distribution if $$P[Y>y] = C / y^{k}$$ for some positive $k$ and $C$. This model is important in many applications due to the power law probability tail, especially for large values of $y$.

The Pareto distribution, which is used a lot in economics, has a probability density function that can be written $$f(y;\alpha,k) = k \alpha^k / y^{k+1}$$ for $0 < \alpha < y$ and $0<k$. The $\alpha$ is called the scale parameter, and it is either assumed known or else min(y) is used. The parameter $k$ is called the shape parameter. The mean of $Y$ is $\alpha k/(k-1)$ provided $k > 1$. Its variance is $\alpha^2 k /((k-1)^2 (k-2))$ provided $k > 2$.

The upper truncated Pareto distribution has a probability density function that can be written $$f(y) = k \alpha^k / [y^{k+1} (1-(\alpha/U)^k)]$$ for $0 < \alpha < y < U < \infty$ and $k>0$. Possibly, better names for $k$ are the index and tail parameters. Here, $\alpha$ and $U$ are known. The mean of $Y$ is $k \alpha^k (U^{1-k}-\alpha^{1-k}) / [(1-k)(1-(\alpha/U)^k)]$.

References

Forbes, C., Evans, M., Hastings, N. and Peacock, B. (2011). Statistical Distributions, Hoboken, NJ, USA: John Wiley and Sons, Fourth edition.

Aban, I. B., Meerschaert, M. M. and Panorska, A. K. (2006). Parameter estimation for the truncated Pareto distribution, Journal of the American Statistical Association, 101(473), 270--277.

Examples

Run this code

alpha <- 2; kay <- exp(3)
pdata <- data.frame(y = rpareto(n = 1000, scale = alpha, shape = kay))
fit <- vglm(y ~ 1, paretoff, data = pdata, trace = TRUE)
fit@extra  # The estimate of alpha is here
head(fitted(fit))
with(pdata, mean(y))
coef(fit, matrix = TRUE)
summary(fit)  # Standard errors are incorrect!!

# Here, alpha is assumed known
fit2 <- vglm(y ~ 1, paretoff(scale = alpha), data = pdata, trace = TRUE)
fit2@extra  # alpha stored here
head(fitted(fit2))
coef(fit2, matrix = TRUE)
summary(fit2)  # Standard errors are okay

# Upper truncated Pareto distribution
lower <- 2; upper <- 8; kay <- exp(2)
pdata3 <- data.frame(y = rtruncpareto(n = 100, lower = lower,
                                      upper = upper, shape = kay))
fit3 <- vglm(y ~ 1, truncpareto(lower, upper), data = pdata3, trace = TRUE)
coef(fit3, matrix = TRUE)
c(fit3@misc$lower, fit3@misc$upper)

Run the code above in your browser using DataLab