pareto1: Pareto and Truncated Pareto Distribution Family Functions

Description

Estimates one of the parameters of the Pareto(I) distribution by maximum likelihood estimation. Also includes the upper truncated Pareto(I) distribution.

Usage

pareto1(lshape = "loge", location = NULL)
tpareto1(lower, upper, lshape = "loge", ishape = NULL, imethod = 1)

Arguments

lshape

Parameter link function applied to the parameter $k$. See Links for more choices. A log link is the default because $k$ is positive.

lower, upper

Numeric. Lower and upper limits for the truncated Pareto distribution. Each must be positive and of length 1. They are called $\alpha$ and $U$ below.

ishape

Numeric. Optional initial value for the shape parameter. A NULL means a value is obtained internally. If failure to converge occurs try specifying a value, e.g., 1 or 2.

location

Numeric. The parameter $\alpha$ below. If the user inputs a number then it is assumed known with this value. The default means it is estimated by maximum likelihood estimation, which means min(y) where y is the response v

imethod

An integer with value 1 or 2 which specifies the initialization method. If failure to converge occurs try the other value, or else specify a value for ishape.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, and vgam.

Warning

The usual or unbounded Pareto distribution has two parameters (called $\alpha$ and $k$ here) but the family function pareto1 estimates only $k$ using iteratively reweighted least squares. The MLE of the $\alpha$ parameter lies on the boundary and is min(y) where y is the response. Consequently, using the default argument values, the standard errors are incorrect when one does a summary on the fitted object. If the user inputs a value for alpha then it is assumed known with this value and then summary on the fitted object should be correct. Numerical problems may occur for small $k$, e.g., $k < 1$.

Details

A random variable $Y$ has a Pareto distribution if $$P[Y>y] = C / y^{k}$$ for some positive $k$ and $C$. This model is important in many applications due to the power law probability tail, especially for large values of $y$.

The Pareto distribution, which is used a lot in economics, has a probability density function that can be written $$f(y) = k \alpha^k / y^{k+1}$$ for $0 < \alpha < y$ and $k>0$. The $\alpha$ is known as the location parameter, and $k$ is known as the shape parameter. The mean of $Y$ is $\alpha k/(k-1)$ provided $k > 1$. Its variance is $\alpha^2 k /((k-1)^2 (k-2))$ provided $k > 2$.

The upper truncated Pareto distribution has a probability density function that can be written $$f(y) = k \alpha^k / [y^{k+1} (1-(\alpha/U)^k)]$$ for $0 < \alpha < y < U < \infty$ and $k>0$. Possibly, better names for $k$ are the index and tail parameters. Here, $\alpha$ and $U$ are known. The mean of $Y$ is $k \alpha^k (U^{1-k}-\alpha^{1-k}) / [(1-k)(1-(\alpha/U)^k)]$.

References

Evans, M., Hastings, N. and Peacock, B. (2000) Statistical Distributions, New York: Wiley-Interscience, Third edition.

Aban, I. B., Meerschaert, M. M. and Panorska, A. K. (2006) Parameter estimation for the truncated Pareto distribution, Journal of the American Statistical Association, 101(473), 270--277.

Examples

Run this code

alpha <- 2; kay <- exp(3)
pdat <- data.frame(y = rpareto(n = 1000, location = alpha, shape = kay))
fit <- vglm(y ~ 1, pareto1, pdat, trace = TRUE)
fit@extra # The estimate of alpha is here
head(fitted(fit))
with(pdat, mean(y))
coef(fit, matrix = TRUE)
summary(fit) # Standard errors are incorrect!!

# Here, alpha is assumed known
fit2 <- vglm(y ~ 1, pareto1(location = alpha), pdat, trace = TRUE)
fit2@extra # alpha stored here
head(fitted(fit2))
coef(fit2, matrix = TRUE)
summary(fit2) # Standard errors are okay

# Upper truncated Pareto distribution
lower <- 2; upper <- 8; kay <- exp(2)
pdat3 <- data.frame(y = rtpareto(n = 100, lower = lower,
                                 upper = upper, shape = kay))
fit3 <- vglm(y ~ 1, tpareto1(lower, upper), pdat3, trace = TRUE)
coef(fit3, matrix = TRUE)
c(fit3@misc$lower, fit3@misc$upper)

Run the code above in your browser using DataLab