loglaplace: Log-Laplace and Logit-Laplace Distribution Family Functions

Description

Maximum likelihood estimation of the 1-parameter log-Laplace and the 1-parameter logit-Laplace distributions. These may be used for quantile regression for counts and proportions respectively.

Usage

loglaplace1(tau = NULL, llocation = "loglink",
    ilocation = NULL, kappa = sqrt(tau/(1 - tau)), Scale.arg = 1,
    ishrinkage = 0.95, parallel.locat = FALSE, digt = 4,
    idf.mu = 3, rep0 = 0.5, minquantile = 0, maxquantile = Inf,
    imethod = 1, zero = NULL)
logitlaplace1(tau = NULL, llocation = "logitlink",
    ilocation = NULL, kappa = sqrt(tau/(1 - tau)),
    Scale.arg = 1, ishrinkage = 0.95, parallel.locat = FALSE,
    digt = 4, idf.mu = 3, rep01 = 0.5, imethod = 1, zero = NULL)

Arguments

tau, kappa

See alaplace1.

llocation

Character. Parameter link functions for location parameter \(\xi\). See Links for more choices. However, this argument should be left unchanged with count data because it restricts the quantiles to be positive. With proportions data llocation can be assigned a link such as logitlink, probitlink, clogloglink, etc.

ilocation

Optional initial values. If given, it must be numeric and values are recycled to the appropriate length. The default is to choose the value internally.

parallel.locat

Logical. Should the quantiles be parallel on the transformed scale (argument llocation)? Assigning this argument to TRUE circumvents the seriously embarrassing quantile crossing problem.

imethod

Initialization method. Either the value 1, 2, or ….

idf.mu, ishrinkage, Scale.arg, digt, zero

See alaplace1.

rep0, rep01

Numeric, positive. Replacement values for 0s and 1s respectively. For count data, values of the response whose value is 0 are replaced by rep0; it avoids computing log(0). For proportions data values of the response whose value is 0 or 1 are replaced by min(rangey01[1]/2, rep01/w[y< = 0]) and max((1 + rangey01[2])/2, 1-rep01/w[y >= 1]) respectively; e.g., it avoids computing logitlink(0) or logitlink(1). Here, rangey01 is the 2-vector range(y[(y > 0) & (y < 1)]) of the response.

minquantile, maxquantile

Numeric. The minimum and maximum values possible in the quantiles. These argument are effectively ignored by default since loglink keeps all quantiles positive. However, if llocation = logofflink(offset = 1) then it is possible that the fitted quantiles have value 0 because minquantile = 0.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm and vgam.

In the extra slot of the fitted object are some list components which are useful. For example, the sample proportion of values which are less than the fitted quantile curves, which is sum(wprior[y <= location]) / sum(wprior) internally. Here, wprior are the prior weights (called ssize below), y is the response and location is a fitted quantile curve. This definition comes about naturally from the transformed ALD data.

Warning

The VGAM family function logitlaplace1 will not handle a vector of just 0s and 1s as the response; it will only work satisfactorily if the number of trials is large.

See alaplace1 for other warnings. Care is needed with tau values which are too small, e.g., for count data the sample proportion of zeros must be less than all values in tau. Similarly, this also holds with logitlaplace1, which also requires all tau values to be less than the sample proportion of ones.

Details

These VGAM family functions implement translations of the asymmetric Laplace distribution (ALD). The resulting variants may be suitable for quantile regression for count data or sample proportions. For example, a log link applied to count data is assumed to follow an ALD. Another example is a logit link applied to proportions data so as to follow an ALD. A positive random variable \(Y\) is said to have a log-Laplace distribution if \(Y = e^W\) where \(W\) has an ALD. There are many variants of ALDs and the one used here is described in alaplace1.

References

Kotz, S., Kozubowski, T. J. and Podgorski, K. (2001). The Laplace distribution and generalizations: a revisit with applications to communications, economics, engineering, and finance, Boston: Birkhauser.

Kozubowski, T. J. and Podgorski, K. (2003). Log-Laplace distributions. International Mathematical Journal, 3, 467--495.

Yee, T. W. (2020). Quantile regression for counts and proportions. In preparation.

Examples

Run this code

# NOT RUN {
# Example 1: quantile regression of counts with regression splines
set.seed(123); my.k <- exp(0)
adata <- data.frame(x2 = sort(runif(n <- 500)))
mymu <- function(x) exp( 1 + 3*sin(2*x) / (x+0.5)^2)
adata <- transform(adata, y = rnbinom(n, mu = mymu(x2), size = my.k))
mytau <- c(0.1, 0.25, 0.5, 0.75, 0.9); mydof = 3
# halfstepping is usual:
fitp <- vglm(y ~ sm.bs(x2, df = mydof), data = adata, trace = TRUE,
            loglaplace1(tau = mytau, parallel.locat = TRUE))

# }
# NOT RUN {
par(las = 1)  # Plot on a log1p() scale
mylwd <- 1.5


plot(jitter(log1p(y), factor = 1.5) ~ x2, adata, col = "red", pch = "o",
     main = "Example 1; darkgreen=truth, blue=estimated", cex = 0.75)
with(adata, matlines(x2, log1p(fitted(fitp)), col = "blue",
                      lty = 1, lwd = mylwd))
finexgrid <- seq(0, 1, len = 201)
for (ii in 1:length(mytau))
  lines(finexgrid, col = "darkgreen", lwd = mylwd,
        log1p(qnbinom(p = mytau[ii], mu = mymu(finexgrid), si = my.k)))
# }
# NOT RUN {
fitp@extra  # Contains useful information


# Example 2: sample proportions
set.seed(123); nnn <- 1000; ssize <- 100  # ssize = 1 will not work!
adata <- data.frame(x2 = sort(runif(nnn)))
mymu <- function(x) logitlink( 1.0 + 4*x, inv = TRUE)
adata <- transform(adata, ssize = ssize,
                    y2 = rbinom(nnn, size = ssize, prob = mymu(x2)) / ssize)

mytau <- c(0.25, 0.50, 0.75)
fit1 <- vglm(y2 ~ sm.bs(x2, df = 3), data = adata,
             weights = ssize, trace = TRUE,
             logitlaplace1(tau = mytau, lloc = "clogloglink", paral = TRUE))

# }
# NOT RUN {
# Check the solution.  Note: this may be like comparing apples with oranges.
plotvgam(fit1, se = TRUE, scol = "red", lcol = "blue",
         main = "Truth = 'darkgreen'")
# Centered approximately !
linkFunctionChar <- as.character(fit1@misc$link)
adata <- transform(adata, trueFunction=
                   theta2eta(theta = mymu(x2), link=linkFunctionChar))
with(adata, lines(x2, trueFunction - mean(trueFunction), col = "darkgreen"))


# Plot the data + fitted quantiles (on the original scale)
myylim <- with(adata, range(y2))
plot(y2 ~ x2, adata, col = "blue", ylim = myylim, las = 1,
     pch = ".", cex = 2.5)
with(adata, matplot(x2, fitted(fit1), add = TRUE, lwd = 3, type = "l"))
truecol <- rep(1:3, len = fit1@misc$M)  # Add the 'truth'
smallxgrid <- seq(0, 1, len = 501)
for (ii in 1:length(mytau))
  lines(smallxgrid, col = truecol[ii], lwd = 2,
        qbinom(p = mytau[ii], prob = mymu(smallxgrid), size = ssize) / ssize)


# Plot on the eta (== logitlink()/probit()/...) scale
with(adata, matplot(x2, predict(fit1), add = FALSE, lwd = 3, type = "l"))
# Add the 'truth'
for (ii in 1:length(mytau)) {
  true.quant <- qbinom(p = mytau[ii], pr = mymu(smallxgrid), si = ssize) / ssize
  lines(smallxgrid, theta2eta(theta = true.quant, link = linkFunctionChar),
        col = truecol[ii], lwd = 2)
}
# }

Run the code above in your browser using DataLab