extlogF1: Extended log-F Distribution Family Function

Description

Maximum likelihood estimation of the 1-parameter extended log-F distribution.

Usage

extlogF1(tau = c(0.25, 0.5, 0.75), parallel = TRUE ~ 0,
          seppar = 0, tol0 = -0.001,
          llocation = "identitylink", ilocation = NULL,
          lambda.arg = NULL, scale.arg = 1, ishrinkage = 0.95,
          digt = 4, idf.mu = 3, imethod = 1)

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm

and vgam.

Arguments

tau: Numeric, the desired quantiles. A strictly increasing sequence, each value must be in \((0, 1)\). The default values are the three quartiles, matching lms.bcn.
parallel: Similar to alaplace1, applying to the location parameters. One can try fix up the quantile-crossing problem after fitting the model by calling fix.crossing. Use is.crossing to see if there is a problem. The default for parallel is totally FALSE, i.e., FALSE for every variable including the intercept. Quantile-crossing can occur when values of tau are too close, given the data. How the quantiles are modelled with respect to the covariates also has a big effect, e.g., if they are too flexible or too inflexible then the problem is likely to occur. For example, using bs with df = 10 is likely to create problems.
seppar, tol0: Numeric, both of unit length and nonnegative, the separation and shift parameters. If seppar is positive then any crossing quantile is penalized by the difference cubed multiplied by seppar. The log-likelihood subtracts the penalty. The shift parameter ensures that the result is strictly noncrossing when seppar is large enough; otherwise if tol0 = 0 and seppar is large then the crossing quantiles remain crossed even though the offending amount becomes small but never exactly 0. Informally, tol0 pushes the adjustment enough so that is.crossing should return FALSE.

llocation, ilocation

See Links for more choices and CommonVGAMffArguments for more information. Choosing loglink should usually be good for counts. And choosing logitlink should be a reasonable for proportions. However, avoid choosing tau values close to the boundary, for example, if \(p_0\) is the proportion of 0s then choose \(p_0 \ll \tau\). For proportions grouped data is much better than ungrouped data, and the bigger the groups the more the granularity so that the empirical proportion can approximate tau more closely.

lambda.arg

Positive tuning parameter which controls the sharpness of the cusp. The limit as it approaches 0 is probably very similar to dalap. The default is to choose the value internally. If scale.arg increases, then probably lambda.arg needs to increase accordingly. If lambda.arg is too large then the empirical quantiles may not be very close to tau. If lambda.arg is too close to 0 then the convergence behaviour will not be good and local solutions found, as well as numerical problems in general. Monitoring convergence is recommended when varying lambda.arg.

scale.arg

Positive scale parameter and sometimes called scale. The transformation used is (y - location) / scale. This function should be okay for response variables having a moderate range (0--100, say), but if very different from this then experimenting with this argument will be a good idea.

ishrinkage, idf.mu, digt

Similar to alaplace1.

imethod

Initialization method. Either the value 1, 2, or .... See CommonVGAMffArguments for more information.

Author

Thomas W. Yee

Details

This is an experimental family function for quantile regression. Fasiolo et al. (2020) propose an extended log-F distribution (ELF) however this family function only estimates the location parameter. The distribution has a scale parameter which can be inputted (default value is unity). One location parameter is estimated for each tau value and these are the estimated quantiles. For quantile regression it is not necessary to estimate the scale parameter since the log-likelihood function is triangle shaped.

The ELF is used as an approximation of the asymmetric Laplace distribution (ALD). The latter cannot be estimated properly using Fisher scoring/IRLS but the ELF holds promise because it has continuous derivatives and therefore fewer problems with the regularity conditions. Because the ELF is fitted to data to obtain an empirical result the convergence behaviour may not be gentle and smooth. Hence there is a function-specific control function called extlogF1.control which has something like stepsize = 0.5 and maxits = 100. It has been found that slowing down the rate of convergence produces greater stability during the estimation process. Regardless, convergence should be monitored carefully always.

This function accepts a vector response but not a matrix response.

References

Fasiolo, M., Wood, S. N., Zaffran, M., Nedellec, R. and Goude, Y. (2020). Fast calibrated additive quantile regression. J. Amer. Statist. Assoc., in press.

Yee, T. W. (2020). On quantile regression based on the 1-parameter extended log-F distribution. In preparation.

Examples

Run this code

nn <- 1000; mytau <- c(0.25, 0.75)
edata <- data.frame(x2 = sort(rnorm(nn)))
edata <- transform(edata, y1 = 1 + x2  + rnorm(nn, sd = exp(-1)),
          y2 = cos(x2) / (1 + abs(x2)) + rnorm(nn, sd = exp(-1)))
fit1 <- vglm(y1 ~ x2, extlogF1(tau = mytau), data = edata)  # trace = TRUE
fit2 <- vglm(y2 ~ bs(x2, 6), extlogF1(tau = mytau), data = edata)
coef(fit1, matrix = TRUE)
fit2@extra$percentile  # Empirical percentiles here
summary(fit2)
c(is.crossing(fit1), is.crossing(fit2))
head(fitted(fit1))
if (FALSE) plot(y2 ~ x2, edata, col = "blue")
matlines(with(edata, x2), fitted(fit2), col="orange", lty = 1, lwd = 2)

Run the code above in your browser using DataLab