Maximum likelihood estimation of the 1-parameter extended log-F distribution.
extlogF1(tau = c(0.25, 0.5, 0.75), parallel = TRUE ~ 0,
seppar = 0, tol0 = -0.001,
llocation = "identitylink", ilocation = NULL,
lambda.arg = NULL, scale.arg = 1, ishrinkage = 0.95,
digt = 4, idf.mu = 3, imethod = 1)
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions such as vglm
and vgam
.
Numeric, the desired quantiles. A strictly increasing sequence,
each value must be in \((0, 1)\).
The default values are the three quartiles, matching
lms.bcn
.
Similar to alaplace1
, applying to the
location parameters.
One can try fix up the quantile-crossing problem after fitting
the model by calling fix.crossing
.
Use is.crossing
to see if there is a problem.
The default for parallel
is totally FALSE
, i.e.,
FALSE
for every variable including the intercept.
Quantile-crossing can occur when values of tau
are too
close, given the data. How the quantiles are modelled with
respect to the covariates also has a big effect, e.g.,
if they are too flexible or too inflexible then the problem
is likely to occur.
For example, using bs
with
df = 10
is likely to create problems.
Setting parallel = TRUE
results in a totally
parallel model; all quantiles are parallel
and this assumption can be too strong for some data sets.
Instead, fix.crossing
only repairs the
quantiles that cross.
So one must carefully choose values of tau
for
fitting the original fit.
Numeric, both of unit length and nonnegative,
the separation and shift parameters.
If seppar
is positive then any crossing quantile
is penalized by the difference cubed multiplied by seppar
.
The log-likelihood subtracts the penalty.
The shift parameter ensures that the result is strictly
noncrossing when seppar
is large enough; otherwise
if tol0 = 0
and seppar
is large
then the crossing quantiles remain
crossed even though the offending amount becomes small but never
exactly 0.
Informally, tol0
pushes the adjustment enough
so that is.crossing
should return FALSE
.
If tol0
is positive then that is the shift in absolute
terms. But tol0
may be assigned a negative value, in
which case it is interpreted multiplicatively
relative to the midspread of the response;
tol0 <- abs(tol0) * midspread
.
Regardless,
fit@extra$tol0
is the amount in absolute terms.
If avoiding the quantile crossing problem is of concern to you,
try increasing seppar
to decrease the amount of crossing.
Probably it is best to choose the smallest value of seppar
so that is.crossing
returns FALSE
.
Increasing tol0
relatively or absolutely
means the fitted quantiles are
allowed to move apart more.
However, tau
must be considered when choosing tol0
.
See Links
for more choices and
CommonVGAMffArguments
for more information.
Choosing loglink
should usually be good
for counts.
And choosing logitlink
should be a reasonable for
proportions. However, avoid choosing tau
values close to
the boundary, for example, if \(p_0\) is the proportion of
0s then choose \(p_0 \ll \tau\).
For proportions grouped data is much better than ungrouped data,
and the bigger the groups the more the
granularity so that the empirical proportion can approximate
tau
more closely.
Positive tuning parameter which controls the sharpness of the cusp.
The limit as it approaches 0 is probably very similar to
dalap
.
The default is to choose the value internally.
If scale.arg
increases, then probably lambda.arg
needs to increase accordingly.
If lambda.arg
is too large then the empirical quantiles
may not be very close to tau
.
If lambda.arg
is too close to 0 then the convergence
behaviour will not be good and local solutions found, as well
as numerical problems in general.
Monitoring convergence is recommended when varying
lambda.arg
.
Positive scale parameter and sometimes called scale
.
The transformation used is (y - location) / scale
.
This function should be okay for response variables
having a moderate range (0--100, say), but if very different
from this then experimenting with this argument will be
a good idea.
Similar to alaplace1
.
Initialization method.
Either the value 1, 2, or ....
See CommonVGAMffArguments
for more information.
Thomas W. Yee
This is an experimental family function for quantile regression.
Fasiolo et al. (2020) propose an extended log-F distribution
(ELF)
however this family function only estimates the location parameter.
The distribution has a scale parameter which can be inputted
(default value is unity).
One location parameter is estimated for each tau
value
and these are the estimated quantiles.
For quantile regression it is not necessary to estimate
the scale parameter since the log-likelihood function is
triangle shaped.
The ELF is used as an approximation of the asymmetric Laplace
distribution (ALD).
The latter cannot be estimated properly using Fisher scoring/IRLS
but the ELF holds promise because it has continuous derivatives
and therefore fewer problems with the regularity conditions.
Because the ELF is fitted to data to obtain an
empirical result the convergence behaviour may not be gentle
and smooth.
Hence there is a function-specific control function called
extlogF1.control
which has something like
stepsize = 0.5
and maxits = 100
.
It has been found that
slowing down the rate of convergence produces greater
stability during the estimation process.
Regardless, convergence should be monitored carefully always.
This function accepts a vector response but not a matrix response.
Fasiolo, M., Wood, S. N., Zaffran, M., Nedellec, R. and Goude, Y. (2020). Fast calibrated additive quantile regression. J. Amer. Statist. Assoc., in press.
Yee, T. W. (2020). On quantile regression based on the 1-parameter extended log-F distribution. In preparation.
dextlogF
,
is.crossing
,
fix.crossing
,
eCDF
,
vglm.control
,
logF
,
alaplace1
,
dalap
,
lms.bcn
.
if (FALSE) {
nn <- 1000; mytau <- c(0.25, 0.75)
edata <- data.frame(x2 = sort(rnorm(nn)))
edata <- transform(edata, y1 = 1 + x2 + rnorm(nn, sd = exp(-1)),
y2 = cos(x2) / (1 + abs(x2)) + rnorm(nn, sd = exp(-1)))
fit1 <- vglm(y1 ~ x2, extlogF1(tau = mytau), data = edata) # trace = TRUE
fit2 <- vglm(y2 ~ bs(x2, 6), extlogF1(tau = mytau), data = edata)
coef(fit1, matrix = TRUE)
fit2@extra$percentile # Empirical percentiles here
summary(fit2)
c(is.crossing(fit1), is.crossing(fit2))
head(fitted(fit1))
plot(y2 ~ x2, edata, col = "blue")
matlines(with(edata, x2), fitted(fit2), col="orange", lty = 1, lwd = 2) }
Run the code above in your browser using DataLab