Fits isotonic distributional regression (IDR) to a training dataset.
idr(y, X, groups = setNames(rep(1, ncol(X)), colnames(X)), orders =
c("comp" = 1), stoch = "sd", pars = osqpSettings(verbose = FALSE, eps_abs =
1e-5, eps_rel = 1e-5, max_iter = 10000L), progress = TRUE)
numeric vector (the response variable).
data frame of numeric or ordered factor variables (the regression covariates).
named vector of length ncol(X)
denoting groups of
variables that are to be ordered with the same order (see 'Details'). Only
relevant if X
contains more than one variable. The same names as in
X
should be used.
named vector giving for each group in groups
the order
that will be applied to this group. Only relevant if X
contains more
than one variable. The names of orders
give the order, the entries
give the group labels. Available options: "comp"
for componentwise
order, "sd"
for stochastic dominance, "icx"
for increasing
convex order (see 'Details). Default is "comp"
for all variables.
The "sd"
and "icx"
orders can only be used with numeric
variables, but not with ordered factors.
stochastic order constraint used for estimation. Default is
"sd"
for first order stochastic dominance. Use "hazard"
for
hazard rate order (experimental).
parameters for quadratic programming optimization (only relevant
if X
has more than one column), set using
osqpSettings
.
display progressbar (TRUE
, FALSE
or 1
,
0
)?
An object of class "idrfit"
containing the following
components:
X
data frame of all distinct covariate combinations used for the fit.
y
list of all observed responses in the training data for
given covariate combinations in X
.
cdf
matrix containing the estimated CDFs, one CDF per row,
evaluated at thresholds
(see next point). The CDF in the i
th
row corredponds to the estimated conditional distribution of the response
given the covariates values in X[i,]
.
thresholds
the thresholds at which the CDFs in cdf
are
evaluated. The entries in cdf[,j]
are the conditional CDFs evaluated
at thresholds[j]
.
groups
, orders
the groups and orders used for estimation.
diagnostic
list giving a bound on the precision of the CDF
estimation (the maximal downwards-step in the CDF that has been detected)
and the fraction of CDF estimations that were stopped at the iteration
limit max_iter
. Decrease the parameters eps_abs
and/or
eps_rel
or increase max_iter
in pars
to improve the
precision. See osqpSettings
for more optimization
parameters.
indices
the row indices of the covariates in X
in the
original training dataset (used for in-sample predictions with
predict.idrfit
).
constraints
(in multivariate IDR, NULL
otherwise)
matrices giving the order constraints for optimization. Used in
predict.idrfit
.
This function computes the isotonic distributional regression (IDR)
of a response y on on one or more covariates X. IDR estimates
the cumulative distribution function (CDF) of y conditional on
X by monotone regression, assuming that y is more likely to
take higher values, as X increases. Formally, IDR assumes that the
conditional CDF \(F_{y | X = x}(z)\) at each fixed threshold z
decreases, as x increases, or equivalently, that the exceedance
probabilities for any threshold z
\(P(y > z | X = x)\) increase
with x.
The conditional CDFs are estimated at each threshold in unique(y)
.
This is the set where the CDFs may have jumps. If X
contains more
than one variable, the CDFs are estimated by calling
solve_osqp
from the package osqp
length(unique(y))
times. This might take a while if the training
dataset is large.
Use the argument groups
to group exchangeable covariates.
Exchangeable covariates are indistinguishable except from the order in
which they are labelled (e.g. ensemble weather forecasts, repeated
measurements under the same measurement conditions).
The following orders are available to perform the monotone regression in IDR:
Componentwise order ("comp"
): A covariate
vector x1
is greater than x2
if x1[i] >= x2[i]
holds
for all components i
. This is the standard order used in
multivariate monotone regression and should not be used for
exchangeable variables (e.g. perturbed ensemble forecasts).
Stochastic dominance ("sd"
): x1
is greater than x2
in
the stochastic order, if the (empirical) distribution of the elements of
x1
is greater than the distribution of the elements of x2
(in
first order) stochastic dominance. The "sd"
order is invariant under
permutations of the grouped variables and therefore suitable for
exchangeable covariables.
Increasing convex order ("icx"
):
The "icx"
order can be used for groups of exchangeable variables. It
should be used if the variables have increasing variability, when their
mean increases (e.g. precipitation forecasts or other variables with
right-skewed distributions). More precisely, "icx"
uses the
increasing convex stochastic order on the empirical distributions of the
grouped variables.
Henzi, A., Moesching, A., & Duembgen, L. (2020). Accelerating the pool-adjacent-violators algorithm for isotonic distributional regression. arXiv preprint arXiv:2006.05527.
Stellato, B., Banjac, G., Goulart, P., Bemporad, A., & Boyd, S. (2020). OSQP: An operator splitting solver for quadratic programs. Mathematical Programming Computation, 1-36.
Bartolomeo Stellato, Goran Banjac, Paul Goulart and Stephen Boyd (2019). osqp: Quadratic Programming Solver using the 'OSQP' Library. R package version 0.6.0.3. https://CRAN.R-project.org/package=osqp
The S3 method predict.idrfit
for predictions based on
an IDR fit.
# NOT RUN {
data("rain")
## Fit IDR to data of 185 days using componentwise order on HRES and CTR and
## increasing convex order on perturbed ensemble forecasts (P1, P2, ..., P50)
varNames <- c("HRES", "CTR", paste0("P", 1:50))
X <- rain[1:185, varNames]
y <- rain[1:185, "obs"]
## HRES and CTR are group '1', with componentwise order "comp", perturbed
## forecasts P1, ..., P50 are group '2', with "icx" order
groups <- setNames(c(1, 1, rep(2, 50)), varNames)
orders <- c("comp" = 1, "icx" = 2)
fit <- idr(y = y, X = X, orders = orders, groups = groups)
fit
# }
Run the code above in your browser using DataLab