idr: Fit IDR to training data

Description

Fits isotonic distributional regression (IDR) to a training dataset.

Usage

idr(y, X, groups = setNames(rep(1, ncol(X)), colnames(X)), orders =
  c("comp" = 1), stoch = "sd", pars = osqpSettings(verbose = FALSE, eps_abs =
  1e-5, eps_rel = 1e-5, max_iter = 10000L), progress = TRUE)

Arguments

numeric vector (the response variable).

data frame of numeric or ordered factor variables (the regression covariates).

groups

named vector of length ncol(X) denoting groups of variables that are to be ordered with the same order (see 'Details'). Only relevant if X contains more than one variable. The same names as in X should be used.

orders

named vector giving for each group in groups the order that will be applied to this group. Only relevant if X contains more than one variable. The names of orders give the order, the entries give the group labels. Available options: "comp" for componentwise order, "sd" for stochastic dominance, "icx" for increasing convex order (see 'Details). Default is "comp" for all variables. The "sd" and "icx" orders can only be used with numeric variables, but not with ordered factors.

stoch

stochastic order constraint used for estimation. Default is "sd" for first order stochastic dominance. Use "hazard" for hazard rate order (experimental).

pars

parameters for quadratic programming optimization (only relevant if X has more than one column), set using osqpSettings.

progress

display progressbar (TRUE, FALSE or 1, 0)?

Value

An object of class "idrfit" containing the following components:

X

data frame of all distinct covariate combinations used for the fit.

y

list of all observed responses in the training data for given covariate combinations in X.

cdf

matrix containing the estimated CDFs, one CDF per row, evaluated at thresholds (see next point). The CDF in the ith row corredponds to the estimated conditional distribution of the response given the covariates values in X[i,].

thresholds

the thresholds at which the CDFs in cdf are evaluated. The entries in cdf[,j] are the conditional CDFs evaluated at thresholds[j].

groups, orders

the groups and orders used for estimation.

diagnostic

list giving a bound on the precision of the CDF estimation (the maximal downwards-step in the CDF that has been detected) and the fraction of CDF estimations that were stopped at the iteration limit max_iter. Decrease the parameters eps_abs and/or eps_rel or increase max_iter in pars to improve the precision. See osqpSettings for more optimization parameters.

indices

the row indices of the covariates in X in the original training dataset (used for in-sample predictions with predict.idrfit).

constraints

(in multivariate IDR, NULL otherwise) matrices giving the order constraints for optimization. Used in predict.idrfit.

Details

This function computes the isotonic distributional regression (IDR) of a response y on on one or more covariates X. IDR estimates the cumulative distribution function (CDF) of y conditional on X by monotone regression, assuming that y is more likely to take higher values, as X increases. Formally, IDR assumes that the conditional CDF \(F_{y | X = x}(z)\) at each fixed threshold z decreases, as x increases, or equivalently, that the exceedance probabilities for any threshold z \(P(y > z | X = x)\) increase with x.

The conditional CDFs are estimated at each threshold in unique(y). This is the set where the CDFs may have jumps. If X contains more than one variable, the CDFs are estimated by calling solve_osqp from the package osqp length(unique(y)) times. This might take a while if the training dataset is large.

Use the argument groups to group exchangeable covariates. Exchangeable covariates are indistinguishable except from the order in which they are labelled (e.g. ensemble weather forecasts, repeated measurements under the same measurement conditions).

The following orders are available to perform the monotone regression in IDR:

Componentwise order ("comp"): A covariate vector x1 is greater than x2 if x1[i] >= x2[i] holds for all components i. This is the standard order used in multivariate monotone regression and should not be used for exchangeable variables (e.g. perturbed ensemble forecasts).
Stochastic dominance ("sd"): x1 is greater than x2 in the stochastic order, if the (empirical) distribution of the elements of x1 is greater than the distribution of the elements of x2 (in first order) stochastic dominance. The "sd" order is invariant under permutations of the grouped variables and therefore suitable for exchangeable covariables.
Increasing convex order ("icx"): The "icx" order can be used for groups of exchangeable variables. It should be used if the variables have increasing variability, when their mean increases (e.g. precipitation forecasts or other variables with right-skewed distributions). More precisely, "icx" uses the increasing convex stochastic order on the empirical distributions of the grouped variables.

References

Henzi, A., Moesching, A., & Duembgen, L. (2020). Accelerating the pool-adjacent-violators algorithm for isotonic distributional regression. arXiv preprint arXiv:2006.05527.

Stellato, B., Banjac, G., Goulart, P., Bemporad, A., & Boyd, S. (2020). OSQP: An operator splitting solver for quadratic programs. Mathematical Programming Computation, 1-36.

Bartolomeo Stellato, Goran Banjac, Paul Goulart and Stephen Boyd (2019). osqp: Quadratic Programming Solver using the 'OSQP' Library. R package version 0.6.0.3. https://CRAN.R-project.org/package=osqp

Examples

Run this code

# NOT RUN {
data("rain")

## Fit IDR to data of 185 days using componentwise order on HRES and CTR and
## increasing convex order on perturbed ensemble forecasts (P1, P2, ..., P50)

varNames <- c("HRES", "CTR", paste0("P", 1:50))
X <- rain[1:185, varNames]
y <- rain[1:185, "obs"]

## HRES and CTR are group '1', with componentwise order "comp", perturbed
## forecasts P1, ..., P50 are group '2', with "icx" order

groups <- setNames(c(1, 1, rep(2, 50)), varNames)
orders <- c("comp" = 1, "icx" = 2)

fit <- idr(y = y, X = X, orders = orders, groups = groups)
fit
# }

Run the code above in your browser using DataLab