quad_spline_kn: AIC and BIC criteria for choosing the optimal number of inter-knot segments in quadratic spline fits

Description

Computes the optimal number $k_n$ of inter-knot segments in the quadratic spline fits proposed by Daouia, Noh and Park (2016).

Usage

quad_spline_kn(xtab, ytab, method, krange = 1:20, type = "AIC", 
 control = list("tm_limit" = 700))

Value

Returns an integer.

Arguments

xtab: a numeric vector containing the observed inputs $x_1,\ldots,x_n$.
ytab: a numeric vector of the same length as xtab containing the observed outputs $y_1,\ldots,y_n$.
method: a character equal to "u" (unconstrained estimator), "m" (under the monotonicity constraint) or "mc" (under simultaneous monotonicity and concavity constraints).
krange: a vector of integers specifying the range in which the optimal number of inter-knot segments is to be selected.
type: a character equal to "AIC" or "BIC".
control: a list of parameters to the GLPK solver. See *Details* of help(Rglpk_solve_LP).

Author

Hohsuk Noh.

Details

For the implementation of the unconstrained quadratic spline smoother $\tilde\varphi_n$ (see quad_spline_est), based on the knot mesh $\{t_j = x_{[j n/k_n]}: j=1,\ldots,k_n-1\}$, the user has to employ the option method="u". Since the number $k_n$ determines the complexity of the spline approximation, its choice may be viewed as model selection via the minimization of the following Akaike (option type="AIC") or Bayesian (option type="BIC") information criteria: $$ A\tilde{I}C(k) = \log \left( \sum_{i=1}^{n} (\tilde \varphi_n(x_i)- y_i) \right) + (k+2)/n,$$ $$B\tilde{I}C(k) = \log \left( \sum_{i=1}^{n} (\tilde \varphi_n(x_i) - y_i) \right) + \log n \cdot (k+2)/2n.$$ For the implementation of the monotone (option method="m") quadratic spline smoother $\hat\varphi_n$ (see quad_spline_est), the authors first suggest using the set of knots $\{ t_j = {\mathcal{X}_{[j \mathcal{N}/k_n]}},~j=1,\ldots,k_n-1 \}$ among the FDH points $(\mathcal{X}_{\ell},\mathcal{Y}_{\ell})$, $\ell=1,\ldots,\mathcal{N}$ (function quad_spline_est). Then, they propose to choose $k_n$ by minimizing the following AIC (option type="AIC") or BIC (option type="BIC") information criteria: $$ A\hat{I}C(k) = \log \left( \sum_{i=1}^{n} (\hat \varphi_n(x_i)- y_i) \right) + (k+2)/n,$$ $$B\hat{I}C(k) = \log \left( \sum_{i=1}^{n} (\hat \varphi_n(x_i) - y_i) \right) + \log n \cdot (k+2)/2n.$$ A small number of knots is typically needed as elucidated by the asymptotic theory.

For the implementation of the monotone and concave (option method="mc") spline estimator $\hat\varphi^{\star}_n$, just apply the same scheme as above by replacing the FDH points $(\mathcal{X}_{\ell},\mathcal{Y}_{\ell})$ with the DEA points $(\mathcal{X}^*_{\ell},\mathcal{Y}^*_{\ell})$ (see dea_est).

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle, in Second International Symposium of Information Theory, eds. B. N. Petrov and F. Csaki, Budapest: Akademia Kiado, 267--281.

Daouia, A., Noh, H. and Park, B.U. (2016). Data Envelope fitting with constrained polynomial splines. Journal of the Royal Statistical Society: Series B, 78(1), 3-30. doi:10.1111/rssb.12098.

Schwartz, G. (1978). Estimating the dimension of a model, Annals of Statistics, 6, 461--464.

Examples

Run this code

data("green")
if (FALSE) {
# BIC criteria for choosing the optimal number of 
# inter-knot segments in:   
# a. Unconstrained quadratic spline fits
(kn.bic.green.u <- quad_spline_kn(log(green$COST), 
 log(green$OUTPUT), method = "u", type = "BIC"))
# b. Monotone quadratic spline smoother
(kn.bic.green.m <- quad_spline_kn(log(green$COST), 
 log(green$OUTPUT), method = "m", type = "BIC"))  
# c. Monotone and concave quadratic spline smoother
(kn.bic.green.mc<-quad_spline_kn(log(green$COST), 
 log(green$OUTPUT), method = "mc", type = "BIC"))
}

Run the code above in your browser using DataLab