The package npbr (Daouia et al., 2017) is the first free specialized software for data edge and frontier analysis in the statistical literature. It provides a variety of functions for the best known and most innovative approaches to nonparametric boundary estimation. The selected methods are concerned with empirical, smoothed, unrestricted as well as constrained fits under both separate and multiple shape constraints. They also cover data envelopment techniques as well as robust approaches to outliers. The routines included in npbr are user friendly and afford a large degree of flexibility in the estimation specifications. They provide smoothing parameter selection for the modern local linear and polynomial spline methods and for some promising extreme value techniques. Also, they seamlessly allow for Monte Carlo comparisons among the implemented estimation procedures. This package will be very useful for statisticians and applied researchers interested in employing nonparametric boundary regression models. Its use is illustrated with a number of empirical applications and simulated examples.
Abdelaati Daouia <Abdelaati.Daouia@tse-fr.eu>, Thibault Laurent <thibault.laurent@univ-tlse1.fr>, Hohsuk Noh <word5810@gmail.com>
Maintainer: Thibault Laurent <thibault.laurent@univ-tlse1.fr>
Suppose that we have \(n\) pairs of observations \((x_i,y_i),~i=1,\ldots,n\), from a bivariate distribution with a density \(f(x,y)\) in \(R^2\). The support \(\Psi\) of \(f\) is assumed to be of the form $$ \Psi = \{ (x,y) | y \leq \varphi(x) \} \supseteq \{ (x,y) | f(x,y) > 0 \} $$ $$\{ (x,y) | y > \varphi(x) \} \subseteq \{ (x,y) | f(x,y) = 0 \}, $$
where the graph of \(\varphi\) corresponds to the locus of the curve above which the density \(f\) is zero. We consider the estimation of the frontier function \(\varphi\) based on the sample \(\{ (x_i,y_i),~i=1,\ldots,n\}\) in the general setting where the density \(f\) may have sudden jumps at the frontier, decay to zero or rise up to infinity as it approaches its support boundary.
The overall objective of the present package is to provide a large variety of functions for the best known approaches to nonparametric boundary regression, including the vast class of methods employed in both Monte Carlo comparisons of Daouia et al. (2016) and Noh (2014) as well as other promising nonparametric devices, namely the extreme-value techniques of Gijbels and Peng (2000), Daouia et al. (2010) and Daouia et al. (2012). The various functions in the npbr package are summarized in the table below. We are not aware of any other existing set of statistical routines more adapted to data envelope fitting and robust frontier estimation. Only the classical nonsmooth FDH and DEA methods can be found in some available packages dedicated to the economic literature on measurements of the production performance of enterprises, such as the programs Benchmarking by Bogetoft and Otto (2011) and FEAR by Wilson (2008). Other contributions to the econometric literature on frontier analysis by Parmeter and Racine (2013) can be found at https://socialsciences.mcmaster.ca/racinej/Gallery/Home.html. The package npbr is actually the first free specialized software for the statistical literature on nonparametric frontier analysis. The routines included in npbr are user friendly and highly flexible in terms of estimation specifications. They allow the user to locate the boundary from data by making use of both empirical and smooth fits as well as (un)constrained estimates under single and multiple shape constraints. They also integrate smoothing parameter selection for the innovative methods based on local linear techniques, polynomial splines, extreme values and kernel smoothing, though the proposed selection procedures can be computationally demanding.
In addition, the package will be extremely useful for researchers and practitioners interested in employing nonparametric boundary regression methods. On one hand, such methods are very appealing because they rely on very few assumptions and benefit from their modeling flexibility, function approximation power and ability to detect the boundary structure of data without recourse to any a priori parametric restrictions on the shape of the frontier and/or the distribution of noise. On the other hand, the package offers R users and statisticians in this active area of research simple functions to compute the empirical mean integrated squared error, the empirical integrated squared bias and the empirical integrated variance of the implemented frontier estimators. This seamlessly allows the interested researcher to reproduce the Monte Carlo estimates obtained in the original articles and, perhaps most importantly, to easily compare the quality of any new proposal with the competitive existing methods.
Function | Description | Reference |
dea_est | DEA, FDH | Farrell (1957) |
Deprins et al. (1984), | ||
and linearized FDH | Hall and Park (2002) | |
Jeong and Simar (2006) | ||
loc_est | Local linear fitting | Hall et al. (1998), |
Hall and Park (2004) | ||
loc_est_bw | Bandwidth choice | Hall and Park (2004) |
for local linear fitting | ||
poly_est | Polynomial estimation | Hall et al. (1998) |
poly_degree | Optimal polynomial | Daouia et al. (2015) |
degree selection | ||
dfs_momt | Moment type estimation | Daouia et al. (2010), |
Dekkers et al. (1989) | ||
dfs_pick | Pickands type estimation | Daouia et al. (2010), |
Dekkers and de Haan (1989) | ||
rho_momt_pick | Conditional tail | Daouia et al. (2010), |
index estimation | Dekkers et al. (1989), | |
Dekkers and de Haan (1989) | ||
kopt_momt_pick | Threshold selection for | Daouia et al. (2010) |
moment/Pickands frontiers | ||
dfs_pwm | Nonparametric frontier | Daouia et al. (2012) |
regularization | ||
loc_max | Local constant estimation | Gijbels and Peng (2000) |
pick_est | Local extreme-value estimation | Gijbels and Peng (2000) |
quad_spline_est | Quadratic spline fitting | Daouia et al. (2015) |
quad_spline_kn | Knot selection for | Daouia et al. (2015) |
quadratic spline fitting | ||
cub_spline_est | Cubic spline fitting | Daouia et al. (2015) |
cub_spline_kn | Knot selection for | Daouia et al. (2015) |
cubic spline fitting | ||
kern_smooth | Nonparametric kernel | Parmeter and Racine (2013), |
boundary regression | Noh (2014) | |
kern_smooth_bw | Bandwidth choice for | Parmeter and Racine (2013), |
kernel boundary regression | Noh (2014) |
Daouia, A., Florens, J.-P. and Simar, L. (2010). Frontier estimation and extreme value theory. Bernoulli, 16, 1039-1063.
Daouia, A., Florens, J.-P. and Simar, L. (2012). Regularization of Nonparametric Frontier Estimators. Journal of Econometrics, 168, 285-299.
Daouia, A., Laurent, T. and Noh, H. (2017). npbr: A Package for Nonparametric Boundary Regression in R. Journal of Statistical Software, 79(9), 1-43. doi:10.18637/jss.v079.i09.
Daouia, A., Noh, H. and Park, B.U. (2016). Data Envelope fitting with constrained polynomial splines. Journal of the Royal Statistical Society: Series B, 78(1), 3-30. doi:10.1111/rssb.12098.
Dekkers, A.L.M. and L. de Haan (1989). On the estimation of extreme-value index and large quantiles estimation. Annals of Statistics, 17, 1795-1832.
Dekkers, A.L.M., Einmahl, J.H.J. and L. de Haan (1989). A moment estimator for the index of an extreme-value distribution. Annals of Statistics, 17, 1833-1855.
Deprins, D., Simar, L. and Tulkens H. (1984). Measuring labor efficiency in post offices, in: M. Marchand, P. Pestieau and H. Tulkens (Eds), The performance of Public Enterprises: Concepts and Measurements. North-Holland, Amsterdam, 243-267.
Farrell, M.J. (1957). The measurement of productive efficiency. Journal of the Royal Statistical Society, Series A, 120, 253-281.
Gijbels, I. and Peng, L. (2000). Estimation of a support curve via order statistics. Extremes, 3, 251-277.
Hall, P., Park, B.U. and Stern, S.E. (1998). On polynomial estimators of frontiers and boundaries. Journal of Multivariate Analysis, 66, 71-98.
Hall, P. and Park, B.U. (2004). Bandwidth choice for local polynomial estimation of smooth boundaries. Journal of Multivariate Analysis, 91, 240-261.
Jeong, S.-O. and Simar, L. (2006). Linearly interpolated FDH efficiency score for nonconvex frontiers. Journal of Multivariate Analysis, 97, 2141-2161.
Noh, H. (2014). Frontier Estimation using Kernel Smoothing with Data Transformation. Journal of the Korean Statistical Society, 43, 503-512.
Parmeter, C. and Racine, J.S. (2013). Smooth Constrained Frontier Analysis. In Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis: Essays in Honor of Halbert L. White, Jr., Springer Verlag, (X. Chen and N.R. Swanson Eds), 463-488.