npquantile
computes smooth quantiles from a univariate
unconditional kernel cumulative distribution estimate given data and,
optionally, a bandwidth specification i.e. a dbandwidth
object
using the bandwidth selection method of Li, Li and Racine (2017).
npquantile(x = NULL,
tau = c(0.01,0.05,0.25,0.50,0.75,0.95,0.99),
num.eval = 10000,
bws = NULL,
f = 1,
…)
a univariate vector of type numeric
containing sample
realizations (training data) used to estimate the cumulative
distribution (must be the same training data used to compute the
bandwidth object bws
passed in).
an optional vector containing the probabilities for quantile(s) to
be estimated (must contain numbers in \([0,1]\)). Defaults to
c(0.01,0.05,0.25,0.50,0.75,0.95,0.99)
.
an optional integer specifying the length of the grid on which the
quasi-inverse is computed. Defaults to 10000
.
an optional argument fed to extendrange
. Defaults to
1
. See ?extendrange
for details.
additional arguments supplied to specify the bandwidth type, kernel
types, bandwidth selection methods, and so on. See
?npudistbw
for details.
npquantile
returns a vector of quantiles corresponding
to tau
.
Cross-validated bandwidth selection is used by default
(npudistbw
). For large datasets this can be
computationally demanding. In such cases one might instead consider a
rule-of-thumb bandwidth (bwmethod="normal-reference"
) or,
alternatively, use kd-trees (options(np.tree=TRUE)
along with a
bounded kernel (ckertype="epanechnikov"
)), both of which will
reduce the computational burden appreciably.
Typical usage is
x <- rchisq(100,df=10) npquantile(x)
The quantile function \(q_\tau\) is defined to be the left-continuous inverse of the distribution function \(F(x)\), i.e. \(q_\tau = \inf\{x: F(x) \ge \tau\}\).
A traditional estimator of \(q_\tau\) is the \(\tau\)th sample
quantile. However, these estimates suffer from lack of efficiency
arising from variability of individual order statistics; see Sheather
and Marron (1990) and Hyndman and Fan (1996) for methods that
interpolate/smooth the order statistics, each of which discussed in
the latter can be invoked through quantile
via
type=j
, j=1,…,9
.
The function npquantile
implements a method for estimating
smooth quantiles based on the quasi-inverse of a npudist
object where \(F(x)\) is replaced with its kernel estimator and
bandwidth selection is that appropriate for such objects; see
Definition 2.3.6, page 21, Nelsen 2006 for a definition of the
quasi-inverse of \(F(x)\).
For construction of the quasi-inverse we create a grid of evaluation
points based on the function extendrange
along with the
sample quantiles themselves computed from invocation of
quantile
. The coarseness of the grid defined by
extendrange
(which has been passed the option
f=1
) is controlled by num.eval
.
Note that for any value of \(\tau\) less/greater than the smallest/largest value of \(F(x)\) computed for the evaluation data (i.e. that outlined in the paragraph above), the quantile returned for such values is that associated with the smallest/largest value of \(F(x)\), respectively.
Cheng, M.-Y. and Sun, S. (2006), “Bandwidth selection for kernel quantile estimation,” Journal of the Chinese Statistical Association, 44, 271-295.
Hyndman, R.J. and Fan, Y. (1996), “Sample quantiles in statistical packages,” American Statistician, 50, 361-365.
Li, Q. and J.S. Racine (2017), “Smooth Unconditional Quantile Estimation,” Manuscript.
Li, C. and H. Li and J.S. Racine (2017), “Cross-Validated Mixed Datatype Bandwidth Selection for Nonparametric Cumulative Distribution/Survivor Functions,” Econometric Reviews, 36, 970-987.
Nelsen, R.B. (2006), An Introduction to Copulas, Second Edition, Springer-Verlag.
Sheather, S. and J.S. Marron (1990), “Kernel quantile estimators,” Journal of the American Statistical Association, Vol. 85, No. 410, 410-416.
Yang, S.-S. (1985), “A Smooth Nonparametric Estimator of a Quantile Function,” Journal of the American Statistical Association, 80, 1004-1011.
quantile
for various types of sample quantiles;
ecdf
for empirical distributions of which
quantile
is an inverse; boxplot.stats
and
fivenum
for computing other versions of quartiles;
qlogspline
for logspline density quantiles;
qkde
for alternative kernel quantiles, etc.
# NOT RUN {
## Simulate data from a chi-square distribution
df <- 50
x <- rchisq(100,df=df)
## Vector of quantiles desired
tau <- c(0.01,0.05,0.25,0.50,0.75,0.95,0.99)
## Compute kernel smoothed sample quantiles
npquantile(x,tau)
## Compute sample quantiles using the default method in R (Type 7)
quantile(x,tau)
## True quantiles based on known distribution
qchisq(tau,df=df)
# }
# NOT RUN {
# }
# NOT RUN {
<!-- % enddontrun -->
# }
Run the code above in your browser using DataLab