qqbounds: Computation of confidence intervals for qqplot

Description

We compute confidence intervals for QQ plots. These can be simultaneous (to check whether the whole data set is compatible) or pointwise (to check whether each (single) data point is compatible);

Usage

qqbounds(x,D,alpha,n,withConf.pw, withConf.sim,
         exact.sCI=(n

Value

A list with components crit --- a matrix with the lower and upper confidence bounds, and err a logical vector of length 2.

Component crit is a matrix with length(x) rows and four columns c("sim.left","sim.right","pw.left","pw.right"). Entries will be set to NA if the corresponding x component is not in support(D) or if the computation method returned an error or if the corresponding parts have not been required (if withConf.pw

or withConf.sim is FALSE).

err has components pw

---do we have a non-error return value for the computation of pointwise CI's (FALSE if withConf.pw is FALSE)--- and sim

---do we have a non-error return value for the computation of simultaneous CI's (FALSE if withConf.sim is FALSE).

Arguments

x: data to be checked for compatibility with distribution D.
D: object of class "UnivariateDistribution", the assumed data distribution.
alpha: confidence level
n: sample size
withConf.pw: logical; shall pointwise confidence lines be computed?
withConf.sim: logical; shall simultaneous confidence lines be computed?
exact.pCI: logical; shall pointwise CIs be determined with exact Binomial distribution?
exact.sCI: logical; shall simultaneous CIs be determined with exact kolmogorov distribution?
nosym.pCI: logical; shall we use (shortest) asymmetric CIs?
debug: logical; if TRUE additional output to debug confidence bounds.

Author

Peter Ruckdeschel peter.ruckdeschel@uni-oldenburg.de

Details

Both simultaneous and pointwise confidence intervals come in a finite-sample and an asymptotic version; the finite sample versions will get quite slow for large data sets x, so in these cases the asymptotic version will be preferrable.
For simultaneous intervals, the finite sample version is based on C function "pkolmogorov2x" from package stats, while the asymptotic one uses R function pkstwo again from package stats, both taken from the code to ks.test.

Both finite sample and asymptotic versions use the fact, that the distribution of the supremal distance between the empirical distribution \(\hat F_n\) and the corresponding theoretical one \(F\) (assuming data from \(F\)) does not depend on \(F\) for continuous distribution \(F\) and leads to the Kolmogorov distribution (compare, e.g. Durbin(1973)). In case of \(F\) with jumps, the corresponding Kolmogorov distribution is used to produce conservative intervals.
For pointwise intervals, the finite sample version is based on corresponding binomial distributions, (compare e.g., Fisz(1963)), while the asymptotic one uses a CLT approximation for this binomial distribution. In fact, this approximation is only valid for distributions with strictly positive density at the evaluation quantiles.

In the finite sample version, the binomial distributions will in general not be symmetric, so that, by setting nosym.pCI to TRUE we may produce shortest asymmetric confidence intervals (albeit with a considerable computational effort).

The symmetric intervals returned by default will be conservative (which also applies to distributions with jumps in this case).

For distributions with jumps or with density (nearly) equal to 0 at the corresponding quantile, we use the approximation of (D-E(D))/sd(D) by the standard normal at these points; this latter approximation is only available if package distrEx is installed; otherwise the corresponding columns will be filled with NA.

References

Durbin, J. (1973) Distribution theory for tests based on the sample distribution function. SIAM.

Fisz, M. (1963). Probability Theory and Mathematical Statistics. 3rd ed. Wiley, New York.

Examples

Run this code

qqplot(Norm(15,sqrt(30)), Chisq(df=15))
## uses:
old.digits <- getOption("digits")
on.exit(options(digits = old.digits))
options(digits = 6)
set.seed(20230508)
## IGNORE_RDIFF_BEGIN
qqbounds(x = rnorm(30), Norm(), alpha = 0.95, n = 30,
        withConf.pw = TRUE, withConf.sim  = TRUE,
        exact.sCI = TRUE, exact.pCI = TRUE,
        nosym.pCI = FALSE)
## other calls:
qqbounds(x = rchisq(30,df=4), Chisq(df=4), alpha = 0.95, n = 30,
        withConf.pw = TRUE, withConf.sim  = TRUE,
        exact.sCI = FALSE, exact.pCI = FALSE,
        nosym.pCI = FALSE)
qqbounds(x = rchisq(30,df=4), Chisq(df=4), alpha = 0.95, n = 30,
        withConf.pw = TRUE, withConf.sim  = TRUE,
        exact.sCI = TRUE, exact.pCI= TRUE,
        nosym.pCI = TRUE)
## IGNORE_RDIFF_END
options(digits = old.digits)

Run the code above in your browser using DataLab