Learn R Programming

EnvStats (version 2.1.0)

cdfCompare: Plot Two Cumulative Distribution Functions

Description

For one sample, plots the empirical cumulative distribution function (ecdf) along with a theoretical cumulative distribution function (cdf). For two samples, plots the two ecdf's. These plots are used to graphically assess goodness of fit.

Usage

cdfCompare(x, y = NULL, discrete = FALSE, 
    prob.method = ifelse(discrete, "emp.probs", "plot.pos"), 
    plot.pos.con = NULL, distribution = "norm", param.list = NULL, 
    estimate.params = is.null(param.list), est.arg.list = NULL, x.col = "blue", 
    y.or.fitted.col = "black", x.lwd = 3 * par("cex"), y.or.fitted.lwd = 3 * par("cex"), 
    x.lty = 1, y.or.fitted.lty = 2, digits = .Options$digits, ..., 
    type = ifelse(discrete, "s", "l"), main = NULL, xlab = NULL, ylab = NULL, 
    xlim = NULL, ylim = NULL)

Arguments

x
numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.
y
a numeric vector (not necessarily of the same length as x). Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed. The default value is
discrete
logical scalar indicating whether the assumed parent distribution of x is discrete (discrete=TRUE) or continuous (discrete=FALSE; the default).
prob.method
character string indicating what method to use to compute the plotting positions (empirical probabilities). Possible values are plot.pos (plotting positions, the default if discrete=FALSE) and emp.probs
plot.pos.con
numeric scalar between 0 and 1 containing the value of the plotting position constant. When y is supplied, the default value is plot.pos.con=0.375. When y is not supplied, for the normal, lognormal, three-p
distribution
when y is not supplied, a character string denoting the distribution abbreviation. The default value is distribution="norm". See the help file for Distribution.df
param.list
when y is not supplied, a list with values for the parameters of the distribution. The default value is param.list=list(mean=0, sd=1). See the help file for Distribution.df<
estimate.params
when y is not supplied, a logical scalar indicating whether to compute the cdf for x based on estimating the distribution parameters (estimate.params=TRUE) or using the known distribution parameters speci
est.arg.list
when y is not supplied and estimate.params=TRUE, a list whose components are optional arguments associated with the function used to estimate the parameters of the assumed distribution (see the help file
x.col
a numeric scalar or character string determining the color of the empirical cdf (based on x) line or points. The default value is x.col="blue". See the entry for col in the help file for
y.or.fitted.col
a numeric scalar or character string determining the color of the empirical cdf (based on y) or the theoretical cdf line or points. The default value is y.or.fitted.col="black". See the entry for col in
x.lwd
a numeric scalar determining the width of the empirical cdf (based on x) line. The default value is x.lwd=3*par("cex"). See the entry for lwd in the help file for par
y.or.fitted.lwd
a numeric scalar determining the width of the empirical cdf (based on y) or theoretical cdf line. The default value is y.or.fitted.lwd=3*par("cex"). See the entry for lwd in the help file for
x.lty
a numeric scalar determining the line type of the empirical cdf (based on x) line. The default value is x.lty=1. See the entry for lty in the help file for par
y.or.fitted.lty
a numeric scalar determining the line type of the empirical cdf (based on y) or theoretical cdf line. The default value is y.or.fitted.lty=2. See the entry for lty in the help file for
digits
when y is not supplied, a scalar indicating how many significant digits to print for the distribution parameters. The default value is digits=.Options$digits.
type, main, xlab, ylab, xlim, ylim, ...
additional graphical parameters (see lines and par). In particular, the argument type specifies the kind of line type. By default, the funct

Value

  • When y is supplied, cdfCompare invisibly returns a list with components x.ecdf.list and y.ecdf.list. Each of these components is itself a list, with the components Order.Statistics and Cumulative.Probabilities, giving coordinates of the points that have been plotted. When y is not supplied, cdfCompare invisibly returns a list with components x.ecdf.list and fitted.cdf.list. The component x.ecdf.list is itself a list with the components Order.Statistics and Cumulative.Probabilities, giving coordinates of the points that have been plotted for the x values. The component fitted.cdf.list is itself a list with the components Quantiles and Cumulative.Probabilities, giving coordinates of the points that have been plotted for the fitted cdf.

Details

When both x and y are supplied, the function cdfCompare creates the empirical cdf plot of x and y on the same plot by calling the function ecdfPlot. When y is not supplied, the function cdfCompare creates the emprical cdf plot of x (by calling ecdfPlot) and the theoretical cdf plot (by calling cdfPlot and using the argument distribution) on the same plot.

References

Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.11-16. Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp. D'Agostino, R.B. (1986a). Graphical Analysis. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, Chapter 2, pp.7-62.

See Also

cdfPlot, ecdfPlot, qqPlot.

Examples

Run this code
# Generate 20 observations from a normal (Gaussian) distribution 
  # with mean=10 and sd=2 and compare the empirical cdf with a 
  # theoretical normal cdf that is based on estimating the parameters. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(250) 
  x <- rnorm(20, mean = 10, sd = 2) 
  dev.new()
  cdfCompare(x)

  #----------

  # Generate 30 observations from an exponential distribution with parameter 
  # rate=0.1 (see the R help file for Exponential) and compare the empirical 
  # cdf with the empirical cdf of the normal observations generated in the 
  # previous example:

  set.seed(432)
  y <- rexp(30, rate = 0.1) 
  dev.new()
  cdfCompare(x, y)

  #==========

  # Generate 20 observations from a Poisson distribution with parameter lambda=10 
  # (see the R help file for Poisson) and compare the empirical cdf with a 
  # theoretical Poisson cdf based on estimating the distribution parameters. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(250) 
  x <- rpois(20, lambda = 10) 
  dev.new()
  cdfCompare(x, dist = "pois")

  #==========

  # Clean up
  #---------
  rm(x, y)
  graphics.off()

Run the code above in your browser using DataLab