Last chance! 50% off unlimited learning
Sale ends in
Produce an empirical cumulative distribution function plot.
ecdfPlot(x, discrete = FALSE,
prob.method = ifelse(discrete, "emp.probs", "plot.pos"),
plot.pos.con = 0.375, plot.it = TRUE, add = FALSE, ecdf.col = "black",
ecdf.lwd = 3 * par("cex"), ecdf.lty = 1, curve.fill = FALSE,
curve.fill.col = "cyan", ..., type = ifelse(discrete, "s", "l"),
main = NULL, xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL)
numeric vector of observations. Missing (NA
), undefined (NaN
), and
infinite (Inf
, -Inf
) values are allowed but will be removed.
logical scalar indicating whether the assumed parent distribution of x
is discrete
(discrete=TRUE
) or continuous (discrete=FALSE
; the default).
character string indicating what method to use to compute the plotting positions (empirical probabilities).
Possible values are plot.pos
(plotting positions, the default if discrete=FALSE
) and
emp.probs
(empirical probabilities, the default if discrete=TRUE
).
See the DETAILS section for more explanation.
numeric scalar between 0 and 1 containing the value of the plotting position constant.
The default value is plot.pos.con=0.375
. See the DETAILS section for more information.
This argument is ignored if prob.method="emp.probs"
.
logical scalar indicating whether to produce a plot or add to the current plot (see add
)
on the current graphics device. The default value is plot.it=TRUE
.
logical scalar indicating whether to add the empirical cdf to the current plot (add=TRUE
)
or generate a new plot (add=FALSE
; the default). This argument is ignored if
plot.it=FALSE
.
a numeric scalar or character string determining the color of the empirical cdf line or points.
The default value is ecdf.col=1
. See the entry for col
in the help file for
par
for more information.
a numeric scalar determining the width of the empirical cdf line. The default value is
ecdf.lwd=3*par("cex")
. See the entry for lwd
in the help file for par
for more information.
a numeric scalar determining the line type of the empirical cdf line. The default value is
ecdf.lty=1
. See the entry for lty
in the help file for par
for more information.
a logical scalar indicating whether to fill in the area below the empirical cdf curve with the
color specified by curve.fill.col
. The default value is
curve.fill=FALSE
.
a numeric scalar or character string indicating what color to use to fill in the area below the
empirical cdf curve. The default value is curve.fill.col=5
. This argument is ignored
if curve.fill=FALSE
.
additional graphical parameters (see lines
and par
). In particular,
the argument type
specifies the kind of line type. By default, the function
ecdfPlot
plots a step function (type="s"
) when discrete=TRUE
, and
plots a straight line between points (type="l"
) when discrete=FALSE
.
The user may override these defaults by supplying the graphics parameter type
(type="s"
for a step function, type="l"
for linear interpolation,
type="p"
for points only, etc.).
ecdfPlot
invisibly returns a list with the following components:
numeric vector of the ordered observations.
numeric vector of the associated plotting positions.
The cumulative distribution function (cdf) of a random variable
When we have a sample of data from some population, we usually do not
know what percentiles our observations correspond to because we do not
know the form of the cumulative distribution function
(Note: Some authors (e.g., Chambers et al., 1983, pp.11-16; Cleveland, 1993, pp.17-20)
reverse the axes on a quantile plot, i.e., the observed order statistics from the
random sample are on the
The empirical cumulative distribution function (ecdf)
is an estimate of the cdf based on a random sample of
For any value ecdfPlot
uses the step function version when discrete=TRUE
, and
the linear interpolation version when discrete=FALSE
. The user may override these defaults by
supplying the graphics parameter type
(type="s"
for a step function, type="l"
for linear interpolation, type="p"
for points only, etc.).
The empirical probabilities estimator is intuitively appealing. This is the estimator used when
prob.method="emp.probs"
. The disadvantage of this estimator is that it implies the largest
observed value is the maximum possible value of the distribution (i.e., the 100'th percentile). This
may be satisfactory if the underlying distribution is known to be discrete, but it is usually not
satisfactory if the underlying distribution is known to be continuous.
The plotting-position estimator with various values of qqPlot
) rather than an empirical cdf plot. It is used
to compute the estimated expected values or medians of the order statistics for a probability plot.
This is the estimator used when prob.method="plot.pos"
. The argument plot.pos.con
refers
to the variable qqPlot
for more information).
Because
Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.11-16.
Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp.
D'Agostino, R.B. (1986a). Graphical Analysis. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, Chapter 2, pp.7-62.
# NOT RUN {
# Generate 20 observations from a normal distribution with
# mean=0 and sd=1 and create an ecdf plot.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(250)
x <- rnorm(20)
dev.new()
ecdfPlot(x)
#----------
# Repeat the above example, but fill in the area under the
# empirical cdf curve.
dev.new()
ecdfPlot(x, curve.fill = TRUE)
#----------
# Repeat the above example, but plot only the points.
dev.new()
ecdfPlot(x, type = "p")
#----------
# Repeat the above example, but force a step function.
dev.new()
ecdfPlot(x, type = "s")
#----------
# Clean up
rm(x)
#-------------------------------------------------------------------------------------
# The guidance document USEPA (1994b, pp. 6.22--6.25)
# contains measures of 1,2,3,4-Tetrachlorobenzene (TcCB)
# concentrations (in parts per billion) from soil samples
# at a Reference area and a Cleanup area. These data are strored
# in the data frame EPA.94b.tccb.df.
#
# Create an empirical CDF plot for the reference area data.
dev.new()
with(EPA.94b.tccb.df,
ecdfPlot(TcCB[Area == "Reference"], xlab = "TcCB (ppb)"))
#==========
# Clean up
#---------
graphics.off()
# }
Run the code above in your browser using DataLab