plot.roc: Plot ROC curves and associated diagnostics

Description

Produces up to four plots (selectable by "which") from the results of a call to roc, including the ROC curve itself.

Usage

# S3 method for roc
plot(x,
     which = c(1:3,5),
     group = "Combined",
     prior = NULL,
     show.stats = TRUE,
     abline.col = "grey",
     abline.lty = "dashed",
     inGroup.col = "red",
     outGroup.col = "blue",
     lty = "solid",
     caption = c("ROC curve", "Dissimilarity profiles",
                 "TPF - FPF vs Dissimilarity",
                 "Likelihood ratios"),
     legend = "topright",
     ask = prod(par("mfcol")) < length(which) && dev.interactive(),
     ...)

Value

One or more plots, drawn on the current device.

Arguments

x: an object of class "roc".
which: numeric vector; which aspects of "roc" object to plot if a subset of the plots is required, specify a subset of the numbers 1:5.
group: character vector of length 1 giving the name of the group to plot.
prior: numeric vector of length 2 (e.g. c(0.5, 0.5)) specifiying the prior probabilities of analogue and no-analogue. Used to generate posterior probability of analogue using Bayes factors in plot 5 (which = 5).
show.stats: logical; should concise summary statistics of the ROC analysis be displayed on the plot?
abline.col: character string or numeric value; the colour used to draw vertical lines at the optimal dissimilarity on the plots.
abline.lty: Line type for indicator of optimal ROC dissimilarity threshold. See par for the allowed line types.
inGroup.col: character string or numeric value; the colour used to draw the density curve for the dissimilarities between sites in the same group.
outGroup.col: character string or numeric value; the colour used to draw the density curve for the dissimilarities between sites not in the same group.
lty: vector of at most length 2 (elements past the second in longer vectors are ignored) line types. The first element of lty will be used where a single line is drawn on a plot. Where two lines are drawn (for analogue and non-analogue cases), the first element pertains to the analogue group and the second element to the non-analogue group. See par for the allowed line types.
caption: vector of character strings, containing the captions to appear above each plot.
legend: character; position of legends drawn on plots. See Details section in legend for keywords that can be specified.
ask: logical; if TRUE, the user is asked before each plot, see par(ask=.).
...: graphical arguments passed to other graphics functions.

Author

Gavin L. Simpson. Code borrows heavily from plot.lm.

Details

This plotting function is modelled closely on plot.lm and many of the conventions and defaults for that function are replicated here.

First, some definitions:

TPF: True Positive Fraction, also known as sensitivity.
TNF: True Negative Fraction, also known as specificity.
FPF: False Positive Fraction; the complement of TNF, calculated as 1 - TNF. This is often referred to a 1 - specificity. A false positive is also known as a type I error.
FNF: False Negative Fraction; the complement of TPF, calculated as 1 - TPF. A false negative is also known as a type II error.
AUC: The Area Under the ROC Curve.

The "ROC curve" plot (which = 1,) draws the ROC curve itself as a plot of the False Positive Fraction against the True Positive Fraction. A diagonal 1:1 line represents no ability for the dissimilarity coefficient to differentiate between groups. The AUC statistic may also be displayed (see argument "show.stats" above).

The "Dissimilarity profile" plot (which = 2), draws the density functions of the dissimilarity values (d) for the correctly assigned samples and the incorrectly assigned samples. A dissimilarity coefficient that is able to well distinguish the sample groupings will have density functions for the correctly and incorrectly assigned samples that have little overlap. Conversely, a poorly discriminating dissimilarity coefficient will have density profiles for the two assignments that overlap considerably. The point where the two curves cross is the optimal dissimilarity or critical value, d'. This represents the point where the difference between TPF and FPF is maximal. The value of d at the point where the difference between TPF and FPF is maximal will not neccesarily be the same as the value of d' where the density profiles cross. This is because the ROC curve has been estimated at discrete points d, which may not include excatly the optimal d', but which should be close to this value if the ROC curve is not sampled on too coarse an interval.

The "TPF - FPF vs Dissimilarity" plot, draws the difference between the ROC curve and the 1:1 line. The point where the ROC curve is farthest from the 1:1 line is the point at which the ROC curve has maximal slope. This is the optimal value for d, as discussed above.

The "Likelihood ratios" plot, draws two definitions of the slope of the ROC curve as the likelihood functions LR(+), and LR(-). LR(+), is the likelihood ratio of a positive test result, that the value of d assigns the sample to the group it belongs to. LR(-) is the likelihood ratio of a negative test result, that the value of d assigns the sample to the wrong group.

LR(+) is defined as \(LR(+) = TPF / FPF\) (or sensitivity / (1 - specificity)), and LR(-) is defined as \(LR(-) = FPF / TNF\) (or (1 - sensitivity) / specificity), in Henderson (1993).

The “probability of analogue” plot, draws the posterior probability of analogue given a dissimilarity. This is the LR(+) likelihood ratio values multiplied by the prior odds of analogue, for given values of the dissimilarity, and is then converted to a probability.

References

Brown, C.D., and Davis, H.T. (2006) Receiver operating characteristics curves and related decision measures: A tutorial. Chemometrics and Intelligent Laboratory Systems 80, 24--38.

Gavin, D.G., Oswald, W.W., Wahl, E.R. and Williams, J.W. (2003) A statistical approach to evaluating distance metrics and analog assignments for pollen records. Quaternary Research 60, 356--367.

Henderson, A.R. (1993) Assessing test accuracy and its clinical consequences: a primer for receiver operating characteristic curve analysis. Annals of Clinical Biochemistry 30, 834--846.