rowpAUCs-methods: Rowwise ROC and pAUC computation

Description

Methods for fast rowwise computation of ROC curves and (partial) area under the curve (pAUC) using the simple classification rule x > theta, where theta is a value in the range of x

Usage

rowpAUCs(x, fac, p=0.1, flip=TRUE, caseNames=c("1", "2"))

Arguments

ExpressionSet or numeric matrix. The matrix must not contain NA values.

fac

A factor or numeric or character that can be coerced to a factor. If x is an ExpressionSet, this may also be a character vector of length 1 with the name of a covariate variable in x. fac must have exactly 2 levels. For better control over the classification, use integer values in 0 and 1, where 1 indicates the "Disease" class in the sense of the Pepe et al paper (see below).

Numeric vector of length 1. Limit in (0,1) to integrate pAUC to.

flip

Logical. If TRUE, both classification rules

x
  > theta

and x < theta are tested and the (partial) area under the curve of the better one of the two is returned. This is appropriate for the cases in which the classification is not necessarily linked to higher expression values, but instead it is symmetric and one would assume both over- and under-expressed genes for both classes. You can set flip to FALSE if you only want to screen for genes which discriminate Disease from Control with the x > theta rule.

caseNames

The class names that are used when plotting the data. If fac is the name of the covariate variable in the ExpressionSet the function will use its levels as caseNames.

Value

An object of class rowROC with the calculated specificities and sensitivities for each row and the corresponding pAUCs and AUCs values. See rowROC for details.

Details

Rowwise calculation of Receiver Operating Characteristic (ROC) curves and the corresponding partial area under the curve (pAUC) for a given data matrix or ExpressionSet. The function is implemented in C and thus reasonably fast and memory efficient. Cutpoints (theta are calculated before the first, in between and after the last data value. By default, both classification rules x > theta and x < theta are tested and the (partial) area under the curve of the better one of the two is returned. This is only valid for symmetric cases, where the classification is independent of the magnitude of x (e.g., both over- and under-expression of different genes in the same class). For unsymmetric cases in which you expect x to be consistently higher/lower in of of the two classes (e.g. presence or absence of a single biomarker) set flip=FALSE or use the functionality provided in the ROC package. For better control over the classification (i.e., the choice of "Disease" and "Control" class in the sense of the Pepe et al paper), argument fac can be an integer in [0,1] where 1 indicates "Disease" and 0 indicates "Control".

References

Pepe MS, Longton G, Anderson GL, Schummer M.: Selecting differentially expressed genes from microarray experiments. Biometrics. 2003 Mar;59(1):133-42.

Examples

Run this code

library(Biobase)
data(sample.ExpressionSet)

r1 = rowttests(sample.ExpressionSet, "sex")
r2 = rowpAUCs(sample.ExpressionSet, "sex", p=0.1)

plot(area(r2, total=TRUE), r1$statistic, pch=16)
sel <- which(area(r2, total=TRUE) > 0.7)
plot(r2[sel])

## this compares performance and output of rowpAUCs to function pAUC in
## package ROC 
if(require(ROC)){
  ## performance
  myRule = function(x)
    pAUC(rocdemo.sca(truth = as.integer(sample.ExpressionSet$sex)-1 ,
         data = x, rule = dxrule.sca), t0 = 0.1)
  nGenes = 200
  cat("computation time for ", nGenes, "genes:
")
  cat("function pAUC: ")
  print(system.time(r3 <- esApply(sample.ExpressionSet[1:nGenes, ], 1, myRule)))
  cat("function rowpAUCs: ")
  print(system.time(r2 <- rowpAUCs(sample.ExpressionSet[1:nGenes, ],
  "sex", p=1)))

  ## compare output
  myRule2 = function(x)
   pAUC(rocdemo.sca(truth = as.integer(sample.ExpressionSet$sex)-1 ,
                    data = x, rule = dxrule.sca), t0 = 1)
  r4 <-  esApply(sample.ExpressionSet[1:nGenes, ], 1, myRule2)
  plot(r4,area(r2), xlab="function pAUC", ylab="function rowpAUCs",
  main="pAUCs")

  plot(r4, area(rowpAUCs(sample.ExpressionSet[1:nGenes, ],
  "sex", p=1, flip=FALSE)), xlab="function pAUC", ylab="function rowpAUCs",
  main="pAUCs")

  r4[r4<0.5] <- 1-r4[r4<0.5]
  plot(r4, area(r2), xlab="function pAUC", ylab="function rowpAUCs",
  main="pAUCs")
 }

Run the code above in your browser using DataLab