Learn R Programming

ClassDiscovery (version 3.4.0)

mahalanobisQC: Using Mahalanobis Distance and PCA for Quality Control

Description

Compute the Mahalanobis distance of each sample from the center of an N-dimensional principal component space.

Usage

mahalanobisQC(spca, N)

Arguments

spca

object of class SamplePCA representing the results of a principal components analysis.

N

integer scalar specifying the number of components to use when assessing QC.

Value

Returns a data frame containing two columns, with the rows corresponding to the columns of the original data set on which PCA was performed. First column is the chi-squared statistic, with N degrees of freedom. Second column is the associated p-value.

Details

The theory says that, under the null hypothesis that all samples arise from the same multivariate normal distribution, the distance from the center of a D-dimensional principal component space should follow a chi-squared distribution with D degrees of freedom. This theory lets us compute p-values associated with the Mahalanobis distances for each sample. This method can be used for quality control or outlier identification.

References

Coombes KR, et al. Quality control and peak finding for proteomics data collected from nipple aspirate fluid by surface-enhanced laser desorption and ionization. Clin Chem 2003; 49:1615-23.

Examples

Run this code
# NOT RUN {
library(oompaData)
data(lungData)
spca <- SamplePCA(na.omit(lung.dataset))
mc <- mahalanobisQC(spca, 2)
mc[mc$p.value < 0.01,]
# }

Run the code above in your browser using DataLab