Learn R Programming

methylKit (version 0.99.2)

PCASamples: Principal Components Analysis of Methylation data

Description

The function does a PCA analysis using prcomp function using percent methylation matrix as an input.

Usage

PCASamples(.Object, screeplot=FALSE, adj.lim=c(0.0004,0.1), scale=TRUE,
center=TRUE,comp=c(1,2),transpose=TRUE,sd.filter=TRUE,
           sd.threshold=0.5,filterByQuantile=TRUE,obj.return=FALSE,chunk.size)

# S4 method for methylBase PCASamples(.Object, screeplot, adj.lim, scale, center, comp, transpose, sd.filter, sd.threshold, filterByQuantile, obj.return)

# S4 method for methylBaseDB PCASamples(.Object, screeplot = FALSE, adj.lim = c(4e-04, 0.1), scale = TRUE, center = TRUE, comp = c(1, 2), transpose = TRUE, sd.filter = TRUE, sd.threshold = 0.5, filterByQuantile = TRUE, obj.return = FALSE, chunk.size = 1e+06)

Arguments

.Object

a methylBase or methylBaseDB object

screeplot

a logical value indicating whether to plot the variances against the number of the principal component. (default: FALSE)

adj.lim

a vector indicating the propotional adjustment of xlim (adj.lim[1]) and ylim (adj.lim[2]). This is primarily used for adjusting the visibility of sample labels on the on the PCA plot. (default: c(0.0004,0.1))

scale

logical indicating if prcomp should scale the data to have unit variance or not (default: TRUE)

center

logical indicating if prcomp should center the data or not (default: TRUE)

comp

vector of integers with 2 elements specifying which components to be plotted.

transpose

if TRUE (default) percent methylation matrix will be transposed, this is equivalent to doing PCA on variables that are regions/bases. The resulting plot will location of samples in the new coordinate system if FALSE the variables for the matrix will be samples and the resulting plot whill show how each sample (variable) contributes to the principle component.the samples that are highly correlated should have similar contributions to the principal components.

sd.filter

If TRUE, the bases/regions with low variation will be discarded prior to PCA (default:TRUE)

sd.threshold

A numeric value. If filterByQuantile is TRUE, the value should be between 0 and 1 and the features whose standard deviations is less than the quantile denoted by sd.threshold will be removed. If filterByQuantile is FALSE, then features whose standard deviations is less than the value of sd.threshold will be removed.(default:0.5)

filterByQuantile

A logical determining if sd.threshold is to be interpreted as a quantile of all standard deviation values from bases/regions (the default), or as an absolute value

obj.return

if the result of prcomp function should be returned or not. (Default:FALSE)

chunk.size

Number of rows to be taken as a chunk for processing the methylRawListDB objects, default: 1e6

Value

The form of the value returned by PCASamples is the summary of principal component analysis by prcomp.

Details

The parameter chunk.size is only used when working with methylBaseDB objects, as they are read in chunk by chunk to enable processing large-sized objects which are stored as flat file database. Per default the chunk.size is set to 1M rows, which should work for most systems. If you encounter memory problems or have a high amount of memory available feel free to adjust the chunk.size.

Examples

Run this code
# NOT RUN {
data(methylKit) 

# do PCA with filtering rows with low variation, filter rows with standard 
# deviation lower than the 50th percentile of Standard deviation distribution
PCASamples(methylBase.obj,screeplot=FALSE, adj.lim=c(0.0004,0.1),
           scale=TRUE,center=TRUE,comp=c(1,2),transpose=TRUE,sd.filter=TRUE,
           sd.threshold=0.5,filterByQuantile=TRUE,obj.return=FALSE)

# }

Run the code above in your browser using DataLab