Learn R Programming

XMRF (version 1.0)

processSeq: Process Sequencing Data for Poisson-based MRFs

Description

Process and normalize RNA-Sequencing count data into a distribution appropriate for Poisson MRFs.

Usage

processSeq(X, quanNorm = 0.75, nLowCount = 20, percentLowCount = 0.95, NumGenes = 500, 
PercentGenes = 0.1)

Arguments

X

nxp data matrix.

quanNorm

an optional parameter controlling the quantile for sample normalization, default to 0.75.

nLowCount

minimum read count to decide if to filter a gene, default to 20.

percentLowCount

filter out a gene if it has this percentage of samples less than nLowCount, default to 0.95.

NumGenes

number of genes to retain in the final data set, default to 500.

PercentGenes

percentage of genes to retain, default to 0.1.

Value

a n x NumGenes or PercentGenes processed data matrix.

Details

To process the next-generation sequencing count data into proper distribution (with dispersion removed), the following steps are taken in this function:

Quantile normalization for the samples.
Filter out genes with all low counts.
Filter genes by maximal variance (if specified).
Transform the data to be closer to the Poisson distribution. A log or power transform is considered and selected based upon the Kolmogorov-Smirnov goodness of fit test.

Examples

library(XMRF)
data('brcadat')
brca = t(processSeq(t(brcadat), PercentGenes=1))

Run the code above in your browser using DataLab