Learn R Programming

methylKit (version 0.99.2)

filterByCoverage: Filter methylRaw, methylRawDB, methylRawList and methylRawListDB object based on read coverage

Description

This function filters methylRaw, methylRawDB, methylRawList and methylRawListDB objects. You can filter based on lower read cutoff or high read cutoff. Higher read cutoff is usefull to eliminate PCR effects Lower read cutoff is usefull for doing better statistical tests.

Usage

filterByCoverage(methylObj, lo.count = NULL, lo.perc = NULL,
  hi.count = NULL, hi.perc = NULL, chunk.size = 1e+06, save.db = FALSE,
  ...)

# S4 method for methylRaw filterByCoverage(methylObj, lo.count = NULL, lo.perc = NULL, hi.count = NULL, hi.perc = NULL, chunk.size = 1e+06, save.db = FALSE, ...)

# S4 method for methylRawList filterByCoverage(methylObj, lo.count = NULL, lo.perc = NULL, hi.count = NULL, hi.perc = NULL, chunk.size = 1e+06, save.db = FALSE, ...)

# S4 method for methylRawDB filterByCoverage(methylObj, lo.count = NULL, lo.perc = NULL, hi.count = NULL, hi.perc = NULL, chunk.size = 1e+06, save.db = TRUE, ...)

# S4 method for methylRawListDB filterByCoverage(methylObj, lo.count = NULL, lo.perc = NULL, hi.count = NULL, hi.perc = NULL, chunk.size = 1e+06, save.db = TRUE, ...)

Arguments

methylObj

a methylRaw, methylRawDB, methylRawList or methylRawListDB object

lo.count

An integer for read counts.Bases/regions having lower coverage than this count is discarded

lo.perc

A double [0-100] for percentile of read counts. Bases/regions having lower coverage than this percentile is discarded

hi.count

An integer for read counts. Bases/regions having higher coverage than this is count discarded

hi.perc

A double [0-100] for percentile of read counts. Bases/regions having higher coverage than this percentile is discarded

chunk.size

Number of rows to be taken as a chunk for processing the methylRawDB or methylRawListDB objects, default: 1e6

save.db

A Logical to decide whether the resulting object should be saved as flat file database or not, default: explained in Details sections

...

optional Arguments used when save.db is TRUE

suffix A character string to append to the name of the output flat file database, only used if save.db is true, default actions: append “_filtered” to current filename if database already exists or generate new file with filename “sampleID_filtered” dbdir The directory where flat file database(s) should be stored, defaults to getwd(), working directory for newly stored databases and to original directory for already existing database dbtype The type of the flat file database, currently only option is "tabix" (only used for newly stored databases)

Value

methylRaw, methylRawDB, methylRawList or methylRawListDB object depending on input object

Details

The parameter chunk.size is only used when working with methylRawDB or methylRawListDB objects, as they are read in chunk by chunk to enable processing large-sized objects which are stored as flat file database. Per default the chunk.size is set to 1M rows, which should work for most systems. If you encounter memory problems or have a high amount of memory available feel free to adjust the chunk.size.

The parameter save.db is per default TRUE for methylDB objects as methylRawDB and methylRawListDB, while being per default FALSE for methylRaw and methylRawList. If you wish to save the result of an in-memory-calculation as flat file database or if the size of the database allows the calculation in-memory, then you might change the value of this parameter.

Examples

Run this code
# NOT RUN {
data(methylKit)

# filter out bases with covereage above 500 reads
filtered1=filterByCoverage(methylRawList.obj,lo.count=NULL,lo.perc=NULL,
hi.count=500,hi.perc=NULL)

# filter out bases with cread coverage above 99.9th percentile of coverage
# distribution
filtered2=filterByCoverage(methylRawList.obj,lo.count=NULL,lo.perc=NULL,
hi.count=NULL,hi.perc=99.9)

# filter out bases with covereage above 500 reads and save to database 
# "test1_max500.txt.bgz" 
# in directory "methylDB", filtered3 now becomes a \code{methylRawDB} object
filtered3=filterByCoverage(methylRawList.obj[[1]], lo.count=NULL,lo.perc=NULL, 
                           hi.count=500, hi.perc=NULL, save.db=TRUE, 
                           suffix="max500", dbdir="methylDB")
                           
# tidy up
rm(filtered3)
unlink("methylDB",recursive=TRUE)

# }

Run the code above in your browser using DataLab