dmrFind: Identify DMR candidates using a regression-based approach and correcting for batch effects.

Description

Identify DMR candidates using a regression-based approach and correcting for batch effects.

Usage

dmrFind(p=NULL, logitp=NULL, svs=NULL, mod, mod0, coeff, pns, chr, pos, only.cleanp=FALSE, only.dmrs=FALSE, rob=TRUE, use.limma=FALSE, smoo="weighted.loess", k=3, SPAN=300, DELTA=36, use="sbeta", Q=0.99, min.probes=3, min.value=0.075, keepXY=TRUE, sortBy="area.raw", verbose=TRUE, vfilter=NULL, nsubsets=50, ...)

Arguments

matrix of methylation percentage estimates. Either this or logitp must be provided. Can be an ff_matrix object.

logitp

matrix of logit-transformed methylation percentage estimates. Either this or p must be provided. Can be an ff_matrix object.

svs

surrogate variables whose effect will be corrected for. This should be svaobj$sv, where svaobj is the object returned by sva(). Setting svs=0 will result in sva not being used.

mod

The mod argument provided to sva() which yielded svs. This should be a design matrix with all the adjustment covariates and (in the rightmost column(s)) your covariate of interest.

mod0

The mod0 argument provided to sva() which yielded svs. This should be a design matrix with just the adjustment covariates. Thus it should be the same as mod, excluding the rightmost column(s) for the covariate of interest.

coeff

a character or numeric index for the column of mod that identifies the covariate column of interest.

pns

vector of region names for the probes corresponding to rows of p or logitp.

chr

vector of chromosomal identifiers for the probes corresponding to rows of p or logitp.

pos

vector of chromosomal coordinates for the probes corresponding to rows of p or logitp.

only.cleanp

if TRUE, only return the matrix of methylation percentage estimates after removing the batch effects (columns of sv)

only.dmrs

if TRUE, do not return the matrix of methylation percentage estimates that have had the batch effects (columns of sv) removed (called cleanp).

rob

One of the outputs of dmrFind is cleanp, which is the input p matrix after removing batch effects identified by SVA. By default these are the only effects removed from the p matrix. However, if you set rob=FALSE, then the other adjustment variables in mod and mod0 (all other variables besides the covariate of interest) are also removed. This will affect the methylation levels shown in plots using plotDMRs, plotRegions, and any other function that uses the cleanp output of dmrFind. It does not affect the selection of DMRs, though, except in the application of the filter at the end of the function where DMRs with an average difference between the 2 groups (or the average correlation between methylation and the covariate if the covariate is continuous) of less than the min.value argument are filtered out, since this step uses the cleanp matrix to calculate those averages or that correlation.

use.limma

Use the linear modeling approach (borrowing strength across probes) of lmFit in the limma package.

smoo

which method to use for smoothing. "weighted loess", "loess", or "runmed".

k argument to runmed() if smoo="runmed".

SPAN

see DELTA. Only used if smoo="loess"

DELTA

span parameter in loess smoothing will = SPAN/(DELTA * number of probes in the plotted region). Only used if smoo="loess".

use

If "sbeta", identify DMRs by segmenting the smoothed effect estimates. If "swald", identify DMRs by segmenting the smoothed wald statistics.

Identify DMRs as the consecutive groups of probes whose smoothed effect estimate (if use="sbeta") or smoothed wald statistics (if use="swald") exceed this quantile.

min.probes

The minimum allowable number of probes in a DMR candidate.

min.value

The minimum allowable average difference in methylation percentage between the 2 groups if covariate is categorical, or the minimum average correlation between methylation and the covariate if covariate is continuous.

keepXY

if FALSE, exclude DMRs in "chrX" and "chrY".

sortBy

column of DMR table to sorty by.

verbose

print progress messages if TRUE.

vfilter

vfilter argument to sva function. The number of most variable probes to use when building SVs--must be between 100 and m, where m is the total number of probes. vfilter=NULL by default, which means vfilter=m. Setting this to something smaller, like 100000, will typically yield satisfactory SVs and is advisable when the size of the data is very large and memory is limited (as when the p and/or logitp arguments given to dmrFind are ff_matrix objects).

nsubsets

used if p or logitp are ff_matrix objects. Rather than doing computations on the whole logitp matrix, break up its m rows into chunks of m/nsubsets rows. Default is 50, but if even that uses too much memory, set this higher. Results do not depend on the value chosen.

...

Additional arguments passed to sva()

Value

dmrs

A data frame with all DMR candidates, with columns:

chr: chromosome of DMR
start: start of DMR (bp)
end: end of DMR (bp)
value: average value of the smoothed effect estimate within the DMR if use="sbeta" (the default), or the average value of the smoothed wald statistic within the DMR if use="swald"
area: nprobes x value
pns: name of the region on the array in which the DMR candidate was identified.
indexStart: index of first probe in DMR. This indexes chr, pos, pns, and cleanp
indexEnd: index of last probe in DMR. This indexes chr, pos, pns, and cleanp
nprobes: number of probes for the DMR, i.e., indexEnd-indexStart+1
avg: average (across probes) percentage methylation difference within the DMR if covariate is categorical, or average (across probes) correlation between cleanp and covariate if covariate is continuous.
max: maximum (across probes) percentage methylation difference within the DMR if covariate is categorical, or maximum (across probes) correlation between cleanp and covariate if covariate is continuous.
area.raw: nprobes x avg

pval

a vector of p-values for the t-test at each probe (in same order as rows of cleanp)

pns

a vector of probe region names corresponding to the rows of cleanp

chr

a vector of chromosomes corresponding to the rows of cleanp

pos

a vector of positions corresponding to the rows of cleanp

args

A list containing all the arguments provided to dmrFind. If svs was not provided, svs here will be the surrogate variables obtained from sva.

cleanp

The matrix of percentage methylation estimates, after subtracting batch effects. If rob=FALSE, the effects of the other adjustment covariates are removed also.

beta

the effect estimate at each probe.

sbeta

the smoothed effect estimate at each probe.

Details

Identify DMR candidates using a regression-based approach and correcting for batch effects.

Examples

Run this code

# See qval

Run the code above in your browser using DataLab