peakreg: Call and merge enriched genomic windows/bins.

Description

A function used to call and merge enriched bins using the posterior probability calculated by iSeq1 or iSeq2 functions at certain posterior probability and false discovery rate (FDR) cutoffs.

Usage

peakreg(chrpos,count,pp,cutoff,method=c("ppcut","fdrcut"),maxgap=300)

Arguments

chrpos

A n by 3 matrix or data frame. The rows correspond to genomic bins. The first column contains chromosome IDs; the second and third columns contain the start and end positions of the bin, respectively.

count

A n by 2 matrix containing the number of sequence tags in the bins specified by chrpos. The first column contains the tag counts for chain 1 (usually the forward chain), and the second column contains the tag counts for chain 2 (usually the reverse chain). See the document of the function 'mergetag' for the definition of chain 1 and 2. The function uses the information in 'count' to find the center of the enriched regions, where the true binding sites are usually located.

A vector containing the posterior probabilities of bins in the enriched state returned by functions iSeq1 or iSeq2.

cutoff

The cutoff value (a scalar) used to call enriched bins. If use posterior probability as a criterion (method="ppcut"), a bin is said to be enriched if its pp is greater than the cutoff. If use FDR as a criterion (method="fdrcut"), bins are said to be enriched if the bin-based FDR is less than the cutoff. The FDR is calculated using a direct posterior probability approach (Newton et al., 2004).

method

'ppcut' or 'fdrcut'.

maxgap

The criterion used to merge enriched bins. If the genomic distance of adjacent bins is less than maxgap, the bins will be merged into the same enriched region.

Value

A data frame with rows corresponding to enriched regions and columns corresponding to the following:
chrChromosome IDs.
gstartThe start genomic position of the enriched region.
gendThe end genomic position of the enriched region.
rstartThe row number for gstart in chrpos.
rendThe row number for gend in chrpos.
peakposThe inferred center (peak) of the enriched region.
meanppThe mean posterior probability of the merged regions/bins.
ct1total tag counts for the region from gstart to gend for the chain corresponding to count[,1]; ct1=sum(count[rstart:rend,1])
ct2total tag counts for the region from gstart to gend for the chain corresponding to count[,2]; ct2=sum(count[rstart:rend,2])
ct12ct12 = ct1 + ct2
symA parameter used to measure if the forward and reverse tag counts are symmetrical (or balanced) in enriched regions. The values range from 0.5 (perfect symmetry) to 0 (complete asymmetry).

References

Qianxing Mo. (2012). A fully Bayesian hidden Ising model for ChIP-seq data analysis. Biostatistics 13(1), 113-28. Newton, M., Noueiry, A., Sarkar, D., Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5 , 155-176.

Examples

Run this code

data(nrsf)
chip = rbind(nrsf$chipFC1592,nrsf$chipFC1862,nrsf$chipFC2002)
mock = rbind(nrsf$mockFC1592,nrsf$mockFC1862,nrsf$mockFC2002)
tagct = mergetag(chip=chip,control=mock,maxlen=80,minlen=10,ntagcut=20)
tagct22 = tagct[tagct[,1]=="chr22",]
res1 = iSeq1(Y=tagct22[,1:4],gap=200,burnin=200,sampling=500,ctcut=0.95,a0=1,b0=1,
 a1=5,b1=1, k0=3,mink=0,maxk=10,normsd=0.1,verbose=FALSE)

reg1 = peakreg(tagct22[,1:3],tagct22[,5:6]-tagct22[,7:8],res1$pp,0.5,
        method="ppcut",maxgap=200)

reg2 = peakreg(tagct22[,1:3],tagct22[,5:6]-tagct22[,7:8],res1$pp,0.05,
         method="fdrcut",maxgap=200)

Run the code above in your browser using DataLab