enrichedChrRegions(hits1, hits2, chrLength, windowSize=10^4-1, fdr=0.05, nSims=10, mc.cores=1)
GRanges
or RangedData
object.GRanges
(if input is GRanges
) or
RangedData
(if input is RangedData
) containing regions with
smoothed hit count above the specified FDR level.
signature(hits1 = "GRanges", hits2 = "missing")
,
signature(hits1 = "RangedData", hits2 = "missing")
hits1
.
signature(hits1 = "GRanges", hits2 = "GRanges")
,
signature(hits1 = "RangedData", hits2 = "RangedData")
hits1
vs hits2
.
windowSize
.
Notice that only the mid-point of each hit in hits1
(and
hits2
if specified) is used. That is,
hits are not treated as intervals but as being located at a single base
pair.If hits2
is missing, regions with large smoothed number of hits
are selected.
To assess statistical significance, we generate hits (also 1
base pair long) randomly distributed along the genome and
compute the smoothed number of hits.
The number of simulated hits
is set equal to nrow(hits1)
.
The process is repeated
nSims
times, resulting in several independent simulations.
To estimate the FDR, several thresholds to define enriched chromosomal
regions are considered. For each threshold, we count the
number of regions above the threshold in the observed data and in the
simulations. For each threshold t, the FDR is estimated as
the average number of regions with score >=t in the simulations
over the number of regions with score >=t in the observed data.
If hits2
is not missing, the difference in smoothed proportion of
hits (i.e. the number of hits in the window divided by the overall number of hits)
between the two groups is used as a test statistic.
To assess statistical significance, we generate randomly scramble hits
between sample 1 and sample 2 (maintaining the original number of hits
in each sample), and we re-compute the test statistic.
The FDR for a given threshold t is estimated as the
number of bases in the simulated data with test statistic>t divided by
number of bases in observed data with test statistic>t.
The lowest t with estimated FDR below fdr
is used to define
enriched chromosomal regions.
set.seed(1)
st <- round(rnorm(100,500,100))
st[st>10000] <- 10000
strand <- rep(c('+','-'),each=50)
hits1 <- GRanges('chr1', IRanges(st,st+38),strand=strand)
chrLength <- c(chr1=10000)
enrichedChrRegions(hits1,chrLength=chrLength, windowSize=99, nSims=1)
Run the code above in your browser using DataLab