Learn R Programming

Chicago (version 1.0.3)

peakEnrichment4Features: Enrichment for Features

Description

This function computes how many other-ends from a chicagoData object, that engage in significant interactions, overlap with a set of genomic features. In order to determine how those numbers compare to what would be expected if interaction significance had no effect on the overlaps, this function samples different sets of interactions from the non-significant pool and assesses how they overlap with genomic features (it computes the mean and confidence intervals). Results are returned in a table and plotted in a barplot. The difference between the results for the set of significant interactions and the random samples can be used as a measure of the enrichment for genomic features. Samples have the same size as the number of significant interactions called. Moreover, they follow the same distribution of distances between bait and other-end. This is achieved by binning this distribution and drawing interactions per bin, according to the numbers observed in the significant set.

Usage

peakEnrichment4Features(x1, folder=NULL, list_frag, no_bins=NULL, sample_number, position_otherEnd= NULL,colname_dist=NULL, score=5, colname_score="score",min_dist=0, max_dist=NULL, sep="\t", filterB2B=TRUE, b2bcol="isBait2bait", unique=TRUE,plot_name=NULL, trans=FALSE, plotPeakDensity=FALSE)

Arguments

x1
a chicagoData object or a data table (data.table) containining other end IDs.
folder
the name of the folder where the files containing the features of interest are stored.
list_frag
a list where each element is the name of a file containing a feature of interest (e. g. H3K4me1, CTCF, DHS etc.). These files must have a bed format, with no header. Each element of the list must be named.
no_bins
Number of bins to divide the range of colname_dist (after colname_dist has been trimmed according to min_dist and max_dist). This will be important to determine how many interactions should be sampled according to distance from bait. For more details see Note below.
sample_number
Number of samples to be used in the permutation test. Large numbers of samples (around 100) are recommended. Nevertheless, smaller numbers (around 10) speed up the processing time and have shown to give sensible results when compared to large numbers.
position_otherEnd
the name of the file containing the coordinates of the restriction fragments and the corresponding IDs. The coordinates should be "chromosome", "start" and "end", and the ID should be numeric. position_otherEnd only needs to be specified if x is not a chicagoData object.
colname_dist
the name of the column which contains the distances between bait and other end. colname_dist only needs to be specified if x is not a chicagoData object.
score
the threshold above which interactions start being called as significant.
colname_score
the name of the column which contains the score values which establish the level of significance of each interaction.
min_dist
the minimum distance from bait required in the query. If this parameter is set to NULL and trans is set to TRUE, cis interactions are disregarded from the analysis. This parameter is also useful when the user only wants to look at cis distal interactions (very far from bait).
max_dist
the maximum distance from bait required in the query. This parameter is particularly useful when the user only wants to look at cis proximal interactions (interactions surrounding the bait).
sep
the field separator character. Values are separated by this character on each line of the file containing the coordinates of the restriction fragments (called by position_otherEnd).
filterB2B
a logical value indicating whether bait-to-bait interactions should be removed from the analysis.
b2bcol
the name of the column identifying bait-to-bait interactions in the x1.
unique
a logical value indicating whether to removing duplicated other-ends from significant interactions and samples.
plot_name
the name of the file where to save the resulting plot. This parameter is only required if the user wants to save the plot. Otherwise, the plot will be displayed on the screen, but not saved.
trans
a logical value indicating whether the enrichment is to be computed for trans interactions. If this parameter is set to TRUE and min_dist is set to NULL, cis interactions are disregarded from the analysis.
plotPeakDensity
a logical value indicating whether to plot the density of interactions with distance. Setting this parameter to TRUE only applies to cis interactions.

Value

a data frame containing columns for the number of overlaps for each feature in our significant interactions, the average number of overlaps for each feature in our samples, the corresponding standard deviations.

Examples

Run this code
data(cdUnitTest)

##modifications to cdUnitTest, ensuring it uses correct design directory
designDir <- file.path(system.file("extdata", package="Chicago"), "unitTestDesign")
cdUnitTest <- modifySettings(cd=cdUnitTest, designDir=designDir)

##get the unit test ChIP tracks
dataPath <- system.file("extdata", package="Chicago")
ChIPdir <- file.path(dataPath, "unitTestChIP")
dir(ChIPdir)

##get a list of the unit test ChIP tracks
featuresFile <- file.path(ChIPdir, "featuresmESC.txt")
featuresTable <- read.delim(featuresFile, header=FALSE, as.is=TRUE)
featuresList <- as.list(featuresTable$V2)
names(featuresList) <- featuresTable$V1

##test for overlap
peakEnrichment4Features(cdUnitTest, folder=ChIPdir, list_frag = featuresList, no_bins = 500, sample_number = 100)

Run the code above in your browser using DataLab