This function will take in a peak matrix and an annotation file (gtf) and collapse the peak matrix to a gene activity matrix. It makes the simplifying assumption that all counts in the gene body plus X kb up and or downstream should be attributed to that gene.
CreateGeneActivityMatrix(
peak.matrix,
annotation.file,
seq.levels = c(1:22, "X", "Y"),
include.body = TRUE,
upstream = 2000,
downstream = 0,
verbose = TRUE
)
Matrix of peak counts
Path to GTF annotation file
Which seqlevels to keep (corresponds to chromosomes usually)
Include the gene body?
Number of bases upstream to consider
Number of bases downstream to consider
Print progress/messages