matchGenes: Find and annotate closest genes to genomic regions

Description

Find and annotate closest genes to genomic regions

Usage

matchGenes(x, subject, type = c("any", "fiveprime"), promoterDist = 2500, skipExons = FALSE, verbose = TRUE)

Arguments

An IRanges or GenomicRanges object, or a data.frame with columns for start, end, and, optionally, chr or seqnames.

subject

An GenomicRanges object containing transcripts or genes that have been annotated by the function annotateTranscripts.

promoterDist

Anything within this distance to the transcription start site (TSE) will be considered a promoter.

type

Should the distance be computed to any part of the transcript or the five prime end.

skipExons

Should the annotation of exons be skipped. Skipping this part makes the code slightly faster.

verbose

logical value. If 'TRUE', it writes out some messages indicating progress. If 'FALSE' nothing should be printed.

Value

name: Symbol of nearest gene
annotation: RefSeq ID
description: a factor with levels c("upstream", "promoter", "overlaps 5'", "inside intron", "inside exon", "covers exon(s)", "overlaps exon upstream", "overlaps exon downstream", "overlaps two exons", "overlaps 3'", "close to 3'", "downstream", "covers")
region: a factor with levels c("upstream", "promoter", "overlaps 5'", "inside", "overlaps 3'", "close to 3'", "downstream", "covers")
distance: distance before 5' end of gene
subregion: a factor with levels c("inside intron", "inside exon", "covers exon(s)", "overlaps exon upstream", "overlaps exon downstream", "overlaps two exons")
insideDistance: distance past 5' end of gene
exonnumber: which exon
nexons: number of exons
UTR: a factor with levels c("inside transcription region", "5' UTR", "overlaps 5' UTR", "3'UTR", "overlaps 3'UTR", "covers transcription region")
strand: "+" or "-"
geneL: the gene length
codingL: the coding length
Entrez: Entrez ID of closest gene
subjectHits: Index in subject of hit

Details

This function runs nearest and then annotates the the relationship between the region and the transcript/gene that is closest. Many details are provided on this relationship as described in the next section.

Examples

Run this code

## Not run: 
#     islands=makeGRangesFromDataFrame(read.delim("http://rafalab.jhsph.edu/CGI/model-based-cpg-islands-hg19.txt")[1:100,])
#     library("TxDb.Hsapiens.UCSC.hg19.knownGene")
#     genes <- annotateTranscripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
#     tab<- matchGenes(islands,genes)
# ## End(Not run)