Learn R Programming

bumphunter (version 1.12.0)

matchGenes: Find and annotate closest genes to genomic regions

Description

Find and annotate closest genes to genomic regions

Usage

matchGenes(x, subject, type = c("any", "fiveprime"), promoterDist = 2500, skipExons = FALSE, verbose = TRUE)

Arguments

x
An IRanges or GenomicRanges object, or a data.frame with columns for start, end, and, optionally, chr or seqnames.
subject
An GenomicRanges object containing transcripts or genes that have been annotated by the function annotateTranscripts.
promoterDist
Anything within this distance to the transcription start site (TSE) will be considered a promoter.
type
Should the distance be computed to any part of the transcript or the five prime end.
skipExons
Should the annotation of exons be skipped. Skipping this part makes the code slightly faster.
verbose
logical value. If 'TRUE', it writes out some messages indicating progress. If 'FALSE' nothing should be printed.

Value

A data frame with one row for each query and with columns c("name", "annotation", "description", "region", "distance", "subregion", "insideDistance", "exonnumber", "nexons", "UTR", "strand", "geneL", "codingL","Entrez","subhectHits"). The first column is the _gene_ nearest the query, by virtue of it owning the transcript determined (or chosen by nearest) to be nearest the query. Note that the nearest gene to a given query, in column 3, may not be unique and we arbitrarily chose one as done by default by nearest.The "distance" column is the distance from the query to the 5' end of the nearest transcript, so may be different from the distance computed by nearest to that transcript, as a range.
name
Symbol of nearest gene
annotation
RefSeq ID
description
a factor with levels c("upstream", "promoter", "overlaps 5'", "inside intron", "inside exon", "covers exon(s)", "overlaps exon upstream", "overlaps exon downstream", "overlaps two exons", "overlaps 3'", "close to 3'", "downstream", "covers")
region
a factor with levels c("upstream", "promoter", "overlaps 5'", "inside", "overlaps 3'", "close to 3'", "downstream", "covers")
distance
distance before 5' end of gene
subregion
a factor with levels c("inside intron", "inside exon", "covers exon(s)", "overlaps exon upstream", "overlaps exon downstream", "overlaps two exons")
insideDistance
distance past 5' end of gene
exonnumber
which exon
nexons
number of exons
UTR
a factor with levels c("inside transcription region", "5' UTR", "overlaps 5' UTR", "3'UTR", "overlaps 3'UTR", "covers transcription region")
strand
"+" or "-"
geneL
the gene length
codingL
the coding length
Entrez
Entrez ID of closest gene
subjectHits
Index in subject of hit

Details

This function runs nearest and then annotates the the relationship between the region and the transcript/gene that is closest. Many details are provided on this relationship as described in the next section.

See Also

annotateNearest, annotateTranscripts

Examples

Run this code
## Not run: 
#     islands=makeGRangesFromDataFrame(read.delim("http://rafalab.jhsph.edu/CGI/model-based-cpg-islands-hg19.txt")[1:100,])
#     library("TxDb.Hsapiens.UCSC.hg19.knownGene")
#     genes <- annotateTranscripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
#     tab<- matchGenes(islands,genes)
# ## End(Not run)

Run the code above in your browser using DataLab