Learn R Programming

refGenome (version 1.7.7)

overlapJuncs: overlapJuncs function

Description

Overlaps query gap-sites (from BAM alignment data) with annotated splice junctions (from reference genome annotation).

Usage

overlapJuncs(qry, junc)

Arguments

qry

data.frame. Table with query ranges. qry should have columns 'id', 'seqid', 'lstart', 'lend', 'rstart', 'rend'.

junc

refJunctions. Object which contains table of splice junctions in reference genome.

Value

The function returns a data.frame

qid

Integer. Query id value from qry table

%
refid

Integer. Reference id from junctions object for best hit.

%
ldiff

Integer. Difference between lend values in qry and junc table for best hit (refid) record.

%
rdiff

Integer. Difference between rstart values in qry and junc table for best hit (refid) record.

%
nref

Integer. Number of junc records which possibly overlap with query item. nref=0 when no overlap has been found for query.

%
sod

Integer. Sum of distances (=abs(ldiff) + abs(rdiff)). sod=0 when qry exactly hits an annotated site. sod=NA when no overlap has been found for query.

%
first_refid

Integer. id for first overlapping record in junc table.

%
last_refid

Integer. id for last overlapping record in junc table.

%
nadv

Integer. Number of advancing iterations during search for

%
strand

Strand value derived from annotation.

%
gene_id

Gene id from refJunctions or genpos table.

%
transcript_id

Transcript id from refJunctions table.

%
gene_name

Gene name from refJunctions or genpos table.

%

Details

The function finds optimal overlapping hits for alignment gap-sites in annotated splice-sites. A gap-site is the combination of two genomic regions (= exons) which enclose an intermediary (= intron). The function identifies junction records which overlap with the given gap-site (=hits) and select a junction with an optimal fit. The goodness of fit is measured by the distance of the inner gap boundaries (= the splice sites) between query and junction record. A junction with minimal sum of upstream and downstrem distances is selected. The selection of the best hit depends on the order a version of the junction table wich is sorted by lstart and rend.)

Examples

Run this code
# NOT RUN {
##-------------------------------------##
## A) Example query  data
##-------------------------------------##
##                          1       2       3       4       5       6       7 ##
qry <- data.frame(id = 1:7, seqid = "1",
            lstart = c(10100L, 11800L, 12220L, 12220L, 12220L, 32000L, 40000L),
            lend =   c(10100L, 12000L, 12225L, 12227L, 12227L, 32100L, 40100L),
            rstart = c(10200L, 12200L, 12057L, 12613L, 12650L, 32200L, 40200L),
            rend =   c(10300L, 12250L, 12179L, 12620L, 12700L, 32300L, 40300L))
##                          1       2       3       4       5       6       7 ##

##-------------------------------------##
## B) Example reference data
##-------------------------------------##
# B.1) Ensembl genome:
ensfile <- system.file("extdata", "hs.ensembl.62.small.RData", package="refGenome")
ens <- loadGenome(ensfile)
gp <- getGenePositions(ens)
# B.2) Ensembl junctions:
junc <- getSpliceTable(ens)
##-------------------------------------##
## C) Do overlap
##-------------------------------------##
res <- overlapJuncs(qry, junc)
# }

Run the code above in your browser using DataLab