qProfile: Quantify alignments by relative position

Description

Quantify alignments from sequencing data, relative to their position in query regions.

Usage

qProfile(proj,  query, upstream=1000, downstream=upstream, selectReadPosition=c("start", "end"), shift=0L, orientation=c("any", "same", "opposite"), useRead=c("any", "first", "last"), auxiliaryName=NULL, mask=NULL, collapseBySample=TRUE, includeSpliced=TRUE, includeSecondary=TRUE, mapqMin=0L, mapqMax=255L, absIsizeMin=NULL, absIsizeMax=NULL, maxInsertSize=500L, clObj=NULL)

Arguments

proj

a qProject object representing a sequencing experiment as returned by qAlign

query

an object of type GRanges with the regions to be profiled. All regions in query will be anchored at their biological start position (start(query) for regions on strand “+” or “*”, end(query) for regions on strand “-”). This position will become position zero in the return value.

upstream

An “integer” vector of length one or the same length as query indicating the number of bases upstream of the anchor position to include in the profile.

downstream

An “integer” vector of length one or the same length as query indicating the number of bases downstream of the anchor position to include in the profile.

selectReadPosition

defines the part of the alignment that has to be contained within a query region to produce an overlap (see Details), and that is used to calculate the relative position within the query region. Possible values are:

start (default): start of the alignment
end: end of the alignment

shift

controls the shifting alignments towards their 3'-end before quantification. shift can be one of:

an “integer” vector of the same length as the number of alignment files
a single “integer” value
the character string "halfInsert" (only available for paired-end experiments)

The default of 0 will not shift any alignments.

orientation

sets the required orientation of the alignments relative to the query region in order to be counted, one of:

any (default): count alignment on the same and opposite strand
same : count only alignment on the same strand
opposite : count only alignment on the opposite strand

useRead

For paired-end experiments, selects the read mate whose alignments should be counted, one of:

any (default): count all alignments
first : count only alignments from the first read
last : count only alignments from the last read

auxiliaryName

which bam files to use in an experiments with auxiliary alignments (see Details).

mask

If not NULL, a GRanges object with reference regions to be masked, i.e. excluded from the quantification, such as unmappable or highly repetitive regions (see Details).

collapseBySample

if TRUE (the default), sum alignment counts from bam files with the same sample name.

includeSpliced

if TRUE (the default), include spliced alignments when counting. A spliced alignment is defined as an alignment with a gap in the read of at least 60 bases.

includeSecondary

if TRUE (the default), include alignments with the secondary bit (0x0100) set in the FLAG when counting.

mapqMin

minimal mapping quality of alignments to be included when counting (mapping quality must be greater than or equal to mapqMin). Valid values are between 0 and 255. The default (0) will include all alignments.

mapqMax

maximal mapping quality of alignments to be included when counting (mapping quality must be less than or equal to mapqMax). Valid values are between 0 and 255. The default (255) will include all alignments.

absIsizeMin

For paired-end experiments, minimal absolute insert size (TLEN field in SAM Spec v1.4) of alignments to be included when counting. Valid values are greater than 0 or NULL (default), which will not apply any minimum insert size filtering.

absIsizeMax

For paired-end experiments, maximal absolute insert size (TLEN field in SAM Spec v1.4) of alignments to be included when counting. Valid values are greater than 0 or NULL (default), which will not apply any maximum insert size filtering.

maxInsertSize

Maximal fragment size of the paired-end experiment. This parameter is used if shift="halfInsert" and will ensure that query regions are made wide enough to emcompass all alignment pairs whose mid falls into the query region. The default value is 500 bases.

clObj

a cluster object to be used for parallel processing (see ‘Details’).

Value

A list of matrices with length(unique(names(query))) rows with profile names, and max(upstream)+max(downstream)+1 columns indicating relative position.The first list element is called “coverage” and contains, for each profile and relative position, the number of overlapping regions that contributed to the profile.Subsequent list elements contain the alignment counts for individual sequence files (collapseBySample=FALSE) or samples (collapseBySample=TRUE) in proj.For projects with allele-specific quantification, i.e. if a file with single nucleotide polymorphisms was supplied to the snpFile argument of qAlign, there will be three rows instead of one row with counts per unique region name, with numbers of alignments for Reference, Unknown and Alternative genotypes (suffixed _R, _U and _A).

Details

qProfile is used to count alignments in each sample from a qProject object, relative to their position in query regions. Most arguments are identical to the ones of qCount.

The query argument is a GRanges object that defines the regions for the profile. All regions in query will be aligned to one another at their anchor position, which corresponds to their biological start position (start(query) for regions on strand “+” or “*”, end(query) for regions on strand “-”). This anchor position will be extended (with regard to strand) by the number of bases specified by upstream and downstream. In the return value, the anchor position will be at position zero. Regions with identical names in names{query} will be summed, and profiles will be padded with zeros to accomodate the length of all profiles (max(upstream)+max(downstream)+1).

Examples

Run this code

# copy example data to current working directory
file.copy(system.file(package="QuasR", "extdata"), ".", recursive=TRUE)

# create alignments (single-end experiment)
genomeFile <- "extdata/hg19sub.fa"
sampleFile <- "extdata/samples_chip_single.txt"
proj <- qAlign(sampleFile, genomeFile)

# load transcript start site coordinates
library(rtracklayer)
annotationFile <- "extdata/hg19sub_annotation.gtf"
tssRegions <- import.gff(annotationFile, format="gtf",
                         feature.type="start_codon")

# obtain a combined TSS profile
pr1 <- qProfile(proj, tssRegions)
lapply(pr1, dim)
lapply(pr1, "[", , 1:5)

prComb <- do.call("+", lapply(pr1[-1], function(x) x/pr1[[1]]))
barplot(prComb, xlab="Position", ylab="Mean no. of alignments")

# obtain TSS profiles for individual regions
names(tssRegions) <- mcols(tssRegions)$transcript_id
pr2 <- qProfile(proj, tssRegions)
lapply(pr2, dim)
lapply(pr2, "[", 1:3, 1:5)

Run the code above in your browser using DataLab