qCount(proj,  query, reportLevel=c(NULL, "gene", "exon","promoter","junction"), selectReadPosition=c("start", "end"), shift=0L, orientation=c("any", "same", "opposite"), useRead=c("any", "first", "last"), auxiliaryName=NULL, mask=NULL, collapseBySample=TRUE, includeSpliced=TRUE, includeSecondary=TRUE, mapqMin=0L, mapqMax=255L, absIsizeMin=NULL, absIsizeMax=NULL, maxInsertSize=500L, clObj=NULL)qProject object representing a
    sequencing experiment as returned by qAlignGRanges, 
    GRangesList or 
    TxDb with the regions to be
    quantified. The type of query will determine the mode of
    quantification (see Details). For
    reportLevel="junction", query is ignored and can also
    be NULL.query of type
    TxDb or NULL), one of
    gene (default): one value per gene
      exon: one value per exon
      promoter: one value per promoter
      junction: one count per detected exon-exon junction
      (query will be ignored in this case)
    
start (default): start of the alignment
      end: end of the alignment
    
shift can be one of:
    "halfInsert" (only available for
      paired-end experiments)
        The default of 0 will not shift any alignments.
any (default): count alignment on the same and opposite strand
      same : count only alignment on the same strand
      opposite : count only alignment on the opposite strand
    
any (default): count all alignments
      first : count only alignments from the first read
      last : count only alignments from the last read
    
NULL, a GRanges
    object with reference regions to be masked, i.e. excluded from the
    quantification, such as unmappable or highly repetitive regions (see
    Details).TRUE (the default), sum alignment
    counts from bam files with the same sample name.TRUE (the default), include spliced
    alignments when counting. A spliced alignment is defined as an
    alignment with a gap in the read of at least 60 bases.TRUE (the default), include alignments
    with the secondary bit (0x0100) set in the FLAG when counting.mapqMin). Valid values are between 0 and 255. The default (0)
    will include all alignments.mapqMax).
    Valid values are between 0 and 255. The default (255) will include
    all alignments.NULL (default),
    which will not apply any minimum insert size filtering.NULL (default),
    which will not apply any maximum insert size filtering.shift="halfInsert" and will
    ensure that query regions are made wide enough to emcompass all
    alignment pairs whose mid falls into the query region. The default
    value is 500 bases.matrix with effective query regions width in the first
  column, and alignment counts in subsequent columns, or a
  GRanges object if reportLevel="junction".The effective query region width returned as first column in the
  matrix is calculated by the number of unique, non-masked bases in the
  reference sequence that contributed to the count of this query
  name (irrespective if the bases were covered by alignments or not).
  An effective width of zero indicates that the region was fully
  masked and will have zero counts in all samples.The alignment counts in the matrix are contained from column two
  onwards. For projects with allele-specific quantification, i.e. if a
  file with single nucleotide polymorphisms was supplied to the
  snpFile argument of qAlign, there will be
  three columns per bam file (number of alignments for Reference,
  Unknown and Alternative genotypes, with suffixed _R, _U and
  _A). Otherwise there is a single columns per bam file.If collapseBySample=TRUE, groups of bam files with identical 
  sample name are combined by summing their alignment counts.For reportLevel="junction", the return value is a
  GRanges object. The start and end coordinates correspond to the
  first and last base in each detected intron. Plus- and minus-strand
  alignments are quantified separately, so that in an unstranded RNA-seq
  experiment, the same intron may be represented twice; once for each
  strand. The counts for each sample are contained in the mcols
  of the GRanges object.
qCount is used to count alignments in each sample from a
  qProject object. The features to be quantified, together with
  the mode of quantification, are specified by the query
  argument, which is one of:
  GRanges: Overlapping alignments
    are counted separately for each coordinate region. If multiple
    regions have identical names, their counts will be summed, counting
    each alignment only once even if it overlaps more than one of these
    regions. Alignments may be counted more than once if they overlap
    multiple regions that have different names.
    This mode is for example used to quantify ChIP-seq alignments in
    promoter regions, or gene expression levels in an RNA-seq experiment
    (using a query with exon regions named by gene).
    GRangesList: Alignments are
    counted and summed for each list element in query if they
    overlap with any of the regions contained in the list element. The
    order of the list elements defines a hierarchy for quantification:
    Alignment will only be counted for the first element (the one with
    the lowest index in query) that they overlap, but not for any
    potential further list elements containing overlapping regions.
    This mode can be used to hierarchically and uniquely count (assign)
    each alignment to a one of several groups of regions (the elements
    in query), for example to estimate the fractions of different
    classes of RNA in an RNA-seq experiment (rRNA, tRNA, snRNA, snoRNA,
    mRNA, etc.)
    TxDb: Used to extract
    regions from annotation and report alignment counts depending on the
    value of reportLevel. If reportLevel="exon",
    alignments overlapping each exon in query are counted.
    If reportLevel="gene", alignment counts for all exons of a
    gene will be summed, counting each alignment only once even if it
    overlaps multiple annotated exons of a gene. These are useful to
    calculate exon or gene expression levels in RNA-seq experiments
    based on the annotation in a TxDb object. If
    reportLevel="promoter", the promoters function from package
    GenomicFeatures is used with default arguments to extract
    promoter regions around transcript start sites, e.g. to quantify
    alignments inf a ChIP-seq experiment.
    NULL for
    reportLevel="junction": The query argument is ignored
    if reportLevel is set to "junction", and qCount
    will count the number of alignments supporting each exon-exon
    junction detected in any of the samples in proj. The
    arguments selectReadPosition, shift,
    orientation, useRead and mask will have no
    effect in this quantification mode.
  The additional arguments allow to fine-tune the quantification:
  selectReadPosition defines the part of the alignment that has
  to be contained within a query region for an overlap. The values
  start (default) and end refer to the biological start
  (5'-end) and end (3'-end) of the alignment. For example, the
  start of an alignment on the plus strand is its leftmost
  (lowest) base, and the end of an alignment on the minus strand
  is also the leftmost base.
  shift allows on-the-fly shifting of alignments towards their
  3'-end prior to overlap determination and counting. This can be
  helpful to increase resolution of ChIP-seq experiments by moving
  alignments by half the immuno-precipitated fragment size towards the
  middle of fragments. shift is either an integer vector
  with one value per alignment file in proj, or a single
  integer value, in which case all alignment files will be
  shifted by the same value. For paired-end experiments, it can be
  alternatively set to "halfInsert", which will estimate the true
  fragment size from the distance between aligned read pairs and shift
  the alignments accordingly.
  
  orientation controls the interpretation of alignment strand
  when counting, relative to the strand of the query region. any
  will count all overlapping alignments, irrespective of the alignment
  strand (e.g. used in an unstranded RNA-seq experiment). same
  will only count the alignments on the same strand as the query region
  (e.g. in a stranded RNA-seq experiment), and opposite will only
  count the alignments on the opposite strand from the query region
  (e.g. to quantify anti-sense transcription in a stranded RNA-seq
  experiment).
  
  includeSpliced and includeSecondary can be used to
  include or exclude spliced or secondary alignments,
  respectively. mapqMin and mapqMax allow to select alignments
  based on their mapping qualities. mapqMin and mapqMax can
  take integer values between 0 and 255 and equal to
  $-10
  log10 Pr(mapping position is wrong)$, rounded to the nearest
  integer. A value 255 indicates that the mapping quality is not available.
  In paired-end experiments, useRead allows to quantify either
  all alignments (useRead="any"), or only the first
  (useRead="first") or last (useRead="last") read from a
  read pair or read group. Note that for useRead="any" (the
  default), an alignment pair that is fully contained within a query
  region will contribute two counts to the value of that
  region. absIsizeMin and absIsizeMax can be used to
  select alignments based on their insert size (TLEN field in SAM Spec
  v1.4).
  auxiliaryName selects the reference sequence for which
  alignments should be quantified. NULL (the default) will
  select alignments against the genome. If set to a character string
  that matches one of the auxiliary target names (as specified in
  the auxiliaryFile argument of qAlign), the
  corresponding alignments will be counted.
  mask can be used to specify a
  GRanges object with regions in the
  reference sequence to be excluded from quantification. The regions
  will be considered unstranded (strand="*"). Alignments that
  overlap with a region in mask will not be counted. Masking may
  reduce the effective width of query regions reported by qCount,
  even down to zero for regions that are fully contained in mask.
  
  
  
  If clObj is set to an object that inherits from class
  cluster, for example an object returned by
  makeCluster from package parallel, the
  quantification task is split into multiple chunks and processed in
  parallel using clusterMap. Currently, not all
  tasks will be efficiently parallelized: For example, a single query
  region and a single (group of) bam files will not be split into
  multiple chunks.
qAlign,
  qProject,
  makeCluster from package parallel
library(GenomicRanges)
library(Biostrings)
library(Rsamtools)
# copy example data to current working directory
file.copy(system.file(package="QuasR", "extdata"), ".", recursive=TRUE)
# load genome sequence
genomeFile <- "extdata/hg19sub.fa"
gseq <- readDNAStringSet(genomeFile)
chrRegions <- GRanges(names(gseq), IRanges(start=1,width=width(gseq),names=names(gseq)))
# create alignments (paired-end experiment)
sampleFile <- "extdata/samples_rna_paired.txt"
proj <- qAlign(sampleFile, genomeFile, splicedAlignment=TRUE)
# count reads using a "GRanges" query
qCount(proj, query=chrRegions)
qCount(proj, query=chrRegions, useRead="first")
# hierarchical counting using a "GRangesList" query
library(rtracklayer)
annotationFile <- "extdata/hg19sub_annotation.gtf"
gtfRegions <- import.gff(annotationFile, format="gtf", feature.type="exon")
names(gtfRegions) <- mcols(gtfRegions)$source
gtfRegionList <- split(gtfRegions, names(gtfRegions))
names(gtfRegionList)
res3 <- qCount(proj, gtfRegionList)
res3
# gene expression levels using a "TxDb" query
library("GenomicFeatures")
genomeRegion <- scanFaIndex(genomeFile)
chrominfo <- data.frame(chrom=as.character(seqnames(genomeRegion)),
                        length=end(genomeRegion),
                        is_circular=rep(FALSE, length(genomeRegion)))
txdb <- makeTxDbFromGFF(annotationFile, 
                        format="gtf", 
                        chrominfo=chrominfo,
                        dataSource="Ensembl modified",
                        organism="Homo sapiens")
res4 <- qCount(proj, txdb, reportLevel="gene")
res4
# exon-exon junctions
res5 <- qCount(proj, NULL, reportLevel="junction")
res5
Run the code above in your browser using DataLab