
AlignedData
new("AlignedData", readID, seqnames, strand, position, mapq,
isize, qwidth, flag, mrnm, notAligned, pairedEnds, weight)
, or by using the
dedicated constructor that builds an object from a file.readAlignedData(folderName, fileName, fileType="BAM",
pairedEnds=FALSE)
readID(object)
, readID(object) <- value
seqnames(object))
, seqnames(object) <- value
strand(object)
, strand(object) <- value
position(object)
, position(object) <- value
mapq(object)
, mapq(object) <- value
isize(object)
, isize(object) <- value
qwidth(object)
, qwidth(object) <- value
flag(object)
, flag(object) <- value
mrnm(object)
, mrnm(object) <- value
notAligned(object)
, notAligned(object) <- value
pairedEnds(object)
, pairedEnds(object) <- value
weight(object)
, weight(object) <- value
show(object)
checkPairsOK(object)
length(object)
dropChromosomePattern(object, pattern, quiet=TRUE)
filterInsertSize(object, rangeMin, rangeMax, includeLower=FALSE,
quiet=TRUE)
filterReadSize(object, rangeMin, rangeMax, includeLower=FALSE,
quiet=TRUE)
getOrphansIndexes(object, quiet=TRUE)
cleanNonSimplePairs(object, quiet=TRUE)
normalizeChrNames(object, chrPrefix, chrSuffix)
sortByPairs(object, quiet=TRUE)
dropUndefinedStrand, quiet=TRUE
This class is a simplistic representation of aligned reads for users aiming at using piling options offered by Pasha package. Secondary alignments, reads aligned with insertions/deletions (non simple CIGAR), and not aligned reads are discarded by the constructor, and reads with a strand value other than -/+ will raise a warning (strand representation keeps levels c('+', '-', '*') for consistency with bioconductor packages). For other applications requiring exhaustive handling of CIGAR strings (insertions/deletions), undefined strand values, or to keep reads that could not be aligned, the use of GAlignments class family is recommended.
Specific functions are provided for a seamless conversion to/from GAlignments class. It is however important to note that eventual reads with non-simple CIGAR string will be dropped (with a warning) during conversions from GAlignments objects.
Internally, this class defines different kinds of slots. First, slots where a value is mandatory for each read. Second, slots with atomic values (one for the whole object : "notAligned", "pairedEnds"), and finally optionnal slots which can be empty (no information) or should provide information for each read ("mapq", "isize", "flag", "mrnm", "weight", "qwidth").
Paired-end specificities
The limitations imposed by the constructor when reading paired-end datasets (no unaligned reads, and no secondary alignments) allow to represent the mate reads association by their shared 'qname' (reported as 'readID'). Altering such IDs can thus break the pairs relation as seen by methods of the class.
Several methods are dedicated for paired-ends datasets (see 'checkPairsOK', 'filterInsertSize', 'getOrphansIndexes', 'cleanNonSimplePairs', 'sortByPairs'). These methods will emit a warning and not process data when called for a single-end object.
readAligned
GAlignments
scanBam
BamFile
# Get the BAM file used in RSamTools as example
BAMfile <- system.file("extdata","ex1.bam",package="Rsamtools")
# Testing the public constructor of AlignedData class (from Pasha)
alignedDataObject <- readAlignedData(folderName="" ,
fileName=BAMfile,
fileType="BAM",
pairedEnds=TRUE)
print(alignedDataObject)
# Rename the chromosomes from seqX to chrX
alignedDataObject <- normalizeChrNames(alignedDataObject,
chrPrefix="seq",
chrSuffix="")
print(alignedDataObject)
#### Splitting object (indirect use of '[')
# Split the object and the associated data by chromosomes
splitListalignedDataObject <- split(alignedDataObject,
seqnames(alignedDataObject))
# check that the object has correctly been split
print(names(splitListalignedDataObject))
print(sapply(lapply(splitListalignedDataObject, seqnames), unique))
#### Orphans and pairs consistency
print(length(alignedDataObject))
# Identify the reads with no mate and remove them
orphansReadsIndex <- getOrphansIndexes(alignedDataObject)
print(paste("Nb of orphan reads :", sum(orphansReadsIndex)))
alignedDataObject <- alignedDataObject[!orphansReadsIndex]
print(length(alignedDataObject))
# Check that pairs are correctly represented in the object
# ie. mate pairs are consecutive
print(checkPairsOK(alignedDataObject))
# Sort by pairs
alignedDataObject <- sortByPairs(alignedDataObject)
# Check pairs consistency again
print(checkPairsOK(alignedDataObject))
#### Filtering
## Chromosome patterns
# (drop reads on chromosomes containing the character '1')
print(alignedDataObject)
resultFiltering <- dropChromosomePattern(alignedDataObject,
pattern="1")
print(resultFiltering)
## Range of insert size
print(table(isize(alignedDataObject)))
resultFiltering <- filterInsertSize(alignedDataObject,
rangeMin=190,
rangeMax=195)
print(table(isize(resultFiltering)))
print(resultFiltering)
## Reads sequence size
print(table(qwidth(alignedDataObject)))
resultFiltering <- filterReadSize(alignedDataObject,
rangeMin=35,
rangeMax=50)
print(table(qwidth(resultFiltering)))
print(resultFiltering)
## Reads with atypic characteristics for chromatin alignments
# Mate unmapped flag set, or mates aligned on same strand
resultFiltering <- cleanNonSimplePairs(alignedDataObject, quiet=FALSE)
# Reads with a strand value different from expected '+' or '-'
resultFiltering <- dropUndefinedStrand(resultFiltering, quiet=FALSE)
Run the code above in your browser using DataLab