Learn R Programming

DECIPHER (version 2.0.2)

OrientNucleotides: Orient nucleotide sequences

Description

Orients nucleotide sequences to match the directionality and complementarity of specified reference sequences.

Usage

OrientNucleotides(myXStringSet, reference = which.max(width(myXStringSet)), type = "sequences", orientation = "all", threshold = 0.05, verbose = TRUE, processors = 1)

Arguments

myXStringSet
A DNAStringSet or RNAStringSet of unaligned sequences.
reference
The index of reference sequences with the same (desired) orientation. By default the first sequence with maximum width will be used.
type
Character string indicating the type of results desired. This should be (an abbreviation of) either "sequences", "orientations", or "both".
orientation
Character string(s) indicating the allowed reorientation(s) of non-reference sequences. This should be (an abbreviation of) either "all", "reverse", "complement", and/or "both" (for reverse complement).
threshold
Numeric giving the decrease in k-mer distance required to adopt the alternative orientation.
verbose
Logical indicating whether to display progress.
processors
The number of processors to use, or NULL to automatically detect and use all available processors.

Value

OrientNucleotides can return two types of results: the relative orientations of sequences and/or the reoriented sequences. If type is "sequences" (the default) then the reoriented sequences are returned. If type is "orientations" then a character vector is returned that specifies whether sequences were reversed ("r"), complemented ("c"), reversed complemented ("rc"), or in the same orientation ("") as the reference sequences (marked by NA). If type is "both" then the output is a list with the first component containing the "orientations" and the second component containing the "sequences".

Details

Biological sequences can sometimes have inconsistent orientation that interferes with their analysis. OrientNucleotides will reorient sequences by changing their directionality and/or complementarity to match specified reference sequences in the same set. The process works by finding the k-mer distance between the reference sequence(s) and each allowed orientation of the sequences. Alternative orientations that lessen the distance by at least threshold are adopted. Note that this procedure requires a moderately similar reference sequence be available for each sequence that needs to be reoriented. Sequences for which a corresponding reference is unavailable will most likely be left alone because alternative orientations will not pass the threshold. For this reason, it is recommended to specify several markedly different sequences as references.

See Also

CorrectFrameshifts

Examples

Run this code
db <- system.file("extdata", "Bacteria_175seqs.sqlite", package="DECIPHER")
dna <- SearchDB(db, remove="all")
DNA <- dna # 175 sequences

# reorient subsamples of the first 169 sequences
s <- sample(169, 30)
DNA[s] <- reverseComplement(dna[s])
s <- sample(169, 30)
DNA[s] <- reverse(dna[s])
s <- sample(169, 30)
DNA[s] <- complement(dna[s])

DNA <- OrientNucleotides(DNA, reference=170:175)
DNA==dna # all were correctly reoriented

Run the code above in your browser using DataLab