AlignSeqs(myXStringSet, guideTree = NULL, iterations = 1, refinements = 1, gapOpening=c(-16, -12), gapExtension=c(-2, -1), structures = NULL, FUN = AdjustAlignment, levels = c(0.95, 0.7, 10, 5), processors = 1, verbose = TRUE, ...)
AAStringSet
, DNAStringSet
, or RNAStringSet
object of unaligned sequences.
NULL
or a data.frame
giving the ordered tree structure in which to align profiles. If NULL
then a guide tree will be automatically constructed based on the order of shared k-mers.
structureMatrix
, such as that output by PredictHEC
, or NULL
to generate the structures automatically. Only applicable if myXStringSet
is an AAStringSet
.
FUN
. (See details section below.)
NULL
to automatically detect and use all available processors.
AlignProfiles
, including perfectMatch
, misMatch
, gapPower
, terminalGap
, restrict
, anchor
, normPower
, substitutionMatrix
, and structureMatrix
.
XStringSet
of aligned sequences.
guideTree=NULL
, an initial single-linkage guide tree is constructed based on a distance matrix of shared k-mers. If an initial guideTree
is provided then the guideTree
should be provided in the output given by IdClusters
with ascending levels of cutoff
. (2) If iterations
is greater than zero, then a UPGMA guide tree is built based on the initial alignment and the sequences are re-aligned along this tree. This process repeated iterations
times or until convergence. (3) If refinements
is greater than zero, then groups of sequences are iteratively realigned to the full-alignment. This process generates two alignments, the best of which is chosen based on its sum-of-pairs score. This refinement process is repeated refinements
times, or until no improvement can be made.The FUN
function is applied during each of the three steps based on levels
. The purpose of levels
is to speed-up the alignment process by not running FUN
on the alignment when it is unnecessary. The default levels
specify that FUN
should be run on the sequences when the initial tree is above 0.95 average dissimilarity, when the iterative tree is above 0.7 average dissimilarity, and after every tenth improvement made during refinement. The final element of levels prevents FUN
from being applied at any point to less than 5 sequences. The FUN
function is always applied just before returning the alignment, independently of the first three values of levels
. The default FUN
is AdjustAlignment
, but FUN
accepts any function that takes in an XStringSet
as its first argument, and weights
, processors
, and substitutionMatrix
as optional arguments. For example, the default FUN
could be altered to not perform any function by setting it equal to FUN=function(x, ...) return(x)
where x
is an XStringSet
.
AdjustAlignment
, AlignDB
, AlignProfiles
, AlignSynteny
, AlignTranslation
, IdClusters
, StaggerAlignment
db <- system.file("extdata", "Bacteria_175seqs.sqlite", package="DECIPHER")
dna <- SearchDB(db, remove="all")
alignedDNA <- AlignSeqs(dna)
BrowseSeqs(alignedDNA, highlight=1)
Run the code above in your browser using DataLab