msa
function provides a unified interface to
the three multiple sequence alignment algorithms in this package:
ClustalW, ClustalOmega, and MUSCLE.
msa(inputSeqs, method=c("ClustalW", "ClustalOmega", "Muscle"), cluster="default", gapOpening="default", gapExtension="default", maxiters="default", substitutionMatrix="default", type="default", order=c("aligned", "input"), verbose=FALSE, help=FALSE, ...)
XStringSet
(includes the
classes AAStringSet
, DNAStringSet
,
and RNAStringSet
), or a single character string with a
file name. In the latter case, the file name is required to have the
suffix .fa or .fasta, and the file must be in
FASTA format."ClustalW"
, "ClustalOmega"
, and
"Muscle"
are supported.msaClustalW
, msaClustalOmega
, or
msaMuscle
for algorithm-specific information.msaClustalW
,
and msaMuscle
). Note that the sign of
this parameter is ignored. The sign is automatically
adjusted such that the called algorithm penalizes gaps
instead of rewarding them.msaClustalW
,
and msaMuscle
). Note that the sign of
this parameter is ignored. The sign is automatically
adjusted such that the called algorithm penalizes gaps
instead of rewarding them.msaClustalW
, msaClustalOmega
, or
msaMuscle
for algorithm-specific information.msaClustalW
, msaClustalOmega
, or
msaMuscle
for algorithm-specific information.inputSeqs
; possible
values are "dna"
, "rna"
, or "protein"
.
In the original ClustalW implementation, this parameter is also called
-type
; "auto"
is also possible in the original
ClustalW, but, in this package, "auto"
is deactivated.
The type
argument is mandatory if inputSeqs
is
a character vector or the file name of a FASTA file (see above).
If inputSeqs
is an object of class
AAStringSet
, DNAStringSet
,
or RNAStringSet
, the type of sequences is
determined by the class of inputSeqs
and the type
parameter is not necessary. If it is nevertheless specified and the
type does not match the class of inputSeqs
, the function
stops with an error."aligned"
is chosen, the sequences are ordered in the way
the multiple sequence alignment algorithm orders them. If
"input"
is chosen, the sequences in the output object are
ordered in the same way as the input sequences. For MUSCLE, the
choice "input"
is not available for sequence data that is
read directly from a FASTA file. Even if sequences are supplied
directly via R, the sequences must have unique names, otherwise
the input order cannot be recovered. If the sequences do not have
names or if the names are not unique, the msaMuscle
function assignes generic unique names "Seq1"
-Seqn
to the sequences and issues a warning.TRUE
, the algorithm displays detailed
information and progress messages.TRUE
, information about algorithm-specific
parameters is displayed. In this case, no multiple sequence
alignment is performed and the function quits after displaying
the additional help information.msaClustalW
, msaClustalOmega
, or
msaMuscle
. An overview of parameters that are
available for the chosen method
is shown when calling msa
with help=TRUE
.
For more details, see also the documentation of chosen
multiple sequence alignment algorithm.msa
returns a MsaAAMultipleAlignment
,
MsaDNAMultipleAlignment
, or
MsaRNAMultipleAlignment
object.
If called with help=TRUE
, msa
returns
an invisible NULL
.
msa
is a simple wrapper function that unifies the interfaces of
the three functions msaClustalW
,
msaClustalOmega
, and msaMuscle
. Which
function is called, is controlled by the method
argument. Note that the input sequences may be reordered by the multiple
sequence alignment algorithms in order to group together similar
sequences (see also description of argument order
above).
So, if the input order should be preserved or if the input order
should be recovered later, we strongly recommend to always assign
unique names to the input sequences. As noted in the description
of the inputSeqs
argument above, all functions, msa()
,
msaClustalW
, msaClustalOmega
, and
msaMuscle
, also allow
for direct reading from FASTA files. This is mainly for the reason of
memory efficiency if the sequence data set is very large. Otherwise,
we want to encourage users to first read the sequences into the R
workspace. If sequences are read from a FASTA file
directly, the order of output sequences is completely under
the control of the respective
algorithm and does not allow for checking whether the sequences are
named uniquely in the FASTA file. The preservation of the input order
works also for sequence data read from a FASTA file, but only for
ClustalW and ClustalOmega; MUSCLE does not support this (see also
argument order
above and msaMuscle
).
http://www.clustal.org/omega/README http://www.drive5.com/muscle/muscle.html Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22):4673-4680. DOI: 10.1093/nar/22.22.4673.
Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Soeding, J., Thompson, J. D., and Higgins, D. G. (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7:539. DOI: 10.1038/msb.2011.75.
Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5):1792-1797. DOI: 10.1093/nar/gkh340.
Edgar, R. C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113. DOI: 10.1186/1471-2105-5-113.
msaClustalW
,
msaClustalOmega
, msaMuscle
,
msaPrettyPrint
, MsaAAMultipleAlignment
,
MsaDNAMultipleAlignment
,
MsaRNAMultipleAlignment
,
MsaMetaData
## read sequences
filepath <- system.file("examples", "exampleAA.fasta", package="msa")
mySeqs <- readAAStringSet(filepath)
## call unified interface msa() for default method (ClustalW) and
## default parameters
msa(mySeqs)
## call ClustalOmega through unified interface
msa(mySeqs, method="ClustalOmega")
## call MUSCLE through unified interface with some custom parameters
msa(mySeqs, method="Muscle", gapOpening=12, gapExtension=3, maxiters=16,
cluster="upgmamax", SUEFF=0.4, brenner=FALSE,
order="input", verbose=FALSE)
Run the code above in your browser using DataLab