mafft: Sequence Alignment with MAFFT

Description

This function is a wrapper for MAFFT and can be used for (profile) aligning of DNA and amino acid sequences.

Usage

mafft(x, y, add, method = "auto", maxiterate = 0, op = 1.53,
  ep = 0, gt, options, thread = -1, exec, quiet, file)

Arguments

An object of class DNAbin or AAbin.

An object of class DNAbin or AAbin, if given both x and y are preserved and aligned to each other ("profile alignment").

add

A character string giving the method used for adding y to x: "add", "addprofile" (default), or any unambiguous abbreviation of these.

method

A character string giving the alignment method. Available accuracy-oriented methods for less than 200 sequences are "localpair", "globalpair", and "genafpair"; "retree 1" and "retree 2" are for speed-oriented alignment. The default is "auto", which lets MAFFT choose an appropriate alignment method.

maxiterate

An integer giving the number of cycles of iterative refinement to perform. Possible choices are 0: progressive method, no iterative refinement (default); 2: two cycles of iterative refinement; 1000: at most 1000 cycles of iterative refinement.

A numeric giving the gap opening penalty at group-to-group alignment; default 1.53.

A numeric giving the offset value, which works like gap extension penalty, for group-to-group alignment; default 0.0, but 0.123 is recommended if no long indels are expected.

An object of class phylo that is to be used as a guide tree during alignment.

options

A vector of mode character specifying additional arguments to MAFFT, that are not included in mafft such as, e.g., --adjustdirection.

thread

Integer giving the number of physical cores MAFFT should use; with thread = -1 the number of cores is determined automatically.

exec

A character string giving the path to the MAFFT executable including its name, e.g. something like /user/local/bin/mafft under UNIX-alikes.

quiet

Logical, if set to TRUE, mafft progress is printed out on the screen.

file

A character string indicating the filename of the output FASTA file; if this is missing the the alignment will be returned as matrix of class DNAbin or AAbin.

Value

A matrix of class "DNAbin" or "AAbin".

Details

"localpair" selects the L-INS-i algorithm, probably most accurate; recommended for <200 sequences; iterative refinement method incorporating local pairwise alignment information.

"globalpair" selects the G-INS-i algorithm suitable for sequences of similar lengths; recommended for <200 sequences; iterative refinement method incorporating global pairwise alignment information.

"genafpair" selects the E-INS-i algorithm suitable for sequences containing large unalignable regions; recommended for <200 sequences.

"retree 1" selects the FFT-NS-1 algorithm, the simplest progressive option in MAFFT; recommended for >200 sequences.

"retree 2" selects the FFT-NS-2 algorithm that uses a second iteration of alignment based on a guide tree computed from an FFT-NS-1 alignment; this is the default in MAFFT; recommended for >200 sequences.

References

Katoh, K. and H. Toh. 2008. Recent developments in the MAFFT multiple sequence alignment program. Briefings in Bioinformatics 9: 286-298.

Katoh, K., K.-i. Kuma, H. Toh, and T. Miyata. 2005. Mafft version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Research 33: 511--518.

Katoh, K., K. Misawa, K.-i. Kuma, and T. Miyata. 2002. Mafft: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleid Acids Research 30: 3059--3066.

http://mafft.cbrc.jp/alignment/software/index.html