Provides a wrapper to Gblocks, a computer program written in ANSI C language that eliminates poorly aligned positions and divergent regions of an alignment of DNA or protein sequences. Gblocks selects conserved blocks from a multiple alignment according to a set of features of the alignment positions.
gblocks(x, b1 = 0.5, b2 = b1, b3 = ncol(x), b4 = 2, b5 = "a",
target = "alignment", exec)
A matrix of DNA sequences of classes DNAbin
.
A real number, the minimum number of sequences for a conserved position given as a fraction. Values between 0.5 and 1.0 are allowed. Larger values will decrease the number of selected positions, i.e. are more conservative. Defaults to 0.5
A real number, the minimum number of sequences for a flank
position given as a fraction. Values must be equal or larger than
b1
. Larger values will decrease the number of selected
positions, i.e. are more conservative. Defaults to 0.5
An integer, the maximum number of contiguous nonconserved positions; any integer is allowed. Larger values will increase the number of selected position, i.e. are less conservative. Defaults to the number of positions in the alignment.
An integer, the minimum length of a block, any integer equal to or bigger than 2 is allowed. Larger values will decrease the number of selected positions, i.e. are more conservative. Defaults to 2.
A character string indicating the treatment of gap
positions. Three choices are possible. 1. "n"
: No gap
positions are allowed in the final alignment. All positions with a single
gap or more are treated as a gap position for the block selection
procedure, and they and the adjacent nonconserved positions are eliminated.
2. "h"
: Only positions where 50% or more of the sequences
have a gap are treated as a gap position. Thus, positions with a gap in
less than 50% of the sequences can be selected in the final alignment if
they are within an appropriate block. 3. "a"
: All gap
positions can be selected. Positions with gaps are not treated differently
from other positions (default).
A vector of mode "character"
giving the output format:
"alignment"
will return the alignment with only the selected
positions, "index"
will return the indices of the selected position,
and "score"
will provide a score for every position in the original
alignment (0 for excluded, 1 for included).
A character string indicating the path to the GBLOCKS executable.
A matrix
of class "DNAbin"
Explanation of the routine taken from the Online Documentation: First, the degree of conservation of every positions of the multiple alignment is evaluated and classified as nonconserved, conserved, or highly conserved. All stretches of contiguous nonconserved positions bigger than a certain value (b3) are rejected. In such stretches, alignments are normally ambiguous and, even when in some cases a unique alignment could be given, multiple hidden substitutions make them inadequate for phylogenetic analysis. In the remaining blocks, flanks are examined and positions are removed until blocks are surrounded by highly conserved positions at both flanks. This way, selected blocks are anchored by positions that can be aligned with high confidence. Then, all gap positions -that can be defined in three different ways (b5)- are removed. Furthermore, nonconserved positions adjacent to a gap position are also eliminated until a conserved position is reached, because regions adjacent to a gap are the most difficult to align. Finally, small blocks (falling below the limit of b4) remaining after gap cleaning are also removed.
Castresana, J. 2000. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution 17, 540-552.
Talavera, G., and J. Castresana. 2007. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Systematic Biology 56, 564-577.
Gblocks website: http://molevol.cmima.csic.es/castresana/Gblocks.html
mafft
and prank
for multiple sequence
alignment; aliscore
for another alignment masking algorithm.
# NOT RUN {
data(ips.28S)
# }
# NOT RUN {
gblocks(ips.28S)
# }
Run the code above in your browser using DataLab