This function reads a file with sequences in the NEXUS format.
read.nexus.data(file)a file name specified by either a variable of mode character, or a double-quoted string.
A list of sequences each made of a single vector of mode character where each element is a (phylogenetic) character state.
This parser tries to read data from a file written in a restricted NEXUS format (see examples below).
Please see files data.nex and taxacharacters.nex for
examples of formats that will work.
Some noticeable exceptions from the NEXUS standard (non-exhaustive list):
IComments must be either on separate lines or at the
end of lines. Examples:
[Comment] --- OK
Taxon ACGTACG [Comment] --- OK
[Comment line 1
Comment line 2] --- NOT OK!
Tax[Comment]on ACG[Comment]T --- NOT OK!
IINo spaces (or comments) are allowed in the
sequences. Examples:
name ACGT --- OK
name AC GT --- NOT OK!
IIINo spaces are allowed in taxon names, not even if
names are in single quotes. That is, single-quoted names are not
treated as such by the parser. Examples:
Genus_species --- OK
'Genus_species' --- OK
'Genus species' --- NOT OK!
IVThe trailing end that closes the
matrix must be on a separate line. Examples:
taxon AACCGGT
end; --- OK
taxon AACCGGT;
end; --- OK
taxon AACCCGT; end; --- NOT OK!
VMultistate characters are not allowed. That is,
NEXUS allows you to specify multiple character states at a
character position either as an uncertainty, (XY), or as an
actual appearance of multiple states, {XY}. This is
information is not handled by the parser. Examples:
taxon 0011?110 --- OK
taxon 0011{01}110 --- NOT OK!
taxon 0011(01)110 --- NOT OK!
VIThe number of taxa must be on the same line as
ntax. The same applies to nchar. Examples:
ntax = 12 --- OK
ntax =
12 --- NOT OK!
VIIThe word “matrix” can not occur anywhere in
the file before the actual matrix command, unless it is in
a comment. Examples:
BEGIN CHARACTERS;
TITLE 'Data in file "03a-cytochromeB.nex"';
DIMENSIONS NCHAR=382;
FORMAT DATATYPE=Protein GAP=- MISSING=?;
["This is The Matrix"] --- OK
MATRIX
BEGIN CHARACTERS;
TITLE 'Matrix in file "03a-cytochromeB.nex"'; --- NOT OK!
DIMENSIONS NCHAR=382;
FORMAT DATATYPE=Protein GAP=- MISSING=?;
MATRIX
Maddison, D. R., Swofford, D. L. and Maddison, W. P. (1997) NEXUS: an extensible file format for systematic information. Systematic Biology, 46, 590--621.
# NOT RUN {
## Use read.nexus.data to read a file in NEXUS format into object x
# }
# NOT RUN {
x <- read.nexus.data("file.nex")
# }
Run the code above in your browser using DataLab