read_nexus_matrix: Reads in a morphological #NEXUS data file

Description

Reads in a morphological data file in #NEXUS format.

Usage

read_nexus_matrix(file_name, equalize_weights = FALSE)

Value

topper: Contains any header text or costmatrices and pertains to the entire file.
matrix_N: One or more matrix blocks (numbered 1 to N) with associated information pertaining only to that matrix block. This includes the block name (if specificed, NA if not), the block datatype (one of "CONTINUOUS", "DNA", "NUCLEOTIDE", "PROTEIN", "RESTRICTION", "RNA", or "STANDARD"), the actual matrix (taxa as rows, names stored as rownames and characters as columns), the ordering type of each character ("ordered", "unordered"), the character weights, the minimum and maximum values (used by Claddis' distance functions), and the original characters (symbols, missing, and gap values) used for writing out the data.

Arguments

file_name: The file name or path of the #NEXUS file.
equalize_weights: Optional that overrides the weights specified in the file to make all characters truly equally weighted.

Author

Graeme T. Lloyd graemetlloyd@gmail.com

Details

Reads in a #NEXUS (Maddison et al. 1997) data file representing the distribution of characters (continuous, discrete, DNA etc.) in a set of taxa. Unlike read.nexus.data this function can handle polymorphisms (e.g., (012)).

Note that the function is generally intolerant to excursions from a standard format and it is recommended your data be formatted like the morphmatrix.nex example below. However, the function also produces informative error messages if (expected) excursions are discovered.

Previously all empty values (missing or inapplicable) were treated as NAs. But now anything coded as a "gap" now appears as an empty text string ("") in the matrix. Additionally, previously polymorphisms and uncertianties were both considered as polymorphisms with multiple states separated by an ampersand ("&"), but now polymorphisms use the ampersand ("&") and uncertainties use a slash ("/"), allowing for different treatment later and correct outputting when writing to #NEXUS format. (NB: TNT does not allow this distinction and so both polymorphisms and uncertainties will be output as polymorphisms.)

References

Maddison, D. R., Swofford, D. L. and Maddison, W. P., 1997. NEXUS: an extensible file format for systematic information. Systematic Biology, 46, 590-621.

Examples

Run this code


# Create example matrix
example_matrix <- paste("#NEXUS", "", "BEGIN DATA;",
                        "\tDIMENSIONS  NTAX=5 NCHAR=5;",
                        "\tFORMAT SYMBOLS= \" 0 1 2\" MISSING=? GAP=- ;",
                        "MATRIX", "", "Taxon_1  010?0", "Taxon_2  021?0",
                        "Taxon_3  02111", "Taxon_4  011-1",
                        "Taxon_5  001-1", ";", "END;", "",
                        "BEGIN ASSUMPTIONS;",
                        "\tOPTIONS  DEFTYPE=unord PolyTcount=MINSTEPS ;",
                        "\tTYPESET * UNTITLED  = unord: 1 3-5, ord: 2;",
                        "\tWTSET * UNTITLED  = 1: 2, 2: 1 3-5;",
                        "END;", sep = "\n")

# Write example matrix to current working directory called
# "morphmatrix.nex":
cat(example_matrix, file = "morphmatrix.nex")

# Read in example matrix:
morph.matrix <- read_nexus_matrix("morphmatrix.nex")

# View example matrix in R:
morph.matrix

# Remove the generated data set:
file.remove("morphmatrix.nex")

Run the code above in your browser using DataLab