The function fasta2DNAbin
reads alignments with the fasta
format (extensions ".fasta", ".fas", or ".fa"), and outputs a
DNAbin
object (the efficient DNA representation from the
ape package). The output contains either the full alignments, or only
SNPs. This implementation is designed for memory-efficiency,
and can read in larger datasets than Ape's read.dna
.
The function reads data by chunks of a few genomes (minimum 1, no
maximum) at a time, which allows one to read massive datasets with
negligible RAM requirements (albeit at a cost of computational
time). The argument chunkSize
indicates the number of genomes
read at a time. Increasing this value decreases the computational time
required to read data in, while increasing memory requirements.