structure2conStruct: Convert a dataset from STRUCTURE to conStruct format

Description

structure2conStruct converts a STRUCTURE dataset to conStruct format

Usage

structure2conStruct(
  infile,
  onerowperind,
  start.loci,
  start.samples = 1,
  missing.datum,
  outfile
)

Value

This function returns an allele frequency data matrix that can be used as the freqs argument in a conStruct analysis run using conStruct. It also saves this object as an .RData file so that it can be used in future analyses.

Arguments

infile: The name and path of the file in STRUCTURE format to be converted to conStruct format.
onerowperind: Indicates whether the file format has one row per individual (TRUE) or two rows per individual (FALSE).
start.loci: The index of the first column in the dataset that contains genotype data.
start.samples: The index of the first row in the dataset that contains genotype data (e.g., after any headers). Default value is 1.
missing.datum: The character or value used to denote missing data in the STRUCTURE dataset (often 0 or -9).
outfile: The name and path of the file containing the conStruct formatted dataset to be generated by this function.

Details

This function takes a population genetics dataset in STRUCTURE format and converts it to conStruct format. The STRUCTURE file can have one row per individual and two columns per locus, or one column and two rows per individual. It can only contain bi-allelic SNPs. Missing data is acceptable, but must be indicated with a single value throughout the dataset.

This function takes a STRUCTURE format data file and converts it to a conStruct format data file. This function can only be applied to diploid organisms. The STRUCTURE data file must be a plain text file. If there is extraneous text or column headers before the data starts, those extra lines should be deleted by hand or taken into account via the start.samples argument.

The STRUCTURE dataset can either be in the ONEROWPERIND=1 file format, with one row per individual and two columns per locus, or the ONEROWPERIND=0 format, with two rows and one column per individual. The first column of the STRUCTURE dataset should be individual names. There may be any number of other columns that contain non-genotype information before the first column that contains genotype data, but there can be no extraneous columns at the end of the dataset, after the genotype data.

The genotype data must be bi-allelic single nucleotide polymorphisms (SNPs). Applying this function to datasets with more than two alleles per locus may result in cryptic failure. For more details, see the format-data vignette.