Structure 2.2 and higher can process autopolyploid microsatellite data,
although 2.3.3 or higher is recommended for its improvements on
polyploid handling. The input format of Structure requires that
each locus take up one column and that each individual take up as
many rows as the parameter PLOIDY. Because of the multiple rows per
sample, each sample name must be duplicated, as well as any
population, location, or phenotype data. Partially heterozygous
genotypes also must have one arbitrary allele duplicated up to the
ploidy of the sample, and samples that have a lower ploidy than that
used in the file (for mixed polyploid data sets) must have a missing
data symbol inserted to fill in the extra rows. Additionally, if
some samples have more alleles than PLOIDY (if you are using a lower
PLOIDY to save processing time, or if there are extra alleles from
scoring errors), some alleles must be randomly removed from the data.
write.Structure
performs this duplication, insertion, and random
deletion of data.
The sample names from samples
will be used as row
names in the Structure file. Each sample name should only be in the
vector samples
once, because write.Structure
will duplicate
the sample names a number of times as dictated by ploidy
.
In writing genotypes to the file, write.Structure
compares the number
of alleles in the genotype, the ploidy of the sample*locus as stored in
Ploidies
, and the ploidy of the file as stored in
ploidy
, and does one of six things (for a given sample x and
locus loc):
1) If Ploidies(object,x,loc)
is greater than or equal to
ploidy
, and
length(Genotype(object, x, loc))
is equal to ploidy
, the
genotype data are used as is.
2) If Ploidies(object,x,loc)
is greater than or equal to
ploidy
, and
length(Genotype(object, x, loc))
is less than ploidy
,
the first allele is
duplicated as many times as necessary for there to be as many alleles
as ploidy
.
3) If Ploidies(object,x,loc)
is greater than or equal to
ploidy
, and length(Genotype(object, x, loc))
is greater
than ploidy
, a random sample of
the alleles, without replacement, is used as the genotype.
4) If Ploidies(object,x,loc)
is less than ploidy
, and
length(Genotype(object, x, loc))
is equal to
Ploidies(object,x,loc)
, the genotype
data are used as is and missing data symbols are inserted in the extra
rows.
5) If Ploidies(object,x,loc)
is less than ploidy
, and
length(Genotype(object, x, loc))
is less than
Ploidies(object,x,loc)
, the first
allele is duplicated as many times as necessary for there to be as
many alleles as Ploidies(object,x,loc)
, and missing data symbols
are inserted in the extra rows.
6) If Ploidies(object,x,loc)
is less than ploidy
, and
length(Genotype(object, x, loc))
is greater than
Ploidies(object,x,loc)
, a random
sample, without replacement, of Ploidies(object)[x]
alleles is
used, and
missing data symbols are inserted in the extra rows. (Alleles are
removed even though there is room for them in the file.)
Two of the header rows that are optional for Structure are written by
write.Structure
. These are ‘Marker Names’, containing the
names of loci supplied in gendata
, and ‘Recessive
Alleles’, which contains the missing data symbol once for each locus.
This indicates to the program that all alleles are codominant with
copy number ambiguity.