Codominant marker data (which here means: data with several diploid
loci; two alleles per locus) can be represented in various ways. This
function converts the formats "genepop"
and "structure"
into
"structurama"
and "prabclus"
. "genepop"
is a version of the format
used by the package GENEPOP (Rousset, 2008), "structure"
is a version
of what is used by STRUCTURE (Pritchard et al., 2000), another one is
"structureb"
. "structurama"
is a version of what is used by STRUCTURAMA (Huelsenbeck and
Andolfatto, 2007) and "prabclus"
is required by the function
alleleinit
in the present package.
alleleconvert(file=NULL,strmatrix=NULL, format.in="genepop",
format.out="prabclus",
alength=3,orig.nachar="000",new.nachar="-",
rows.are.individuals=TRUE, firstcolname=FALSE,
aletters=intToUtf8(c(65:90,97:122),multiple=TRUE),
outfile=NULL,skip=0)
A matrix of strings in the format specified as format.out
with
an attribute "alevels"
, a vector of all used allele codes if
format.out=="prabclus"
, otherwise vector of allele codes of
last locus.
string. Filename of input file, see details. One of
file
and strmatrix
needs to be specified.
matrix or data frame of strings, see details. One of
file
and strmatrix
needs to be specified.
string. One of "genepop"
,
"structure"
, or "structureb"
, see details.
string. One of "structurama"
or
"prabclus"
, see details.
integer. If format.in="genepop"
, length of code
for a single allele.
string. Code for missing values in input data.
string. Code for missing values in output data.
logical. If TRUE
, rows are
interpreted as individuals and columns (variables if
strmatrix
is a data frame) as loci.
logical. If TRUE
, it is assumed that the
first column contains row names.
character vector. String of default characters for
alleles if format.out=="prabclus"
(the default is fine unless
there is a locus that can have more than 62 different alleles in the
dataset).
string. If specified, the output matrix (omitting
quotes) is written to a file of this name (including row names if
fistcolname==TRUE
).
number of rows to be skipped when reading data from a
file (skip
-argument of read.table
).
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en
The formats are as follows (described is the format within R, i.e.,
for the input, the format of strmatrix
; if file
is
specified, the file is read with
read.table(file,colClasses="character")
and should give the
format explained below - note that colClasses="character"
implies that quotes are not needed in the input file):
Alleles are coded by strings of length alength
and there is no space between the two alleles in a locus, so a
value of "258260"
means that in the corresponding locus the two
alleles have codes 258 and 260.
Alleles are coded by strings of arbitrary length. Two rows correspond to each inidividual, the first row containing the first alleles in all loci and the second row containing the second ones.
Alleles are coded by strings of arbitrary length. One row corresponds to each inidividual, containing first and second alleles in all loci (first and second allele of first locus, first and second allele of second locus etc.). This starts in the third row (first two have locus names and other information).
Alleles are coded by strings of arbitrary
length. the two alleles in each locus are written with brackets
around them and a comma in between, so "258260"
in
"genepop"
corresponds to "(258,260)"
in "structurama"
.
Alleles are coded by a single character and there is
no space between the two alleles in a locus (e.g.,
"AC"
).
Huelsenbeck, J. P., and P. Andolfatto (2007) Inference of population structure under a Dirichlet process model. Genetics 175, 1787-1802.
Pritchard, J. K., M. Stephens, and P. Donnelly (2000) Inference of population structure using multi-locus genotype data. Genetics 155, 945-959.
Rousset, F. (2008) genepop'007: a complete re-implementation of the genepop software for Windows and Linux. Molecular Ecology Resources 8, 103-106.
alleleinit
data(tetragonula)
# This uses example data file Heterotrigona_indoFO.dat
str(alleleconvert(strmatrix=tetragonula))
strucmatrix <-
cbind(c("I1","I1","I2","I2","I3","I3"),
c("122","144","122","122","144","144"),c("0","0","21","33","35","44"))
alleleconvert(strmatrix=strucmatrix,format.in="structure",
format.out="prabclus",orig.nachar="0",firstcolname=TRUE)
alleleconvert(strmatrix=strucmatrix,format.in="structure",
format.out="structurama",orig.nachar="0",new.nachar="-9",firstcolname=TRUE)
Run the code above in your browser using DataLab