Learn R Programming

DetSel (version 1.0.4)

read.data: Read Data

Description

Read the data file in DetSel format.

Usage

read.data(infile,dominance,maf,a,b)

Arguments

infile

An input file in DetSel format.

dominance

A a logical variable, which is FALSE if co-dominant data are considered (e.g., microsatellite markers, SNPs, etc.), or TRUE, if bi-allelic dominant data are considered (e.g., AFLPs).

maf

The maximum allele frequency (the frequency of the most frequent allele over the full sample) to be considered in both the input file and the simulated data.

a,b

The parameters for the beta prior distribution, used in Zhivotovsky's (1999) Bayesian method to compute the underlying allele frequencies. The default values are a = b = 0.25, as suggested by Mark A. Beaumont in the DFdist manual, yet the user may alternatively chose to use Zhivotovsky's equation (13) to compute estimates of a and b from the data. Note that neither the parameter a nor the parameter b are not needed if dominance = FALSE.

Value

The output files are saved in the current directory.

Details

The input file should be a space- or tab-delimited ASCII text file. The first line is a 0 / 1 indicator. ‘0’ indicates that the data matrix for each locus is a populations x alleles matrix; ‘1’ indicates that the data matrix for each locus is an alleles x populations matrix. The second line contains the number of populations. The third line contains the number of loci. Then, the data for each locus consists in the number of alleles at that locus, followed by the data matrix at that locus, with each row corresponding to the same allele (if the indicator variable is 1) or to the same population (if the indicator variable is 0). For dominant data, the data consists in the number of genotypes, not the number of alleles. It is important to note that the frequency of the homozygote individuals for the recessive allele appear first in either the rows or columns of the data matrix. In the following example, the data consists in 2 populations and 2 loci, with 5 alleles at the first locus and 8 alleles at the second locus.

0 2 2

5 1 0 4 10 5 0 1 13 0 6

8 3 1 1 0 0 0 1 14 6 0 2 1 2 5 2 2

Spaces and blank lines can be included as desired.

For dominant data, it is important to note that the frequency of the homozygote individuals for the recessive allele appears first in either the rows or columns of the data matrix.

The command line read.data creates a file named ‘infile.dat’, a file named ‘sample_sizes.dat’ and a set of files named ‘plot_i_j.dat’ where \(i\) and \(j\) correspond to population numbers, so that each file ‘plot_i_j.dat’ corresponds to the pairwise analysis of populations \(i\) and j. In the file infile.dat, each line corresponds to the pairwise analysis of populations \(i\) and \(j\). Each line contains (in that order): the name of the output simulation file, the numbers \(i\) and \(j\), the multi-locus estimates of \(F_1\) and \(F_2\), and Weir and Cockerham's (1984) estimate of \(F_{ST}\). The file sample_sizes.dat contains sample sizes information, for internal use only. In the files ‘plot_i_j.dat’, each line corresponds to one locus observed in the data set. Each line contains (in that order): the locus-specific estimates of \(F_1\) and \(F_2\), Weir and Cockerham's (1984) estimate of \(F_{ST}\), Nei's heterozygosity (\(H_e\)), the number of alleles at that locus in the pooled sample, and the rank of the locus in the data set.

References

Weir, B. S., and Cockerham, C. C. (1984) Estimating F-statistics for the analysis of population structure, Evolution 38: 1358--1370.

Zhivotovsky, L. A. (1999) Estimating population structure in diploids with multilocus dominant DNA markers, Molecular Ecology 8, 907--913

Examples

Run this code
# NOT RUN {
## This is to generate an example file in the working directory.
make.example.files()

## This will read an input file named 'data.dat' that contains co-dominant markers,
## and a maximum allele frequency of 0.99 will be applied (i.e., by removing 
## marker loci in the observed and simulated datasets that have an allele with
## frequency larger than 0.99).
read.data(infile = 'data.dat',dominance = FALSE,maf = 0.99)

## This is to clean up the working directory.
remove.example.files()
# }

Run the code above in your browser using DataLab