run.detsel: Create Simulated Data

Description

This runs the simulated data, using a coalescent-based algorithm.

Usage

run.detsel(example)

Arguments

example

a logical variable, which is TRUE if the user wants to run a toy example with the example file.

Value

The output files are saved in the current directory.

Details

Once the read.data command line has been executed, run.detsel executes the simulations. The user is first asked to provide the total number of simulations for the entire set of parameter values (default: 500000). For bi-allelic data, I recommend to run no less than 1000000 simulations to estimate correctly the P-values for each empirical locus. With less than a million simulations, indeed, simulation tests have shown that the P-values may be biased. The user is then asked to provide the average mutation rate, and the mutation model: type ‘0’ for the infinite allele model, where each mutation creates a new allelic state in the population; type ‘1’ for the stepwise mutation model, where each mutation consists in increasing or decreasing by one step, the size of the current allele; and type any integer \(k\) (with \(k > 1\)) for a k-allele model, where each mutation consists in drawing randomly one allele among k possible states, provided it is different from the current state. For example, for SNP data, type ‘2’. Finally, the user is asked to provide the number of distinct sets of nuisance parameters (the default is a single set of parameters). Because of the uncertainty in the nuisance parameter values, it is recommended to perform simulations using different combinations of values for the ancestral population size, divergence time and bottleneck parameters. Then, the user is asked to provide as many sets of parameters as he/she indicated. Each set comprises four parameters that should be given in the following order: \(t, N_0,t_0\) and \(N_e\). Here, the user must chose parameter values, including the mutation model, that correspond to his/her knowledge of the biological model.

The command line run.detsel creates a list of files named ‘Pair_i_j_ni_nj.dat’, where i and j are the indices of populations pairs, and ni and nj are the sample sizes of populations i and j, respectively. Because some marker loci may have missing data, several ‘Pair_i_j_ni_nj.dat’ files may be created for a given pair of populations. Simulating the exact sample size for each locus is required to precisely calculate the empirical P-values, especially for bi-allelic markers. Note that if negative multi-locus Fi estimates are observed for a pairwise comparison, then the simulations will not be run for that pair. Each line of the ‘Pair_i_j_ni_nj.dat’ files contains the locus-specific estimates of \(F_1\) and \(F_2\), Weir and Cockerham's estimate of \(F_{ST}\) (Weir and Cockerham 1984), Nei's heterozygosity (\(H_e\)), and the number of alleles at that locus in the pooled sample. The command line run.detsel() also creates a file named out.dat that contains the estimates of the above statistics averaged over all the simulated data. In the file out.dat, each line corresponds to the pairwise analysis of populations i and j with sample sizes and ni and nj. Each line contains (in that order): the name of the output simulation file (‘Pair_i_j_ni_nj.dat’), the multi-locus estimates of \(F_1\) and \(F_2\), and Weir and Cockerham's estimate of \(F_{ST}\) (Weir and Cockerham 1984). An important point to consider is to make sure that for each pairwise comparison, the and estimates averaged over the simulated data (in the file out.dat) closely match to the observed values in the real dataset (in the file infile.dat). If not, this suggests that the simulated datasets do not fit to the observed data, which urges to choose other parameter values for the nuisance parameters.

References

Weir, B. S., and Cockerham, C. C. (1984) Estimating F-statistics for the analysis of population structure, Evolution 38: 1358--1370.

Examples

Run this code

# NOT RUN {
## This is to generate an example file in the working directory.
make.example.files()

## This will read an input file named 'data.dat' that contains co-dominant markers,
## and a maximum allele frequency of 0.99 will be applied (i.e., by removing 
## marker loci in the observed and simulated datasets that have an allele with
## frequency larger than 0.99).
read.data(infile = 'data.dat',dominance = FALSE,maf = 0.99)

## The following command line executes the simulations:
run.detsel(example = TRUE)

## This is to clean up the working directory.
remove.example.files()
# }

Run the code above in your browser using DataLab