The function glSim
simulates simple SNP data with the
possibility of contrasted structures between two groups
as well as background ancestral population structure.
Returned objects are instances of the class genlight.
glSim(n.ind, n.snp.nonstruc, n.snp.struc = 0, grp.size = c(0.5, 0.5), k = NULL,
pop.freq = NULL, ploidy = 1, alpha = 0, parallel = FALSE,
LD = TRUE, block.minsize = 10, block.maxsize = 1000, theta = NULL,
sort.pop = FALSE, ...)
A genlight object.
an integer indicating the number of individuals to be simulated.
an integer indicating the number of non-structured SNPs to be simulated; for these SNPs, all individuals are drawn from the same binomial distribution.
an integer indicating the number of structured SNPs to be simulated; for these SNPs, different binomial distributions are used for the two simulated groups; frequencies of the derived alleles in groups A and B are built to differ (see details).
a vector of length 2 specifying the proportions of the two phenotypic groups (must sum to 1). By default, both groups have the same size.
an integer specifying the number of ancestral populations to be generated.
a vector of length k
specifying the proportions of the
k ancestral populations (must sum to 1). If, as by default, pop.freq
is null, and k
is non-null, pop.freq
will be the result of
random sampling into k population groups.
an integer indicating the ploidy of the simulated genotypes.
asymmetry parameter: a numeric value between 0 and 0.5, used to enforce allelic differences between the groups. Differences between groups are strongest when alpha = 0.5 and weakest when alpha = 0 (see details).
a logical indicating whether multiple cores should be used in generating the simulated data (TRUE). This option can reduce the amount of computational time required to simulate the data, but is not supported on Windows.
a logical indicating whether loci should be displaying linkage disequilibrium (TRUE) or be generated independently (FALSE, default). When set to TRUE, data are generated by blocks of correlated SNPs (see details).
an optional integer indicating the minimum number of
SNPs to be handled at a time during the simulation of linked SNPs (when
LD=TRUE
. Increasing the minimum block size will increase
the RAM requirement but decrease the amount of computational time
required to simulate the genotypes.
an optional integer indicating the maximum number of SNPs to be handled at a time during the simulation of linked SNPs. Note: if LD blocks of equal size are desired, set block.minsize = block.maxsize.
an optional numeric value between 0 and 0.5 specifying the extent to which linkage should be diluted. Linkage is strongest when theta = 0 and weakest when theta = 0.5.
a logical specifying whether individuals should be ordered by
ancestral population (sort.pop=TRUE
) or phenotypic population
(sort.pop=FALSE
).
arguments to be passed to the genlight constructor.
Caitlin Collins caitlin.collins12@imperial.ac.uk, Thibaut Jombart t.jombart@imperial.ac.uk
=== Allele frequencies in contrasted groups ===
When n.snp.struc
is greater than 0, some SNPs are simulated in
order to differ between groups (noted 'A' and 'B'). Different patterns
between groups are achieved by using different
frequencies of the second allele for A and B, denoted \(p_A\) and
\(p_B\). For a given SNP, \(p_A\) is drawn from a uniform
distribution between 0 and (0.5 - alpha). \(p_B\) is then computed
as 1 - \(p_A\). Therefore, differences between groups are mild for
alpha=0, and total for alpha = 0.5.
=== Linked or independent loci ===
Independent loci (LD=FALSE
) are simulated using the standard
binomial distribution, with randomly generated allele
frequencies. Linked loci (LD=FALSE
) are trickier towe need to
simulate discrete variables with pre-defined correlation structure.
Here, we first generate deviates from multivariate normal distributions with randomly generated correlation structures. These variables are then discretized using the quantiles of the distribution. Further improvement of the procedure will aim at i) specifying the strength of the correlations between blocks of alleles and ii) enforce contrasted structures between groups.
- genlight
: class of object for storing massive binary
SNP data.
- glPlot
: plotting genlight objects.
- glPca
: PCA for genlight objects.
if (FALSE) {
## no structure
x <- glSim(100, 1e3, ploid=2)
plot(x)
## 1,000 non structured SNPs, 100 structured SNPs
x <- glSim(100, 1e3, n.snp.struc=100, ploid=2)
plot(x)
## 1,000 non structured SNPs, 100 structured SNPs, ploidy=4
x <- glSim(100, 1e3, n.snp.struc=100, ploid=4)
plot(x)
## same thing, stronger differences between groups
x <- glSim(100, 1e3, n.snp.struc=100, ploid=2, alpha=0.4)
plot(x)
## same thing, loci with LD structures
x <- glSim(100, 1e3, n.snp.struc=100, ploid=2, alpha=0.4, LD=TRUE, block.minsize=100)
plot(x)
}
Run the code above in your browser using DataLab