Learn R Programming

bnpsd (version 1.3.13)

draw_all_admix: Simulate random allele frequencies and genotypes from the BN-PSD admixture model

Description

This function returns simulated ancestral, intermediate, and individual-specific allele frequencies and genotypes given the admixture structure, as determined by the admixture proportions and the vector or tree of intermediate subpopulation FST values. The function is a wrapper around draw_p_anc(), draw_p_subpops()/draw_p_subpops_tree(), make_p_ind_admix(), and draw_genotypes_admix() with additional features such as requiring polymorphic loci. Importantly, by default fixed loci (where all individuals were homozygous for the same allele) are re-drawn from the start (starting from the ancestral allele frequencies) so no fixed loci are in the output and no biases are introduced by re-drawing genotypes conditional on any of the previous allele frequencies (ancestral, intermediate, or individual-specific). Below m_loci (also m) is the number of loci, n is the number of individuals, and k is the number of intermediate subpopulations.

Usage

draw_all_admix(
  admix_proportions,
  inbr_subpops = NULL,
  m_loci,
  tree_subpops = NULL,
  want_genotypes = TRUE,
  want_p_ind = FALSE,
  want_p_subpops = FALSE,
  want_p_anc = TRUE,
  verbose = FALSE,
  require_polymorphic_loci = TRUE,
  maf_min = 0,
  beta = NA,
  p_anc = NULL,
  p_anc_distr = NULL
)

Arguments

admix_proportions

The n-by-k matrix of admixture proportions.

inbr_subpops

The length-k vector (or scalar) of intermediate subpopulation FST values. Either this or tree_subpops must be provided (but not both).

m_loci

The number of loci to draw.

tree_subpops

The coancestry tree relating the k intermediate subpopulations. Must be a phylo object from the ape package (see ape::read.tree()). Either this or inbr_subpops must be provided (but not both).

want_genotypes

If TRUE (default), includes the matrix of random genotypes in the return list.

want_p_ind

If TRUE (NOT default), includes the matrix of individual-specific allele frequencies in the return list. Note that by default p_ind is not constructed in full at all, instead a fast low-memory algorithm constructs it in parts as needed only; beware that setting want_p_ind = TRUE increases memory usage in comparison.

want_p_subpops

If TRUE (NOT default), includes the matrix of random intermediate subpopulation allele frequencies in the return list.

want_p_anc

If TRUE (default), includes the vector of random ancestral allele frequencies in the return list.

verbose

If TRUE, prints messages for every stage in the algorithm.

require_polymorphic_loci

If TRUE (default), returned genotype matrix will not include any fixed loci (loci that happened to be fixed are drawn again, starting from their ancestral allele frequencies, and checked iteratively until no fixed loci remain, so that the final number of polymorphic loci is exactly m_loci).

maf_min

The minimum minor allele frequency (default zero), to extend the working definition of "fixed" above to include rare variants. This helps simulate a frequency-based locus ascertainment bias. Loci with minor allele frequencies less than or equal to this value are treated as fixed (passed to fixed_loci()). This parameter has no effect if require_polymorphic_loci is FALSE.

beta

Shape parameter for a symmetric Beta for ancestral allele frequencies p_anc. If NA (default), p_anc is uniform with range in [0.01, 0.5]. Otherwise, p_anc has a symmetric Beta distribution with range in [0, 1]. Has no effect if either p_anc or p_anc_distr options are non-NULL.

p_anc

If provided, it is used as the ancestral allele frequencies (instead of drawing random ones). Must either be a scalar or a length-m_loci vector. If scalar and want_p_anc = TRUE, then the returned p_anc is the scalar value repeated m_loci times (it is always a vector). If a locus was fixed and has to be redrawn, the ancestral allele frequency in p_anc is retained and only downstream allele frequencies and genotypes are redrawn (contrast to p_anc_distr below).

p_anc_distr

If provided, ancestral allele frequencies are drawn with replacement from this vector (which may have any length) or function, instead of from draw_p_anc(). If a function, must accept a single parameter specifying the number of loci to draw. If a locus was fixed and has to be redrawn, the ancestral allele frequency is redrawn from the distribution (contrast to p_anc above).

Value

A named list with the following items (which may be missing depending on options):

  • X: An m-by-n matrix of genotypes. Included if want_genotypes = TRUE.

  • p_anc: A length-m vector of ancestral allele frequencies. Included if want_p_anc = TRUE.

  • p_subpops: An m-by-k matrix of intermediate subpopulation allele frequencies Included if want_p_subpops = TRUE.

  • p_ind: An m-by-n matrix of individual-specific allele frequencies. Included if want_p_ind = TRUE.

Details

As a precaution, function stops if both the column names of admix_proportions and the names in inbr_subpops or tree_subpops exist and disagree, which might be because these two data are not aligned or there is some other inconsistency.

Examples

Run this code
# NOT RUN {
# dimensions
# number of loci
m_loci <- 10
# number of individuals
n_ind <- 5
# number of intermediate subpops
k_subpops <- 2

# define population structure
# FST values for k = 2 subpopulations
inbr_subpops <- c(0.1, 0.3)
# admixture proportions from 1D geography
admix_proportions <- admix_prop_1d_linear(n_ind, k_subpops, sigma = 1)

# draw all random allele freqs and genotypes
out <- draw_all_admix(admix_proportions, inbr_subpops, m_loci)

# return value is a list with these items:

# genotypes
X <- out$X

# ancestral AFs
p_anc <- out$p_anc

# # these are excluded by default, but would be included if ...
# # ... `want_p_subpops == TRUE`
# # intermediate subpopulation AFs
# p_subpops <- out$p_subpops
# 
# # ... `want_p_ind == TRUE`
# # individual-specific AFs
# p_ind <- out$p_ind

# }

Run the code above in your browser using DataLab