snp_simuPheno: Simulate phenotypes

Description

Simulate phenotypes using a linear model. When a prevalence is given, the liability threshold is used to convert liabilities to a binary outcome. The genetic and environmental liabilities are scaled such that the variance of the genetic liability is exactly equal to the requested heritability, and the variance of the total liability is equal to 1.

Usage

snp_simuPheno(
  G,
  h2,
  M,
  K = NULL,
  alpha = -1,
  ind.row = rows_along(G),
  ind.possible = cols_along(G),
  prob = NULL,
  effects.dist = c("gaussian", "laplace"),
  ncores = 1
)

Value

A list with 3 elements:

$pheno: vector of phenotypes,
$set: indices of causal variants,
$effects: effect sizes (of scaled genotypes) corresponding to set.
$allelic_effects: effect sizes, but on the allele scale (0|1|2).

Arguments

G: A FBM.code256 (typically <bigSNP>$genotypes).
You shouldn't have missing values. Also, remember to do quality control, e.g. some algorithms in this package won't work if you use SNPs with 0 MAF.
h2: Heritability.
M: Number of causal variants.
K: Prevalence. Default is NULL, giving a continuous trait.
alpha: Assumes that the average contribution (e.g. heritability) of a SNP of frequency $p$ is proportional to $[2p(1-p)]^{1+\alpha}$. Default is -1.
ind.row: An optional vector of the row indices (individuals) that are used. If not specified, all rows are used.
Don't use negative indices.
ind.possible: Indices of possible causal variants.
prob: Vector of probability weights for sampling causal indices. It can have 0s (discarded) and is automatically scaled to sum to 1. Default is NULL (all indices have the same probability).
effects.dist: Distribution of effects. Either "gaussian" (the default) or "laplace".
ncores: Number of cores used. Default doesn't use parallelism. You may use bigstatsr::nb_cores().