Learn R Programming

lymphclon (version 1.3.0)

generate.clonal.data: generate.clonal.data (part of lymphclon package)

Description

This function generates simulated data, for evaluation purposes. We start with an underlying multinomial population with entries proportional to rank^-power distribution, where power is fixed. Next, we draw multinomially from this distribution, a fixed number of cells, to generate each desired replicate. Then, this distribution is subject to log-normal error, and subsequently scaled up to the expected number of reads. To round into integers, the expected number of reads for each clone is finally pushed through a poisson process to generate integer read counts. The poisson "rounding" process is why the resulting read counts are not exactly as specified.

Usage

generate.clonal.data(
  n = 2e+07, 
  num.cells.taken.vector = c(2000, 5000, 10000, 20000, 50000, 50000), 
  read.count.per.replicate.vector = rep(20000, length(num.cells.taken.vector)), 
  clonal.distribution.power = -sqrt(2),
  pcr.noise.type = 'pareto',
  pcr.pareto.location = 1,
  pcr.pareto.shape = 1,
  pcr.lognormal.meanlog = 0,
  pcr.lognormal.sdlog = 1)

Arguments

n
The true number of distinct clones in the underlying assemblage
num.cells.taken.vector
A vector specifying the number of cells taken in each independent biological replicate
read.count.per.replicate.vector
A vector of the same length as num.cells.taken.vector, specifying the number of reads generated from each biological replicate, of the same corresponding indices
clonal.distribution.power
The true underlying clonal multinomial distribution is proportional to (1:n)^-clonal.distribution.power
pcr.noise.type
A string denoting the type of PCR noise: either 'pareto' (default), or 'lognormal'. The package author Yi Liu has found anecdotally and empirically that pareto distributions model sequencing amplification bonanzas much better than lognormal distributions.
pcr.pareto.location
The location parameter for the pareto distribution; matters only if the noise type is pareto.
pcr.pareto.shape
The shape parameter for the pareto distribution; matters only if the noise type is pareto.
pcr.lognormal.meanlog
The meanlog parameter for the lognormal distribution; matters only if the nosie type is lognormal
pcr.lognormal.sdlog
The sdlog parameter for the lognormal distribution; matters only if the nosie type is lognormal

Value

  • read.count.matrixThis is a matrix of simulated counts, with rows corresponding to clones (classes, or species), and columns corresponding to biological replicates
  • true.clone.probThis is the underlying simulated assemblage multinomial distribution used to generate read.count.matrix
  • true.clonalityThis is the true clonality score of the underlying simulated assemblage

Examples

Run this code
my.data <- generate.clonal.data(n=2e3) 
# n ~ 2e7 is more appropriate for a realistic B cell repertoire
my.lymphclon.results <- infer.clonality(my.data$read.count.matrix)
# a consistently improved estimate of clonality (the squared 
# 2-norm of the underlying multinomial distribution)

Run the code above in your browser using DataLab