Learn R Programming

snapCGH (version 1.42.0)

simulateData: A function for simulating aCGH data and the corresponding clone layout

Description

This simulation scheme operates in two stages. Initially, we simulate the layout of clones before using a modified version of the scheme developed by Willenbrock et al., 2005 to generate the $\log_2$ ratios. For each simulated clone layout we generate 20 sets of simulated $\log_2$ ratios from one of five templates. Additionally, we also take account of the cellularity of the test sample in our simulation.

Usage

simulateData(nArrays = 20, chrominfo = NULL, prb.short.tiled = 0.5, prb.long.tiled = 0.5, non.tiled.lower.res = 0.9, non.tiled.upper.res = 1.1, length.clone.lower = 0.05, length.clone.upper = 0.2, tiled.lower.res = -0.05, tiled.upper.res = 0, sd = NULL, output = FALSE, prb.proportion.tiled = c(0.2, 0.2, 0.2, 0.2, 0.2), zerolengthnontiled = NULL, zerolengthtiled = NULL, nonzerolengthnontiled = NULL, nonzerolengthtiled = NULL, seed = 1)

Arguments

nArrays
The number of arrays we want to simulate
chrominfo
The information about chromosome length/centromere location to be used. Defaults to the information provided in aCGH package of Jane Fridlyand and Peter Dimitrov.
prb.short.tiled
The probability of a tiled region on the short arm of the simulated chromosome (defaults to 0.5).
prb.long.tiled
The probability of a tiled region on the long arm of the simulated chromosome (defaults to 0.5).
non.tiled.lower.res
The lower limit for the distance (in Mbs) between adjacent clones in non-tiled regions of the genome (defaults to 0.9Mb).
non.tiled.upper.res
The upper limit for the distance (in Mbs) between adjacent clones in non-tiled regions of the genome (defaults to 1.1Mb).
length.clone.lower
The lower limit for the length (in Mbs) of a clone (this defaults to 0.05Mb).
length.clone.upper
The upper limit for the length (in Mbs) of a clone (this defaults to 0.2Mb).
tiled.lower.res
The lower limit for the distance (in Mbs) between adjacent clones in tiled regions of the genome (defaults to -0.05Mb).
tiled.upper.res
The upper limit for the distance (in Mbs) between adjacent clones in tiled regions of the genome (defaults to 0Mb).
sd
The standard deviation of the simulated data in each of the states. Defaults to being randomly sampled between 0.1 and 0.2.
output
A logical variable which is TRUE if you want the output to be written to txt files in the present working directory. Defaults to FALSE.
prb.proportion.tiled
Given that an arm of a chromosome contains a tiled region this variable (which is a vector of length 5) gives the probability that 20,30,40,50 or 100% of the chromosome is tiled. It defaults to (0.2,0.2,0.2,0.2,0.2)
zerolengthnontiled
The empirical distribution for regions of the genome which are non-tiled and contain no copy number gains or losses. Defaults to zero.length.distr.non.tiled
zerolengthtiled
The empirical distribution for regions of the genome which are tiled and contain no copy number gains or losses. Defaults to zero.length.distr.tiled
nonzerolengthnontiled
The empirical distribution for regions of the genome which are non-tiled and contain no copy number gains or losses. Defaults to non.zero.length.distr.non.tiled
nonzerolengthtiled
The empiricial distribution for regions of the genome which are tiled and contain copy number gains or losses. Defaults to non.zero.length.distr.tiled
seed
Seed value allowing simulation to be reproduced if the same seed value is set.

Value

The function returns a list containing the following elements.
clones
Gives the start, end and midpoint of the simulated clones.
class.output
A list of the true underlying state clones are assigned to for each of the twenty simulations associated with each clone layout.
class.matrix
Defines the true underlying state clones are assigned to in each of the five classes
classes
Which of the five class outputs has been used to simulate the $\log_2$ ratios
datamatrix
A matrix containing twenty columns each of which contains the simulated $\log_2$ ratios associated with each of the simulations for a particular clone layout.
samples
Gives information about the cellularity associated with each of the samples.

Details

For more details see the article by Marioni and Thorne published in Bioinformatics.

References

See the relevant article in Bioinformatics or the following website: www.damtp.cam.ac.uk/user/jcm68