Learn R Programming

methylKit (version 0.99.2)

dataSim: Simulate DNA methylation data

Description

The function simulates DNA methylation data from multiple samples. See references for detailed explanation on statistics.

Usage

dataSim(replicates, sites, treatment, percentage = 10, effect = 25,
  alpha = 0.4, beta = 0.5, theta = 10, covariates = NULL,
  sample.ids = NULL, assembly = "hg18", context = "CpG",
  add.info = FALSE)

Arguments

replicates

the number of samples that should be simulated.

sites

the number of CpG sites per sample.

treatment

a vector containing treatment information.

percentage

the proportion of sites which should be affected by the treatment.

effect

a number or vector specifying the effect size of the treatment. See `Examples'.

alpha

shape1 parameter for beta distribution (used for substitution probabilites)

beta

shape2 parameter for beta distribution (used for substitution probabilites)

theta

dispersion parameter for beta distribution (used for substitution probabilites)

covariates

a data.frame containing covariates (optional)

sample.ids

will be generated automatically from treatment, but can be overwritten by a character vector containing sample names.

assembly

the assembly description (e.g. "hg18")

context

the experimanteal context of the data (e.g. "CpG")

add.info

if set to TRUE, the output will be a list with the first element being the methylbase object and a vector containing the treatment effect sizes of all sites as the second element.

Value

a methylBase object containing simulated methylation data, or a list containing the methylbase object and the indices of all treated sites as the second element.

Details

While the coverage is modeled with a binomial distribution, the function uses a Beta distribution to simulate the methylation background across all samples. The parameters alpha, beta and theta determine this beta distribution and thereby the methylation values. The parameters percentage and effect determine the proportion of sites that are affected by the treatment and the strength of this influence, respectively. The additional information needed for a valid methylBase.obj is generated as "dummy values", but can be overwritten as needed.

Examples

Run this code
# NOT RUN {
data(methylKit)

# Simualte data for 4 samples with 20000 sites each.
# The methylation in 10% of the sites are elevated by 50%.
my.methylBase=dataSim(replicates=4,sites=2000,treatment=c(1,1,0,0),
percentage=10,effect=25)

# Simulate data with variable effect sizes of the treatment
# The methylation in 30% of the sites are elevated by 40%, 50% or 60%.
my.methylBase2=dataSim(replicates=4,sites=2000,treatment=c(1,1,0,0),
percentage=30,effect=10:40)

# }

Run the code above in your browser using DataLab