Learn R Programming

simulatorZ (version 1.6.0)

simData: simData

Description

simData is a function to perform non-parametric bootstrap resampling

on a list of (original) data sets, both on set level and patient level,

in order to simulate independent genomic sets.

Usage

simData(obj, n.samples, y.vars = list(), type = "two-steps",
balance.variables = NULL)

Arguments

obj
a list of ExpressionSets, matrices or RangedSummarizedExperiments. If

elements are matrices, columns represent samples

n.samples
an integer indicating how many samples should be resampled from each set
y.vars
a list of response variables, can be Surv object, or matrix or data.frame

with two columns

type
string "one-step" or "two-steps". If type="one-step", the function will

skip resampling the datasets, and directly resample from the original list

of obj

balance.variables
balance.variables will be a vector of covariate names that should be

balanced in the simulation. After balancing, the prevalence of covariate

in each result set should be the same as the overall distribution across

all original data sets. Default is set as NULL, when it will not balance

over any covariate. if isn't NULL, esets parameter should only be of class

ExpressionSet

Value

prob.desired and prob.real are only useful when balance.varaibles is set.prob.desired shows overall distrubition of the specified covariate. prob.listshows the sampling probability in each set after balancing

Examples

Run this code


library(curatedOvarianData)


library(GenomicRanges)





data(GSE17260_eset)


data(E.MTAB.386_eset)


data(GSE14764_eset)





esets <- list(GSE17260=GSE17260_eset, E.MTAB.386=E.MTAB.386_eset, GSE14764=GSE14764_eset)


esets.list <- lapply(esets, function(eset){


  return(eset[1:1000, 1:10])


})





## simulate on multiple ExpressionSets


set.seed(8)


# one-step bootstrap: skip resampling set labels


simmodels <- simData(esets.list, 20, type="one-step")  


# two-step-non-parametric bootstrap


simmodels <- simData(esets.list, 10, type="two-steps")





## simulate one set


simmodels <- simData(list(esets.list[[1]]), 10, type="two-steps")





## balancing covariates


# single covariate


simmodels <- simData(list(esets.list[[1]]), 5, balance.variables="tumorstage")





# multiple covariates


simmodels <- simData(list(esets.list[[1]]), 5, 


                     balance.variables=c("tumorstage", "age_at_initial_pathologic_diagnosis"))  





## Support matrices


X.list <- lapply(esets.list, function(eset){


  return(exprs(eset))


})


simmodels <- simData(X.list, 20, type="two-steps")





## Support RangedSummarizedExperiment


nrows <- 200; ncols <- 6


counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)


rowRanges <- GRanges(rep(c("chr1", "chr2"), c(50, 150)),


                     IRanges(floor(runif(200, 1e5, 1e6)), width=100),


                     strand=sample(c("+", "-"), 200, TRUE))


colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),


                     row.names=LETTERS[1:6])


sset <- SummarizedExperiment(assays=SimpleList(counts=counts),


                             rowRanges=rowRanges, colData=colData)





s.list <- list(sset[,1:3], sset[,4:6])


simmodels <- simData(s.list, 20, type="two-steps")


Run the code above in your browser using DataLab