get.mser.interpolation: Interpolate MSER dependency on the tag count

Description

MSER generally decreases with increasing sequencing depth. This function interpolates the dependency of MSER on tag counts as a log-log linear function. The log-log fit is used to estimate the depth of sequencing required to reach desired target.fold.enrichment.

Usage

get.mser.interpolation(signal.data, 
  control.data, 
  target.fold.enrichment = 5, 
  n.chains = 10, 
  n.steps = 6, 
  step.size = 1e+05, 
  chains = NULL, 
  test.agreement = 0.99, 
  return.chains = F, 
  enrichment.background.scales = c(1), 
  excluded.steps = c(seq(2, n.steps - 2)), ...)

Arguments

signal.data

signal chromosome tag vector list

control.data

control chromosome tag vector list

target.fold.enrichment

target MSER for which the depth should be estimated

n.steps

number of steps in each subset chain.

step.size

Either number of tags or fraction of the dataset size, see step.size parameter for get.mser.

test.agreement

Fraction of the detected peaks that should agree between the full and subsampled datasets. See test.agreement parameter for get.mser

n.chains

number of random subset chains

chains

optional structure of pre-calculated chains (e.g. generated by an earlier call with return.chains=T.

return.chains

whether to return peak predictions calculated on random chains. These can be passed back using chains argument to skip subsampling/prediction steps, and just recalculate the depth estimate for a different MSER.

enrichment.background.scales

see enrichment.background.scales parameter for get.mser

excluded.steps

Intermediate subsampling steps that should be excluded from the chains to speed up the calculation. By default, all intermediate steps except for first two and last two are skipped. Adding intermediate steps improves interpolation at the expense of computational time.

…

additional parameters are passed to get.mser

Value

Normally reurns a list, specifying for each backgroundscale:

prediction

estimated sequencing depth required to reach specified target MSER

log10.fit

linear fit model, a result of lm() call

If return.chains=T, the above structure is returned under interpolation field, along with chains field containing results of find.binding.positions calls on subsampled chains.

Details

To simulate sequencing growth, the method calculates peak predictions on random chains. Each chain is produced by sequential random subsampling of the original data. The number of steps in the chain indicates how many times the random subsampling will be performed.