Learn R Programming

SnapATAC (version 1.0.0)

runJDA: Jaccard Distance Analysis

Description

This function takes a Snap obj as input with bmat/pmat slot and run Jaccard distance based analysis (JDA).

Usage

runJDA(obj, input.mat = c("bmat", "pmat"), bin.cov.zscore.lower = -Inf,
  bin.cov.zscore.upper = Inf, bin.downsample = 1, pc.num = 50,
  norm.method = c("residual", "zscore", "None"), tmp.folder, max.var = 2000,
  row.center = TRUE, row.scale = TRUE, low.threshold = -5,
  high.threshold = 5, do.par = TRUE, ncell.chunk = 2000, num.cores = 1,
  seed.use = 10, keep.jmat = FALSE)

Arguments

obj

A snap obj

input.mat

Input matrix to be used for LSA c("bmat", "pmat", "gmat").

bin.cov.zscore.lower

Bin coverage is coverted to zscore and bins with zscore lower than bin.cov.zscore.lower will be filtered

bin.cov.zscore.upper

Bin coverage is coverted to zscore and bins with zscore higher than bin.cov.zscore.upper will be filtered

bin.downsample

Percentage of bins to be downsampled to [1].

pc.num

An integer number of dimetions to return [50].

norm.method

A character class that indicates the normalization method to be used. This must be one of c("residual", "zscore", "None")

tmp.folder

A non-empty character vector giving the directory name that saves the temp files

max.var

A numeric variable indicates the how many dimentions for jaccard index to be calcualted

row.center

A logical value indicating whether rows of the normalized jaccard inex matrix should be centered by subtracting the layer means (omitting 'NA's)

row.scale

A logical value indicating whether rows of the normalized jaccard index matrix should be scaled by dividing the (centered) layers of 'x' by their standard deviations if 'center' is 'TRUE'.

low.threshold

A numeric class that indicates the min value for normalized jaccard index [-5].

high.threshold

A numeric class that indicates the max value for normalized jaccard index [5].

do.par

A logic variable indicates weather to run this in parallel with multiple processors.

ncell.chunk

A numeric class that indicates the number of cells to calculate per processing core.

num.cores

Number of processors to use.

seed.use

A numeric variable indicates the random seed to use [10].

keep.jmat

A logical variable indicates whether to keep the jaccard index matrix [FALSE].

Details

runJDA performs the following steps: 1) runJaccard - calculate pair-wise jaccard index 2) normJaccard - adjusts read depth and other artifacts in observed jaccard matrix 3) SVD - run SVD on the normalized jaccard matrix

In theory, the entry in the jaccard index calculated by calJaccard() should reflects the true similarity between two cells, however, that is not the case. We observed that a cell of higher coverage tends to have a higher similarity with another cell regardless whether these two cells are similar or not. These biases, we termed as <U+201C>coverage bias<U+201D> also observed in other studies, can later result in misleading cell grouping. Therefore, it is cruicial to normalize the bias.

Examples

Run this code
# NOT RUN {
data(demo.sp);
demo.sp = makeBinary(demo.sp);
demo.sp = runJDA(
obj=demo.sp, 
input.mat="bmat", 
 bin.cov.zscore.lower=-2,
 bin.cov.zscore.upper=2,
bin.downsample=1,
pc.num=50,
norm.method="residual",
tmp.folder=tempdir(),
max.var=2000,
do.par=TRUE,
ncell.chunk=1000,
num.cores=5,
seed.use=10
);
# }

Run the code above in your browser using DataLab