Usage
sc3(filename, ks = 3:7, cell.filter = FALSE, cell.filter.genes = 2000, gene.filter = TRUE, gene.filter.fraction = 0.06, log.scale = TRUE, d.region.min = 0.04, d.region.max = 0.07, interactivity = TRUE, show.original.labels = FALSE, svm = FALSE, svm.num.cells = NA, n.cores = NA, seed = 1)
Arguments
filename
either an R matrix / data.frame object OR a
path to your input file containing an input expression matrix. The expression
matrix must contain both colnames (cell IDs) and rownames (gene IDs).
ks
a range of the number of clusters that needs to be tested.
k.min is the minimum number of clusters (default is 3). k.max is the maximum
number of clusters (default is 7).
cell.filter
defines whether to filter cells that express less than
cell.filter.genes genes (lowly expressed cells). By default it is FALSE.
The cell filter should be used if the quality of data is low, i.e. if one
suspects that some of the cells may be technical outliers with poor coverage.
Filtering of lowly expressed cells usually improves clustering.
cell.filter.genes
if cell.filter is used then this parameter defines
the minimum number of genes that have to be expressed in each cell
(expression value > 1e-2). If there are fewer, the cell will be
removed from the analysis. The default is 2000.
gene.filter
defines whether to perform gene filtering or not. Boolean,
default is TRUE.
gene.filter.fraction
fraction of cells (1 - X/100), default is 0.06.
The gene filter removes genes that are either expressed or absent
(expression value is less than 2) in at least X
The motivation for the gene filter is that ubiquitous and rare genes most
often are not informative for the clustering.
log.scale
defines whether to perform log2 scaling or not. Boolean,
default is TRUE.
d.region.min
the lower boundary of the optimum region of d,
default is 0.04.
d.region.max
the upper boundary of the optimum region of d,
default is 0.07.
interactivity
defines whether a browser interactive window should be
open after all computation is done. By default it is TRUE. This option can
be used to separate clustering calculations from visualisation,
e.g. long and time-consuming clustering of really big datasets can be run
on a farm cluster and visualisations can be done using a personal
laptop afterwards. If interactivity is FALSE then all clustering results
will be saved as "sc3.interactive.arg" list. To run interactive visulisation with
the precomputed clustering results please use
sc3_interactive(sc3.interactive.arg).
show.original.labels
if cell labels in the dataset are not unique,
but represent clusters expected from the experiment, they can be visualised
by setting this parameter to TRUE. The default is FALSE.
svm
if TRUE then an SVM prediction will be used. The default is FALSE.
svm.num.cells
number of training cells to be used for SVM prediction.
The default is NA. If the svm parameter is TRUE and svn.num.cells is not provided,
then the defaults of SC3 will be used: if number of cells is more than 5000,
then svn.num.cells = 1000, otherwise svn.num.cells = 20 percent of the total number of cells
n.cores
defines the number
of cores to be used on the user's machine. Default is NA.
seed
sets seed for the random number generator, default is 1.
Can be used to check the stability of clustering results: if the results are
the same after changing the seed several time, then the clustering solution
is stable.