"cyclone"(x, pairs, gene.names=rownames(x), iter=1000, min.iter=100, min.pairs=50, BPPARAM=bpparam(), verbose=FALSE)
"cyclone"(x, ..., assay="counts", get.spikes=FALSE)
sandbag
, containing pairs of marker genes.bplapply
for parallel processing.cyclone,matrix-method
.counts
or exprs
.scores
, containing the phase scores for each phase and cell (i.e., each row is a cell);
and normalized.scores
, containing the row-normalized scores (i.e., where the row sum for each cell is equal to 1).
sandbag
, where the sign of the relative expression between gene in each pair changes across phases.
For each phase and each cell, the function calculates the proportion of all marker pairs where the expression of the first gene is greater than the second
(pairs with the same expression are ignored).
A distribution of proportions is constructed by shuffling the expression values within the cell and recalculating the proportion at each iteration.
The phase score for that cell is then defined as the lower tail probability of this distribution.By default, shuffling is performed iter
times to obtain the distribution from which the score is estimated.
However, some iterations may not be used if there are fewer than min.pairs
pairs with different expression, such that the proportion cannot be calculated precisely.
Also, a score is only returned if the distribution is large enough for stable calculation of the tail probability, i.e., consists of results from at least min.iter
iterations.
Cells with G1 and G2M scores above 0.5 should be assigned to the G1 and G2M phases, respectively. This is based on the interpretation of the score as 1 minus the p-value for the null distribution of proportions. The null hypothesis here is that expression of the marker genes is independent within each cell, i.e., with no cycle-induced correlations between marker pairs. Cells can be assigned to S phase based on the S phase score, but a more reliable approach is to define S phase cells based on those cells with G1 and G2M scores below 0.5.
For cyclone,SCESet-method
, the matrix of counts is used but can be replaced with expression values by setting assays
.
By default, get.spikes=FALSE
which means that any rows corresponding to spike-in transcripts will not be considered for score calculation.
This is for the same reasons as described in ?sandbag
.
sandbag
example(sandbag)
# Classifying (note: test.data!=training.data in real cases)
test <- training
assignments <- cyclone(test, out)
# Visualizing
col <- character(ncells)
col[is.G1] <- "red"
col[is.G2M] <- "blue"
col[is.S] <- "darkgreen"
plot(assignments$score$G1, assignments$score$G2M, col=col, pch=16)
Run the code above in your browser using DataLab