UMAP
dimred_umap(
x,
ndim = 2,
distance_method = c("euclidean", "cosine", "manhattan"),
pca_components = 50,
n_neighbors = 15L,
init = "spectral",
n_threads = 1
)
Log transformed expression data, with rows as cells and columns as features
The number of dimensions
The name of the distance metric, see dynutils::calculate_distance
The number of pca components to use for UMAP. If NULL, PCA will not be performed first
The size of local neighborhood (in terms of number of neighboring sample points).
Type of initialization for the coordinates. Options are:
"spectral"
Spectral embedding using the normalized Laplacian
of the fuzzy 1-skeleton, with Gaussian noise added.
"normlaplacian"
. Spectral embedding using the normalized
Laplacian of the fuzzy 1-skeleton, without noise.
"random"
. Coordinates assigned using a uniform random
distribution between -10 and 10.
"lvrandom"
. Coordinates assigned using a Gaussian
distribution with standard deviation 1e-4, as used in LargeVis
(Tang et al., 2016) and t-SNE.
"laplacian"
. Spectral embedding using the Laplacian Eigenmap
(Belkin and Niyogi, 2002).
"pca"
. The first two principal components from PCA of
X
if X
is a data frame, and from a 2-dimensional classical
MDS if X
is of class "dist"
.
"spca"
. Like "pca"
, but each dimension is then scaled
so the standard deviation is 1e-4, to give a distribution similar to that
used in t-SNE. This is an alias for init = "pca", init_sdev =
1e-4
.
"agspectral"
An "approximate global" modification of
"spectral"
which all edges in the graph to a value of 1, and then
sets a random number of edges (negative_sample_rate
edges per
vertex) to 0.1, to approximate the effect of non-local affinities.
A matrix of initial coordinates.
For spectral initializations, ("spectral"
, "normlaplacian"
,
"laplacian"
), if more than one connected component is identified,
each connected component is initialized separately and the results are
merged. If verbose = TRUE
the number of connected components are
logged to the console. The existence of multiple connected components
implies that a global view of the data cannot be attained with this
initialization. Either a PCA-based initialization or increasing the value of
n_neighbors
may be more appropriate.
Number of threads to use (except during stochastic gradient
descent). Default is half the number of concurrent threads supported by the
system. For nearest neighbor search, only applies if
nn_method = "annoy"
. If n_threads > 1
, then the Annoy index
will be temporarily written to disk in the location determined by
tempfile
.
# NOT RUN {
library(Matrix)
dataset <- abs(Matrix::rsparsematrix(100, 100, .5))
dimred_umap(dataset, ndim = 2, pca_components = NULL)
# }
Run the code above in your browser using DataLab