Runs the Uniform Manifold Approximation and Projection (UMAP) dimensional
reduction technique. To run, you must first install the umap-learn python
package (e.g. via pip install umap-learn
). Details on this package can be
found here: https://github.com/lmcinnes/umap. For a more in depth
discussion of the mathematics underlying UMAP, see the ArXiv paper here:
https://arxiv.org/abs/1802.03426.
RunUMAP(object, ...)# S3 method for default
RunUMAP(object, assay = NULL, n.neighbors = 30L,
n.components = 2L, metric = "correlation", n.epochs = NULL,
learning.rate = 1, min.dist = 0.3, spread = 1,
set.op.mix.ratio = 1, local.connectivity = 1L,
repulsion.strength = 1, negative.sample.rate = 5, a = NULL,
b = NULL, seed.use = 42, metric.kwds = NULL,
angular.rp.forest = FALSE, reduction.key = "UMAP_", verbose = TRUE,
...)
# S3 method for Graph
RunUMAP(object, assay = NULL, n.components = 2L,
metric = "correlation", n.epochs = 0L, learning.rate = 1,
min.dist = 0.3, spread = 1, repulsion.strength = 1,
negative.sample.rate = 5L, a = NULL, b = NULL, seed.use = 42L,
metric.kwds = NULL, verbose = TRUE, reduction.key = "UMAP_", ...)
# S3 method for Seurat
RunUMAP(object, dims = NULL, reduction = "pca",
features = NULL, graph = NULL, assay = "RNA", n.neighbors = 30L,
n.components = 2L, metric = "correlation", n.epochs = NULL,
learning.rate = 1, min.dist = 0.3, spread = 1,
set.op.mix.ratio = 1, local.connectivity = 1L,
repulsion.strength = 1, negative.sample.rate = 5L, a = NULL,
b = NULL, seed.use = 42L, metric.kwds = NULL,
angular.rp.forest = FALSE, verbose = TRUE, reduction.name = "umap",
reduction.key = "UMAP_", ...)
An object
Arguments passed to other methods and UMAP
Assay to pull data for when using features
, or assay used to construct Graph
if running UMAP on a Graph
This determines the number of neighboring points used in local approximations of manifold structure. Larger values will result in more global structure being preserved at the loss of detailed local structure. In general this parameter should often be in the range 5 to 50.
The dimension of the space to embed into.
metric: This determines the choice of metric used to measure distance in the input space. A wide variety of metrics are already coded, and a user defined function can be passed as long as it has been JITd by numba.
he number of training epochs to be used in optimizing the low dimensional embedding. Larger values result in more accurate embeddings. If NULL is specified, a value will be selected based on the size of the input dataset (200 for large datasets, 500 for small).
The initial learning rate for the embedding optimization.
This controls how tightly the embedding is allowed compress points together. Larger values ensure embedded points are moreevenly distributed, while smaller values allow the algorithm to optimise more accurately with regard to local structure. Sensible values are in the range 0.001 to 0.5.
The effective scale of embedded points. In combination with min.dist this determines how clustered/clumped the embedded points are.
Interpolate between (fuzzy) union and intersection as the set operation used to combine local fuzzy simplicial sets to obtain a global fuzzy simplicial sets. Both fuzzy set operations use the product t-norm. The value of this parameter should be between 0.0 and 1.0; a value of 1.0 will use a pure fuzzy union, while 0.0 will use a pure fuzzy intersection.
The local connectivity required - i.e. the number of nearest neighbors that should be assumed to be connected at a local level. The higher this value the more connected the manifold becomes locally. In practice this should be not more than the local intrinsic dimension of the manifold.
Weighting applied to negative samples in low dimensional embedding optimization. Values higher than one will result in greater weight being given to negative samples.
The number of negative samples to select per positive sample in the optimization process. Increasing this value will result in greater repulsive force being applied, greater optimization cost, but slightly more accuracy.
More specific parameters controlling the embedding. If NULL, these values are set automatically as determined by min. dist and spread. Parameter of differentiable approximation of right adjoint functor.
More specific parameters controlling the embedding. If NULL, these values are set automatically as determined by min. dist and spread. Parameter of differentiable approximation of right adjoint functor.
Set a random seed. By default, sets the seed to 42. Setting NULL will not set a seed
A dictionary of arguments to pass on to the metric, such as the p value for Minkowski distance. If NULL then no arguments are passed on.
Whether to use an angular random projection forest to initialise the approximate nearest neighbor search. This can be faster, but is mostly on useful for metric that use an angular style distance such as cosine, correlation etc. In the case of those metrics angular forests will be chosen automatically.
dimensional reduction key, specifies the string before the number for the dimension names. UMAP by default
Controls verbosity
Which dimensions to use as input features, used only if
features
is NULL
Which dimensional reduction (PCA or ICA) to use for the UMAP input. Default is PCA
If set, run UMAP on this subset of features (instead of running on a
set of reduced dimensions). Not set (NULL) by default; dims
must be NULL to run
on features
Name of graph on which to run UMAP
Name to store dimensional reduction under in the Seurat object
Returns a Seurat object containing a UMAP representation
McInnes, L, Healy, J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, ArXiv e-prints 1802.03426, 2018
# NOT RUN {
pbmc_small
# Run UMAP map on first 5 PCs
pbmc_small <- RunUMAP(object = pbmc_small, dims = 1:5)
# Plot results
DimPlot(object = pbmc_small, reduction = 'umap')
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab