largeVis: Apply the LargeVis algorithm for visualizing large high-dimensional datasets.

Description

Apply the LargeVis algorithm for visualizing large high-dimensional datasets.

Usage

largeVis(x, dim = 2, K = 50, n_trees = 50, tree_threshold = max(10,
  min(nrow(x), ncol(x))), max_iter = 1, distance_method = "Euclidean",
  perplexity = max(50, K/3), save_neighbors = TRUE, save_edges = TRUE,
  threads = NULL, verbose = getOption("verbose", TRUE), ...)

Arguments

A matrix, where the features are rows and the examples are columns.

dim

The number of dimensions in the output

The number of nearest-neighbors to use in computing the kNN graph

n_trees

See randomProjectionTreeSearch. The default is set at 50, which is the number used in the examples in the original paper.

tree_threshold

See randomProjectionTreeSearch. By default, this is the number of features in the input set.

max_iter

See randomProjectionTreeSearch.

distance_method

One of "Euclidean" or "Cosine." See randomProjectionTreeSearch.

perplexity

See buildWijMatrix.

save_neighbors

Whether to include in the output the adjacency matrix of nearest neighbors.

save_edges

Whether to include in the output the distance matrix of nearest neighbors.

threads

The maximum number of threads to spawn. Determined automatically if NULL (the default). It is unlikely that this parameter should ever need to be adjusted. It is only available to make it possible to abide by the CRAN limitation that no package use more than two cores.

verbose

Verbosity

...

Additional arguments passed to projectKNNs.

Value

A `largeVis` object with the following slots:

'knns': If save_neighbors=TRUE, An [N,K] 0-indexed integer matrix, which is an adjacency list of each vertex' identified nearest neighbors. If the algorithm failed to find K neighbors, the matrix is padded with NA's. Note that this matrix is not identical to the output from randomProjectionTreeSearch: missing neighbors are NA's rather than -1's, and the matrix is transposed.
'edges': If save_edges=TRUE, a [N,N] sparse matrix of distances between nearest neighbors.
'wij': A sparse [N,N] matrix where each cell represents \(w_{ij}\).
'call': The call.
'coords': A [D,N] matrix of the embedding of the dataset in the low-dimensional space.

References

Jian Tang, Jingzhou Liu, Ming Zhang, Qiaozhu Mei. Visualizing Large-scale and High-dimensional Data.

Examples

Run this code

# NOT RUN {
# iris
data(iris)
dat <- as.matrix(iris[,1:4])
visObject <- largeVis(dat, max_iter = 20, K = 10, sgd_batches = 10000, threads = 1)
plot(t(visObject$coords))

# }
# NOT RUN {
# mnist
# Note: The MNIST dataset may be obtained using the deepnet package.
load("./mnist.Rda")
dat <- mnist$images
dim(dat) <- c(42000, 28 * 28)
dat <- (dat / 255) - 0.5
dat <- t(dat)
visObject <- largeVis(dat, n_trees = 50, tree_th = 200, K = 50)
plot(t(visObject$coords))
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab