largeVis-package: largeVis: high-quality visualizations for large, high-dimensionality datasets

Description

This is an implementation of the largeVis algorithm by Tang et al., and related functions and algorithms.

Arguments

Details

largeVis estimates a low-dimensional embedding for high-dimensional data, where the distance between vertices in the low-dimensional space is proportional to the distance between them in the high-dimensional space. The algorithm works in 4 phases:

Estimate candidate nearest-neighbors for each vertex by building n.trees random projection trees.
Estimate K nearest-neighbors for each vertex by visiting each vertex' 2d-degree neighbors (its neighbors' neighbors). This is repeated max.iter times. Note that the original paper suggested a max.iter of 1, however a larger number may be appropriate for some datasets if the algorithm has trouble finding K neighbors for every vertex.
Estimate \(p_{j|i}\), the conditional probability that each edge found in the previous step is actually to a nearest neighbor of each of its nodes.
Using stochastic gradient descent, estimate an embedding for each vertex in the low-dimensional space.

The nearest-neighbor search functionality is also available as a separate function, where it offers an extremely fast approximate nearest-neighbor search. (See the Benchmarks vignette for details.)

The package also includes implementations of the HDBSCAN, DBSCAN, and OPTICS clustering algorithms, and LOF outlier detection, optimized to use data generated by running largeVis.

References

Jian Tang, Jingzhou Liu, Ming Zhang, Qiaozhu Mei. Visualizing Large-scale and High-dimensional Data. R. Campello, D. Moulavi, and J. Sander, Density-Based Clustering Based on Hierarchical Density Estimates In: Advances in Knowledge Discovery and Data Mining, Springer, pp 160-172. 2013 Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jorg Sander (1999). OPTICS: Ordering Points To Identify the Clustering Structure. ACM SIGMOD international conference on Management of data. ACM Press. pp. 49-60. Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu (1996). Evangelos Simoudis, Jiawei Han, Usama M. Fayyad, eds. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. pp. 226-231. ISBN 1-57735-004-9.

Description

Arguments

Details

References

See Also