Learn R Programming

Rdimtools

Rdimtools is an R package for dimension reduction (DR) - including feature selection and manifold learning - and intrinsic dimension estimation (IDE) methods. We aim at building one of the most comprehensive toolbox available online, where current version delivers 145 DR algorithms and 17 IDE methods.

The philosophy is simple, the more we have at hands, the better we can play.

Elephant

Our logo characterizes the foundational nature of multivariate data analysis; we may be blind people wrangling the data to see an elephant to grasp an idea of what the data looks like with partial information from each algorithm.

Installation

You can install a release version from CRAN:

install.packages("Rdimtools")

or the development version from github:

## install.packages("devtools")
devtools::install_github("kisungyou/Rdimtools")

Minimal Example : Dimension Reduction

Here is an example of dimension reduction on the famous iris dataset. Principal Component Analysis (do.pca), Laplacian Score (do.lscore), and Diffusion Maps (do.dm) are compared, each from a family of algorithms for linear reduction, feature extraction, and nonlinear reduction.

# load the library
library(Rdimtools)

# load the data
X   = as.matrix(iris[,1:4])
lab = as.factor(iris[,5])

# run 3 algorithms mentioned above
mypca = do.pca(X, ndim=2)
mylap = do.lscore(X, ndim=2)
mydfm = do.dm(X, ndim=2, bandwidth=10)

# visualize
par(mfrow=c(1,3))
plot(mypca$Y, pch=19, col=lab, xlab="axis 1", ylab="axis 2", main="PCA")
plot(mylap$Y, pch=19, col=lab, xlab="axis 1", ylab="axis 2", main="Laplacian Score")
plot(mydfm$Y, pch=19, col=lab, xlab="axis 1", ylab="axis 2", main="Diffusion Maps")

Minimal Example : Dimension Estimation

Swill Roll is a classic example of 2-dimensional manifold embedded in $\mathbb{R}^3$ and one of 11 famous model-based samples from aux.gensamples() function. Given the ground truth that $d=2$, let’s apply several methods for intrinsic dimension estimation.

# generate sample data
set.seed(100)
roll = aux.gensamples(dname="swiss")

# we will compare 6 methods (out of 17 methods from version 1.0.0)
vecd = rep(0,5)
vecd[1] = est.Ustat(roll)$estdim       # convergence rate of U-statistic on manifold
vecd[2] = est.correlation(roll)$estdim # correlation dimension
vecd[3] = est.made(roll)$estdim        # manifold-adaptive dimension estimation
vecd[4] = est.mle1(roll)$estdim        # MLE with Poisson process
vecd[5] = est.twonn(roll)$estdim       # minimal neighborhood information

# let's visualize
plot(1:5, vecd, type="b", ylim=c(1.5,2.5), 
     main="true dimension is d=2",
     xaxt="n",xlab="",ylab="estimated dimension")
xtick = seq(1,5,by=1)
axis(side=1, at=xtick, labels = FALSE)
text(x=xtick,  par("usr")[3], 
     labels = c("Ustat","correlation","made","mle1","twonn"), pos=1, xpd = TRUE)

We can observe that all 5 methods we tested estimated the intrinsic dimension around $d=2$. It should be noted that the estimated dimension may not be integer-valued due to characteristics of each method.

Acknowledgements

The logo icon is made by Freepik from www.flaticon.com.The rotating Swiss Roll image is taken from Dinoj Surendran’s website.

Copy Link

Version

Install

install.packages('Rdimtools')

Monthly Downloads

710

Version

1.1.2

License

MIT + file LICENSE

Maintainer

Last Published

December 15th, 2022

Functions in Rdimtools (1.1.2)

est.clustering

Intrinsic Dimension Estimation via Clustering
aux.pkgstat

Show the number of functions for Rdimtools.
aux.gensamples

Generate model-based samples
est.gdistnn

Intrinsic Dimension Estimation based on Manifold Assumption and Graph Distance
est.nearneighbor1

Intrinsic Dimension Estimation with Near-Neighbor Information
est.mindkl

MiNDkl
est.made

Manifold-Adaptive Dimension Estimation
est.danco

Intrinsic Dimensionality Estimation with DANCo
est.mindml

MINDml
do.cscoreg

Constraint Score using Spectral Graph
est.twonn

Intrinsic Dimension Estimation by a Minimal Neighborhood Information
est.incisingball

Intrinsic Dimension Estimation with Incising Ball
do.fscore

Fisher Score
do.cscore

Constraint Score
est.nearneighbor2

Near-Neighbor Information with Bias Correction
est.mle1

Maximum Likelihood Esimation with Poisson Process
est.mle2

Maximum Likelihood Esimation with Poisson Process and Bias Correction
do.lsls

Locality Sensitive Laplacian Score
do.fosmod

Forward Orthogonal Search by Maximizing the Overall Dependency
do.enet

Elastic Net Regularization
est.packing

Intrinsic Dimension Estimation using Packing Numbers
do.mifs

Mutual Information for Selecting Features
do.mcfs

Multi-Cluster Feature Selection
do.nrsr

Non-convex Regularized Self-Representation
est.pcathr

PCA Thresholding with Accumulated Variance
do.specs

Supervised Spectral Feature Selection
do.rsr

Regularized Self-Representation
do.procrustes

Feature Selection using PCA and Procrustes Analysis
do.spufs

Structure Preserving Unsupervised Feature Selection
do.disr

Diversity-Induced Self-Representation
do.lsdf

Locality Sensitive Discriminant Feature
do.specu

Unsupervised Spectral Feature Selection
do.lasso

Least Absolute Shrinkage and Selection Operator
do.lscore

Laplacian Score
do.udfs

Unsupervised Discriminative Features Selection
do.lspe

Locality and Similarity Preserving Embedding
do.uwdfs

Uncorrelated Worst-Case Discriminative Feature Selection
do.cnpe

Complete Neighborhood Preserving Embedding
do.dagdne

Double-Adjacency Graphs-based Discriminant Neighborhood Embedding
do.ugfs

Unsupervised Graph-based Feature Selection
do.crp

Collaborative Representation-based Projection
do.bpca

Bayesian Principal Component Analysis
do.extlpp

Extended Locality Preserving Projection
do.pfa

Principal Feature Analysis
do.ammc

Adaptive Maximum Margin Criterion
do.adr

Adaptive Dimension Reduction
do.dne

Discriminant Neighborhood Embedding
do.fa

Exploratory Factor Analysis
do.lpca2006

Locally Principal Component Analysis by Yang et al. (2006)
do.lpe

Locality Pursuit Embedding
do.cca

Canonical Correlation Analysis
do.elpp2

Enhanced Locality Preserving Projection (2013)
do.lltsa

Linear Local Tangent Space Alignment
do.lmds

Landmark Multidimensional Scaling
do.lea

Locally Linear Embedded Eigenspace Analysis
do.lspp

Local Similarity Preserving Projection
do.eslpp

Extended Supervised Locality Preserving Projection
do.ldp

Locally Discriminating Projection
do.lsir

Localized Sliced Inverse Regression
do.lsda

Locality Sensitive Discriminant Analysis
do.fssem

Feature Subset Selection using Expectation-Maximization
do.anmm

Average Neighborhood Margin Maximization
do.asi

Adaptive Subspace Iteration
do.ldakm

Combination of LDA and K-means
do.lde

Local Discriminant Embedding
do.isoproj

Isometric Projection
do.kmvp

Kernel-Weighted Maximum Variance Projection
do.lpp

Locality Preserving Projection
do.lqmi

Linear Quadratic Mutual Information
do.ica

Independent Component Analysis
do.llp

Local Learning Projections
do.lfda

Local Fisher Discriminant Analysis
do.nolpp

Nonnegative Orthogonal Locality Preserving Projection
do.mfa

Marginal Fisher Analysis
do.mvp

Maximum Variance Projection
do.mlie

Maximal Local Interclass Embedding
do.msd

Maximum Scatter Difference
do.pls

Partial Least Squares
do.slpe

Supervised Locality Pursuit Embedding
do.slpp

Supervised Locality Preserving Projection
do.pflpp

Parameter-Free Locality Preserving Projection
do.pca

Principal Component Analysis
do.opls

Orthogonal Partial Least Squares
do.rpcag

Robust Principal Component Analysis via Geometric Median
do.rndproj

Random Projection
do.nonpp

Nonnegative Orthogonal Neighborhood Preserving Projections
do.sir

Sliced Inverse Regression
do.sdlpp

Sample-Dependent Locality Preserving Projection
do.mds

(Classical) Multidimensional Scaling
do.wdfs

Worst-Case Discriminative Feature Selection
do.bmds

Bayesian Multidimensional Scaling
do.cge

Constrained Graph Embedding
do.spca

Sparse Principal Component Analysis
do.spc

Supervised Principal Component Analysis
do.dspp

Discriminative Sparsity Preserving Projection
do.elde

Exponential Local Discriminant Embedding
do.lpfda

Locality Preserving Fisher Discriminant Analysis
do.kudp

Kernel-Weighted Unsupervised Discriminant Projection
iris

Load Iris data
do.lda

Linear Discriminant Analysis
do.odp

Orthogonal Discriminant Projection
do.fastmap

FastMap
do.hydra

Hyperbolic Distance Recovery and Approximation
do.npca

Nonnegative Principal Component Analysis
do.npe

Neighborhood Preserving Embedding
do.mmc

Maximum Margin Criterion
do.ppca

Probabilistic Principal Component Analysis
do.modp

Modified Orthogonal Discriminant Projection
do.rlda

Regularized Linear Discriminant Analysis
do.mmsd

Multiple Maximum Scatter Difference
do.udp

Unsupervised Discriminant Projection
do.lpmip

Locality-Preserved Maximum Information Projection
do.dm

Diffusion Maps
do.ulda

Uncorrelated Linear Discriminant Analysis
do.klfda

Kernel Local Fisher Discriminant Analysis
do.crda

Curvilinear Distance Analysis
do.idmap

Interactive Document Map
do.klsda

Kernel Locality Sensitive Discriminant Analysis
do.mvu

Maximum Variance Unfolding / Semidefinite Embedding
do.mve

Minimum Volume Embedding
do.iltsa

Improved Local Tangent Space Alignment
do.olpp

Orthogonal Locality Preserving Projection
do.save

Sliced Average Variance Estimation
do.onpp

Orthogonal Neighborhood Preserving Projections
do.sda

Semi-Supervised Discriminant Analysis
do.mmp

Maximum Margin Projection
do.spp

Sparsity Preserving Projection
do.sammc

Semi-Supervised Adaptive Maximum Margin Criterion
do.llle

Local Linear Laplacian Eigenmaps
do.crca

Curvilinear Component Analysis
do.kmfa

Kernel Marginal Fisher Analysis
do.nnp

Nearest Neighbor Projection
do.ssldp

Semi-Supervised Locally Discriminant Projection
do.lle

Locally Linear Embedding
do.phate

Potential of Heat Diffusion for Affinity-based Transition Embedding
do.rsir

Regularized Sliced Inverse Regression
do.kpca

Kernel Principal Component Analysis
do.cisomap

Conformal Isometric Feature Mapping
do.olda

Orthogonal Linear Discriminant Analysis
do.kqmi

Kernel Quadratic Mutual Information
do.kmmc

Kernel Maximum Margin Criterion
do.isomap

Isometric Feature Mapping
do.tsne

t-distributed Stochastic Neighbor Embedding
do.ksda

Kernel Semi-Supervised Discriminant Analysis
do.sammon

Sammon Mapping
do.rpca

Robust Principal Component Analysis
do.plp

Piecewise Laplacian-based Projection (PLP)
do.lamp

Local Affine Multidimensional Projection
do.lapeig

Laplacian Eigenmaps
do.klde

Kernel Local Discriminant Embedding
do.dppca

Dual Probabilistic Principal Component Analysis
oos.linproj

OOS : Linear Projection
do.ltsa

Local Tangent Space Alignment
do.splapeig

Supervised Laplacian Eigenmaps
do.keca

Kernel Entropy Component Analysis
do.dve

Distinguishing Variance Embedding
do.ispe

Isometric Stochastic Proximity Embedding
do.ree

Robust Euclidean Embedding
do.mmds

Metric Multidimensional Scaling
do.spmds

Spectral Multidimensional Scaling
do.spe

Stochastic Proximity Embedding
usps

Load USPS handwritten digits data
do.lisomap

Landmark Isometric Feature Mapping
do.sne

Stochastic Neighbor Embedding
est.boxcount

Box-counting Dimension
est.Ustat

ID Estimation with Convergence Rate of U-statistic on Manifold
aux.preprocess

Preprocessing the data
aux.graphnbd

Construct Nearest-Neighborhood Graph
est.correlation

Correlation Dimension
aux.kernelcov

Build a centered kernel matrix K
aux.shortestpath

Find shortest path using Floyd-Warshall algorithm