Learn R Programming

IDmining (version 1.0.7)

MINDID: The (Multipoint) Morisita Index for Intrinsic Dimension Estimation

Description

Estimates the intrinsic dimension of data using the Morisita estimator of intrinsic dimension.

Usage

MINDID(X, scaleQ=1:5, mMin=2, mMax=2)

Arguments

X

A \(N \times E\) matrix, data.frame or data.table where \(N\) is the number of data points and \(E\) is the number of variables (or features). Each variable is rescaled to the \([0,1]\) interval by the function.

scaleQ

A vector (at least two values). It contains the values of \(\ell^{-1}\) chosen by the user (by default: scaleQ = 1:5).

mMin

The minimum value of \(m\) (by default: mMin = 2).

mMax

The maximum value of \(m\) (by default: mMax = 2).

Value

A list of two elements:

  1. a data.frame containing the \(\ln\) value of the m-Morisita index for each value of \(\ln (\delta)\) and \(m\). The values of \(\ln (\delta)\) are provided with regard to the \([0,1]\) interval.

  2. a data.frame containing the values of \(S_m\) and \(M_m\) for each value of \(m\).

Details

  1. \(\ell\) is the edge length of the grid cells (or quadrats). Since the variables (and consenquently the grid) are rescaled to the \([0,1]\) interval, \(\ell\) is equal to \(1\) for a grid consisting of only one cell.

  2. \(\ell^{-1}\) is the number of grid cells (or quadrats) along each axis of the Euclidean space in which the data points are embedded.

  3. \(\ell^{-1}\) is equal to \(Q^{(1/E)}\) where \(Q\) is the number of grid cells and \(E\) is the number of variables (or features).

  4. \(\ell^{-1}\) is directly related to \(\delta\) (see References).

  5. \(\delta\) is the diagonal length of the grid cells.

References

J. Golay and M. Kanevski (2015). A new estimator of intrinsic dimension based on the multipoint Morisita index, Pattern Recognition 48 (12):4070<U+2013>4081.

J. Golay, M. Leuenberger and M. Kanevski (2017). Feature selection for regression problems based on the Morisita estimator of intrinsic dimension, Pattern Recognition 70:126<U+2013>138.

J. Golay and M. Kanevski (2017). Unsupervised feature selection based on the Morisita estimator of intrinsic dimension, Knowledge-Based Systems 135:125-134.

J. Golay, M. Leuenberger and M. Kanevski (2015). Morisita-based feature selection for regression problems. Proceedings of the 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Bruges (Belgium).

Examples

Run this code
# NOT RUN {
sim_dat <- SwissRoll(1000)

scaleQ <- 1:15 # It starts with a grid of 1^E cell (or quadrat).
               # It ends with a grid of 15^E cells (or quadrats).
mMI_ID <- MINDID(sim_dat, scaleQ[5:15])

print(paste("The ID estimate is equal to",round(mMI_ID[[1]][1,3],2)))
# }

Run the code above in your browser using DataLab