Learn R Programming

IDmining (version 1.0.7)

RenDim: Renyi's Generalized Dimensions

Description

Estimates R<U+00E9>nyi's generalized dimensions (or R<U+00E9>nyi's dimensions of \(qth\) order). It is mainly for \(q=2\) that the result is used as an estimate of the intrinsic dimension of data.

Usage

RenDim(X, scaleQ=1:5, qMin=2, qMax=2)

Arguments

X

A \(N \times E\) matrix, data.frame or data.table where \(N\) is the number of data points and \(E\) is the number of variables (or features). Each variable is rescaled to the \([0,1]\) interval by the function.

scaleQ

A vector (at least two values). It contains the values of \(\ell^{-1}\) chosen by the user (by default: scaleQ = 1:5).

qMin

The minimum value of \(q\) (by default: qMin = 2).

qMax

The maximum value of \(q\) (by default: qMax = 2).

Value

A list of two elements:

  1. a data.frame containing the value of R<U+00E9>nyi's information of \(qth\) order (computed using the natural logarithm) for each value of \(\ln (\delta)\) and \(q\). The values of \(\ln (\delta)\) are provided with regard to the \([0,1]\) interval.

  2. a data.frame containing the value of \(D_q\) for each value of \(q\).

Details

  1. \(\ell\) is the edge length of the grid cells (or quadrats). Since the variables (and consenquently the grid) are rescaled to the \([0,1]\) interval, \(\ell\) is equal to \(1\) for a grid consisting of only one cell.

  2. \(\ell^{-1}\) is the number of grid cells (or quadrats) along each axis of the Euclidean space in which the data points are embedded.

  3. \(\ell^{-1}\) is equal to \(Q^{(1/E)}\) where \(Q\) is the number of grid cells and \(E\) is the number of variables (or features).

  4. \(\ell^{-1}\) is directly related to \(\delta\) (see References).

  5. \(\delta\) is the diagonal length of the grid cells.

References

C. Traina Jr., A. J. M. Traina, L. Wu and C. Faloutsos (2000). Fast feature selection using fractal dimension. Proceedings of the 15th Brazilian Symposium on Databases (SBBD 2000), Jo<U+00E3>o Pessoa (Brazil).

E. P. M. De Sousa, C. Traina Jr., A. J. M. Traina, L. Wu and C. Faloutsos (2007). A fast and effective method to find correlations among attributes in databases, Data Mining and Knowledge Discovery 14(3):367-407.

J. Golay and M. Kanevski (2015). A new estimator of intrinsic dimension based on the multipoint Morisita index, Pattern Recognition 48 (12):4070<U+2013>4081.

H. Hentschel and I. Procaccia (1983). The infinite number of generalized dimensions of fractals and strange attractors, Physica D 8(3):435-444.

Examples

Run this code
# NOT RUN {
sim_dat <- SwissRoll(1000)

scaleQ <- 1:15 # It starts with a grid of 1^E cell (or quadrat).
               # It ends with a grid of 15^E cells (or quadrats).
qRI_ID <- RenDim(sim_dat[,c(1,2)], scaleQ[5:15])

print(paste("The ID estimate is equal to",round(qRI_ID[[1]][1,2],2)))
# }

Run the code above in your browser using DataLab