The function dtm
computes the "distance to measure function" on a set of points Grid
, using the uniform empirical measure on a set of points X
. Given a probability measure \(P\), The distance to measure function, for each \(y \in R^d\), is defined by
$$
d_{m0}(y) = \left(\frac{1}{m0}\int_0^{m0} ( G_y^{-1}(u))^{r} du\right)^{1/r},
$$
where \(G_y(t) = P( \Vert X-y \Vert \le t)\), and \(m0 \in (0,1)\) and \(r \in [1,\infty)\) are tuning parameters. As m0
increases, DTM function becomes smoother, so m0
can be understood as a smoothing parameter. r
affects less but also changes DTM function as well. The DTM can be seen as a smoothed version of the distance function. See Details and References.
Given \(X=\{x_1, \dots, x_n\}\), the empirical version of the distance to measure is $$ \hat d_{m0}(y) = \left(\frac{1}{k} \sum_{x_i \in N_k(y)} \Vert x_i-y \Vert^{r}\right)^{1/r}, $$ where \(k= \lceil m0 * n \rceil\) and \(N_k(y)\) is the set containing the \(k\) nearest neighbors of \(y\) among \(x_1, \ldots, x_n\).
dtm(X, Grid, m0, r = 2, weight = 1)
The function dtm
returns a vector of length \(m\) (the number of points stored in Grid
) containing the value of the distance to measure function evaluated at each point of Grid
.
an \(n\) by \(d\) matrix of coordinates of points used to construct the uniform empirical measure for the distance to measure, where \(n\) is the number of points and \(d\) is the dimension.
an \(m\) by \(d\) matrix of coordinates of points where the distance to measure is computed, where \(m\) is the number of points in Grid
and \(d\) is the dimension.
a numeric variable for the smoothing parameter of the distance to measure. Roughly, m0
is the the percentage of points of X
that are considered when the distance to measure is computed for each point of Grid
. The value of m0
should be in \((0,1)\).
a numeric variable for the tuning parameter of the distance to measure. The value of r
should be in \([1,\infty)\), and the default value is 2
.
either a number, or a vector of length \(n\). If it is a number, then same weight is applied to each points of X
. If it is a vector, weight
represents weights of each points of X
. The default value is 1
.
Jisu Kim and Fabrizio Lecci
See (Chazal, Cohen-Steiner, and Merigot, 2011, Definition 3.2) and (Chazal, Massart, and Michel, 2015, Equation (2)) for a formal definition of the "distance to measure" function.
Chazal F, Cohen-Steiner D, Merigot Q (2011). "Geometric inference for probability measures." Foundations of Computational Mathematics 11.6, 733-751.
Chazal F, Massart P, Michel B (2015). "Rates of convergence for robust geometric inference."
Chazal F, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Robust Topological Inference: Distance-To-a-Measure and Kernel Distance." Technical Report.
kde
, kernelDist
, distFct
## Generate Data from the unit circle
n <- 300
X <- circleUnif(n)
## Construct a grid of points over which we evaluate the function
by <- 0.065
Xseq <- seq(-1.6, 1.6, by = by)
Yseq <- seq(-1.7, 1.7, by = by)
Grid <- expand.grid(Xseq, Yseq)
## distance to measure
m0 <- 0.1
DTM <- dtm(X, Grid, m0)
Run the code above in your browser using DataLab