semimetric.NPFDA: Proximities between functional data (semi-metrics)

Description

Computes semi-metric distances of functional data based on Ferraty F and Vieu, P. (2006).

Usage

semimetric.deriv(
  fdata1,
  fdata2 = fdata1,
  nderiv = 1,
  nknot = ifelse(floor(ncol(DATA1)/3) > floor((ncol(DATA1) - nderiv - 4)/2),
    floor((ncol(DATA1) - nderiv - 4)/2), floor(ncol(DATA1)/3)),
  ...
)
semimetric.fourier(
  fdata1,
  fdata2 = fdata1,
  nderiv = 0,
  nbasis = ifelse(floor(ncol(DATA1)/3) > floor((ncol(DATA1) - nderiv - 4)/2),
    floor((ncol(DATA1) - nderiv - 4)/2), floor(ncol(DATA1)/3)),
  period = NULL,
  ...
)
semimetric.hshift(fdata1, fdata2 = fdata1, t = 1:ncol(DATA1), ...)
semimetric.mplsr(fdata1, fdata2 = fdata1, q = 2, class1, ...)
semimetric.pca(fdata1, fdata2 = fdata1, q = 1, ...)

Value

Returns a proximities matrix between two functional datasets.

Arguments

fdata1: Functional data 1 or curve 1. DATA1 with dimension (n1 x m), where n1 is the number of curves and m are the points observed in each curve.
fdata2: Functional data 2 or curve 2. DATA1 with dimension (n2 x m), where n2 is the number of curves and m are the points observed in each curve.
nderiv: Order of derivation, used in semimetric.deriv and
semimetric.fourier
nknot: semimetric.deriv argument: number of interior knots (needed for defining the B-spline basis).
...: Further arguments passed to or from other methods.
nbasis: semimetric.fourier: size of the basis.
period: semimetric.fourier:allows to select the period for the fourier expansion.
t: semimetric.hshift: vector which defines t (one can choose 1,2,...,nbt where nbt is the number of points of the discretization)
q: If semimetric.pca: the retained number of principal components.
If semimetric.mplsr: the retained number of factors.
class1: semimetric.mplsr: vector containing a categorical response which corresponds to class number for units stored in DATA1.

Details

semimetric.deriv: approximates $L_2$ metric between derivatives of the curves based on ther B-spline representation. The derivatives set with the argument nderiv.
semimetric.fourier: approximates $L_2$ metric between the curves based on ther B-spline representation. The derivatives set with the argument nderiv.
semimetric.hshift: computes distance between curves taking into account an horizontal shift effect.
semimetric.mplsr: computes distance between curves based on the partial least squares method.
semimetric.pca: computes distance between curves based on the functional principal components analysis method.

In the next semi-metric functions the functional data $X$ is approximated by $k_n$ elements of the Fourier, B--spline, PC or PLS basis using, $\hat{X_i} =\sum_{k=1}^{k_n}\nu_{k,i}\xi_k$, where $\nu_k$ are the coefficient of the expansion on the basis function $\left\{\xi_k\right\}_{k=1}^{\infty}$.
The distances between the q-order derivatives of two curves $X_{1}$ and $X_2$ is, $$d_{2}^{(q)}\left(X_1,X_2\right)_{k_n}=\sqrt{\frac{1}{T}\int_{T}\left(X_{1}^{(q)}(t)-X_{2}^{(q)}(t)\right)^2 dt}$$ where $X_{i}^{(q)}\left(t\right)$ denot the $q$ derivative of $X_i$.

semimetric.deriv and semimetric.fourier function use a B-spline and Fourier approximation respectively for each curve and the derivatives are directly computed by differentiating several times their analytic form, by default q=1 and q=0 respectively. semimetric.pca and semimetric.mprls function compute proximities between curves based on the functional principal components analysis (FPCA) and the functional partial least square analysis (FPLS), respectively. The FPC and FPLS reduce the functional data in a reduced dimensional space (q components). semimetric.mprls function requires a scalar response.

$$d_{2}^{(q)}\left(X_1,X_2\right)_{k_n}\approx\sqrt{\sum_{k=1}^{k_n}\left(\nu_{k,1}-\nu_{k,2}\right)^2\left\|\xi_k^{(q)}\right\|dt}$$ semimetric.hshift computes proximities between curves taking into account an horizontal shift effect.

$$d_{hshift}\left(X_1,X_2\right)=\min_{h\in\left[-mh,mh\right]}d_2(X_1(t),X_2(t+h))$$ where $mh$ is the maximum horizontal shifted allowed.

References

Ferraty, F. and Vieu, P. (2006). Nonparametric functional data analysis. Springer Series in Statistics, New York.

Ferraty, F. and Vieu, P. (2006). NPFDA in practice. Free access on line at https://www.math.univ-toulouse.fr/~ferraty/SOFTWARES/NPFDA/

Examples

Run this code

if (FALSE) { 
#	INFERENCE PHONDAT
data(phoneme)
ind=1:100 # 2 groups
mlearn<-phoneme$learn[ind,]
mtest<-phoneme$test[ind,]
n=nrow(mlearn[["data"]])
np=ncol(mlearn[["data"]])
mdist1=semimetric.pca(mlearn,mtest)
mdist2=semimetric.pca(mlearn,mtest,q=2)
mdist3=semimetric.deriv(mlearn,mtest,nderiv=0)
mdist4=semimetric.fourier(mlearn,mtest,nderiv=2,nbasis=21)
#uses hshift function
#mdist5=semimetric.hshift(mlearn,mtest) #takes a lot
glearn<-phoneme$classlearn[ind]
#uses mplsr function
mdist6=semimetric.mplsr(mlearn,mtest,5,glearn)
mdist0=metric.lp(mlearn,mtest)
b=as.dist(mdist6)
c2=hclust(b)
plot(c2)
memb <- cutree(c2, k = 2)
table(memb,phoneme$classlearn[ind])
 }

Run the code above in your browser using DataLab